The New Method of Building Apps with AI Agents
Designing a WhatsApp payments app as a state machine where each node is an agent — and why that pattern scales to all sorts of conversational flows.
I began thinking about creating an app with Laravel that allows users to interact primarily with AI agents. I've been exploring various design ideas, and recently, an interesting concept has come to mind. Let me share it with you.
I.Paga Fácil
The idea is that users can make payments in Venezuela without ever leaving WhatsApp. Compared to the current method, it should be easier and faster.
The user writes in natural language that they want to make a new payment, and the service guides them through the payment step by step without ever leaving the chat.
Under the hood, we use the Cobra Fácil API, which allows us to make transfers from one account to another.
We use passwordless login: every time users log in to the platform, they receive a 6-digit code via email. Users must be authenticated to perform actions like creating a payment method or updating their profile.
For now, we only support mobile payment, which is the option allowed by Cobra Fácil.
So, let's start designing this app.
II.General Architecture
The basic flow looks like this:
- Users send a message to WhatsApp, and WhatsApp forwards it to our Laravel app via a webhook call.
- We process the inbound message and hand it off to a Routing System. The routing system decides which Agent should handle the message.
- The agent processes the message — possibly interacting with other agents along the way — and sends a response back to the user through WhatsApp.
We also need a way to store the application's state. For that, we're
going to create conversations and conversation_messages tables.
III.Data Architecture
User. The basic user, defined by Laravel.
Conversation. The conversation model has:
- User ID, because it belongs to a user.
- State, the current state of the conversation. A particular agent handles the conversation based on this state.
- Is Active, to know if the conversation is active. Since we have only one chat, there should be only one active conversation.
- Expires At. If the conversation has been active for more than a certain time, we should expire it.
- Metadata, a JSON column that stores all relevant data collected during the flow.
Conversation Message. This model has:
- Conversation ID, because it belongs to a conversation.
- Role, to define whether it's a message from the user or the assistant.
- Direction, to define whether the message is inbound or outbound.
- Content, the content of the message.
Payment Method. Specifies the bank account from which the user transfers the money.
Now let's take a closer look at the conversation's flow.
IV.Conversation Flow
When users start a new conversation, we route them to Intent Detection, which is managed by the Intent Detection Agent.
Intent Detection Agent. This agent figures out the user's intent and, based on that, routes them to either Payment Data Collection or Customer Service.
We take the inbound message and pass it to an LLM to decipher the user's intent. When the LLM knows what the user wants, it calls a function to transition to one of the allowed states.
How does the LLM know what the allowed states are? We create a
StateGraph class to define all the relations between states — the
edges. The class lets us declare what the initial state of a
conversation is, which state is terminal, what the neighbors of a
specific state are, and whether a state can transition to another.
In this case, we pass the neighbors of Intent Detection (Payment Data Collection and Customer Service) to the LLM. If one of those intents is detected, we transition to the corresponding state. Because we want the next agent to respond to the user, we call it directly.
Otherwise, the LLM keeps asking questions to clarify the user's intent.
Let's say the user wants to make a payment. We'll call the Payment Data Collection Agent next.
Payment Data Collection Agent. This agent gathers all the information needed to make a payment: the amount, the recipient's name, phone number, document ID, and so on. Once it has everything, it calls a function to store the data in the conversation's metadata and then hands off to the next agent.
OTP Capture Agent. This agent reads the data from the previous step and makes the first API call to Cobra Fácil, which asks the user's bank to send a verification code confirming they actually want to initiate a transfer. It also sends a message to the user asking for that code. Then we update the conversation state — but we don't call the next agent yet. We need to wait for the user to send back the OTP.
Payment Execution Agent. With the OTP code and the payment information in hand, this agent makes a second API call to Cobra Fácil to execute the payment. We update the conversation state and tell the user the payment is in progress.
Payment Status Monitoring. We don't handle this state like the others. Instead, a Laravel command monitors the pending payment transaction with Cobra Fácil. Once the transaction is processed, we update the conversation state and send the user a message with the result. The last two states are Payment Failed and Payment Success.
Customer Service Agent. If the user is asking questions about the service instead of making a payment, this agent handles the conversation and answers them.
The conversation automatically expires once the user completes a payment or a specified amount of time passes without any interaction.
V.Conversations as state machines
And that's it. We can handle flows like this by designing a state machine where each node is an agent. The approach is flexible enough to support all sorts of conversation flows, and it keeps each agent small and focused on one job.
Should I turn this into a package to make these kinds of applications easier to build?