m2local — a chat-first search that never reached its users

I.Problem

Argentine real-estate search is a filter form fighting the user.

Listings in Argentina live across a dozen portals, half of them duplicates, none of them sharing a schema. The canonical shape of a property — its operation, its neighborhood, its price band, whether it takes pets — has to be reconstructed every time. The search interfaces on top of that data are rigid dropdowns that demand the user translate fuzzy intent into checkboxes: algo chico cerca de Palermo, bajo 500k, que acepte mascotas becomes a form the user is not quite sure how to fill in.

I wanted a search experience where the user said the sentence and the system did the translating. And underneath it, a dataset where every listing had been geocoded, deduplicated, image-analyzed, and folded into one schema — so the agent was querying one canonical Property, not a pile of differently-shaped scrapes.

The chat was the surface. The work was upstream, in the pipeline that made the chat's answers worth trusting.

II.Approach

A canonical model, a multi-agent search, two front doors.

The system settled into three layers that had to be built in concert:

An ingestion pipeline that scrapes sources through Firecrawl, geocodes through LocationIQ, runs each image through an ImageAnalysisAgent, and funnels everything into a single Property model. Raw payloads stay on ScrapeRun / Extraction records so anything can be reprocessed when the schema moves.
A chat pipeline where the HTTP request never waits on the LLM. A user message is persisted as pending, returned immediately, and picked up by a Horizon job that invokes a PropertySearchAgent. The agent has a single structured tool — search_properties, strict JSON schema — and a cap of five tool calls per turn. Responses come back in one of three enum-constrained shapes: search_results, clarification, or text.
An MCP server at POST /mcp/properties, Sanctum-authenticated, exposing the same search tool to external LLM clients. Same catalog, two front doors.

Each agent is specialized rather than monolithic — PropertySearchAgent, PropertyAvailabilityAgent, ImageAnalysisAgent, TitleGeneratorAgent — which kept prompts debuggable and token costs predictable. Access to the product is invitation-only: Waitlist and admin-issued Invitation tokens, with Fortify handling auth and Filament v4 giving admins CRUD-for-free over sources, runs, invitations, and properties.

III.Outcome

Built end to end, never opened to users.

Users ever onboarded

Specialized agents behind one structured tool

~ 4 mo

From first commit to quiet shelving

By January the pipeline ingested, the agent answered in Spanish, the admin panel was usable, the MCP surface worked, and the invitation flow was ready to send its first email. The email was never sent. The reasons were the usual ones — a commercial question the engineering could not answer, a market that was harder to reach than the architecture had assumed, a founding team whose attention was needed elsewhere. We wound it down carefully, with the code in a state a future self could pick up from, and moved on.

So the outcome is, honestly, a system that worked in staging and a set of lessons about the parts of a product the code cannot solve. Both are worth writing down.

IV.Retrospective

The model's job is narrow. The pipeline's job is wide.

The first draft had one big prompt trying to do everything — understand intent, search, clarify, recommend, write the reply. It was brittle in the ways monolithic prompts always are. Splitting the work into specialized agents with a single structured tool each was not an optimization; it was what made the system debuggable. When something went wrong, we could point at the agent and the tool call, not at a thousand-token prompt.

The second lesson was about the canonical model. We spent the early weeks tempted to keep per-source tables "just in case" — and every week we kept them was a week the search agent had to reason across shapes it should not have had to see. Committing to one Property, and making ScrapeRun / Extraction the audit trail for raw payloads, let the agent assume the schema it was told to assume. Normalization is a product decision, not a storage one.

The third lesson was about safeguards at the model boundary. LLMs, occasionally, double-JSON-encode their structured output. A strict schema catches most of it; a small defensive decode catches the rest. Neither was interesting to write. Both stopped being a source of 2am pages.

The fourth lesson was the one the architecture could not teach me. A system that works in staging is not a product; a product is a system that has met its users. We spent four months on the parts of the problem that engineering is good at solving, and shelved the project before we met the parts that engineering cannot solve alone — distribution, trust, the slow business of convincing strangers to type into a chat box. I would make most of the technical choices again. I would start talking to users earlier.