Agents keep being handed a browser designed for a person.
The web an agent sees through a naive fetch is a different web from the
one a person sees through Chrome. JavaScript-heavy pages return empty
shells; search engines return HTML meant for a human's eye; video
platforms return player chrome and nothing else. The workaround, in most
agent codebases, is a small pile of bespoke scrapers that each fail in
their own interesting way.
I wanted one surface. A single CLI an agent could reach for whenever it needed to know something about the outside world — and a single JSON shape coming back, regardless of whether the answer was found in a search result, a rendered page, or a video transcript.
The goal was not to build a better browser. It was to stop asking agents to pretend they were people.
A handful of verbs, markdown by default, three providers behind the curtain.
Zurf exposes a small vocabulary — search, fetch, browse, ask,
transcribe — and hides the provider choice behind each. The default
output is markdown, because markdown is what a language model reads
cheaply; --json is there when an agent wants to consume the result
programmatically.
- Search returns URLs, titles, and snippets in a normalized list, with
quickanddeepmodes mapped onto Perplexity's Sonar tiers. - Fetch is the lightweight path for static pages — capped at a megabyte, no browser, no rendering tax.
- Browse is the heavy path, renders JavaScript-heavy pages through a Browserbase session, and flattens the result.
- Ask routes through Sonar for questions that want citations rather than raw pages.
- Transcribe pulls transcripts from YouTube, TikTok, Instagram, X, and Facebook through Supadata — native captions first, AI fallback second.
A small skill layer teaches coding agents how to use the commands progressively, so an agent does not have to hold the full CLI surface in its context to do useful work. The hardest part was not the integrations; it was deciding what not to ship — resisting every provider flag and every optional header. The useful surface turned out to be small.
A tool I reach for before I reach for curl.
Zurf is now the first thing the agents I work with use when they need to touch the web. The uniformity of the output means the prompt that handles a search result handles a rendered page handles a video transcript — the agent stops caring where the text came from and starts caring what it says.
The side effect I did not expect was on my own workflow. Having a single command for read the web turned out to be useful for humans too, and a non-trivial amount of my own research now goes through it.
The right abstraction is the boring one.
The first drafts of Zurf tried to be clever — a unified query command that guessed whether the user wanted search, a page, or an answer. It was a bad idea dressed as a good one. Agents do better with verbs they can choose deliberately than with a single magic endpoint that sometimes does the wrong thing for reasons they cannot inspect.
The second lesson was about output. Returning markdown for browse felt
unambitious next to returning a structured DOM; in practice, markdown is
what the model wants, and the structured alternatives were a cost the
agent paid for no benefit. Meet the consumer where it lives.
Still adding providers. Still cutting flags.