Applications now ship faster than their authors can read them.
A generation of builders is producing working software without ever auditing it — the stack is held together by confidence and a language model. The result, looked at from the outside, is a web full of sites with exposed source maps, absent security headers, open registration on private tooling, and localStorage tokens waiting for their first XSS.
I wanted a standing answer to a simple question: what does a vibe-coded application actually look like from the network? Not a manifesto. A pile of reports, all produced the same way, so that the pattern — if there was one — would be hard to argue with.
The goal was not to shame anyone. It was to find out, concretely, what the cost of skipping the boring parts had become.
Seven skills, a fleet of sub-agents, one weighted grade.
The toolkit is a set of seven Claude Code skills that dispatch sub-agents in parallel and reconcile their output into a single report. Everything is passive: no authentication, no exploitation, no writes — just DNS, TLS, HTTP responses, JavaScript bundles, and public APIs.
A full audit scores across six weighted categories on a hundred-point scale:
- Security (25) — HTTP headers, SSL/DNS, Mozilla Observatory, W3C.
- Application exposure (15) — secrets in client bundles, hardcoded API surface, auth flaws visible from the browser.
- SEO (20) — meta, structured data, crawlability, redirects.
- Performance (20) — PageSpeed Insights, Core Web Vitals, images.
- Accessibility (10) and Mobile (10).
Grades run A+ through F. Missing data — an exhausted PSI quota, a site that blocks one of the APIs — rescales the remaining categories rather than punishing the target for the tool's bad day.
Every run leaves behind a folder: the technical report.md, the raw
sub-agent JSON (so any finding is traceable to its source), an
optional plain-language report-client.md, and a WeasyPrint PDF for
the founders who would never read the first version.
Fifteen audits in, the pattern is legible.
Grades cluster in the C and D range. Three sites reached B; none reached A; none, to their credit, fell to F. The dominant shape is the same across almost every report: strong transport-layer security paired with weak application-layer hardening. The cloud providers are doing their job. The application is not doing its own.
The individual findings were not subtle.
- A property-management SaaS shipped a 10.2 MB bundle whose data model made SSNs, bank accounts and routing numbers structurally visible.
- A healthcare app mapped 250+ API endpoints and its full data structures to anyone with a browser, with auth tokens in localStorage and CORS reflecting arbitrary origins with credentials.
- A staffing platform left GraphQL introspection on in production
(2,850 types, 345 queries) with a Google OAuth
client_secrethardcoded in the client bundle beside it. - A B2B SaaS shipped two server-side API keys — a CRM and an auto-leasing provider — inside a 4.5 MB React bundle, with Supabase signup open and email verification bypassed.
- A consumer app exposed Supabase RPC functions that returned business metrics and customer PII to any unauthenticated caller.
- A production SaaS shipped source maps that reconstructed 55+ original TypeScript files — auth flows, server-action signatures, staging bucket names, a dev Auth0 tenant — from the deployed bundle.
Across the set, the same absences recur: no Content-Security-Policy (or
one undermined by unsafe-inline and unsafe-eval), email enumeration
on signup and password-reset endpoints, Swagger UI or GraphQL
introspection in production, OAuth and server-side keys shipped to the
browser, and multi-megabyte bundles that leak hundreds of internal
routes. None of it is exotic. All of it is avoidable by a human who
slows down for an afternoon.
The report is for the reader, not the writer.
The first drafts of the tool produced reports that were, in hindsight,
written for me — dense, technical, structured around the scanner's
architecture rather than the reader's next action. The sites whose
authors I most wanted to reach were founders, not security engineers.
Splitting the pipeline into a technical report.md and a translated
report-client.md with urgency badges did more for the project than
any new check I added.
The second lesson was about the shape of the agents. Early versions had sub-agents reading reference files at runtime and running in the background; both were false economies. Embedding the full reference content directly into each sub-agent's prompt made runs reproducible, and keeping them in the foreground meant permission prompts surfaced where I could answer them. Markdown stayed the source of truth; HTML and PDF are derived from it, never the other way around.
The third lesson was about the division of labour between scripts and
agents. The boring, repeatable things — headers, TLS, DNS records,
robots, sitemaps, image sizes — belong in bash. Running them as
scripts means the agent never burns tokens rediscovering that a site
has no Strict-Transport-Security. What the agents are good at is the
opposite: the case-specific dig. Noticing that a bundle's TypeScript
types describe a PII schema. Realising that a Supabase RPC returns
revenue without asking who you are. Reading a CSP closely enough to
see a staging bucket name leaking out of it. The most interesting
findings in every report came from a sub-agent chasing a thread no
script would have known to pull.
The fourth lesson was the most uncomfortable. I contacted most of the audited companies with the findings and an offer to help fix them. Most of the time I was ghosted. Occasionally I got a polite "thanks" — and then the same issues were still live a week later. The audit is cheap; the part that turns it into a fix is not technical at all. That is a separate problem, and probably a harder one.
Still scanning. Still adding targets. Still writing the findings down.