Project № II · Case study

Website Reports — an auditor for the vibe-coded web

A passive scanner that reads a public site the way an attacker would, and writes the findings down in language a founder can actually act on. Built to learn what goes wrong when applications are shipped faster than they are understood.

I.Problem

Applications now ship faster than their authors can read them.

A generation of builders is producing working software without ever auditing it — the stack is held together by confidence and a language model. The result, looked at from the outside, is a web full of sites with exposed source maps, absent security headers, open registration on private tooling, and localStorage tokens waiting for their first XSS.

I wanted a standing answer to a simple question: what does a vibe-coded application actually look like from the network? Not a manifesto. A pile of reports, all produced the same way, so that the pattern — if there was one — would be hard to argue with.

The goal was not to shame anyone. It was to find out, concretely, what the cost of skipping the boring parts had become.

II.Approach

Seven skills, a fleet of sub-agents, one weighted grade.

The toolkit is a set of seven Claude Code skills that dispatch sub-agents in parallel and reconcile their output into a single report. Everything is passive: no authentication, no exploitation, no writes — just DNS, TLS, HTTP responses, JavaScript bundles, and public APIs.

A full audit scores across six weighted categories on a hundred-point scale:

  • Security (25) — HTTP headers, SSL/DNS, Mozilla Observatory, W3C.
  • Application exposure (15) — secrets in client bundles, hardcoded API surface, auth flaws visible from the browser.
  • SEO (20) — meta, structured data, crawlability, redirects.
  • Performance (20) — PageSpeed Insights, Core Web Vitals, images.
  • Accessibility (10) and Mobile (10).

Grades run A+ through F. Missing data — an exhausted PSI quota, a site that blocks one of the APIs — rescales the remaining categories rather than punishing the target for the tool's bad day.

Every run leaves behind a folder: the technical report.md, the raw sub-agent JSON (so any finding is traceable to its source), an optional plain-language report-client.md, and a WeasyPrint PDF for the founders who would never read the first version.

III.Outcome

Fifteen audits in, the pattern is legible.

15
Production sites audited
3 / 8 / 4
B / C / D grades · no A, no F
10.2 MB
Largest bundle leaking a PII schema

Grades cluster in the C and D range. Three sites reached B; none reached A; none, to their credit, fell to F. The dominant shape is the same across almost every report: strong transport-layer security paired with weak application-layer hardening. The cloud providers are doing their job. The application is not doing its own.

The individual findings were not subtle.

  • A property-management SaaS shipped a 10.2 MB bundle whose data model made SSNs, bank accounts and routing numbers structurally visible.
  • A healthcare app mapped 250+ API endpoints and its full data structures to anyone with a browser, with auth tokens in localStorage and CORS reflecting arbitrary origins with credentials.
  • A staffing platform left GraphQL introspection on in production (2,850 types, 345 queries) with a Google OAuth client_secret hardcoded in the client bundle beside it.
  • A B2B SaaS shipped two server-side API keys — a CRM and an auto-leasing provider — inside a 4.5 MB React bundle, with Supabase signup open and email verification bypassed.
  • A consumer app exposed Supabase RPC functions that returned business metrics and customer PII to any unauthenticated caller.
  • A production SaaS shipped source maps that reconstructed 55+ original TypeScript files — auth flows, server-action signatures, staging bucket names, a dev Auth0 tenant — from the deployed bundle.

Across the set, the same absences recur: no Content-Security-Policy (or one undermined by unsafe-inline and unsafe-eval), email enumeration on signup and password-reset endpoints, Swagger UI or GraphQL introspection in production, OAuth and server-side keys shipped to the browser, and multi-megabyte bundles that leak hundreds of internal routes. None of it is exotic. All of it is avoidable by a human who slows down for an afternoon.

IV.Retrospective

The report is for the reader, not the writer.

The first drafts of the tool produced reports that were, in hindsight, written for me — dense, technical, structured around the scanner's architecture rather than the reader's next action. The sites whose authors I most wanted to reach were founders, not security engineers. Splitting the pipeline into a technical report.md and a translated report-client.md with urgency badges did more for the project than any new check I added.

The second lesson was about the shape of the agents. Early versions had sub-agents reading reference files at runtime and running in the background; both were false economies. Embedding the full reference content directly into each sub-agent's prompt made runs reproducible, and keeping them in the foreground meant permission prompts surfaced where I could answer them. Markdown stayed the source of truth; HTML and PDF are derived from it, never the other way around.

The third lesson was about the division of labour between scripts and agents. The boring, repeatable things — headers, TLS, DNS records, robots, sitemaps, image sizes — belong in bash. Running them as scripts means the agent never burns tokens rediscovering that a site has no Strict-Transport-Security. What the agents are good at is the opposite: the case-specific dig. Noticing that a bundle's TypeScript types describe a PII schema. Realising that a Supabase RPC returns revenue without asking who you are. Reading a CSP closely enough to see a staging bucket name leaking out of it. The most interesting findings in every report came from a sub-agent chasing a thread no script would have known to pull.

The fourth lesson was the most uncomfortable. I contacted most of the audited companies with the findings and an offer to help fix them. Most of the time I was ghosted. Occasionally I got a polite "thanks" — and then the same issues were still live a week later. The audit is cheap; the part that turns it into a fix is not technical at all. That is a separate problem, and probably a harder one.

Still scanning. Still adding targets. Still writing the findings down.