
Shipping Claude Code for real work
A field report from running Claude Code on production codebases — the patterns that scale, the failure modes that look like success, and the rituals I keep.
After six months of running Claude Code as a daily driver across three production codebases, I stopped reaching for autocomplete-style copilots. The difference isn't the model. It's how you work with it.
Here is the field report: the prompt patterns that scale, the failure mode that looks like success, and the small infrastructure investments that turn a clever assistant into an actual teammate.
Treat it like a contractor, not autocomplete
Autocomplete finishes your line. An agent does a task. The mental shift is to stop thinking in keystrokes and start thinking in work orders: "add a rate limiter to these three routes, match the existing error shape, and show me the diff." The closer your request looks to a ticket you would hand a competent contractor, the better the result.
The prompt patterns that actually scale
Three habits did most of the work:
- Point at examples. "Match the pattern in this file" beats any amount of description.
- Constrain the surface. Tell it which files to touch and which to leave alone. Unbounded tasks produce unbounded diffs.
- Ask for the plan first on anything non-trivial. A 30-second plan review catches the wrong approach before it writes 300 lines.
The failure mode that looks like success
The dangerous output isn't the obviously broken one. It's the confident, plausible diff that passes a glance and quietly does the wrong thing: a test that asserts nothing, an error path swallowed, a security check skipped because it "wasn't in scope." This is the same lesson as building agents that don't hallucinate: the model is most dangerous when it is wrong and smooth about it. The fix is simple and non-negotiable — never merge what you didn't read.
Review is the new bottleneck
When the agent writes faster than you can read, your review becomes the constraint. I leaned into it: smaller diffs, clearer commits, and a habit of asking "what would make this wrong?" before merging. The teams that get burned by AI coding tools are the ones that let velocity outrun review.
The infrastructure that turns it into a teammate
A few cheap investments compounded:
- A tight project doc stating conventions, so I'm not re-explaining the stack every session.
- Fast tests and a type-checker it can run itself, so it gets a feedback loop instead of guessing.
- Small, reviewable commits, so when something is wrong I can see exactly where.
None of this is exotic. It is the same hygiene that makes human teams fast — the agent just rewards it more obviously.
Where I don't use it
I still write the hard parts myself: the security-sensitive code, the gnarly state machine, the thing where being 95% right is worse than not shipping. The agent is fast on the well-defined 80%. The remaining 20% is where the judgment lives, and that judgment is exactly what clients pay an AI architect for.
Claude Code didn't replace engineering judgment. It moved where I spend it: less typing, more reviewing and deciding. If you want this kind of velocity wired into your own stack, the AI Lab shows what I've shipped, and you can book a call to talk about it.

I ship production AI for startups and teams — agents, RAG, automations — on a decade of design & Webflow craft.
About me →Keep going.

AI agency vs. in-house vs. fractional: how to staff your AI work
The real trade-offs between hiring an AI agency, building an in-house team, and bringing in a fractional AI lead — and which fits your stage.

How to add AI to your SaaS (without a rebuild)
A practical sequence for shipping your first real AI feature into an existing product — what to build first, what to skip, and how not to break what already works.

What does an AI consultant cost in 2026?
Real 2026 pricing for AI audits, builds, retainers, and fractional leads — what drives the number, and how to avoid overpaying.
