Barry Li | Climate Reporting & Assurance

Insights on climate reporting, carbon markets, and sustainability assurance.

Build ratios, silent failure modes, and what serious AI-assisted development actually costs


The dominant narrative around AI-assisted development concerns accessibility. Describe what you want; the model builds it. The barrier has lowered. Anyone can ship. That claim is accurate. It is also the smallest part of the story.

What that narrative omits — what the LinkedIn posts and YouTube tutorials quietly skip — is what comes after the code runs. For anyone serious about building something real with AI, that part is where most of the time goes.


The ratio nobody publishes

A concrete example illustrates the scale of the gap. Building a peer-to-peer communication protocol between two AI agent instances — handshake state machine, peer discovery, message routing, deduplication, reply correlation — took roughly 1.5 hours with AI assistance. Four commits. Done by midnight.

Debugging it took 13 hours. Nine fix commits. Three internal design documents produced mid-process. Multiple AI agents involved across the session.

Build to debug ratio: 1 to 9.

If you search for articles on AI-assisted development, you will find extensive coverage of the 1. Almost none of the 9.

This ratio is not an anomaly. It is structural. Building is fast because AI can generate plausible code at speed. Debugging is slow because debugging requires something AI does not yet reliably have: a whole-system understanding of what should be true, and why reality has diverged from it.

Why debugging with AI is harder than debugging alone

When something breaks in an AI-assisted codebase, you are not just finding a bug. You are finding a bug in code you did not fully write, in a system that may have accumulated subtle architectural assumptions you were not aware of.

The bugs that surface tend to fall into a specific pattern. The obvious ones — wrong outputs, crashes, failed tests — get caught quickly, often by the AI itself. The ones that survive are the ones that look correct. The system appears healthy. Logs are clean. Unit tests pass. And yet, in real-world conditions, something is wrong in a way that is quiet and compounding.

Some examples of what this looks like in practice:

A handshake succeeds with the wrong process — something occupying the expected port that is not the intended peer. The system accepts it as legitimate because it returned the right status code. Nothing fails explicitly.

A message loop forms between two AI agents on different machines. Each reply is treated as a new incoming message. The loop runs silently in the background, consuming resources, until forcibly terminated.

A message is delivered twice. Deduplication logic works correctly. The bug lives in the interaction between a synchronous HTTP call and the async event loop it shares — a self-call that blocks until timeout, after which a fallback path delivers a second copy. Every individual component behaves as designed. The failure is in the interaction.

The liveness state of a peer reverts to stale data seconds after a successful handshake. Fresh data is written correctly. A background merge process treats all fields equally and overwrites it with an older timestamp from a different source. No error. No warning. Silent regression.

These are not beginner mistakes. They are the category of problem that requires holding an entire distributed system in your head simultaneously and asking: what happens when this calls that, which depends on this, which is blocked by that? AI generates components efficiently. It does not naturally simulate their pathological interactions.

What the emotional experience actually is

There is something important to name here that technical writing usually avoids.

Building with AI feels good. It is fast, generative, and — for a period — exhilarating. You move from intention to working prototype in hours. If you are someone who could not read code previously, this feels like a genuine shift in what is possible for you.

Then debugging begins, and the emotional texture changes completely.

You will feel clever when you find the first bug. You will feel frustrated when the second one proves more elusive. By the fourth or fifth — the ones that require tracing state through multiple layers of concurrent logic — you will feel something closer to doubt. Have I built on a flawed foundation? Is there something fundamentally wrong that I am missing?

That doubt is useful. It is the part of the process that forces rigour. But it is also genuinely uncomfortable, and no one preparing you for AI-assisted development is preparing you for it.

The reason to persist through it — the only reason that actually works — is that the problems are comprehensible. Not easy. Not fast. But always, eventually, logical. Every bug has an explanation. Every explanation makes the system more legible than it was before. You come out knowing something real, not because you read about it, but because you had to find it.

The actual value AI provides

The dominant framing is efficiency. Do more in less time. Lower the barrier.

This gets it backwards.

AI does not make serious work faster. It makes more ambitious work possible. The ceiling rises. The scope of what one person can build — without a team, without years of prior training — expands significantly.

But the cost scales with the ambition. The higher you build, the more complex the failure modes, and the more demanding the verification work required to be confident in what you have.

If you are building something genuinely difficult, expect to spend nine times longer validating it than generating it. That is not a failure of your process. That is an accurate accounting of what serious work with AI actually costs.

The 1 is visible and exciting. The 9 is quiet, necessary, and where the real craft lives.

Posted in