The One-Shot Problem

Updated Jun 9, 2026 History

The One-Shot Problem

A task too big to fit in one prompt-response cycle, attempted in one anyway.

The agent produces something. It compiles. Surface tests pass. But pieces silently disappear: a requirement from paragraph three is missing, a constraint is ignored, an error path is absent.

The agent didn't lie. It ran out of context, attention, or tool-call budget — and shipped what fit.

Why it happens

flowchart LR
    Big[Big request] --> CW[Context pressure]
    Big --> AD[Attention dilution]
    Big --> TB[Tool-call budget]
    CW & AD & TB --> Out[Plausible but incomplete]

Context pressure: spec, code, conventions, tests all compete for the same tokens. Past a threshold things disappear, silently.
Attention dilution: earlier instructions get less weight as later ones pile on.
Tool budget: every grep/read/edit is a round trip. Hit the ceiling, the agent shortcuts the rest.

The classic symptom

PR reads complete. Diff covers the expected files. Read closely:

An enum case isn't handled.
A retry path is referenced in a comment but never implemented.
The happy-path test exists; four error paths from the spec don't.
A dependency is imported but never installed.
Code labelled "idempotent" isn't.

Failure is quiet.

What doesn't work

"Try harder." "Use a bigger model." "Add more tests." "Read carefully."

Bigger models push the ceiling out — there's still a ceiling, and cost grows faster than the ceiling does. The other three are human-side rework.

What works

Break the work into pieces small enough that each one is actually one-shottable.

flowchart LR
    Big[Big request] --> S1[specify]
    S1 --> S2[clarify]
    S2 --> S3[plan]
    S3 --> S4[tasks]
    S4 --> S5[implement]
    S5 --> S6[verify]

Each box is a separate invocation. Each has a tight prompt and fresh context. The agent that writes the spec is not the agent that writes the code, and the agent that writes the code touches one task at a time. This is the whole basis of the Spec-Driven Workflow.

Logical "runs"

A run is one coherent batch — one feature branch, one PR, one design decision. Each run has a strict envelope: spec, plan, tasks, verify gate.

Task = one-shottable.
Run = graph of tasks.
A run that grows beyond its envelope is the smell. Split it.

Adding "while you're at it" tasks mid-conversation is one-shot territory. Stop, scope, split.

Signs you're about to ship a one-shot

Prompt grows mid-conversation ("...and also...").
Diff touches more than ~3 logical components.
Plan covers the whole feature in a single sketch.
Task list has < ~5 items for non-trivial work.
Implement ends in one big commit.

Recovering after the fact

Run verify against the diff with the original spec. Don't ask the same agent to "patch the gaps" — that produces a second one-shot on top of the first. Re-enter at tasks. Add the missing items as proper tasks with proper context.

Small enough to one-shot, structured enough to verify.