The One-Shot Problem
The One-Shot Problem
A task too big to fit in one prompt-response cycle, attempted in one anyway.
The agent produces something. It compiles. Surface tests pass. But pieces silently disappear: a requirement from paragraph three is missing, a constraint is ignored, an error path is absent.
The agent didn't lie. It ran out of context, attention, or tool-call budget — and shipped what fit.
Why it happens
flowchart LR
Big[Big request] --> CW[Context pressure]
Big --> AD[Attention dilution]
Big --> TB[Tool-call budget]
CW & AD & TB --> Out[Plausible but incomplete]
- Context pressure: spec, code, conventions, tests all compete for the same tokens. Past a threshold things disappear, silently.
- Attention dilution: earlier instructions get less weight as later ones pile on.
- Tool budget: every grep/read/edit is a round trip. Hit the ceiling, the agent shortcuts the rest.
The classic symptom
PR reads complete. Diff covers the expected files. Read closely:
- An enum case isn't handled.
- A retry path is referenced in a comment but never implemented.
- The happy-path test exists; four error paths from the spec don't.
- A dependency is imported but never installed.
- Code labelled "idempotent" isn't.
Failure is quiet.
What doesn't work
"Try harder." "Use a bigger model." "Add more tests." "Read carefully."
Bigger models push the ceiling out — there's still a ceiling, and cost grows faster than the ceiling does. The other three are human-side rework.
What works
Break the work into pieces small enough that each one is actually one-shottable.
flowchart LR
Big[Big request] --> S1[specify]
S1 --> S2[clarify]
S2 --> S3[plan]
S3 --> S4[tasks]
S4 --> S5[implement]
S5 --> S6[verify]
Each box is a separate invocation. Each has a tight prompt and fresh context. The agent that writes the spec is not the agent that writes the code, and the agent that writes the code touches one task at a time. This is the whole basis of the Spec-Driven Workflow.
Logical "runs"
A run is one coherent batch — one feature branch, one PR, one design decision. Each run has a strict envelope: spec, plan, tasks, verify gate.
- Task = one-shottable.
- Run = graph of tasks.
- A run that grows beyond its envelope is the smell. Split it.
Adding "while you're at it" tasks mid-conversation is one-shot territory. Stop, scope, split.
Signs you're about to ship a one-shot
- Prompt grows mid-conversation ("...and also...").
- Diff touches more than ~3 logical components.
- Plan covers the whole feature in a single sketch.
- Task list has < ~5 items for non-trivial work.
- Implement ends in one big commit.
Recovering after the fact
Run verify against the diff with the original spec. Don't ask the same agent to "patch the gaps" — that produces a second one-shot on top of the first. Re-enter at tasks. Add the missing items as proper tasks with proper context.
Small enough to one-shot, structured enough to verify.