The One-Shot Problem

The One-Shot Problem

A task too big to fit in one prompt-response cycle, attempted in one anyway.

The agent produces something. It compiles. Surface tests pass. But pieces silently disappear: a requirement from paragraph three is missing, a constraint is ignored, an error path is absent.

The agent didn't lie. It ran out of context, attention, or tool-call budget — and shipped what fit.

Why it happens

flowchart LR
    Big[Big request] --> CW[Context pressure]
    Big --> AD[Attention dilution]
    Big --> TB[Tool-call budget]
    CW & AD & TB --> Out[Plausible but incomplete]
  • Context pressure: spec, code, conventions, tests all compete for the same tokens. Past a threshold things disappear, silently.
  • Attention dilution: earlier instructions get less weight as later ones pile on.
  • Tool budget: every grep/read/edit is a round trip. Hit the ceiling, the agent shortcuts the rest.

The classic symptom

PR reads complete. Diff covers the expected files. Read closely:

  • An enum case isn't handled.
  • A retry path is referenced in a comment but never implemented.
  • The happy-path test exists; four error paths from the spec don't.
  • A dependency is imported but never installed.
  • Code labelled "idempotent" isn't.

Failure is quiet.

What doesn't work

"Try harder." "Use a bigger model." "Add more tests." "Read carefully."

Bigger models push the ceiling out — there's still a ceiling, and cost grows faster than the ceiling does. The other three are human-side rework.

What works

Break the work into pieces small enough that each one is actually one-shottable.

flowchart LR
    Big[Big request] --> S1[specify]
    S1 --> S2[clarify]
    S2 --> S3[plan]
    S3 --> S4[tasks]
    S4 --> S5[implement]
    S5 --> S6[verify]

Each box is a separate invocation. Each has a tight prompt and fresh context. The agent that writes the spec is not the agent that writes the code, and the agent that writes the code touches one task at a time. This is the whole basis of the Spec-Driven Workflow.

Logical "runs"

A run is one coherent batch — one feature branch, one PR, one design decision. Each run has a strict envelope: spec, plan, tasks, verify gate.

  • Task = one-shottable.
  • Run = graph of tasks.
  • A run that grows beyond its envelope is the smell. Split it.

Adding "while you're at it" tasks mid-conversation is one-shot territory. Stop, scope, split.

Signs you're about to ship a one-shot

  • Prompt grows mid-conversation ("...and also...").
  • Diff touches more than ~3 logical components.
  • Plan covers the whole feature in a single sketch.
  • Task list has < ~5 items for non-trivial work.
  • Implement ends in one big commit.

Recovering after the fact

Run verify against the diff with the original spec. Don't ask the same agent to "patch the gaps" — that produces a second one-shot on top of the first. Re-enter at tasks. Add the missing items as proper tasks with proper context.

Small enough to one-shot, structured enough to verify.