Losing Control: When the Agent Goes Off-Plan
Losing Control: When the Agent Goes Off-Plan
You asked for a bug fix. You got a refactor.
You asked for one endpoint. The agent rewrote three.
You said "confirm before running the migration." It ran the migration.
Three modes — drift, hallucination, premature execution — all produce code that looks plausible, runs, and does the wrong thing.
Three modes
flowchart LR
Intent[User intent] --> Agent
Agent --> S1[Scope creep]
Agent --> S2[Hallucination]
Agent --> S3[Premature execution]
S1 & S2 & S3 --> Ship[Code that ships]
Ship -.->|surprise| Intent
Scope creep. Charitable interpretation of your prompt → "improvements" you didn't ask for. "Fix the timezone bug" becomes "fix it + refactor the date utility + rename three functions + extract a new package."
Hallucination. Invented APIs, packages, file paths. The dangerous variant is plausible — a function name that matches the naming convention so closely that a reviewer skims past it.
Premature execution. Tool calls that have side effects: migrations, force-pushes, destructive deletes, emails. By the time you notice, the artefact exists.
What drift feels like
Agent: "I've updated the auth middleware to use the new token format. I also took the opportunity to centralise error logging across the package, and I noticed the old
legacy_authfile was unused so I removed it."
User: "wait — what?"
Three steps from intent before you can pause.
Countermeasures
| Failure | Countermeasure | Where |
|---|---|---|
| Scope creep | Reviewed plan before code is written | Spec-Driven Workflow plan phase |
| Hallucination | Pre-fetched context; subagent can't grep outside the list | Subagents and Context Injection |
| Hallucinated facts | Quote-the-source rule | All phases |
| Silent gaps | Adversarial verify on the diff | Trust but Verify |
| Premature execution | Confirm-before-irreversible | Tool policy in system prompt |
The plan becomes the contract. Anything not in the plan is out of scope until the plan is revised.
Diagnosing drift after the fact
- Spec says it? If not, you didn't tell it — update spec, re-run from
plan. - Plan says it? If not, tighten task scope and the implementer's leash.
- Verify caught it? If no, sharpen the verify dimensions.
- Happened before? Add to
known-failures.md. 3+ → constitution (see The Compounding Layer).
Drift in implementation is almost always a symptom of a leakier phase upstream.
Untyped trust is the same as trust in a hallucination.