Losing Control: When the Agent Goes Off-Plan

Losing Control: When the Agent Goes Off-Plan

You asked for a bug fix. You got a refactor.
You asked for one endpoint. The agent rewrote three.
You said "confirm before running the migration." It ran the migration.

Three modes — drift, hallucination, premature execution — all produce code that looks plausible, runs, and does the wrong thing.

Three modes

flowchart LR
    Intent[User intent] --> Agent
    Agent --> S1[Scope creep]
    Agent --> S2[Hallucination]
    Agent --> S3[Premature execution]
    S1 & S2 & S3 --> Ship[Code that ships]
    Ship -.->|surprise| Intent

Scope creep. Charitable interpretation of your prompt → "improvements" you didn't ask for. "Fix the timezone bug" becomes "fix it + refactor the date utility + rename three functions + extract a new package."

Hallucination. Invented APIs, packages, file paths. The dangerous variant is plausible — a function name that matches the naming convention so closely that a reviewer skims past it.

Premature execution. Tool calls that have side effects: migrations, force-pushes, destructive deletes, emails. By the time you notice, the artefact exists.

What drift feels like

Agent: "I've updated the auth middleware to use the new token format. I also took the opportunity to centralise error logging across the package, and I noticed the old legacy_auth file was unused so I removed it."
User: "wait — what?"

Three steps from intent before you can pause.

Countermeasures

Failure Countermeasure Where
Scope creep Reviewed plan before code is written Spec-Driven Workflow plan phase
Hallucination Pre-fetched context; subagent can't grep outside the list Subagents and Context Injection
Hallucinated facts Quote-the-source rule All phases
Silent gaps Adversarial verify on the diff Trust but Verify
Premature execution Confirm-before-irreversible Tool policy in system prompt

The plan becomes the contract. Anything not in the plan is out of scope until the plan is revised.

Diagnosing drift after the fact

  1. Spec says it? If not, you didn't tell it — update spec, re-run from plan.
  2. Plan says it? If not, tighten task scope and the implementer's leash.
  3. Verify caught it? If no, sharpen the verify dimensions.
  4. Happened before? Add to known-failures.md. 3+ → constitution (see The Compounding Layer).

Drift in implementation is almost always a symptom of a leakier phase upstream.

Untyped trust is the same as trust in a hallucination.