Model Routing: Haiku, Sonnet, Opus
Model Routing: Haiku, Sonnet, Opus
A feature run can spawn 50 agent invocations. All Opus = 20× cost, 5× latency, no marginal benefit on the easy ones. All Haiku = silent failures on the hard ones. Route each task to the smallest model that's correct.
Tiers
flowchart LR
Task --> R{What kind?}
R -->|lookup, grep, read| H[Haiku]
R -->|docs, tests, bulk edit, straight impl| S[Sonnet]
R -->|design, complex impl, review| O[Opus]
| Tier | Best for | Worst at |
|---|---|---|
| Haiku | lookups, grep, mechanical reads, listings | judgement, design |
| Sonnet | well-specified impl, tests, docs, bulk edits | open-ended design |
| Opus | spec writing, plan/design, complex impl, adversarial review | throughput, cost |
Use the next tier up when in doubt.
Where it's encoded
Per phase — frontmatter on each command (model: opus). The harness reads it and routes the invocation.
Per task — @haiku / @sonnet / @opus tag on each line in tasks.md. /implement dispatches subagents at the suggested tier.
Worked example
Feature: add /health endpoint returning {status, deps[]}.
| Task | Tier | Why |
|---|---|---|
| Locate router setup | @haiku | Pure lookup |
| Find dependency-check helpers | @haiku | grep-style |
| Design the response shape | @opus | Design decision; affects callers |
| Write the model struct | @sonnet | Mechanical given the design |
| Implement the handler | @sonnet | Straight impl |
| Wire the route | @sonnet | Mechanical |
| Write the integration test | @sonnet | Behaviour fully specified |
| Adversarial review | @opus | Judgement-heavy |
Two Opus invocations. Six Sonnet/Haiku. ~15% of tokens at Opus tier. ~4× faster wall-clock than all-Opus, same correctness.
Heuristics
Output shape. Listing/extraction → Haiku. Specified function body → Sonnet. New design or cross-file → Opus.
Judgement required. None → Haiku. Bounded by the spec → Sonnet. About the spec, or spanning files → Opus.
Anti-patterns
- All Opus: cheap to do, expensive to run. Wall-clock balloons on trivial subtasks.
- All Haiku: cheap and wrong. Failures look like The One-Shot Problem — surface passes, semantics off.
- Trust the tag blindly: tags are hypotheses. Flaky @sonnet → bump to @opus. Trivial @opus → demote.
Use the smallest model that's correct, not the biggest model you can afford.