Model Routing: Haiku, Sonnet, Opus

Model Routing: Haiku, Sonnet, Opus

A feature run can spawn 50 agent invocations. All Opus = 20× cost, 5× latency, no marginal benefit on the easy ones. All Haiku = silent failures on the hard ones. Route each task to the smallest model that's correct.

Tiers

flowchart LR
    Task --> R{What kind?}
    R -->|lookup, grep, read| H[Haiku]
    R -->|docs, tests, bulk edit, straight impl| S[Sonnet]
    R -->|design, complex impl, review| O[Opus]
Tier Best for Worst at
Haiku lookups, grep, mechanical reads, listings judgement, design
Sonnet well-specified impl, tests, docs, bulk edits open-ended design
Opus spec writing, plan/design, complex impl, adversarial review throughput, cost

Use the next tier up when in doubt.

Where it's encoded

Per phase — frontmatter on each command (model: opus). The harness reads it and routes the invocation.

Per task@haiku / @sonnet / @opus tag on each line in tasks.md. /implement dispatches subagents at the suggested tier.

Worked example

Feature: add /health endpoint returning {status, deps[]}.

Task Tier Why
Locate router setup @haiku Pure lookup
Find dependency-check helpers @haiku grep-style
Design the response shape @opus Design decision; affects callers
Write the model struct @sonnet Mechanical given the design
Implement the handler @sonnet Straight impl
Wire the route @sonnet Mechanical
Write the integration test @sonnet Behaviour fully specified
Adversarial review @opus Judgement-heavy

Two Opus invocations. Six Sonnet/Haiku. ~15% of tokens at Opus tier. ~4× faster wall-clock than all-Opus, same correctness.

Heuristics

Output shape. Listing/extraction → Haiku. Specified function body → Sonnet. New design or cross-file → Opus.

Judgement required. None → Haiku. Bounded by the spec → Sonnet. About the spec, or spanning files → Opus.

Anti-patterns

  • All Opus: cheap to do, expensive to run. Wall-clock balloons on trivial subtasks.
  • All Haiku: cheap and wrong. Failures look like The One-Shot Problem — surface passes, semantics off.
  • Trust the tag blindly: tags are hypotheses. Flaky @sonnet → bump to @opus. Trivial @opus → demote.

Use the smallest model that's correct, not the biggest model you can afford.