Model Routing: Haiku, Sonnet, Opus

Updated Jun 9, 2026 History

Model Routing: Haiku, Sonnet, Opus

A feature run can spawn 50 agent invocations. All Opus = 20× cost, 5× latency, no marginal benefit on the easy ones. All Haiku = silent failures on the hard ones. Route each task to the smallest model that's correct.

Tiers

flowchart LR
    Task --> R{What kind?}
    R -->|lookup, grep, read| H[Haiku]
    R -->|docs, tests, bulk edit, straight impl| S[Sonnet]
    R -->|design, complex impl, review| O[Opus]

Tier	Best for	Worst at
Haiku	lookups, grep, mechanical reads, listings	judgement, design
Sonnet	well-specified impl, tests, docs, bulk edits	open-ended design
Opus	spec writing, plan/design, complex impl, adversarial review	throughput, cost

Use the next tier up when in doubt.

Where it's encoded

Per phase — frontmatter on each command (model: opus). The harness reads it and routes the invocation.

Per task — @haiku / @sonnet / @opus tag on each line in tasks.md. /implement dispatches subagents at the suggested tier.

Worked example

Feature: add /health endpoint returning {status, deps[]}.

Task	Tier	Why
Locate router setup	@haiku	Pure lookup
Find dependency-check helpers	@haiku	grep-style
Design the response shape	@opus	Design decision; affects callers
Write the model struct	@sonnet	Mechanical given the design
Implement the handler	@sonnet	Straight impl
Wire the route	@sonnet	Mechanical
Write the integration test	@sonnet	Behaviour fully specified
Adversarial review	@opus	Judgement-heavy

Two Opus invocations. Six Sonnet/Haiku. ~15% of tokens at Opus tier. ~4× faster wall-clock than all-Opus, same correctness.

Heuristics

Output shape. Listing/extraction → Haiku. Specified function body → Sonnet. New design or cross-file → Opus.

Judgement required. None → Haiku. Bounded by the spec → Sonnet. About the spec, or spanning files → Opus.

Anti-patterns

All Opus: cheap to do, expensive to run. Wall-clock balloons on trivial subtasks.
All Haiku: cheap and wrong. Failures look like The One-Shot Problem — surface passes, semantics off.
Trust the tag blindly: tags are hypotheses. Flaky @sonnet → bump to @opus. Trivial @opus → demote.

Use the smallest model that's correct, not the biggest model you can afford.