Spec-Driven Workflow

Updated Jun 9, 2026 History

Spec-Driven Workflow

The pipeline that turns one ambiguous request into a chain of small, one-shottable steps.

The pipeline

flowchart TD
    A[Idea] --> B{Small?}
    B -- yes --> Q[/quickfix/]
    B -- no --> S[/specify/]
    S --> CL[/clarify/]
    CL --> P[/plan/]
    P --> R[Pre-Task Review]
    R --> T[/tasks/]
    T --> I[/implement/]
    I --> V[/verify/]
    V -- BLOCKER --> I
    V -- PASS --> RE[/retrospective/]
    RE --> EX[/extract-skills/]
    EX --> KF{3+ repeats?}
    KF -- yes --> CON[/constitution/]
    KF -- no --> END[Done]
    Q --> V

Each command is a separate invocation, its own model tier, its own focused context.

Phases

Phase	Model	Purpose
`quickfix`	sonnet	≤2 files, ≤20 lines. Bypass full pipeline.
`specify`	opus	Natural language → structured spec
`clarify`	opus	≤5 questions, answers folded back into spec
`plan`	opus	Architecture, sketches, data model, contracts; gated review
`tasks`	sonnet	Decompose plan into dependency-ordered tasks
`implement`	opus	Orchestrate; dispatch each task to its `@model` tier
`verify`	opus	Adversarial review across 5 dimensions
`retrospective`	sonnet	Post-mortem → constitution amendments + skill candidates
`extract-skills`	sonnet	Materialise skill YAMLs

Tier rationale: Model Routing: Haiku, Sonnet, Opus.

Phase-by-phase

Specify

Input: natural-language description.
Output: spec.md — one structured document of WHAT, never HOW.

Sections enforced by the template: User Scenarios (priority-ordered, independently testable), Functional Requirements (FR-001, FR-002…), Success Criteria (measurable, technology-agnostic), Edge Cases, Assumptions, optional Key Entities.

Hard rules:

No tech-stack references. No file paths. No framework names. The spec must read as something a business stakeholder could approve.
Up to 3 [NEEDS CLARIFICATION: …] markers for genuine ambiguities. Anything else gets a reasonable default and is logged under Assumptions.
Every requirement must be testable. "User-friendly" is rejected. "95% of users complete checkout in under 3 minutes" is accepted.

Example FR line:

- **FR-005**: System MUST hard-delete artifact rows when their source file
  is removed from disk, including the corresponding full-text-search entry.

Done when: every user story has acceptance scenarios, every FR is verifiable in isolation, success criteria are numbers (or qualitative-but-falsifiable statements).

Clarify

Input: spec.md.
Output: updated spec.md with a ## Clarifications section dated by session.

Up to 5 questions. One at a time. Each presents a recommended answer with reasoning, plus 2–3 alternatives in a table, plus a free-form "Short" option. The user picks; the spec is updated in place immediately (FR text changed, assumptions added, edge cases extended).

Example interaction:

Q3 of ≤5 — Auth posture for AE read routes

Recommended: Option B — gate AE read routes behind WIKI_TOKEN.
AE content is review material for in-flight features; defaulting to
public on a production domain leaks roadmap.

| Option | Description |
|--------|-------------|
| A      | Public read (same as existing wiki articles) |
| B      | Token-gated using existing WIKI_TOKEN |
| C      | AE mode auto-disabled unless server is localhost |

The five-question cap is real. The clarifier ranks candidates by impact × uncertainty and asks only the top of the queue. Low-impact questions are silently dropped — better to make a reasonable default than burn a question slot.

Done when: zero [NEEDS CLARIFICATION] markers remain, the checklist file shows all items checked.

Plan

Input: spec.md plus the project constitution.
Outputs (every run):

File	Purpose
`plan.md`	Tech context, structure decision, constitution check, complexity-tracking table for any rule violations
`research.md`	Numbered decisions with rationale + alternatives considered
`data-model.md`	Entities, fields, validation, state transitions, schema additions
`contracts/*.md`	HTTP routes, storage interfaces, CLI grammars — whatever the project exposes
`quickstart.md`	Runnable end-to-end validation scenarios
`preview.md`	Architecture mermaid + 2–4 code sketches with real signatures, real types, real SQL — not pseudocode

The user reviews preview.md and approves before tasks generation. This is the cheapest place to catch wrong-design errors — at the sketch, not at the diff.

Example research entry:

## 4. Storage namespacing

**Decision**: New table `ae_artifacts` distinct from `articles`. Slug stays
globally non-colliding by namespacing at the HTTP route boundary.

**Rationale**: Spec FR-013 requires AE artifacts can't collide with wiki
articles. Cleanest enforcement is not putting them in the same table.
Hard-delete-on-disappear becomes a single `DELETE WHERE feature_slug=?
AND rel_path=?`.

**Alternatives considered**:
- Shared table + origin enum: rejected — every existing query needs an
  origin filter, raising leak risk.
- Separate SQLite file: rejected — operational surface doubles.

Constitution check happens twice: before research starts, again after design is done. Violations require either a fix or a documented justification in Complexity Tracking.

Done when: user approves preview.md. Until then, no tasks are generated.

Tasks

Input: everything from plan.
Output: tasks.md — dependency-ordered, traceable to user stories.

Each task line in exactly one shape:

- [ ] T012 [P] [deps: T003,T007] [US1] @sonnet [ctx: src/foo.ts:10-40, src/types.ts:1-30] Implement Foo.Bar in src/foo.ts

T012 — sequential ID, also the natural dependency-graph node name.
[P] — present only if parallelisable (different files, no incomplete deps). Absent = sequential.
[deps: T003,T007] — explicit deps. A task only starts when all its deps are [X].
[US1] — links the task back to a user story for traceability (omitted for Setup/Foundational/Polish).
@sonnet — suggested model tier (see Model Routing: Haiku, Sonnet, Opus).
[ctx: file:Lstart-Lend, ...] — pre-fetched line ranges. The orchestrator inlines these into the subagent's prompt under ## Pre-Fetched Context (see Subagents and Context Injection).

Phase grouping:

Phase 1 — Setup            (project init, deps, scaffolding)
Phase 2 — Foundational     (shared models, schema, blocking work)
Phase 3+ — User Stories    (one phase per story, P1 first)
Final  — Polish            (cross-cutting concerns, docs, performance)

Concurrent tasks live under ### Wave N (parallel) headers. The orchestrator commits between waves so the next wave reads fresh state.

### Wave 1 (parallel, after T002)
- [ ] T007 [P] @sonnet  Implement Importer types in internal/ae/ae.go
- [ ] T010 [P] [deps: T003,T004,T005] @opus  Implement Storage AE methods

### Wave 2 (sequential, deps: T007,T010)
- [ ] T011 @opus  Wire Importer.ScanFeature

Done when: every user story has all the tasks needed to satisfy its acceptance scenarios, deps form a valid DAG, every non-trivial task has a [ctx: ...] annotation.

Implement

Input: tasks.md.
Output: code, marked-off tasks, commits between waves.

The orchestrator walks tasks.md in dependency order. For each task:

Parse the [ctx: ...] annotation.
Read each file/range from disk.
Build the subagent prompt: task description + ## Pre-Fetched Context block with file contents inlined + ## Allowed Operations list (paths the subagent may touch).
Dispatch at the suggested @model tier.
Subagent executes. Returns either a diff (success) or CONTEXT_INSUFFICIENT: <reason> (abort).
On success: mark [X]. On abort: widen [ctx] once, retry. Second abort → escalate to the user.
At the end of each ### Wave N group: commit.

The subagent is instructed: do not grep, glob, or read files outside the listed paths. The leash is the whole point — without it the subagent re-explores the codebase and burns 30–70% of its context on lookups the orchestrator already did.

Example dispatched prompt (abridged):

## Task
T010 [@opus] Implement all 10 AE methods in internal/storage/ae.go

## Pre-Fetched Context
### internal/storage/sqlite.go:97-168
(actual 70 lines inlined here)

### internal/storage/storage.go:1-22
(actual 22 lines inlined here)

## Allowed Operations
Read/Edit/Write only on:
- internal/storage/ae.go

If you need anything else, ABORT with CONTEXT_INSUFFICIENT.

Done when: every task is [X], every wave has been committed, the diff matches the plan.

Verify

Input: spec.md + plan.md + tasks.md + the git diff produced by implement.
Output: a structured review with a verdict.

Five dimensions, in order of typical catch-rate:

Spec divergence — for each FR and acceptance scenario, is it actually implemented? Most common BLOCKER source.
Missing tasks — any task marked [X] whose diff content can't be found?
Security — auth, injection, secrets, permission scopes, sensitive logging.
Correctness — off-by-one, races, edge cases the spec called out but the code skipped, swallowed errors.
Contract breaks — schema, struct, signature, response-shape changes that would break callers.

Findings tagged:

Severity	Effect
`BLOCKER`	Must fix. Verdict = FAIL. Re-enter implement with the finding as the new task.
`WARN`	Real but non-blocking. Verdict still PASS if no BLOCKERs. Tracked.
`NOTE`	Observation. Informational.

The reviewer's prompt explicitly tells it to look for what's missing, not validate what's present. Same model + same diff with a different framing produces a measurably different catch rate. Detail in Trust but Verify.

Example finding:

### Spec divergence — FR-005 (hard delete on file removal)
**[BLOCKER]** Spec FR-005 requires hard-delete of artifact rows when a
feature folder disappears. The diff implements file-level deletion in
ScanFeature, but ScanAll never calls DeleteAEFeature for slugs that have
vanished. Quote: importer.go:50-60.

Done when: BLOCKER count is zero. Anything less re-enters implement.

Retrospective + Extract Skills

Input: every artefact produced this run, plus the verify report.
Outputs: a post-mortem with two structured blocks parsed by downstream commands.

CONSTITUTION_AMENDMENTS — rules that, if adopted, would prevent the failures seen this run:

CONSTITUTION_AMENDMENTS:
  - id: cache-stateless-builders
    rationale: |
      Chroma formatter was constructed per HTTP request, costing ~3ms each.
    rule: "Stateless objects with non-trivial construction cost should be
            cached at handler init, not constructed per request."
    severity: warn

These don't go straight into the constitution — they queue in known-failures.md with a count. Same ID seen 3+ times across runs → promotion happens automatically.

SKILLS_TO_EXTRACT — reusable patterns:

SKILLS_TO_EXTRACT:
  - name: sqlite-fts5-pattern
    rationale: Canonical FTS5 setup with content/content_rowid + the three
      triggers, used twice now (articles_fts, ae_artifacts_fts).
    template_path: skills/extracted/sqlite-fts5-pattern.md

/extract-skills materialises each entry into a YAML under skills/extracted/, with trigger, applies-when preconditions, template, and references. Future runs auto-load relevant skills at the start of /specify.

Without this phase, every run is independent. With it, marginal cost trends down and marginal quality trends up. Detail in The Compounding Layer.

Done when: both YAML blocks parse cleanly and known-failures.md is updated.

Fast-track: `quickfix`

Hard preconditions, enforced before any edit:

≤ 2 files
≤ 20 lines
No new files / deps / public-API changes
Commit type in {fix, docs, style, chore, typo, refactor}
No protected paths (configurable)

Trip any of these → escalate to full pipeline. Ceremony for a typo is worse than no process.

Why waterfall

Each phase produces an artefact that gates the next. Unfashionable, intentional. The whole structure exists to defeat The One-Shot Problem: small steps to keep each agent invocation one-shottable, written checkpoints between them.

Trade made explicit: heavy planning and review up front, almost mechanical implementation after. The thing being avoided is the ten-minute agent run that returns the wrong feature. Every hard decision is made and recorded before the implementer starts writing.

The alternative — "give the agent the repo, tell it to do the next thing" — is the one-shot trap with extra steps.

Spec-Driven Workflow

The pipeline

Phases

Phase-by-phase

Specify

Clarify

Plan

Tasks

Implement

Verify

Retrospective + Extract Skills

Fast-track: quickfix

Why waterfall

Fast-track: `quickfix`