* feat: harness engineering improvements — post-merge test gate, shared file isolation, behavioral verification Three improvements inspired by Anthropic's harness engineering research (March 2026) and real-world pain points from parallel worktree execution: 1. Post-merge test gate (execute-phase.md) - Run project test suite after merging each wave's worktrees - Catches cross-plan integration failures that individual Self-Checks miss - Addresses the Generator self-evaluation blind spot (agents praise own work) 2. Shared file isolation (execute-phase.md) - Executors no longer modify STATE.md or ROADMAP.md in parallel mode - Orchestrator updates tracking files centrally after merge - Eliminates the #1 source of merge conflicts in parallel execution 3. Behavioral verification (verify-phase.md) - Verifier runs project test suite and CLI commands, not just grep - Follows Anthropic's Generator/Evaluator separation principle - Tests actual behavior against success criteria, not just file existence Real-world evidence: In a session executing 37 plans across 8 phases with parallel worktrees, we observed: - 4 test failures after merge that all Self-Checks missed (models.py type loss) - STATE.md/ROADMAP.md conflicts on every single parallel merge - Verifier reporting PASSED while merged code had broken imports References: - Anthropic Engineering Blog: Harness Design for Long-Running Apps (2026-03-24) - Issue #1451: Massive git worktree problem - Issue #1413: Autonomous execution without manual context clearing * fix: address review feedback — test runner detection, parallel isolation, edge cases - Replace hardcoded jest/vitest with `npm test` (reads project's scripts.test) - Add Go detection to post-merge test gate (was only in verify-phase) - Add 5-minute timeout to post-merge test gate to prevent pipeline stalls - Track cumulative wave failures via WAVE_FAILURE_COUNT for cross-wave awareness - Guard orchestrator tracking commit against unchanged files (prevent empty commits) - Align execute-plan.md with parallel isolation model (skip STATE.md/ROADMAP.md updates when running in parallel mode, orchestrator handles centrally) - Scope behavioral verification CLI checks: skip when no fixtures/test data exist, mark as NEEDS HUMAN instead of inventing inputs * fix: pass PARALLEL_MODE to executor agents to activate shared file isolation The executor spawn prompt in execute-phase.md instructed agents not to modify STATE.md/ROADMAP.md, but execute-plan.md gates this behavior on PARALLEL_MODE which was never defined in the executor context. This adds the variable to the spawn prompt and wraps all three shared-file steps (update_current_position, update_roadmap, git_commit_metadata) with explicit conditional guards. * fix: replace unreliable PARALLEL_MODE env var with git worktree auto-detection Address PR #1486 review feedback (trek-e): 1. PARALLEL_MODE was never reliably set — the <env> block instructed the LLM to export a bash variable, but each Bash tool call runs in a fresh shell so the variable never persisted. Replace with self-contained worktree detection: `[ -f .git ]` returns true in worktrees (.git is a file) and false in main repos (.git is a directory). Each bash block detects independently with no external state dependency. 2. TEST_EXIT only checked for timeout (124) — test failures (non-zero, non-124) were silently ignored, making the "If tests fail" prose unreachable. Add full if/elif/else handling: 0=pass, 124=timeout, else=fail with WAVE_FAILURE_COUNT increment. 3. Add Go detection to regression_gate (was missing go.mod check). Replace hardcoded npx jest/vitest with npm test for consistency. 4. Renumber steps from 4/4b/4c/5/5/5b to 4a/4b/4c/4d/5/6/7/8/9. * fix: address remaining review blockers — timeout, tracking guard, shell safety - verify-phase.md: wrap behavioral_verification test suite in timeout 300 - execute-phase.md: gate tracking update on TEST_EXIT=0, skip on failure/timeout - Quote all TEST_EXIT variables, add default initialization - Add else branch for unrecognized project types - Renumber steps to align with upstream (5.x series) * fix: rephrase worktree success_criteria to satisfy substring test guard The worktree mode success_criteria line literally contained "STATE.md" and "ROADMAP.md" inside a prohibition ("No modifications to..."), but the test guard in execute-phase-worktree-artifacts.test.cjs uses a substring check and cannot distinguish prohibition from requirement. Rephrase to "shared orchestrator artifacts" so the substring check passes while preserving the same intent.
24 KiB
<required_reading> Read STATE.md before any operation to load project context. Read config.json for planning behavior settings.
@~/.claude/get-shit-done/references/git-integration.md </required_reading>
<available_agent_types> Valid GSD subagent types (use exact names — do not fall back to 'general-purpose'):
- gsd-executor — Executes plan tasks, commits, creates SUMMARY.md </available_agent_types>
INIT=$(node "$HOME/.claude/get-shit-done/bin/gsd-tools.cjs" init execute-phase "${PHASE}")
if [[ "$INIT" == @file:* ]]; then INIT=$(cat "${INIT#@file:}"); fi
Extract from init JSON: executor_model, commit_docs, sub_repos, phase_dir, phase_number, plans, summaries, incomplete_plans, state_path, config_path.
If .planning/ missing: error.
Find first PLAN without matching SUMMARY. Decimal phases supported (01.1-hotfix/):
PHASE=$(echo "$PLAN_PATH" | grep -oE '[0-9]+(\.[0-9]+)?-[0-9]+')
# config settings can be fetched via gsd-tools config-get if needed
Routing by checkpoint type:
| Checkpoints | Pattern | Execution |
|---|---|---|
| None | A (autonomous) | Single subagent: full plan + SUMMARY + commit |
| Verify-only | B (segmented) | Segments between checkpoints. After none/human-verify → SUBAGENT. After decision/human-action → MAIN |
| Decision | C (main) | Execute entirely in main context |
Pattern A: init_agent_tracking → capture EXPECTED_BASE=$(git rev-parse HEAD) → spawn Task(subagent_type="gsd-executor", model=executor_model) with prompt: execute plan at [path], autonomous, all tasks + SUMMARY + commit, follow deviation/auth rules, report: plan name, tasks, SUMMARY path, commit hash → track agent_id → wait → update tracking → report. Include isolation="worktree" only if workflow.use_worktrees is not false (read via config-get workflow.use_worktrees). When using isolation="worktree", include a <worktree_branch_check> block in the prompt instructing the executor to run git merge-base HEAD {EXPECTED_BASE} and, if the result differs from {EXPECTED_BASE}, reset the branch base with git reset --soft {EXPECTED_BASE} before starting work. This corrects a known issue on Windows where EnterWorktree creates branches from main instead of the feature branch HEAD.
Pattern B: Execute segment-by-segment. Autonomous segments: spawn subagent for assigned tasks only (no SUMMARY/commit). Checkpoints: main context. After all segments: aggregate, create SUMMARY, commit. See segment_execution.
Pattern C: Execute in main using standard flow (step name="execute").
Fresh context per subagent preserves peak quality. Main context stays lean.
```bash if [ ! -f .planning/agent-history.json ]; then echo '{"version":"1.0","max_entries":50,"entries":[]}' > .planning/agent-history.json fi rm -f .planning/current-agent-id.txt if [ -f .planning/current-agent-id.txt ]; then INTERRUPTED_ID=$(cat .planning/current-agent-id.txt) echo "Found interrupted agent: $INTERRUPTED_ID" fi ```If interrupted: ask user to resume (Task resume parameter) or start fresh.
Tracking protocol: On spawn: write agent_id to current-agent-id.txt, append to agent-history.json: {"agent_id":"[id]","task_description":"[desc]","phase":"[phase]","plan":"[plan]","segment":[num|null],"timestamp":"[ISO]","status":"spawned","completion_timestamp":null}. On completion: status → "completed", set completion_timestamp, delete current-agent-id.txt. Prune: if entries > max_entries, remove oldest "completed" (never "spawned").
Run for Pattern A/B before spawning. Pattern C: skip.
Pattern B only (verify-only checkpoints). Skip for A/C.-
Parse segment map: checkpoint locations and types
-
Per segment:
- Subagent route: spawn gsd-executor for assigned tasks only. Prompt: task range, plan path, read full plan for context, execute assigned tasks, track deviations, NO SUMMARY/commit. Track via agent protocol.
- Main route: execute tasks using standard flow (step name="execute")
-
After ALL segments: aggregate files/deviations/decisions → create SUMMARY.md → commit → self-check:
- Verify key-files.created exist on disk with
[ -f ] - Check
git log --oneline --all --grep="{phase}-{plan}"returns ≥1 commit - Append
## Self-Check: PASSEDor## Self-Check: FAILEDto SUMMARY
Known Claude Code bug (classifyHandoffIfNeeded): If any segment agent reports "failed" with
classifyHandoffIfNeeded is not defined, this is a Claude Code runtime bug — not a real failure. Run spot-checks; if they pass, treat as successful. - Verify key-files.created exist on disk with
If plan contains <interfaces> block: These are pre-extracted type definitions and contracts. Use them directly — do NOT re-read the source files to discover types. The planner already extracted what you need.
- Read @context files from prompt
- MCP tools: If CLAUDE.md or project instructions reference MCP tools (e.g. jCodeMunch for code navigation), prefer them over Grep/Glob when available. Fall back to Grep/Glob if MCP tools are not accessible.
- Per task:
- MANDATORY read_first gate: If the task has a
<read_first>field, you MUST read every listed file BEFORE making any edits. This is not optional. Do not skip files because you "already know" what's in them — read them. The read_first files establish ground truth for the task. type="auto": iftdd="true"→ TDD execution. Implement with deviation rules + auth gates. Verify done criteria. Commit (see task_commit). Track hash for Summary.type="checkpoint:*": STOP → checkpoint_protocol → wait for user → continue only after confirmation.- MANDATORY acceptance_criteria check: After completing each task, if it has
<acceptance_criteria>, verify EVERY criterion before moving to the next task. Use grep, file reads, or CLI commands to confirm each criterion. If any criterion fails, fix the implementation before proceeding. Do not skip criteria or mark them as "will verify later".
- MANDATORY read_first gate: If the task has a
- Run
<verification>checks - Confirm
<success_criteria>met - Document deviations in Summary
<authentication_gates>
Authentication Gates
Auth errors during execution are NOT failures — they're expected interaction points.
Indicators: "Not authenticated", "Unauthorized", 401/403, "Please run {tool} login", "Set {ENV_VAR}"
Protocol:
- Recognize auth gate (not a bug)
- STOP task execution
- Create dynamic checkpoint:human-action with exact auth steps
- Wait for user to authenticate
- Verify credentials work
- Retry original task
- Continue normally
Example: vercel --yes → "Not authenticated" → checkpoint asking user to vercel login → verify with vercel whoami → retry deploy → continue
In Summary: Document as normal flow under "## Authentication Gates", not as deviations.
</authentication_gates>
<deviation_rules>
Deviation Rules
You WILL discover unplanned work. Apply automatically, track all for Summary.
| Rule | Trigger | Action | Permission |
|---|---|---|---|
| 1: Bug | Broken behavior, errors, wrong queries, type errors, security vulns, race conditions, leaks | Fix → test → verify → track [Rule 1 - Bug] |
Auto |
| 2: Missing Critical | Missing essentials: error handling, validation, auth, CSRF/CORS, rate limiting, indexes, logging | Add → test → verify → track [Rule 2 - Missing Critical] |
Auto |
| 3: Blocking | Prevents completion: missing deps, wrong types, broken imports, missing env/config/files, circular deps | Fix blocker → verify proceeds → track [Rule 3 - Blocking] |
Auto |
| 4: Architectural | Structural change: new DB table, schema change, new service, switching libs, breaking API, new infra | STOP → present decision (below) → track [Rule 4 - Architectural] |
Ask user |
Rule 4 format:
⚠️ Architectural Decision Needed
Current task: [task name]
Discovery: [what prompted this]
Proposed change: [modification]
Why needed: [rationale]
Impact: [what this affects]
Alternatives: [other approaches]
Proceed with proposed change? (yes / different approach / defer)
Priority: Rule 4 (STOP) > Rules 1-3 (auto) > unsure → Rule 4 Edge cases: missing validation → R2 | null crash → R1 | new table → R4 | new column → R1/2 Heuristic: Affects correctness/security/completion? → R1-3. Maybe? → R4.
</deviation_rules>
<deviation_documentation>
Documenting Deviations
Summary MUST include deviations section. None? → ## Deviations from Plan\n\nNone - plan executed exactly as written.
Per deviation: [Rule N - Category] Title — Found during: Task X | Issue | Fix | Files modified | Verification | Commit hash
End with: Total deviations: N auto-fixed (breakdown). Impact: assessment.
</deviation_documentation>
<tdd_plan_execution>
TDD Execution
For type: tdd plans — RED-GREEN-REFACTOR:
- Infrastructure (first TDD plan only): detect project, install framework, config, verify empty suite
- RED: Read
<behavior>→ failing test(s) → run (MUST fail) → commit:test({phase}-{plan}): add failing test for [feature] - GREEN: Read
<implementation>→ minimal code → run (MUST pass) → commit:feat({phase}-{plan}): implement [feature] - REFACTOR: Clean up → tests MUST pass → commit:
refactor({phase}-{plan}): clean up [feature]
Errors: RED doesn't fail → investigate test/existing feature. GREEN doesn't pass → debug, iterate. REFACTOR breaks → undo.
See ~/.claude/get-shit-done/references/tdd.md for structure.
</tdd_plan_execution>
<precommit_failure_handling>
Pre-commit Hook Failure Handling
Your commits may trigger pre-commit hooks. Auto-fix hooks handle themselves transparently — files get fixed and re-staged automatically.
If running as a parallel executor agent (spawned by execute-phase):
Use --no-verify on all commits. Pre-commit hooks cause build lock contention when multiple agents commit simultaneously (e.g., cargo lock fights in Rust projects). The orchestrator validates once after all agents complete.
If running as the sole executor (sequential mode): If a commit is BLOCKED by a hook:
- The
git commitcommand fails with hook error output - Read the error — it tells you exactly which hook and what failed
- Fix the issue (type error, lint violation, secret leak, etc.)
git addthe fixed files- Retry the commit
- Budget 1-2 retry cycles per commit </precommit_failure_handling>
<task_commit>
Task Commit Protocol
After each task (verification passed, done criteria met), commit immediately.
1. Check: git status --short
2. Stage individually (NEVER git add . or git add -A):
git add src/api/auth.ts
git add src/types/user.ts
3. Commit type:
| Type | When | Example |
|---|---|---|
feat |
New functionality | feat(08-02): create user registration endpoint |
fix |
Bug fix | fix(08-02): correct email validation regex |
test |
Test-only (TDD RED) | test(08-02): add failing test for password hashing |
refactor |
No behavior change (TDD REFACTOR) | refactor(08-02): extract validation to helper |
perf |
Performance | perf(08-02): add database index |
docs |
Documentation | docs(08-02): add API docs |
style |
Formatting | style(08-02): format auth module |
chore |
Config/deps | chore(08-02): add bcrypt dependency |
4. Format: {type}({phase}-{plan}): {description} with bullet points for key changes.
<sub_repos_commit_flow>
Sub-repos mode: If sub_repos is configured (non-empty array from init context), use commit-to-subrepo instead of standard git commit. This routes files to their correct sub-repo based on path prefix.
node ~/.claude/get-shit-done/bin/gsd-tools.cjs commit-to-subrepo "{type}({phase}-{plan}): {description}" --files file1 file2 ...
The command groups files by sub-repo prefix and commits atomically to each. Returns JSON: { committed: true, repos: { "backend": { hash: "abc", files: [...] }, ... } }.
Record hashes from each repo in the response for SUMMARY tracking.
If sub_repos is empty or not set: Use standard git commit flow below.
</sub_repos_commit_flow>
5. Record hash:
TASK_COMMIT=$(git rev-parse --short HEAD)
TASK_COMMITS+=("Task ${TASK_NUM}: ${TASK_COMMIT}")
6. Check for untracked generated files:
git status --short | grep '^??'
If new untracked files appeared after running scripts or tools, decide for each:
- Commit it — if it's a source file, config, or intentional artifact
- Add to .gitignore — if it's a generated/runtime output (build artifacts,
.envfiles, cache files, compiled output) - Do NOT leave generated files untracked
</task_commit>
On `type="checkpoint:*"`: automate everything possible first. Checkpoints are for verification/decisions only.Display: CHECKPOINT: [Type] box → Progress {X}/{Y} → Task name → type-specific content → YOUR ACTION: [signal]
| Type | Content | Resume signal |
|---|---|---|
| human-verify (90%) | What was built + verification steps (commands/URLs) | "approved" or describe issues |
| decision (9%) | Decision needed + context + options with pros/cons | "Select: option-id" |
| human-action (1%) | What was automated + ONE manual step + verification plan | "done" |
After response: verify if specified. Pass → continue. Fail → inform, wait. WAIT for user — do NOT hallucinate completion.
See ~/.claude/get-shit-done/references/checkpoints.md for details.
When spawned via Task and hitting checkpoint: return structured state (cannot interact with user directly).Required return: 1) Completed Tasks table (hashes + files) 2) Current Task (what's blocking) 3) Checkpoint Details (user-facing content) 4) Awaiting (what's needed from user)
Orchestrator parses → presents to user → spawns fresh continuation with your completed tasks state. You will NOT be resumed. In main context: use checkpoint_protocol above.
If verification fails:Check if node repair is enabled (default: on):
NODE_REPAIR=$(node "./.claude/get-shit-done/bin/gsd-tools.cjs" config-get workflow.node_repair 2>/dev/null || echo "true")
If NODE_REPAIR is true: invoke @./.claude/get-shit-done/workflows/node-repair.md with:
- FAILED_TASK: task number, name, done-criteria
- ERROR: expected vs actual result
- PLAN_CONTEXT: adjacent task names + phase goal
- REPAIR_BUDGET:
workflow.node_repair_budgetfrom config (default: 2)
Node repair will attempt RETRY, DECOMPOSE, or PRUNE autonomously. Only reaches this gate again if repair budget is exhausted (ESCALATE).
If NODE_REPAIR is false OR repair returns ESCALATE: STOP. Present: "Verification failed for Task [X]: [name]. Expected: [criteria]. Actual: [result]. Repair attempted: [summary of what was tried]." Options: Retry | Skip (mark incomplete) | Stop (investigate). If skipped → SUMMARY "Issues Encountered".
DURATION_SEC=$(( PLAN_END_EPOCH - PLAN_START_EPOCH )) DURATION_MIN=$(( DURATION_SEC / 60 ))
if $DURATION_MIN -ge 60 ; then HRS=$(( DURATION_MIN / 60 )) MIN=$(( DURATION_MIN % 60 )) DURATION="${HRS}h ${MIN}m" else DURATION="${DURATION_MIN} min" fi
</step>
<step name="generate_user_setup">
```bash
grep -A 50 "^user_setup:" .planning/phases/XX-name/{phase}-{plan}-PLAN.md | head -50
If user_setup exists: create {phase}-USER-SETUP.md using template ~/.claude/get-shit-done/templates/user-setup.md. Per service: env vars table, account setup checklist, dashboard config, local dev notes, verification commands. Status "Incomplete". Set USER_SETUP_CREATED=true. If empty/missing: skip.
Frontmatter: phase, plan, subsystem, tags | requires/provides/affects | tech-stack.added/patterns | key-files.created/modified | key-decisions | requirements-completed (MUST copy requirements array from PLAN.md frontmatter verbatim) | duration ($DURATION), completed ($PLAN_END_TIME date).
Title: # Phase [X] Plan [Y]: [Name] Summary
One-liner SUBSTANTIVE: "JWT auth with refresh rotation using jose library" not "Authentication implemented"
Include: duration, start/end times, task count, file count.
Next: more plans → "Ready for {next-plan}" | last → "Phase complete, ready for next step".
**Skip this step if running in parallel mode** (the orchestrator in execute-phase.md handles STATE.md/ROADMAP.md updates centrally after merging worktrees to avoid merge conflicts).Update STATE.md using gsd-tools:
# Auto-detect parallel mode: .git is a file in worktrees, a directory in main repo
IS_WORKTREE=$([ -f .git ] && echo "true" || echo "false")
# Skip in parallel mode — orchestrator handles STATE.md centrally
if [ "$IS_WORKTREE" != "true" ]; then
# Advance plan counter (handles last-plan edge case)
node "$HOME/.claude/get-shit-done/bin/gsd-tools.cjs" state advance-plan
# Recalculate progress bar from disk state
node "$HOME/.claude/get-shit-done/bin/gsd-tools.cjs" state update-progress
# Record execution metrics
node "$HOME/.claude/get-shit-done/bin/gsd-tools.cjs" state record-metric \
--phase "${PHASE}" --plan "${PLAN}" --duration "${DURATION}" \
--tasks "${TASK_COUNT}" --files "${FILE_COUNT}"
fi
# Add each decision from SUMMARY key-decisions
# Prefer file inputs for shell-safe text (preserves `$`, `*`, etc. exactly)
node "$HOME/.claude/get-shit-done/bin/gsd-tools.cjs" state add-decision \
--phase "${PHASE}" --summary-file "${DECISION_TEXT_FILE}" --rationale-file "${RATIONALE_FILE}"
# Add blockers if any found
node "$HOME/.claude/get-shit-done/bin/gsd-tools.cjs" state add-blocker --text-file "${BLOCKER_TEXT_FILE}"
node "$HOME/.claude/get-shit-done/bin/gsd-tools.cjs" state record-session \
--stopped-at "Completed ${PHASE}-${PLAN}-PLAN.md" \
--resume-file "None"
Keep STATE.md under 150 lines.
If SUMMARY "Issues Encountered" ≠ "None": yolo → log and continue. Interactive → present issues, wait for acknowledgment. **Skip this step if running in parallel mode** (the orchestrator handles ROADMAP.md updates centrally after merging worktrees).# Auto-detect parallel mode: .git is a file in worktrees, a directory in main repo
IS_WORKTREE=$([ -f .git ] && echo "true" || echo "false")
# Skip in parallel mode — orchestrator handles ROADMAP.md centrally
if [ "$IS_WORKTREE" != "true" ]; then
node "$HOME/.claude/get-shit-done/bin/gsd-tools.cjs" roadmap update-plan-progress "${PHASE}"
fi
Counts PLAN vs SUMMARY files on disk. Updates progress table row with correct count and status (In Progress or Complete with date).
node "$HOME/.claude/get-shit-done/bin/gsd-tools.cjs" requirements mark-complete ${REQ_IDS}
Extract requirement IDs from the plan's frontmatter (e.g., requirements: [AUTH-01, AUTH-02]). If no requirements field, skip.
# Auto-detect parallel mode: .git is a file in worktrees, a directory in main repo
IS_WORKTREE=$([ -f .git ] && echo "true" || echo "false")
# In parallel mode: exclude STATE.md and ROADMAP.md (orchestrator commits these)
if [ "$IS_WORKTREE" = "true" ]; then
node "$HOME/.claude/get-shit-done/bin/gsd-tools.cjs" commit "docs({phase}-{plan}): complete [plan-name] plan" --files .planning/phases/XX-name/{phase}-{plan}-SUMMARY.md .planning/REQUIREMENTS.md
else
node "$HOME/.claude/get-shit-done/bin/gsd-tools.cjs" commit "docs({phase}-{plan}): complete [plan-name] plan" --files .planning/phases/XX-name/{phase}-{plan}-SUMMARY.md .planning/STATE.md .planning/ROADMAP.md .planning/REQUIREMENTS.md
fi
FIRST_TASK=$(git log --oneline --grep="feat({phase}-{plan}):" --grep="fix({phase}-{plan}):" --grep="test({phase}-{plan}):" --reverse | head -1 | cut -d' ' -f1)
git diff --name-only ${FIRST_TASK}^..HEAD 2>/dev/null || true
Update only structural changes: new src/ dir → STRUCTURE.md | deps → STACK.md | file pattern → CONVENTIONS.md | API client → INTEGRATIONS.md | config → STACK.md | renamed → update paths. Skip code-only/bugfix/content changes.
node "$HOME/.claude/get-shit-done/bin/gsd-tools.cjs" commit "" --files .planning/codebase/*.md --amend
(ls -1 .planning/phases/[current-phase-dir]/*-PLAN.md 2>/dev/null || true) | wc -l
(ls -1 .planning/phases/[current-phase-dir]/*-SUMMARY.md 2>/dev/null || true) | wc -l
| Condition | Route | Action |
|---|---|---|
| summaries < plans | A: More plans | Find next PLAN without SUMMARY. Yolo: auto-continue. Interactive: show next plan, suggest /gsd-execute-phase {phase} + /gsd-verify-work. STOP here. |
| summaries = plans, current < highest phase | B: Phase done | Show completion, suggest /gsd-plan-phase {Z+1} + /gsd-verify-work {Z} + /gsd-discuss-phase {Z+1} |
| summaries = plans, current = highest phase | C: Milestone done | Show banner, suggest /gsd-complete-milestone + /gsd-verify-work + /gsd-add-phase |
All routes: /clear first for fresh context.
<success_criteria>
- All tasks from PLAN.md completed
- All verifications pass
- USER-SETUP.md generated if user_setup in frontmatter
- SUMMARY.md created with substantive content
- STATE.md updated (position, decisions, issues, session) — unless parallel mode (orchestrator handles)
- ROADMAP.md updated — unless parallel mode (orchestrator handles)
- If codebase map exists: map updated with execution changes (or skipped if no significant changes)
- If USER-SETUP.md created: prominently surfaced in completion output </success_criteria>