Commit Graph

186 Commits

Author SHA1 Message Date
Tom Boucher
3ce6a12f30 docs: add docs/RELEASE-v1.42.0-rc.1.md (new features only) (#3280)
Companion docs page for the v1.42.0-rc.1 release tag, scoped to the new
features in 1.42.0:

- Security: package legitimacy gate against slopsquatting (#3215) — three
  layers across researcher, planner, executor; plus npx --yes hardening
  and graceful degradation when slopcheck is unavailable
- Architecture: SDK package seam deepened; runtime-global skills policy
  converged into a single Module (#3238)
- Architecture: phase lifecycle seams deepened — extracts Phase Numbering
  Policy, Phase Filesystem Adapter, and Phase Roadmap Mutation modules
  from phase-lifecycle.ts (#3267)

Fix list is intentionally omitted — those fixes are rolled up from
v1.41.1 and listed on the v1.41.1 release page; this doc links out to
both v1.41.1 and v1.41.0 instead of restating them.

Format follows the established docs/RELEASE-v*.md pattern (compact
one-paragraph intro, categorized sections, install footer, link-out to
prior train).

Closes #3279
2026-05-09 01:10:31 -04:00
Tom Boucher
8bc255c266 fix(workstream): normalize migration workstream names (#3269)
* fix(workstream): normalize migrate-name to valid slug

* docs(context): record workstream migrate-name slug invariant

* fix(catalog-cjs): balanced fallback for unknown profile (CR finding A)

profiles[profile] could return undefined for any profile key absent from
the catalog entry, causing downstream callers like formatAgentToModelMapAsTable
to crash on .length. Add ?? profiles.balanced fallback to match the SDK adapter.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(sdk): anchor path resolution on import.meta.url not cwd (CR finding B)

resolve(process.cwd(), '..') breaks when Vitest is invoked from the repo root
because cwd is already the repo root and '..' goes one level above. Replace
with a file-relative path using fileURLToPath(new URL('../../../', import.meta.url))
anchored at the test file's location (sdk/src/query/).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test: derive Group B runtime list from catalog (CR finding C)

Hardcoded ['kilo', 'cline', ...] throws TypeError if a runtime name is
removed from the catalog. Derive group B dynamically via
Object.keys(catalog.runtimeTierDefaults).filter(r => !r.opus) so the
test never goes stale and auto-covers future Group B additions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(workflow): add hermes to Step B runtime options (CR finding D)

hermes appears in the Group A built-in defaults table but was missing from
the AskUserQuestion options in Step B, forcing users to manually type it via
'Other (Group B or custom)'. Add explicit hermes entry for UI consistency.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(config): refresh dynamic_routing tier table; fix stale L671 (findings E+F)

Finding E: tier table was missing 6 heavy-tier agents and 15 standard/light
agents added by this PR. Updated all three rows to match catalog routingTier
assignments (33 agents total).

Finding F: removed stale '18 of 31' claim and agent enumeration; replaced
with accurate note that all 33 agents have explicit catalog entries. Updated
authoritative source pointers to model-catalog.cjs / model-catalog.ts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(core): add profile-fallback unit tests for quality and budget (CR nitpick G)

The PR introduced quality→opus and budget→haiku unknown-agent fallbacks but
only balanced→sonnet and inherit→inherit were tested. Add two tests covering
the remaining two branches to complete coverage.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* adr: define planning workspace and worktree seam

* refactor(worktree): extract worktree safety policy module

* refactor(workstream): extract active workstream pointer store seam

* test(worktree): cover policy branch paths and persist seam guardrails

* refactor(worktree): centralize health inventory seam for W017

* fix(workspace): align SDK project path policy with CJS planningDir

* refactor(query): unify SDK planning path projection seam

* refactor(init): route workspace projection through planningPaths seam

* docs(adr): add SDK architecture and planning path ADRs

* refactor(worktree): deepen name, pointer, inventory, and config seams

* docs(config): harmonize claude-opus-4-6 to 4-7 in resolve_model_ids example (CR finding 2)

* fix(sdk): return undefined for model_profile='inherit' sentinel (CR finding 3)

* docs(adr): renumber conflicting 0003-sdk-package-seam-module to 0007, update seam-map reference (CR finding 4)

* fix(workstream): align CJS and SDK name validation to accept dots, guard path traversal via includes('..') (CR finding 5)

* fix(sdk): guard writeActiveWorkstream against non-existent workstream directory, k014/k031 parity (CR finding 6)

* chore(changeset): add #3269 changeset (CR finding 1 — proper changeset for this PR)

* docs(inventory): register 3 new CLI modules in INVENTORY.md/MANIFEST (active-workstream-store, workstream-name-policy, worktree-safety)

* fix(sdk): use relPlanningPath(workstream) in planningPaths, fix setActiveWorkstream/getActiveWorkstream name errors in workstream.ts

* fix(sdk): validate GSD_WORKSTREAM in planningPaths before use (#3269 regression)

planningPaths() called resolveWorkspaceContext() which returned GSD_WORKSTREAM
raw (no validation). An invalid value like '../evil' was used as effectiveWorkstream,
constructing a bad path; roadmapAnalyze() caught the ENOENT and returned a
no-phase_count error object instead of the root ROADMAP result.

Fix: validate envCtx.workstream with validateWorkstreamName() in planningPaths()
before accepting it as effectiveWorkstream. Invalid env → null → root .planning/
fallback, preserving the bug-2791 contract: invalid GSD_WORKSTREAM is silently
ignored and falls back to the root context (phase_count: 0 for empty root ROADMAP).

The bug-2791 regression test now passes. No other call sites read GSD_WORKSTREAM
without validation: query-runtime-context.ts already validates; cli.ts already
validates; context-engine.ts takes a caller-validated workstream parameter.

Closes #3268 (regression introduced by #3269 workstream-name-policy work).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 00:15:04 -04:00
Tom Boucher
96806003c5 fix(#3229): shared model catalog source of truth for agent profiles + runtime tier defaults (#3230)
* docs(adr): add ADR-0003 model catalog module

* fix(#3229): add shared model catalog as source of truth for agent profiles and runtime tier defaults

Research / design (ADR-0003):
- Existing drift came from 4 independent model truths:
  1. CJS model-profiles.cjs
  2. SDK config-query.ts stale copy (18 agents)
  3. settings-advanced.md runtime tier table
  4. session-runner Claude-only profile map
- New design: one machine-readable Model Catalog Module in sdk/shared/
  that both packages ship and consume.

Implementation:
- sdk/shared/model-catalog.json — canonical source of truth for:
  - full 33-agent registry
  - per-agent golden (quality) alias + balanced/budget aliases
  - adaptive derivation from routingTier
  - agent→phaseType map
  - agent→dynamic-routing default tier map
  - runtime tier defaults for all supported runtimes
- get-shit-done/bin/lib/model-catalog.cjs — CJS adapter over the catalog
- sdk/src/model-catalog.ts — SDK adapter over the same catalog
- CJS model-profiles.cjs now re-exports derived data from model-catalog.cjs
- SDK config-query.ts now re-exports MODEL_PROFILES/VALID_PROFILES from
  model-catalog.ts instead of maintaining its own list
- sdk/src/query/helpers.ts runtime list now comes from the catalog (fixes hermes drift)
- sdk/src/session-runner.ts Claude profile→model-id mapping now resolves via catalog
- docs/CONFIGURATION.md + settings-advanced.md runtime tables updated to match catalog

Behavior changes:
- resolve-model now covers every shipped agent file on disk (33 agents)
- unknown-agent fallback is profile-semantic, not hardcoded sonnet:
  quality→opus, budget→haiku, balanced/adaptive→sonnet, inherit→inherit
- Group B runtimes remain known runtimes but do not get built-in tier defaults

Tests (RED→GREEN):
- root tests: shipped agent files must equal MODEL_PROFILES keys
- sdk tests: shipped agent files must equal MODEL_PROFILES keys
- direct fix assertion: gsd-code-reviewer resolves to opus under quality with no unknown_agent
- runtime defaults parity test: settings-advanced.md + CONFIGURATION.md tables must match catalog
- helper tests: hermes included in SUPPORTED_RUNTIMES and getRuntimeConfigDir()

Closes #3229

* chore(changeset): update #3229 changeset pr field to 3230

* fix(ci): update inherit fallback expectations and inventory parity for model catalog
2026-05-08 21:25:37 -04:00
Tom Boucher
b37c487325 feat(security): package legitimacy gate against slopsquatting (#3215)
* feat(security): package legitimacy gate against slopsquatting (#2827)

GSD's research → plan → execute pipeline had no install-time legitimacy
gate: a hallucinated package name that passes `npm view` could flow all
the way to `gsd-executor` running `npm install <malicious-pkg>` with no
human checkpoint. This PR closes that gap.

Changes:
- gsd-phase-researcher: runs slopcheck on every recommended package;
  emits `## Package Legitimacy Audit` table; strips [SLOP] packages;
  ecosystem-specific verification (pip/npm/cargo); WebSearch-sourced
  packages tagged [ASSUMED]; ctx7 fallback uses `command -v` guard
  instead of `npx --yes`
- gsd-planner: injects `checkpoint:human-verify` before [ASSUMED]/[SUS]
  installs; adds T-{phase}-SC STRIDE row to <threat_model> template;
  ctx7 fallback also uses `command -v` guard
- gsd-executor: RULE 3 excludes package installs from auto-fix; failed
  installs surface as checkpoints, never silent substitutions
- tests/package-legitimacy-gate.test.cjs: 24 structural assertions
  covering the full gate (node:test + node:assert, no raw .includes())
- docs: USER-GUIDE, COMMANDS, ARCHITECTURE updated with gate description
- .changeset: Security fragment for v1.51 release notes

Closes #2827

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: expand Package Legitimacy Gate documentation

Add full user-facing depth to the gate docs across USER-GUIDE,
COMMANDS, and ARCHITECTURE:

- USER-GUIDE: rewrite gate section with concrete RESEARCH.md/PLAN.md
  examples, slopcheck verdict table, [ASSUMED] WebSearch tagging
  explanation, slopcheck-unavailable troubleshooting, and graceful
  degradation behavior
- COMMANDS.md: expand /gsd-plan-phase gate note with verdict bullets;
  add install-failure checkpoint behavior to /gsd-execute-phase
- ARCHITECTURE.md: expand gate section with threat model rationale,
  layer table, claim provenance integration, ecosystem coverage, and
  graceful degradation semantics

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(security): harden package legitimacy checkpoint semantics

* fix(planner): satisfy size gates and tighten package gate wording

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 09:08:06 -04:00
Tom Boucher
397c34142a Deepen SDK package seam and converge runtime skills policy (#3238)
* Deepen SDK package seam and converge runtime skills policy

* fix(sdk): unified install-root resolution for workflows and agents (CR finding 1)

Use the already-resolved gsdInstallDir constant instead of calling
resolveLegacyInstallDir() again when computing agentsDir, ensuring
workflowsDir and agentsDir share the same install root.

* fix(sdk): tilde shortening requires path-boundary match (CR finding 2)

Both renderGlobalSkillsBaseDisplayPath and renderGlobalSkillDisplayPath
used startsWith(home) which could incorrectly shorten unrelated paths
sharing the same prefix. Now checks for home === base or
base.startsWith(home + sep) to ensure a real directory boundary.

* fix(sdk): validate loadConfig export before invocation (CR finding 3)

After requiring core.cjs, check typeof mod.loadConfig === 'function'
before calling it. Throws a classified GSDError with the module path
if the export is missing, rather than a generic TypeError.

* fix(test): guard root lookup before .path dereference (CR finding 4)

Added assert.ok() guards for claudeRoot and codexRoot after the .find()
calls so that a missing root produces an explicit assertion failure
rather than a TypeError on .path dereference.

* fix(ci): fail-safe on transient API errors in approval dismissal (CR finding 6)

resolveRole() returns 'unknown' for non-404 errors (rate limits, 5xx,
network blips). shouldDismissReviewer() now treats 'unknown' as
unresolvable and skips dismissal, preventing legitimate approvals from
being dismissed due to a transient API failure. Only 'none' (true 404)
is treated as a confirmed non-collaborator.

* changeset: pr=3238 SDK package seam and runtime skills convergence

* fix(sdk): harden resolveGlobalSkillDir against path traversal (CR finding 1)

Use resolve+relative to validate that skillName cannot escape the global
skills base directory. Values like "../../foo" or absolute paths now
return null instead of joining directly. All imports (resolve, relative,
isAbsolute) were already present in helpers.ts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(sdk): split skill-dir-resolution and skill-not-found warnings (CR finding 2)

After resolveGlobalSkillDir's hardening can return null for traversal
attempts, the old single-branch warning "Global skill not found at ..."
was misleading. Split into two distinct cases:
- skillDir === null → "Could not resolve global skill directory for ..."
- skillMd missing → "Global skill not found at ..."

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test: lock skill path-traversal rejection in resolveGlobalSkillDir

Regression test verifying that traversal segments (../../foo, ../escape),
empty string, and absolute paths are all rejected (return null), while
a legitimate skill name resolves correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(sdk): align display-path contract + traversal coverage for resolveGlobalSkillMarkdownPath (CR nitpicks)

- renderGlobalSkillsBaseDisplayPath now returns a non-null string for
  unsupported runtimes (e.g. cline → "(cline does not use a skills directory)")
  matching the existing renderGlobalSkillDisplayPath contract; callers
  of both helpers no longer need null-checks for unsupported runtimes.
- Remove now-redundant ! non-null assertion on renderGlobalSkillsBaseDisplayPath
  calls in skill-manifest.ts (return type is string, not string | null).
- Extend the path-traversal test block to assert resolveGlobalSkillMarkdownPath
  also propagates null for ../../foo, ../escape, empty, and /abs/path inputs,
  locking the null-propagation contract against future refactors.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 09:06:43 -04:00
Tom Boucher
924c697097 docs: replace retired /gsd-intel with /gsd-map-codebase --query (#3258) (#3260)
* test: forbid stale /gsd-intel references in workflow/reference docs (#3258)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: replace retired /gsd-intel with /gsd-map-codebase --query (#3258)

Fixes 5 stale references across the two primary source files called out in
the issue. PR #2790 folded /gsd-intel into /gsd-map-codebase --query; these
prose surfaces were not updated at that time.

Fixes #3258

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: fix additional stale /gsd-intel references found in adversarial sweep (#3258)

Sweep found 7 more occurrences in docs/INVENTORY.md (x2), docs/USER-GUIDE.md (x4),
docs/FEATURES.md (x2), and agents/gsd-intel-updater.md (x2). All replaced with
/gsd-map-codebase --query. The gsd-intel-updater agent name itself (without leading
slash) is intentionally preserved.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* changeset: pr=3260 for #3258

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test: fail loudly on unreadable files in bug-3258 regression scan (CR finding)

Replace silent early-return on readFileSync failure with an explicit
throw so unreadable files surface as test failures rather than skipped
coverage gaps.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 09:06:37 -04:00
Tom Boucher
48b01e4c9f docs(agents): scaffold docs/agents/ skill config files
- docs/agents/issue-tracker.md — GitHub, gsd-build/get-shit-done, .envrc token required
- docs/agents/triage-labels.md — confirmed=AFK-ready, approved-*=human-ready, needs-reproduction=needs-info
- docs/agents/domain.md — single-context, CONTEXT.md sections explained
- CLAUDE.md — fix stale triage label (needs-maintainer-review doesn't exist),
  fix stale domain note ('neither exists yet'), add .envrc token reminder to issue tracker summary
2026-05-07 09:12:24 -04:00
Tom Boucher
e3b52c70bb fix(docs): replace deleted /gsd-new-workspace with /gsd-workspace --new in FEATURES.md (#3221)
Feature 129 (Issue-Driven Orchestration Guide) referenced the deleted command
/gsd-new-workspace. Replace with its v1.40.0 successor /gsd-workspace --new to
fix the stale-ref test introduced in tests/bug-3042-3044-research-flag-and-stale-refs.

Fixes #3220

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-07 00:26:24 -04:00
Tom Boucher
c0be29607a docs: v1.41.0 release documentation — CHANGELOG promotion, release notes, FEATURES update (#3219)
- Promote CHANGELOG [Unreleased] → [1.41.0] - 2026-05-07; add fresh [Unreleased] header
- Fix CONFIGURATION.md version labels: 'added in v1.40' → 'added in v1.41' for models and dynamic_routing
- Create docs/RELEASE-v1.41.0.md in compact v1.39.0 bullet format
- Rewrite docs/RELEASE-v1.40.0-rc.1.md to compact bullet format (removes wall-of-text entries)
- Add docs/FEATURES.md v1.41.0 section (features 126–131: per-phase models, dynamic routing, update banner, issue-driven orchestration, graphify staleness, MVP SDK verbs)
- Update docs/FEATURES.md TOC
- Trim README "Notable extras" table (highlight page, not a command menu)

Fixes #3218

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-07 00:19:26 -04:00
Tom Boucher
2d32ad82be fix(plan-phase): remove agent: directive that caused OpenCode subagent dispatch (#3156) (#3206)
* feat(roadmap): parse **Mode:** field on phase sections

Adds a 'mode' field to roadmap.get-phase and roadmap.analyze outputs.
Recognizes '**Mode:** mvp' lines in phase sections; lowercased + trimmed.
Forward-compat: unrecognized values preserved verbatim, no enum check.

Foundation for --mvp flag in plan-phase (PRD: vertical-mvp-slice).

* feat(plan-phase): parse --mvp flag and resolve MVP_MODE

Resolution order: CLI flag → ROADMAP **Mode:** field → workflow.mvp_mode
config → false. Walking Skeleton gate fires for new-project Phase 1.
Wires MVP_MODE + WALKING_SKELETON into gsd-planner subagent prompt.

Per PRD vertical-mvp-slice Phase 1 (Q1, Q2, Q4).

* docs(planner): add vertical-slice planning reference

New reference loaded by gsd-planner when MVP_MODE=true. Defines slice
ordering, Walking Skeleton rules, and anti-patterns. Referenced from
plan-phase workflow MVP_MODE wiring.

* docs(planner): add SKELETON.md template

Template emitted by gsd-planner under WALKING_SKELETON=true. Captures
architectural decisions and out-of-scope list for new-project Phase 1.

* chore(inventory): register new planner references

Added planner-mvp-mode.md and skeleton-template.md to INVENTORY.md and
INVENTORY-MANIFEST.json. References now: 53.

* feat(gsd-planner): add MVP Mode Detection section

Mode-switched branch in the existing planner agent (per Q4: single agent).
Vertical-slice decomposition rules, Walking Skeleton handling, and
TDD-mode compatibility. Heavy guidance lives in references/planner-mvp-mode.md.

* test(plan-phase): add --mvp resolution-chain integration cases

Validates roadmap.get-phase --pick mode and confirms workflow.mvp_mode
default is unset in fresh projects.

* docs(changelog): announce --mvp vertical-slice planning (#2826)

* feat(mvp-phase): add /gsd mvp-phase slash command

Standalone command for vertical MVP planning. Frontmatter only;
heavyweight workflow at get-shit-done/workflows/mvp-phase.md follows
in next commit. Mirrors discuss-phase/edit-phase command shape.

* docs(planner): add user-story-template reference

Defines the canonical 'As a / I want to / So that' format and the
ROADMAP.md / PLAN.md emit rules. Used by mvp-phase workflow and
gsd-planner agent under MVP_MODE.

* docs(planner): add SPIDR splitting reference

Defines size signals, the five SPIDR axes (Spike/Paths/Interfaces/Data/Rules),
the interactive workflow, and anti-patterns. Per PRD Q3 decision: full
interactive flow, not lightweight check. Used by mvp-phase workflow.

* fix(mvp-phase): trim description to fit 100-char budget

* feat(mvp-phase): add mvp-phase workflow

Standalone workflow: phase validation -> user story prompts (As a / I want to /
So that) -> SPIDR splitting check -> ROADMAP write (Mode + Goal) -> delegation
to plan-phase. Per PRD Phase 2 (Q3 full SPIDR; Phase-2-A/B/C/D decisions).

Plan-phase auto-detects MVP via Phase 1's resolution chain, so no flags
are needed when delegating.

* feat(gsd-planner): emit user-story header in PLAN.md under MVP mode

Extends the MVP Mode Detection section (added in Phase 1) so the planner
sources the user story from ROADMAP **Goal:** and emits the bolded
**As a** / **I want to** / **so that** form as the first content under
the phase header in PLAN.md. References user-story-template.md.

* test(mvp-phase): integration smoke test for ROADMAP mutation

Validates roadmap.get-phase output after a workflow-spec'd ROADMAP write:
mode=mvp and goal=full user story. Catches schema drift between workflow
emit and parser expectation. Includes a long-story case (>120 chars) to
confirm SPIDR-rejected stories still parse correctly.

* chore(inventory): register mvp-phase command + 2 new references

Adds /gsd mvp-phase to commands list, mvp-phase workflow to workflows list,
and user-story-template.md + spidr-splitting.md to references. References
count: 53 -> 55.

* docs(changelog): announce /gsd mvp-phase command (#2826)

* fix(mvp-phase): add TEXT_MODE plain-text fallback for non-Claude runtimes (#2012)

* docs(executor): add MVP+TDD gate reference

Defines the runtime gate semantics for execute-phase when both
MVP_MODE and TDD_MODE are true: pre-task verification of failing-test
commit, end-of-phase review escalation from advisory to blocking,
behavior-adding task definition. Loaded conditionally by
execute-phase workflow and gsd-executor agent.

* feat(execute-phase): MVP+TDD runtime gate + blocking review

Resolves MVP_MODE in Step 1 (CLI flag -> roadmap mode -> config -> false).
Adds per-task gate that halts before behavior-adding tasks run if no
failing-test commit exists for the plan. Escalates end-of-phase TDD
review from advisory to blocking when both MVP_MODE and TDD_MODE active.

Also updates INVENTORY-MANIFEST.json to register execute-mvp-tdd.md
(added by Task 1) so manifest-sync tests pass.

Per PRD vertical-mvp-slice Phase 3a (decisions Phase-3-A, Phase-3-Split).

* feat(gsd-executor): add MVP+TDD Gate section

Mirrors the planner's MVP Mode Detection pattern from Phase 1.
Instructs halt-and-report when the runtime gate trips, references
execute-mvp-tdd.md for full semantics. No agent changes outside the
new section.

* test(execute-phase): add MVP+TDD resolution-chain integration cases

Validates roadmap.get-phase --pick mode and confirms workflow.mvp_mode
default is unset in fresh projects. Mirrors the Phase 1 plan-phase
resolution-chain integration test.

* chore(inventory): register execute-mvp-tdd reference

Bumps References count 55 -> 56. Registers execute-mvp-tdd.md.
Adds "init" to PROSE_ALLOWLIST in registry integration test so
bare `gsd-sdk query init` prose examples in plan docs don't
trigger the unregistered-handler guard (real commands are all
init.<subcommand>).

* docs(changelog): announce MVP+TDD runtime gate in execute-phase (#2826)

* docs(verifier): add verify-mvp-mode reference

Defines UAT framing under MVP mode: user-flow walk-through first,
technical checks deferred, coverage check as goal-backward narrowing
to the user story's outcome clause. Loaded conditionally by
verify-work workflow and gsd-verifier agent.

* feat(verify-work): MVP-mode UAT framing — user flow first

Resolves MVP_MODE from phase mode field. Under MVP mode, generates UAT
in three ordered sections: user-flow walk-through (derived from user
story), technical checks (deferred), coverage check (goal-backward).
Falls back to standard UAT generation when mode is null/absent.
User-story-format guard refuses to verify a mode:mvp phase with a
non-user-story goal.

Also updates docs/INVENTORY.md (56 references) and
docs/INVENTORY-MANIFEST.json to register verify-mvp-mode.md added
in Task 1.

Per PRD vertical-mvp-slice Phase 3b (decisions Phase-3-B,
Phase-3-Verify-Structure).

* feat(gsd-verifier): add MVP Mode Verification section

Narrows goal-backward verification to the user-story [outcome] clause
when phase mode is mvp. References verify-mvp-mode.md. Preserves
existing goal-backward methodology for non-MVP phases. User-story-format
guard refuses to verify a mode:mvp phase with a non-user-story goal.

* docs(changelog): announce MVP-mode UAT framing in verify-work (#2826)

* feat(new-project): add Vertical MVP vs Horizontal Layers mode prompt

Asks user at project init how to structure the project. Vertical MVP
emits **Mode:** mvp on every initial roadmap phase (per-phase mode
preserved per PRD Q1). Horizontal Layers falls back to standard
template — no behavioral change for existing flows.

Per PRD vertical-mvp-slice Phase 4 (decision Phase-4-Persistence).

* feat(progress): add MVP-mode user-flow display

When phase has **Mode:** mvp, progress renders user-flow status from
PLAN.md task names alongside standard task progress. Tasks that aren't
user-flow-shaped (technical-sounding) are filtered out of the user-flow
sub-block. Falls back to standard display when mode is null/absent.

Per PRD vertical-mvp-slice Phase 4 (decision Phase-4-Progress).

* feat(stats): add MVP phase count summary

Reads roadmap.analyze (which surfaces mode per phase from Phase 1) and
emits 'Phases: N total | M MVP | K standard' summary line. Suppressed
when MVP_COUNT == 0 to avoid clutter on non-MVP projects.

Per PRD vertical-mvp-slice Phase 4.

* feat(graphify): add MVP-mode visual differentiation

MVP-mode phases render with #22c55e fill color AND ' (MVP)' label
suffix — two-channel signaling for color-blind and grayscale renders.
Standard phases unchanged.

Per PRD vertical-mvp-slice Phase 4 (PRD Q5: distinct visual treatment).

* docs(changelog): announce Phase 4 discovery & progress (#2826)

* chore(release): bump dev to 1.50.0-canary.0 for first 1.50.0 canary

Sets the base version that .github/workflows/canary.yml derives the canary
tag from (strips suffix → base 1.50.0 → next available v1.50.0-canary.N).

This kicks off the 1.50.0 release train, opened by the MVP/TDD/UAT vertical
slice landed across PRs #2867, #2874, #2878, #2880, #2883.

* docs: add CANARY stream README + v1.50.0-canary.1 release notes

- docs/CANARY.md — explains the dev→@canary stream policy, install/rollback
  paths, and when (not) to install canary builds
- docs/RELEASE-v1.50.0-canary.1.md — release notes for the first 1.50.0
  canary cut: vertical MVP/TDD/UAT slice (#2867 + #2874 + #2878 + #2880 +
  #2883), opening the 1.50.0 train under PRD #2826
- docs/README.md — index entry + quick link for the canary stream

* fix(ci/canary): publish gate checks dev branch, not main

Four publish-step `if:` conditions in .github/workflows/canary.yml were
checking `github.ref == 'refs/heads/main'`. Those steps (Tag and push,
Publish to npm, Publish SDK to npm, Verify publish) therefore always
skipped on every workflow_dispatch invocation since canary runs from dev,
never main.

The workflow's own header comment is unambiguous: `dev → @canary`. The
gate was a copy-paste from release.yml (which correctly targets main for
the @next/@latest streams) that was never corrected for the canary stream.

This is why the 1.50.0-canary.1 publish hadn't materialized despite three
green workflow runs. With the gate corrected, the next dispatch will
actually publish.

* ci(release-sdk): make release-sdk.yml dispatchable from the dev branch

The workflow lives on main only, so the GitHub Actions "Use workflow
from" dropdown doesn't list dev — meaning dev → @dev publishes can't be
triggered from the dev branch directly. Add the file to dev so an
operator can dispatch it with branch=dev and tag=dev.

Per project release-stream policy: dev branch publishes canary (@dev).
This is the stream that needs the file most, since main never publishes
@dev itself (main does @next / @latest).

File is byte-identical to main's release-sdk.yml — straight propagation,
no behavioral change. Tracking issues #2925, #2929.

* docs(mvp): canary-prep concept cleanup — CONTEXT.md, mvp-concepts index, --prd interaction (#3176)

* chore(mvp): concept cleanup + cross-ref index for v1.50.0-canary.2 prep

- CONTEXT.md gains 7 MVP domain terms (MVP Mode, User Story, Walking
  Skeleton, Vertical Slice, Behavior-Adding Task, MVP+TDD Gate, SPIDR
  Splitting) so the project glossary matches the shipped surface.
- New get-shit-done/references/mvp-concepts.md indexes the six MVP
  reference files and concept-to-file map so agents and contributors
  can find the right canonical doc without grepping.
- plan-phase.md Walking Skeleton block now documents that --mvp and
  --prd compose orthogonally on Phase 1; no precedence needed.
- INVENTORY/INVENTORY-MANIFEST refreshed for the new reference (58 -> 59).

No behavior change. Canary-prep cleanup ahead of v1.50.0-canary.2.

Surfaced for follow-up (not in this PR):
- MVP_MODE resolution shell block duplicated across plan-phase,
  execute-phase, verify-work workflows (needs a shared workflow-include
  mechanism; structural change).
- Behavior-Adding Task predicate is prose-only; no shared utility.
- User Story regex hardcoded in verify-work; would benefit from a
  central definition consumed by the verifier and the mvp-phase command.

* chore(changeset): set PR number for mvp concept cleanup

* feat(mvp): centralize resolution surfaces + fix SDK roadmap mode parity (#3178)

Three new SDK query verbs replace the architectural duplication surfaced by
the v1.50.0-canary.2 review against dev tip 12c4e565:

  phase.mvp-mode <N> [--cli-flag]
    Single canonical precedence resolver (CLI flag -> ROADMAP **Mode:** mvp
    -> workflow.mvp_mode config -> false). Replaces 4-8 lines of bash that
    were duplicated across plan-phase.md, execute-phase.md, verify-work.md,
    and progress.md. Returns {active, source, roadmap_mode, config_mvp_mode,
    cli_flag_present}.

  task.is-behavior-adding <plan-file> | --task-content <xml>
    Behavior-Adding Task predicate (tdd="true" + <behavior> block + non-test
    source files in <files>). Replaces prose-only specification in
    references/execute-mvp-tdd.md; gsd-executor agent now invokes the verb
    instead of re-inlining the three checks. Returns {is_behavior_adding,
    checks, reason}.

  user-story.validate <text> | --story <text>
    Owns the canonical User Story regex /^As a .+, I want to .+, so that .+\.$/
    previously hardcoded in verify-work.md prose. Consumed by gsd-verifier
    (phase-goal guard) and /gsd-mvp-phase (interactive-prompt validation).
    Returns {valid, slots: {role, capability, outcome}, errors[]}.

Bug fix bundled: sdk/src/query/roadmap.ts searchPhaseInContent now extracts
the mode field from **Mode:**, restoring parity with roadmap.cjs:120-123.
Without this, roadmap.get-phase --pick mode returned null on the native
dispatch path even when the phase had **Mode:** mvp set, causing MVP_MODE
to silently fall through to the config/false branch in every consuming
workflow. The original PRs Phase 1 (#2885) shipped the CJS parser but the
SDK port omitted the field; this fix brings them back to parity.

Workflows + agents updated to call the verbs:
  - plan-phase.md, execute-phase.md, verify-work.md, progress.md call
    phase.mvp-mode (one line replaces the duplicated bash chains).
  - execute-phase.md MVP+TDD gate calls task.is-behavior-adding.
  - verify-work.md goal guard calls user-story.validate.
  - mvp-phase.md interactive prompt validates via user-story.validate.
  - gsd-executor agent references task.is-behavior-adding instead of prose.
  - gsd-verifier agent references user-story.validate instead of inlined regex.

Tests: 24 new vitest tests in sdk/src/query/mvp.test.ts cover all three
verbs + the regression. Two existing contract tests (progress, verify)
updated to assert on the new verb shape. All 60 existing MVP contract
tests pass; golden integration suite (38 + 42 tests) passes.

Closes #3177

* fix(canary.2): unblock release gates for v1.50.0-canary.2

Run 25451329660 (Release SDK Bundle on dev, 2026-05-06T17:41) failed at the
test-suite step with 3 deterministic content/structure gate failures, all
attributable to the MVP umbrella integration in #3178 and the docs sweep
in #3180.

Failure 1: /gsd-mvp-phase undocumented in workflows/help.md
  - tests/bug-2954-help-md-slash-command-stubs.test.cjs requires every
    shipped commands/gsd/<X>.md to have a /gsd-<X> mention in help.md
  - PR #3180 updated docs/COMMANDS.md but missed help.md (which the AI
    agents load in-product)
  - Fix: add a /gsd-mvp-phase entry to help.md right before /gsd-plan-phase

Failures 2 + 3: execute-phase.md (1727) and plan-phase.md (1714) over XL budget (1700)
  - PR #3178 added MVP-mode verb calls (phase.mvp-mode, task.is-behavior-adding,
    user-story.validate) to both workflow files, pushing them past 1700 lines
  - Fix: bump XL_BUDGET 1700 -> 1800 with inline comment pointing at the
    structural follow-up (extract MVP bodies to <workflow>/modes/mvp.md per
    the discuss-phase/modes/ precedent)
  - The structural extract is the right long-term fix but is bigger than
    canary unblock scope; will land in a follow-up after canary cycles

Local verification:
  $ node --test tests/bug-2954-help-md-slash-command-stubs.test.cjs                 tests/workflow-size-budget.test.cjs
  tests 111  pass 111  fail 0

After this lands, re-trigger Release SDK Bundle on dev for v1.50.0-canary.2.

* chore(changeset): set PR number for canary.2 unblock

* fix(codex): generate-claude-md writes to AGENTS.md on Codex runtime

When config.runtime === 'codex' or GSD_RUNTIME=codex, override the
output target to AGENTS.md regardless of claude_md_path, so Codex
projects no longer have GSD sections written to CLAUDE.md by mistake.

Fixes both the CJS (gsd-tools) and SDK (profile-output.ts) paths.
Explicit --output flags are still honoured in both paths.

Closes #3163

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(plan-phase): remove agent: directive that caused OpenCode subagent dispatch

On OpenCode, any command with `agent: <name>` in its frontmatter is
auto-dispatched to a subagent context where the Agent tool is unavailable.
plan-phase.md and mvp-phase.md both carried `agent: gsd-planner`, causing
them to run inside gsd-planner's subagent context with no ability to spawn
researcher/planner/checker subagents — the orchestrator fell back to inline
execution for all three phases.

Fix: remove `agent: gsd-planner` from both command files so they run in the
main agent context. Also replace the stale `Task` tool in allowed-tools with
`Agent` (the correct dispatcher tool name post-#3168 rename).

Adds a structural regression test that parses YAML frontmatter of every
commands/gsd/*.md file and asserts no command carries an `agent:` directive.

Closes #3156

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(mvp): address CodeRabbit workflow and contract findings

* fix(execute-phase): use registered state.update query command

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 21:51:38 -04:00
Tom Boucher
94f835af40 docs: add Prerelease editions install guidance (Next/Nightly/Insiders/Preview) (#3173)
* docs: add Prerelease editions install guidance (Next/Nightly/Insiders/Preview)

Documents the existing <RUNTIME>_CONFIG_DIR override pattern for users on
prerelease runtime editions (Windsurf Next, Cursor Nightly, VS Code Insiders,
Codex preview, JetBrains EAP, etc.) and explicitly states they are best-effort
and not separately tested under release CI — consistent with the free-string
runtime policy in #2517.

Resolves the discoverability gap behind issue #3161 without enumerating each
prerelease channel as a named runtime. Future "add <runtime>-next/-nightly"
requests can be redirected to the new section.

Closes #3172

* chore(changeset): set PR number for prerelease docs fragment
2026-05-06 12:44:48 -04:00
Tom Boucher
29eb8be06d feat(graphify): commit-based staleness from built_at_commit (#3170) (#3171)
* test(graphify): TDD-red design contract for #3170 commit-staleness signal

Captures the proposed extension to graphifyStatus() as 8 failing
assertions across 3 groups (git-aware, non-git, back-compat). Suite is
describe.skip()'d so npm test stays green on the branch — removing
.skip is the green-light moment when the enhancement is approved and
implementation lands.

Verified against safishamsi/graphify v0.7.0 release notes: the field
on graph.json is built_at_commit (full git HEAD), not commit_hash as
originally guessed in #3170. Tests assert against the verified name.

Design highlights captured in the file's docstring:
- Tri-state commit_stale (true/false/null) — null means "we don't
  know" (pre-v0.7 graph or no git), distinct from false ("known fresh")
- Argument-injection fence /^[0-9a-f]{4,40}$/i validates built_at_commit
  before it reaches `git` as an argv element
- Existing graphifyStatus() fields (node_count, edge_count, stale,
  age_hours, etc.) are unchanged — back-compat fenced

Per the issue's enhancement template: no PR will be opened until the
issue is labeled `approved-enhancement`.

Refs #3170
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(graphify): surface commit-based staleness from graphify v0.7+ built_at_commit

Closes #3170

graphify v0.7+ embeds built_at_commit (full git HEAD) into graph.json at
write time. GSD's existing graphifyStatus() ignored it; staleness was
mtime-only, which is a poor proxy for "does this graph reflect the
current code." A CI-built graph rebuilt minutes ago against an old
checkout reads as FRESH on mtime but is materially stale.

graphifyStatus() now returns four additional fields on the success path:

  built_at_commit   short hash from graph.built_at_commit, or null
  current_commit    short hash of git HEAD, or null when no git
  commits_behind    git rev-list --count <built>..HEAD, or null
  commit_stale      true | false | null

Tri-state on commit_stale is load-bearing. null means "we don't know"
(pre-v0.7 graph, non-git cwd, unreachable commit) — semantically
distinct from false ("known fresh"). Agents reading null should fall
back to mtime; reading false can confidently skip a rebuild.

Security: built_at_commit is on-disk and user-influenceable. Without
validation, a hostile value (e.g. "--upload-pack=evil") would reach git
as an argv element and be interpreted as an option. The
/^[0-9a-f]{4,40}$/i fence rejects anything else as absent. spawnSync's
array args (no shell) is defense in depth, not the boundary.

Skill (commands/gsd/graphify.md) Step 2b renders one conditional line:

  Source commit: abc1234 (5 commits behind HEAD)
  Source commit: abc1234 (current)
  Source commit: abc1234 (freshness unknown)

Pre-v0.7 graphs omit the line entirely — no confusing "Source commit:
unknown" rendered.

Also documents `graphify hook install` in docs/CONFIGURATION.md for
multi-dev teams who would otherwise hit graph.json merge conflicts on
parallel rebuilds (sub-enhancement 2 from #3170).

TDD red→green: tests/enh-3170-graphify-commit-staleness.test.cjs
(8 assertions across git-aware, non-git, back-compat) was committed
describe.skip()'d in c567f23d when the issue was filed; this commit
removes .skip and lands the implementation that makes them green.
Full suite 7503/7503 passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 11:59:53 -04:00
Tom Boucher
41dc9bc060 fix(graphify): run /gsd-graphify build inline (with regression fence) (#3169)
* fix(graphify): run /gsd-graphify build inline instead of spawning a sub-agent

Closes #3166

graphify v0.7+ split the build into a fast AST-extraction phase (cached)
followed by a separate clustering + report-write phase. The cached
extraction phase survived sub-agent isolation, but the post-extraction
phase was SIGTERM'd when the agent exited, leaving the cache populated
and no graph.json / graph.html / GRAPH_REPORT.md artifacts written to
.planning/graphs/.

The skill now runs `graphify update .`, the three artifact copies, the
snapshot, and the status report as a single foreground Bash call so the
entire pipeline survives to completion. The CLI's `graphify build`
pre-flight still returns `action: "spawn_agent"` so external callers
and existing tests in tests/graphify.test.cjs keep working.

Regression test (tests/bug-3166-graphify-inline-build.test.cjs) parses
the skill's YAML frontmatter and body structurally to fence against
re-introducing Task to allowed-tools or `Task(` invocation syntax — a
future edit cannot regress the fix without tripping the fence.

Verified against safishamsi/graphify v0.7.0–v0.7.8 release notes:
`graphify update .` invocation and output filenames are unchanged in
v0.7+; no GSD-side interface migration is required.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(test): drop yaml dep from bug-3166 fence — replace with inline parser

CI failed with MODULE_NOT_FOUND on `require('yaml')` — the package
resolved locally as a transitive dep but isn't declared in package.json.
The project pattern (see tests/helpers.cjs `parseFrontmatter`) deliberately
avoids pulling in yaml/js-yaml.

Replace with a narrow inline parser that handles the scalar + block-list
subset used in this skill's frontmatter. Verified the fence still trips
when Task is reintroduced to allowed-tools.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(test): parse fenced blocks structurally for #3166 fence

Address CodeRabbit nitpicks on PR #3169: the body assertions used raw
markdown text regex (\bTask\s*\(/, /graphify\s+update\s+\./) which
violates the project's "parse, never grep" testing convention and risks
false-positives on prose.

Replace with extractFencedBlocks(body) which returns
[{lang, content}, ...] tuples per markdown code fence. Body assertions
now run against parsed blocks:

  - "no fenced code block contains Task("
    → deepEqual offending blocks to [] (vs. regex on raw body)
  - "a bash block invokes graphify update . / build snapshot"
    → filter to lang === 'bash', then substring-check inside parsed content

Substring checks within already-parsed fenced content are structural —
prose mentioning the word "Task" can no longer false-positive, and a
future prose reference to graphify cannot satisfy the positive assertions
either. The frontmatter side already used a parser; both sides now match.

Verified: re-introducing Task( inside a code fence still trips the
assertion. Full suite 7499/7499 passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(test): rename readFileSync-bound var to satisfy lint-no-source-grep

The structural-parse refactor introduced `b.content.includes(...)` calls
on parsed fenced-block records, but `loadSkill()` had also bound
`const content = fs.readFileSync(...)` for the markdown text. The
lint-no-source-grep regex scanner cannot distinguish scopes — it sees
"variable `content` is bound from readFileSync" and "`content.includes`
is called" and flags it as a source-grep test, even though the two
`content`s are different lexical entities.

Rename the readFileSync-bound local to `markdown`. Now `b.content` is
unambiguously a property access on a parsed-block record. Lint passes
(0 violations across 401 test files); behavior unchanged (4/4 tests
still pass, including the negative regression case).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(test): tighten snapshot assertion to gsd-tools.cjs prefix

CodeRabbit nitpick on bug-3166 fence: the snapshot bash assertion accepted
any 'graphify build snapshot' substring. Tighten to require it follows
'gsd-tools.cjs', matching the actual fenced invocation in
commands/gsd/graphify.md (which uses node "$HOME/.../gsd-tools.cjs" graphify
build snapshot — note the closing quote, so a literal 'gsd-tools graphify build
snapshot' substring would not match).

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 11:56:27 -04:00
Tom Boucher
8ad2e3877f fix(sdk): address CodeRabbit runtime bridge and docs findings 2026-05-05 19:59:56 -04:00
Tom Boucher
54b06e653e docs(sdk): document runtime bridge seam, strict mode, and fallback policy 2026-05-05 19:29:59 -04:00
Tom Boucher
a411e08e88 fix(coderabbit): resolve all 12 findings on PR #3152
MAJOR (security/correctness):
- commands/gsd/debug.md: add Write to allowed-tools (session file creation
  requires it — workflow explicitly says 'use Write tool, never heredoc')
- workflows/debug.md: add SLUG sanitization guard to steps 1b+1c (status/
  continue subcommands used raw user input in file paths — path traversal)
- workflows/thread.md: sanitize $ARGUMENTS in RESUME mode before file path
  construction (was bypassing the sanitization guard in CLOSE/STATUS modes)

MINOR (consistency/correctness):
- docs/INVENTORY-MANIFEST.json: remove stale top-level 'workflows' array
  (duplicate of families.workflows introduced in earlier update)
- commands/gsd/resume-work.md: normalize process to 'Execute end-to-end.'
- commands/gsd/settings.md: normalize process to 'Execute end-to-end.'
- commands/gsd/update.md: normalize otherwise branch to 'execute end-to-end.'
- docs/adr/0002: add Status: Accepted + Date header (ADR convention)
- workflows/extract-learnings.md: rename step extract_learnings → extract-learnings
- tests/extract-learnings.test.cjs: tighten step-name assertion to exact name

ARCHITECTURE:
- scripts/command-contract-helpers.cjs: extract CANONICAL_TOOLS, parseFrontmatter,
  executionContextRefs as shared module — single source of truth consumed by
  both lint script and test suite (prevents silent lint/test disagreement)
- scripts/lint-command-contract.cjs: require() helpers instead of duplicating
- tests/command-contract.test.cjs: require() helpers; move readFileSync calls
  inside test() callbacks (registration-time throws surface as named failures)
2026-05-05 16:06:29 -04:00
Tom Boucher
81f9534b5a feat(adr-0002): command contract validation module + prose @-ref cleanup + workflow extraction
ADR-0002: commands/gsd/*.md contract now enforced at two layers:

LINT (scripts/lint-command-contract.cjs — new CI step):
- name: present, starts with gsd: or gsd-
- description: non-empty
- allowed-tools: non-empty, all entries canonical
- execution_context @-refs: resolve on disk, no trailing prose on same line
- handles both @~/ and $HOME/ path prefixes

TEST (tests/command-contract.test.cjs — 361 assertions):
- Behavioral contract for all 65 command files
- Replaces scattered coverage in enh-2790 + bug-3135
- Per-command per-rule test — one failure names the exact file + rule

CI (.github/workflows/test.yml):
- 'Lint — command contract (ADR-0002)' step added to lint-tests job

PROSE @-REF CLEANUP (39 command files, ~900 tokens/invocation recovered):
- Removed redundant @~/.claude/get-shit-done/... paths from <process> prose
- execution_context block is now the single authoritative load declaration
- Routing commands (sketch, spike, update, pause-work, etc.) keep routing
  instructions; only the inert path token is stripped

WORKFLOW EXTRACTION (debug.md + thread.md, ~15,000 chars / ~3,750 tokens):
- get-shit-done/workflows/debug.md: full process extracted from commands/gsd/debug.md
- get-shit-done/workflows/thread.md: full process extracted from commands/gsd/thread.md
- Command files reduced to frontmatter + objective + execution_context + context
- debug.md: 9,603 → 1,703 chars; thread.md: 7,868 → 585 chars

RENAME:
- get-shit-done/workflows/extract_learnings.md → extract-learnings.md
  (aligns with hyphen convention of all other workflow files)

DOCS:
- docs/INVENTORY.md: count 85→87, new rows, rename row, fix add-todo --backlog attribution
- docs/INVENTORY-MANIFEST.json: +debug.md +thread.md +extract-learnings.md -extract_learnings.md

Closes ADR-0002 implementation.
2026-05-05 15:18:13 -04:00
Tom Boucher
695ad986c0 docs(adr): add ADR-0002 command contract validation module 2026-05-05 15:09:24 -04:00
Tom Boucher
c2b3f02d41 fix(#3135): restore workflows/add-backlog.md — capture --backlog had no workflow to load (#3147)
* fix(#3121): implement commands verb in SDK native registry

- Add commandsList handler — returns sorted JSON array of all registered
  verb strings; satisfies workstream-flag.md + agent tooling discoverability
- Register ['commands', commandsList] in DECISION_ROUTING_STATIC_CATALOG
- Add golden-policy exemption (SDK-only, no CJS mirror needed)
- check.decision-coverage-plan/verify were already registered; commands was the remaining gap

Closes #3121

* fix(#3135): restore workflows/add-backlog.md — capture --backlog had no workflow to load

Root cause: PR #2824 consolidated add-backlog into gsd-capture --backlog and
wired capture.md to delegate to workflows/add-backlog.md via execution_context.
The workflow file was never created (same gap class as reapply-patches.md which
was caught and fixed in the same PR). With no file to load, the agent had no
implementation steps to follow when --backlog was invoked.

Fix:
- Restore get-shit-done/workflows/add-backlog.md with full process from deleted
  commands/gsd/add-backlog.md (phase.next-decimal, ROADMAP write, mkdir, commit)
- Preserve #2280 ordering invariant: ROADMAP entry written before directory
- Fix docs/INVENTORY.md: remove incorrect attribution of --backlog to add-todo.md,
  add add-backlog.md row, bump workflow count 84→85
- Update docs/INVENTORY-MANIFEST.json
- Add regression test: every execution_context @-reference in commands/gsd/*.md
  must resolve to an existing workflow file on disk

Closes #3135
2026-05-05 15:02:38 -04:00
Tom Boucher
ba0409e04e fix(#3097, #3099): add cwd-drift sentinel + absolute-path guard to executor worktree protocol (#3144)
* fix(#3097, #3099): add cwd-drift + absolute-path guards to executor worktree protocol

#3097 — cwd-drift sentinel (gsd-executor.md task_commit_protocol step 0a):
  A Bash cd out of the worktree makes [ -f .git ] false, silently skipping
  all HEAD/branch safety guards. Commits land on main's branch.
  Fix: on first commit, capture spawn-time toplevel into sentinel file at
  .git/worktrees/<name>/gsd-spawn-toplevel. Before every subsequent commit,
  verify ACTUAL_TL matches EXPECTED_TL. Exits 1 with recovery instructions
  if drift detected.

#3099 — absolute-path guard (gsd-executor.md task_commit_protocol step 0b):
  Absolute paths constructed from the orchestrator's pwd (main repo root)
  resolve to the main repo inside worktrees. Edit/Write lands in wrong dir;
  git commit sees a clean worktree tree; work silently lost or leaks to main.
  Fix: before any absolute-path Edit/Write, verify path starts with
  WT_ROOT=/Users/thbouc/projects/get-shit-done. Prefer relative paths.

Both guards are documented in references/worktree-path-safety.md, which
is now loaded into every executor spawn prompt via <execution_context>.
The <worktree_branch_check> footnote references all three steps (0/0a/0b).

execute-phase.md: extracted worktree bash commands to reference file
(safe embed — @ files are inlined before the executor processes the prompt).
The blank line in <required_reading> was removed to stay at the XL=1700 line
budget after adding the @ reference.

Suite: 6986/6986. Closes #3097. Closes #3099.

* fix(lint+executor+docs): allow-test-rule, fix [ -f .git ] guard, fail-closed abs-path check, fix INVENTORY count
2026-05-05 15:02:26 -04:00
Tom Boucher
375bf3abd6 fix(#3126): replace hardcoded globalSkillsBase with first-class runtime-aware mapping (#3140)
* fix(#3126): replace hardcoded globalSkillsBase with runtime-aware mapping

Root cause: buildAgentSkillsBlock() used path.join(os.homedir(), '.claude',
'skills') for globalSkillsBase regardless of config.runtime. Cursor users
(and every non-Claude runtime) saw their global: skill lookups fail with
a warning pointing to the wrong directory.

Fix: introduces get-shit-done/bin/lib/runtime-homes.cjs — a pure, side-
effect-free module covering all 15 GSD runtimes:

  Runtime      Config base              Skills path
  claude        ~/.claude               ~/.claude/skills/
  cursor        ~/.cursor               ~/.cursor/skills/
  gemini        ~/.gemini               ~/.gemini/skills/
  codex         ~/.codex                ~/.codex/skills/
  copilot       ~/.copilot              ~/.copilot/skills/
  antigravity   ~/.gemini/antigravity   ...antigravity/skills/
  windsurf      ~/.codeium/windsurf     ...windsurf/skills/
  augment       ~/.augment              ~/.augment/skills/
  trae          ~/.trae                 ~/.trae/skills/
  qwen          ~/.qwen                 ~/.qwen/skills/
  hermes        ~/.hermes               ~/.hermes/skills/gsd/ (nested #2841)
  codebuddy     ~/.codebuddy            ~/.codebuddy/skills/
  cline         ~/.cline                null (rules-based, no skills dir)
  opencode      ~/.config/opencode      ...opencode/skills/
  kilo          ~/.config/kilo          ...kilo/skills/

Also adds CLAUDE_CONFIG_DIR env var support (was missing).
Warning messages now show the actual runtime-specific path.
Docs: INVENTORY.md CLI Modules 41→42.

Regression test: 30 assertions across all runtimes.
Suite: 7008/7008. Closes #3126.

* fix(lint+init): allow-test-rule, fix display path duplication (skillName appended twice)
2026-05-05 15:02:11 -04:00
Tom Boucher
811410be61 fix: address all 13 CodeRabbit comments from second review pass
Duplicate /gsd-help rows (caused by join-discord → help replacement
landing in tables that already had /gsd-help):
- Remove Discord-purpose duplicate row from README.md, README.ja-JP.md,
  README.zh-CN.md, README.ko-KR.md, docs/zh-CN/README.md,
  docs/zh-CN/USER-GUIDE.md, docs/ja-JP/USER-GUIDE.md,
  docs/ko-KR/USER-GUIDE.md
- Remove orphaned Discord-only ### /gsd-help sections from
  docs/ja-JP/COMMANDS.md and docs/ko-KR/COMMANDS.md

Gap-fix command precision (plan-milestone-gaps → audit-milestone --fix):
- README.ja-JP.md, README.ko-KR.md, README.zh-CN.md gap-fix rows
  updated to /gsd-audit-milestone --fix

docs/COMMANDS.md: document --path <dir> for --from-gsd2 in table and
  example block

docs/FEATURES.md:
- Add adaptive to /gsd-config --profile value set
- Add blank line before spike Produces table (MD058)

Suite: 6971/6971 pass
2026-05-05 11:22:37 -04:00
Tom Boucher
858c821829 docs: sweep stale /gsd-* command references across all user-facing docs
Replace 30 absorbed/deleted standalone command forms with their
consolidated flag-based equivalents across 25 files (English + 4
locales + AGENTS/CLI-TOOLS/CONFIGURATION):

  /gsd-session-report        → /gsd-pause-work --report
  /gsd-list-phase-assumptions → /gsd-discuss-phase --assumptions
  /gsd-analyze-dependencies  → /gsd-manager --analyze-deps
  /gsd-research-phase        → /gsd-plan-phase --research-phase
  /gsd-plan-milestone-gaps   → /gsd-audit-milestone
  /gsd-code-review-fix       → /gsd-code-review --fix
  /gsd-spike-wrap-up         → /gsd-spike --wrap-up
  /gsd-sketch-wrap-up        → /gsd-sketch --wrap-up
  /gsd-set-profile           → /gsd-config --profile
  /gsd-check-todos           → /gsd-capture --list
  /gsd-add-todo              → /gsd-capture
  /gsd-add-backlog           → /gsd-capture --backlog
  /gsd-plant-seed            → /gsd-capture --seed
  /gsd-note                  → /gsd-capture --note
  /gsd-add-phase             → /gsd-phase
  /gsd-insert-phase          → /gsd-phase --insert
  /gsd-edit-phase            → /gsd-phase --edit
  /gsd-remove-phase          → /gsd-phase --remove
  /gsd-new-workspace         → /gsd-workspace --new
  /gsd-list-workspaces       → /gsd-workspace --list
  /gsd-remove-workspace      → /gsd-workspace --remove
  /gsd-sync-skills           → /gsd-update --sync
  /gsd-reapply-patches       → /gsd-update --reapply
  /gsd-scan                  → /gsd-map-codebase --fast
  /gsd-intel                 → /gsd-map-codebase --query
  /gsd-next                  → /gsd-progress --next
  /gsd-do                    → /gsd-progress --do
  /gsd-status                → /gsd-progress
  /gsd-join-discord          → /gsd-help

Skipped: CHANGELOG, RELEASE notes, superpowers/specs (historical)
Suite: 6971/6971 pass
2026-05-05 11:01:15 -04:00
Tom Boucher
d978ad6b2f merge: sync main into PR #3114 and keep canonical next/profile commands 2026-05-04 23:32:42 -04:00
Tom Boucher
4ee6ce4a01 fix(3054): align docs anchors and structured stale-command checks 2026-05-04 23:30:35 -04:00
Tom Boucher
72f4c3b362 fix(docs): replace stale /gsd-next references with /gsd-progress --next 2026-05-04 22:54:01 -04:00
Tom Boucher
5e21bf7567 Deepen query dispatch seam with Command Topology Module (#3078)
* Deepen query dispatch seam with command topology module

* Stabilize SDK parity defaults and integration test gating

* docs(architecture): record pre-project config policy and e2e gate

* refactor(query): stop injecting native adapter in CLI dispatch path

* fix(config): align workflow auto-chain typing and docs
2026-05-03 18:11:38 -04:00
Tom Boucher
9c92c32f6e refactor(query): deepen runtime context/native adapter/output seams (#3076)
* refactor(query): deepen runtime context, native adapter, and cli output seams

* chore(changeset): add fragment for query seam deepening continuation

* refactor(query): converge internal command-resolution imports on canonical seam

* refactor(query): remove dead seam wrappers and converge on canonical modules

* docs(architecture): update context and adr for query seam completion

* fix(query): preserve gsd-tools stderr in cli output and clarify static ws test scope

* test(query): cover whitespace stderr and null exitCode fallback
2026-05-03 16:31:48 -04:00
Tom Boucher
f104dab332 refactor(query): deepen dispatch policy seam with structured result contract (#3066)
* refactor(query): deepen dispatch policy seam with structured result contract

Closes #3065.

- unify query dispatch outcome as typed success/failure union
- include error kind/details + final exit_code in failure path
- align native and fallback paths under one dispatch policy seam
- make CLI query path consume seam result (thin adapter)
- add ADR + context term for Dispatch Policy Module

* refactor(query): strengthen dispatch seam with shared error mapper and typed details

- add query-dispatch-error-mapper module shared by native/fallback paths
- remove ad-hoc inline mapping in dispatch/fallback executors
- lock error-details schema in mapper + dispatch tests
- document structured dispatch contract in QUERY-HANDLERS.md

* fix(query): return structured fallback failure when path resolution throws

- guard resolveGsdToolsPath in cjs dispatch path
- map thrown resolution errors to fallback_failure result
- add regression test for structured failure contract
2026-05-03 14:30:27 -04:00
Tom Boucher
eb365f7336 docs: audit and update docs/ for v1.40.0 release (#3048)
* docs(en): update FEATURES/USER-GUIDE/COMMANDS for v1.40.0 surface

- FEATURES.md: append v1.40.0 section (#122 skill consolidation, #123
  namespace meta-skills, #124 context-window guard, #125 phase-lifecycle
  status-line read-side); add to TOC.
- USER-GUIDE.md: add slash-command form (hyphen vs colon) primer and
  namespace routing primer; replace deleted slash forms in walkthroughs
  (`/gsd-add-backlog`, `/gsd-plant-seed`, `/gsd-add-phase`,
  `/gsd-set-profile`, `/gsd-list-workspaces`, etc.) with consolidated
  forms (`/gsd-capture --backlog`, `/gsd-phase --insert`,
  `/gsd-config --profile`, `/gsd-workspace --list`, etc.); fix
  `/gsd-spike-wrap-up` and `/gsd-sketch-wrap-up` to flag form.
- COMMANDS.md: clarify Command Syntax (Gemini = colon form, others =
  hyphen form); add Namespace Meta-Skills section with all six routers;
  add `--context` to /gsd-health flag table.

Refs #3047

* docs(en): refresh INVENTORY/CLI-TOOLS/STATE-MD-LIFECYCLE for v1.40.0

- INVENTORY.md: workflow-row "Invoked by" column updated to point at
  consolidated commands (`/gsd-phase` family, `/gsd-workspace --list`,
  `/gsd-config --advanced/--integrations/--profile`,
  `/gsd-sketch --wrap-up`, `/gsd-spike --wrap-up`); CLI-modules row for
  `secrets.cjs` updated to `/gsd-config --integrations`. Command count
  and namespace meta-skills section already reflect 65 shipped (= 59
  consolidated sub-skills + 6 ns-* routers).
- CLI-TOOLS.md: add `validate context` row under Validation Commands
  with the 60 %/70 % threshold envelope used by `/gsd-health --context`.
- STATE-MD-LIFECYCLE.md: flip status header from "proposed" to
  "shipped in v1.40.0" since `parseStateMd()` and `formatGsdState()`
  now read and render `active_phase`, `next_action`, `next_phases`,
  and `progress`.

`docs/AGENTS.md` audited and verified clean — `gsd-code-fixer` row
already lists the correct `/gsd-code-review --fix` spawner; no
deleted-skill references found. `docs/INVENTORY-MANIFEST.json`
audited and verified clean — already enumerates the 65 commands
(including six ns-* routers) and contains no deleted slash forms.

Refs #3047

* docs(en): cleanup ARCHITECTURE/CONFIGURATION for v1.40.0

- ARCHITECTURE.md: split Commands install-target list to call out the
  Gemini colon form (`/gsd:command-name`) vs hyphen form for every
  other runtime. Add a new subsection covering two-stage hierarchical
  routing via the six namespace meta-skills (#2792) and a paired note
  on the MCP token-budget interaction so readers see the two big
  per-turn cost levers in one place.
- CONFIGURATION.md: rewrite three references to the deleted
  `/gsd-settings-advanced` and `/gsd-settings-integrations` slash
  forms to use the consolidated `/gsd-config --advanced` /
  `/gsd-config --integrations` invocations. Add a new "STATE.md
  Frontmatter (Phase Lifecycle)" section documenting the four
  optional fields (`active_phase`, `next_action`, `next_phases`,
  `progress`) read by the v1.40 status-line, with a pointer to
  STATE-MD-LIFECYCLE.md for the full reference.

`docs/manual-update.md` audited and verified clean — already documents
`/gsd-update --reapply` (the consolidated form), no reference to the
deleted `/gsd-reapply-patches`.

Refs #3047

* docs(i18n): mirror v1.40.0 slash-command rename into ja-JP/ko-KR/zh-CN/pt-BR

Mechanical token-level renames only — every reference to a deleted
micro-skill slash form is rewritten to the consolidated form on the
matching parent skill. No prose was machine-translated; new prose
sections (slash-form primer, namespace routing primer, v1.40 feature
entries, STATE.md frontmatter) were left for human translator
follow-up.

Renames applied uniformly across all four trees:
  /gsd-add-todo, /gsd-add-note, /gsd-add-backlog,
  /gsd-plant-seed, /gsd-check-todos      → /gsd-capture[ --note|
                                            --backlog|--seed|--list]
  /gsd-add-phase, /gsd-insert-phase,
  /gsd-remove-phase, /gsd-edit-phase     → /gsd-phase[ --insert|
                                            --remove|--edit]
  /gsd-new-workspace, /gsd-list-workspaces,
  /gsd-remove-workspace                  → /gsd-workspace[ --new|
                                            --list|--remove]
  /gsd-settings-advanced,
  /gsd-settings-integrations,
  /gsd-set-profile                       → /gsd-config[ --advanced|
                                            --integrations|--profile]
  /gsd-sketch-wrap-up                    → /gsd-sketch --wrap-up
  /gsd-spike-wrap-up                     → /gsd-spike --wrap-up
  /gsd-reapply-patches                   → /gsd-update --reapply
  /gsd-code-review-fix                   → /gsd-code-review --fix
  /gsd-plan-milestone-gaps               → /gsd-audit-milestone

Refs #3047

* docs(changelog): regroup [Unreleased] under Feature/Enhancement/Fix

Replace the existing Keep-a-Changelog \`Added\` / \`Changed\` /
\`Performance\` / \`Removed\` / \`Fixed\` sub-headers in the [Unreleased]
block with the issue/PR template taxonomy:

  Added                 → Feature
  Changed / Performance → Enhancement
  Removed               → Enhancement
  Fixed                 → Fix

Order within the release: Feature → Enhancement → Fix. Every bullet
preserved verbatim — only headers and grouping changed; the awkward
inline-versioned headers (\`### Added — 1.40.0-rc.1\`,
\`### Changed — 1.40.0-rc.1\`, \`### Fixed — 1.40.0-rc.1\`) folded into
the same buckets with the \`— 1.40.0-rc.1\` suffix dropped, since the
[Unreleased] block IS 1.40.0-rc.1.

The [1.39.2] hotfix block called out in #3047's spec does not yet
exist in CHANGELOG.md (the previously released hotfix is [1.39.1]),
so this commit only regroups [Unreleased]. Older release blocks
([1.39.1] and earlier) are frozen and untouched.

Refs #3047

* docs(changeset): add fragment for v1.40.0 doc audit

Refs #3047

* docs(en): strip leading / from deleted slash-command tokens in FEATURES

REQ-CONSOLIDATE-03 and REQ-CONSOLIDATE-04 listed deleted commands by
their `/gsd-foo` form for the historical record. The docs-parity tests
in bug-3010, bug-3029-3034, and bug-3042-3044 use the regex
`/\/gsd-[a-z0-9][a-z0-9-]*/g` to scan user-facing surfaces for any
remaining mention of removed slash forms — they cannot tell prose
about a deleted command from a live recommendation.

Strip the leading slash from the bare-name references (preserve the
historical text otherwise). Tests now require a `/` prefix to match,
so `gsd-add-todo` reads identically to a human but no longer trips
the parser.

Verified locally: 65/65 tests pass across the three docs-parity
suites that were red on CI run 25270072600.

Refs #3047

* docs(en): fix CR feedback + drop literal /gsd:plan-phase from USER-GUIDE

CI: tests/bug-2543-gsd-slash-namespace.test.cjs flagged
docs/USER-GUIDE.md:35 for embedding the literal `/gsd:plan-phase`
token in the parenthetical Gemini-form example. The test scans every
.md under docs/ for `/gsd:<live-cmd>` because non-Gemini surfaces must
not advertise the colon form. Replaced the literal example with a
prose substitution rule.

CR: docs/ARCHITECTURE.md:125 — the namespace meta-skills were listed
by file-prefix (`gsd-ns-workflow`) but the invocable frontmatter `name:`
is the bare form (`gsd-workflow`). Verified against the six
`commands/gsd/ns-*.md` files. Replaced with the canonical names and
noted the file/name disagreement in-line.

CR: docs/COMMANDS.md:723 — `v1.40` aligned to canonical `v1.40.0`.

CR: docs/FEATURES.md:2679 — REQ-CTX-GUARD-02 advertised the wrong
invocation (`gsd-tools validate context`). The shipped handler is
exposed via `gsd-sdk query validate.context` and requires explicit
`--tokens-used <int>` + `--context-window <int>` flags (verified
against sdk/src/query/validate.ts:849-882 and
get-shit-done/bin/lib/validate-command-router.cjs:19-36).

CR: docs/zh-CN/README.md:533 — added `inherit` to the profile-options
parenthetical to match the canonical set (verified against
model-profiles.cjs:29 `VALID_PROFILES = […MODEL_PROFILES['gsd-planner'], 'inherit']`).

Verified locally: 74/74 tests pass across the four docs-parity suites
that were red on CI runs 25270072600 and 25270182903.

Refs #3047
2026-05-03 07:33:27 -04:00
Tom Boucher
1e6737cd8e feat(plan-phase): --research-phase flag + scrub stale slash-command refs (#3042, #3044) (#3045)
* feat(plan-phase): --research-phase flag absorbs deleted /gsd-research-phase + scrub stale refs (#3042, #3044)

#3042 (orphaned research-phase): /gsd-research-phase had a workflow file
but no slash-command stub. Rather than restore the orphan, the research-
only capability is now a flag on /gsd-plan-phase:

  /gsd-plan-phase --research-phase <N>

When set, the workflow scopes to phase N, runs the research step (Section
5 of the existing plan-phase workflow), then early-exits before the
planner/plan-checker/verifier chain.

Per RCA against the deleted standalone, the flag adds two modifiers to
fully cover the original surface (Option B from the RCA discussion):

- --view : print existing RESEARCH.md to stdout, no spawn. Cheapest mode
  for the correction-without-replanning loop the issue reporter
  explicitly called out. Errors with a clear hint if RESEARCH.md is
  missing.
- --research : reuse the existing "force re-research" semantics. In
  research-only mode this skips the existing-RESEARCH.md prompt and
  re-spawns unconditionally.
- Neither flag, RESEARCH.md exists : prompt update/view/skip. Mirrors
  the deleted standalone's existing-artifact menu (#3042 RCA).

#3044 (stale slash-command refs): scrubbed five deleted commands from
all user-facing surfaces, including English docs, 4 localized doc sets
(ja-JP, ko-KR, zh-CN, pt-BR), workflows, templates, and references.

  /gsd-check-todos          → /gsd-capture --list
  /gsd-new-workspace        → /gsd-workspace --new
  /gsd-status               → /gsd-progress
  /gsd-plan-milestone-gaps  → table rows / orphan sections removed
                              (PR #3038 only scrubbed workflows/agent;
                              missed the docs surfaces this PR covers)
  /gsd-research-phase       → /gsd-plan-phase --research-phase

Includes a fix to docs/issue-driven-orchestration.md (PR #3036)
which itself referenced /gsd-new-workspace 4 times — self-correction.

Removed:
- get-shit-done/workflows/research-phase.md (orphan, capability
  absorbed into --research-phase flag)

Tests:
- tests/bug-3042-3044-research-flag-and-stale-refs.test.cjs — 46
  structural-IR tests across both bugs:
  - argument-hint advertises --research-phase + --view
  - workflow parses --research-phase, sets RESEARCH_ONLY,
    early-exits before planner
  - --view prints RESEARCH.md without spawning
  - --research forces refresh in research-only mode
  - existing-RESEARCH.md prompt path with update/view/skip
  - workflows/research-phase.md is removed
  - 5 deleted slash-commands absent from 17 English user-facing
    surfaces + 16 localized doc surfaces (4 locales × 4 docs each)
  - replacement command tokens present where deleted ones lived

6950/6950 full suite pass. Lints clean.

Closes #3042
Closes #3044

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix: address all 8 CR findings on PR #3045

Major (3):
- get-shit-done/workflows/plan-phase.md:344 — added explicit early-exit
  guard at Section 5.1: "Skip if RESEARCH_ONLY=true". Without it, an LLM
  could fall through "use existing, skip to step 6" → planner spawn,
  violating the research-only contract. The guard makes the early-exit
  unreachable from any non-research-only branch.
- get-shit-done/references/continuation-format.md (3 examples) +
  zh-CN/.../continuation-format.md (3 examples) — pointed to
  `/gsd-plan-phase --research-phase` but docs/COMMANDS.md didn't
  document the flag. Added a full --research-phase + --view + --research
  modifier section to the /gsd-plan-phase flag table in COMMANDS.md so
  the canonical reference matches the continuation examples.

Minor (5):
- docs/FEATURES.md:1632 — `/gsd-plan-phase --research-phase` →
  `/gsd-plan-phase --research-phase <N>` (include required arg).
- get-shit-done/templates/README.md:46 — NN-VALIDATION.md producer
  reverted from `/gsd-plan-phase --research-phase` (Nyquist) to plain
  `/gsd-plan-phase` (Nyquist). VALIDATION.md is created during normal
  Nyquist flow, not research-only mode — the bulk replacement was
  wrong for that line.
- get-shit-done/workflows/help.md:89 — signature line was missing
  `--research`; added it alongside `--research-phase` and `--view`.
- tests/bug-3042-3044-...:197 — promptHasView/promptHasSkip were
  tautological (matched anywhere in 1700-line workflow). Tightened
  to a proximity check anchored on "RESEARCH.md already exists" prompt
  header within a 600-char window. Updated workflow to emit that
  literal phrase.
- tests/feat-2840-...:95 — workspace assertion used `/gsd-workspace`
  but the documented replacement is `/gsd-workspace --new`. Tightened
  to require both tokens (in 3 places: requiredCommands list, regex
  in conceptPairs, error message).

6950/6950 full suite pass. Lint clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-02 23:12:50 -04:00
Tom Boucher
7714b5244b fix(workflows,docs): scrub stale /gsd-code-review-fix and /gsd-plan-milestone-gaps refs (#3029, #3034) (#3038)
* fix(workflows,docs): scrub stale /gsd-code-review-fix and /gsd-plan-milestone-gaps refs (#3029, #3034)

#2790 consolidated /gsd-code-review-fix into /gsd-code-review --fix and
deleted /gsd-plan-milestone-gaps in favor of inline gap planning as part
of /gsd-audit-milestone's output. The deletion was propagated through
some surfaces (#2950 covered help/do/settings/discuss-phase/etc.) but
several user-facing surfaces still emitted the old forms:

#3029 — /gsd-code-review-fix references in:
- agents/gsd-code-fixer.md (description, "Spawned by", recovery prose)
- get-shit-done/workflows/code-review.md (offer text)
- get-shit-done/workflows/execute-phase.md (offer text)
- get-shit-done/workflows/code-review-fix.md (internal retry hints)
- docs/INVENTORY.md (agent + workflow rows)
- docs/CONFIGURATION.md (workflow.code_review row)
- docs/USER-GUIDE.md (3 occurrences in walkthrough)
- docs/AGENTS.md (gsd-code-fixer agent stub)
- docs/FEATURES.md (commands list + REQ-REVIEW-04)

All replaced with /gsd-code-review --fix. Internal retry hints in the
workflow file itself updated to point at the new form. Release notes
(docs/RELEASE-*.md) and gsd-ns-review's "absorbed by" deletion note
left unchanged — historical/explanatory content.

#3034 — /gsd-plan-milestone-gaps references in:
- get-shit-done/workflows/audit-milestone.md (<offer_next> blocks for
  gaps_found and tech_debt: lines 281, 323)
- commands/gsd/complete-milestone.md (gaps_found pre-flight: lines 46, 57)

Replaced with inline closure path:
  /gsd-phase --insert <N> "Close gap: <REQ-ID> ..."
  /gsd-discuss-phase <N>
  /gsd-plan-phase <N>
  /gsd-execute-phase <N>

Plus a Nyquist-coverage hint pointing at /gsd-validate-phase /
/gsd-secure-phase for retroactive audit-chain hygiene gaps. The
gsd-ns-project SKILL.md "deleted by #2790" note is preserved
(it's the canonical pointer for future readers asking what
happened to the command).

Tests:
- tests/bug-3029-3034-stale-command-routes.test.cjs — parser-based
  assertions per fixed surface, plus a structural cross-check that
  gsd-ns-project keeps the deletion note. 15 tests, all green.
- 6905/6905 full suite passes.

Closes #3029
Closes #3034

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix: address CR feedback on PR #3038 — argument order, structural tests, agent count

CR findings on PR #3038:

1. **docs/USER-GUIDE.md (Major)** — `--fix` examples used flag-first form
   (`/gsd-code-review --fix 3`), but the supported CLI grammar is
   phase-first (`/gsd-code-review 3 --fix`). The original sed-based
   replacement preserved the position of the `gsd-code-review-fix`
   token, producing the wrong order. Fixed in USER-GUIDE.md (3
   occurrences) and the same drift in the workflow surfaces:
   - get-shit-done/workflows/code-review-fix.md (2 retry hints)
   - get-shit-done/workflows/code-review.md (offer text)
   - get-shit-done/workflows/execute-phase.md (offer text)

2. **docs/AGENTS.md (Minor)** — internal count drift: line 483 said
   "Ten additional agents" but line 725 said "12 advanced/specialized".
   Filesystem reality: 33 agents total, 21 primary, 12 specialized
   (count of `### ` stubs in the Advanced and Specialized section).
   Updated lines 3, 13, 483 to use 12/33 and added the two missing
   names (doc-classifier, doc-synthesizer) to the inline list at
   line 13.

3. **tests:94 (Major refactor suggestion)** — `.includes()` token checks
   were source-grep style. Refactored to a typed-IR pattern: extract
   the SET of slash-command tokens via regex, assert membership on the
   parsed Set instead of substring scanning the raw file text. Added
   the `allow-test-rule` comment explaining the IR-build vs
   IR-assertion split per scripts/lint-no-source-grep.cjs convention.

4. **tests:130 (Major)** — replacement-path assertion was file-wide and
   could false-pass on generic mentions of "inline" elsewhere in the
   file. Refactored: `extractOfferBlocks(content)` returns the typed
   list of `<offer_next>` and "Pre-flight" blocks where the deleted
   command previously lived, and the assertion runs against those
   blocks specifically. Now requires `/gsd-phase --insert` or
   inline-audit prose to appear in the same offer block, not just
   somewhere in the file.

15/15 targeted tests pass. 6905/6905 full suite pass. Lints clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-02 17:23:44 -04:00
Tom Boucher
117b3ec009 docs: add issue-driven orchestration guide (#2840) (#3036)
* docs: add issue-driven orchestration guide (#2840)

Adds docs/issue-driven-orchestration.md — a recipe for driving GSD from a
GitHub / Linear / Jira issue using existing primitives. Maps Symphony-style
orchestration concepts onto GSD commands without vendoring code, adding a
daemon, or introducing tracker integration.

Concept mapping covers:
- WORKFLOW.md → ROADMAP.md / STATE.md / phase CONTEXT.md / phase PLAN.md
- isolated agent workspace → /gsd-new-workspace --strategy worktree
- agent dispatch → /gsd-manager (interactive), /gsd-autonomous (unattended)
- per-phase steps → /gsd-discuss-phase → /gsd-plan-phase → /gsd-execute-phase
- proof-of-work → /gsd-verify-work (UAT.md persists across /clear)
- adversarial review → /gsd-review (cross-AI peer review)
- human merge gate → /gsd-ship
- follow-up capture → /gsd-note, /gsd-plant-seed, /gsd-new-milestone

End-to-end flow walks through 7 numbered steps from picking the tracker
issue to capturing follow-ups. Safety boundaries (isolated worktrees,
explicit human review, no automatic public posting, verification before
ship) and non-goals (no vendoring, no daemon, no mandatory tracker, no
gate bypass, no command-surface expansion) are spelled out explicitly so
the doc cannot drift into "let's just add one more flag".

Cross-linked from docs/README.md (Documentation Index) and
docs/USER-GUIDE.md (Table of Contents preamble).

Tests: tests/feat-2840-issue-driven-orchestration-guide.test.cjs — 9
structural-IR tests parse the guide into a typed record and assert on
flags (commandsPresent, conceptPairs, nonGoalFlags, safetyFlags,
numberedSteps). Fence-language MD040 check enforced. Cross-link
presence enforced. No raw-text assertions on prose.

6890/6890 tests pass. Lint:tests clean (allow-test-rule comment justifies
the doc-shape parser per scripts/lint-no-source-grep.cjs escape hatch).
Lint:changeset clean.

Closes #2840

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(test): guard USER-GUIDE.md existsSync before read (CR #3036)

CR Minor: cross-linked-from-USER-GUIDE.md test called fs.readFileSync
directly without first asserting fs.existsSync, asymmetric with the
README.md test above. A missing USER-GUIDE.md would throw ENOENT instead
of producing a meaningful assertion message. Mirror the null-guard
pattern.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-02 16:57:42 -04:00
Tom Boucher
95d2bc20f8 feat(hooks): opt-in SessionStart update banner for non-statusline users (#2795) (#3035)
* feat(hooks): opt-in SessionStart update banner for non-statusline users (#2795)

When a user declines (or keeps a non-GSD) statusline at install time, the
installer now offers an opt-in SessionStart banner that surfaces GSD update
availability. The banner reads the existing
~/.cache/gsd/gsd-update-check.json cache (written by
gsd-check-update-worker.js) and emits a single systemMessage line only when
update_available is true:

    GSD update available: <installed> → <latest>. Run /gsd-update.

It is silent when up-to-date and rate-limits "check failed" diagnostics to
once per 24h via a sentinel file so a corrupt cache doesn't nag every
session. Removed cleanly by `npx get-shit-done-cc --uninstall` which strips
both the script and the SessionStart entry. The banner is never offered when
GSD's statusline is being installed (statusline already surfaces update
info, so re-prompting would be noise).

Implementation:
- hooks/gsd-update-banner.js — pure functions buildBannerOutput,
  shouldSuppressFailureWarning, readCache; thin main() wires them.
- bin/install.js — handleUpdateBanner() prompt, parseUpdateBannerInput(),
  buildUpdateBannerHookEntry(), buildUpdateBannerPromptText(); chained into
  installAllRuntimes() so finalize() receives both flags. updateBannerCommand
  computed alongside the other JS-hook commands; finishInstall() registers
  the SessionStart entry only when shouldInstallBanner === true and the
  hook file is present at the target.
- Hook ships in scripts/build-hooks.js HOOKS_TO_COPY, listed in
  MANAGED_HOOKS for stale-detection in gsd-check-update-worker.js, in the
  uninstall hook-removal lists in install.js, and in the
  rewriteLegacyManagedNodeHookCommands allowlist.

Tests:
- tests/feat-2795-update-banner.test.cjs — 22 tests, structural-IR
  assertions on parsed JSON envelopes (no raw-text matching). Covers
  pure-function branches (cache present/absent, parseError, rate-limit
  suppression, missing version fields), end-to-end hook invocation against
  fixture cache states, and install.js wiring (prompt text, input parsing,
  hook entry shape).
- tests/trae-install.test.cjs — updated install() return-shape assertion to
  include updateBannerCommand: null for the no-settings runtime.
- 6881/6881 tests pass.

Docs (bundled in same commit per the bundle-docs-with-code skill):
- docs/USER-GUIDE.md — new "Surface GSD Update Notifications Without GSD's
  Statusline" task section with opt-in/opt-out instructions.
- docs/FEATURES.md — REQ-HOOK-08 added; "Update Banner" subsection under
  the Hook System feature with cache flow + removal path.
- docs/INVENTORY.md — hook count 11 → 12, new row for gsd-update-banner.js.
- docs/INVENTORY-MANIFEST.json — regenerated.

Closes #2795

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(install): gate banner prompt on actual installability (CR #3035)

CodeRabbit findings on PR #3035:

- bin/install.js (Major): continueAfterStatusline gated banner prompt on
  the raw `shouldInstallStatusline` flag from handleStatusline. But
  finishInstall later silently skips the statusline write on local
  installs unless --force-statusline is set (#2248). Two consequences:
    1. Interactive local Claude/Gemini installs got neither a statusline
       nor a banner offer.
    2. Codex/Cursor/Copilot/Windsurf/Trae/Cline-only installs (where
       every result.updateBannerCommand is null) still got prompted even
       though the choice was silently ignored.
  Fix: derive willInstallStatusline = shouldInstallStatusline &&
  (isGlobal || forceStatusline), and gate the banner prompt on a
  canInstallBanner precondition computed from results[].updateBannerCommand.
  Pass the raw shouldInstallStatusline through to finalize unchanged so
  per-runtime statusline gating in finishInstall is unaffected.

- tests/feat-2795-update-banner.test.cjs (Minor): rate-limit suppression
  test parsed r1.stdout without first asserting r1.status === 0. Other
  e2e tests in this file (lines 210, 241) do this. A non-zero exit would
  surface as a cryptic SyntaxError instead of a status assertion failure.
  Fix applied verbatim.

6881/6881 tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-02 16:33:16 -04:00
Tom Boucher
8c43ba7301 docs(#3025): MCP tool schema as a context-budget concern (#3032)
* docs(#3025): MCP tool schema as a context-budget concern

Adds documentation covering the largest GSD cost lever that GSD
itself does not own: MCP tool schema injection. Every enabled MCP
server adds its schema to every turn (often 20k+ tokens for
heavyweight servers like browser/playwright, mac-tools, etc.),
which can dwarf whatever `model_profile` tuning saves.

Two doc surfaces (per the bundle-docs-with-code skill depth gradient):

1. get-shit-done/references/context-budget.md
   - New "MCP Tool Schema Cost (Harness Concern)" section.
   - Explains schemas-per-turn cost framing.
   - Names enabledMcpjsonServers / disabledMcpjsonServers and
     .claude/settings.json explicitly.
   - Pre-phase audit checklist: browser/playwright, platform-specific,
     cross-project/stale, duplicate/shadow.
   - Explicit "GSD does not manage MCP enablement — harness concern"
     statement so users don't hunt for a GSD setting.
   - Links to Anthropic Claude Code MCP docs as canonical reference.
   - Notes compounding interaction with model_profile (additive levers).

2. docs/USER-GUIDE.md
   - New task-oriented "Trim MCP servers to reduce per-turn cost"
     section above "Using Non-Claude Runtimes".
   - Same checklist condensed.
   - Cross-link to context-budget.md for the full reference.

Tests:
- tests/feat-3025-mcp-token-budget-docs.test.cjs (12 cases) parses
  both docs into typed semantic-flag records and asserts behavioral
  invariants (mentions key, includes audit, names harness, etc.)
  rather than substring-matching prose. Adheres to CONTRIBUTING.md
  no-source-grep — section can be reworded freely as long as the
  required semantics survive.
- Markdownlint pre-flight tests (MD040 fence language, MD056 table
  column count) per the bundle-docs-with-code skill so CR can't
  ratchet on prose nitpicks across multiple review rounds.

Verification:
- 12/12 pass on regression test
- 6857/6857 full suite (12 net new)
- lint-no-source-grep clean (377 test files)

Companion to #3023 (per-phase-type model map) and #3024 (dynamic
routing). Together they cover the three biggest cost levers users
ask about; this issue covers the one GSD does not own.

Closes #3025

* docs(#3025): batch 3 CR fixes — pr id, relative link, named flag

CodeRabbit on PR #3032 (3 minor — 2 inline + 1 nitpick), all in one
push per the bundle-docs-with-code skill (avoid per-round nitpick
ratchet):

1. Inline (Minor) — .changeset/mcp-token-budget-docs.md:3
   `pr: TBD` → `pr: 3032` so changeset tooling can link the entry.

2. Inline (Minor) — docs/USER-GUIDE.md:1101
   Used a hardcoded `https://github.com/.../blob/main/...` URL for the
   cross-link to `context-budget.md`. Rest of USER-GUIDE.md uses
   relative links. Switched to `../get-shit-done/references/context-
   budget.md#mcp-tool-schema-cost-harness-concern` so feature-branch
   work shows the right content and rename-resilience is preserved.

3. Nitpick — tests/feat-3025-mcp-token-budget-docs.test.cjs:234
   The cross-link assertion used an inline `/context-budget/i.test(...)`
   while every other invariant in the file lived as a named flag in
   `parseMcpBudgetSection`. Per CONTRIBUTING.md no-source-grep, added
   `crossLinksContextBudget` to the parser and asserted on
   `parsed.crossLinksContextBudget` so the cross-link rule sits next
   to its siblings.

Verification:
- 12/12 pass on regression test (no count change; refactor only)
- No source code changes, only docs + tests

* test(#3025): strip inline markdown before phrase-match (CR nitpick)

CodeRabbit caught that the `explainsHarnessNotGsd` primary regex
branch couldn't match "GSD does **not** manage" in
context-budget.md because the markdown bold markers (`**`) sit
between contiguous words. The test passed today only via the
fallback `harness (concern|setting|controlled)` branch — the
primary branch was effectively dead code.

Fix: strip inline markdown emphasis (`**`, `*`, `~~`) and inline-
code backticks before any phrase-matching in `parseMcpBudgetSection`.
All seven flag computations now run against the stripped text so
markdown formatting can't silently invalidate any invariant.

Underscores are intentionally NOT stripped — `model_profile` and
other snake_case identifiers must survive intact for the
mentionsModelProfileInteraction check to find them.

Verification: 12/12 still pass; primary branches now fire on
real markdown content rather than relying on fallbacks.

* test(#3025): guard markdownlint tests against null section (CR nitpick)

CodeRabbit caught that the MD040 and MD056 markdownlint pre-flight
tests called `section.match(...)` and `section.split('\n')`
directly on the value returned by `extractSection`, which returns
null when no matching header is found. If the MCP section is ever
removed (regression), both tests would throw `TypeError: Cannot
read properties of null` instead of producing a clean assertion
failure naming the actual problem.

The semantic tests above are protected because parseMcpBudgetSection
short-circuits to a typed-falsy record on null input. The
markdownlint tests bypassed that guard since they need raw section
text, not parsed flags. Added `assert.ok(section, ...)` preconditions
to both so a missing section produces a meaningful failure message.

No content changes; defensive programming only.

Verification: 12/12 still pass.
2026-05-02 15:24:26 -04:00
Tom Boucher
e1d661ece0 feat(#3024): dynamic routing with failure-tier escalation (#3031)
* feat(#3024): dynamic routing with failure-tier escalation

Adds a `dynamic_routing` block to .planning/config.json that lets
the resolver start agents on a cheap tier and escalate one tier up
when the orchestrator detects a soft failure (verification
inconclusive, plan-check FLAG, etc.). Solves the "pay Opus rates as
insurance" anti-pattern by making escalation observed-quality-driven.

Architecture:
- AGENT_DEFAULT_TIERS map (light/standard/heavy) — every agent in
  MODEL_PROFILES declares a default tier; tests assert coverage
  so adding a new agent without updating the map fails CI.
- nextTier(currentTier) helper — light → standard → heavy → heavy
  (heavy stays at heavy; can't go further).
- resolveModelForTier(cwd, agentType, attempt) — new resolver. The
  orchestrator tracks the attempt counter and passes 0 for the
  first spawn, 1+ on escalation. The resolver caps internally at
  max_escalations so the orchestrator can blindly bump the counter.
- Schema validation: dynamic_routing.enabled / escalate_on_failure /
  max_escalations / tier_models.<light|standard|heavy>. Unknown
  tiers and unknown sub-keys rejected at config-set time.
- SDK schema mirror updated to keep CJS/SDK in lockstep (#2653).

Resolution precedence (highest → lowest):
  1. model_overrides[<agent>]              (full IDs accepted)
  2. dynamic_routing.tier_models[<tier>]   (NEW; escalation-aware)
  3. models[<phase_type>]                  (#3023 phase-type map)
  4. model_profile                         (per-agent column)
  5. Runtime default

Backward compatibility: dynamic_routing is disabled by default
(enabled: false or block omitted). resolveModelForTier short-
circuits to resolveModelInternal in that case, so callers can
adopt unconditionally without breaking existing behavior.

This PR delivers the JS-layer infrastructure: schema + tier map +
resolver. Orchestrator adoption (workflow markdown updates that
detect soft failures and call resolveModelForTier with attempt+1)
is incremental follow-up — verifier / plan-checker / integration-
checker each adopt the protocol when ready.

Tests (23 cases, all structural-IR — no stdout grep):
- Schema invariants: AGENT_DEFAULT_TIERS coverage, VALID_AGENT_TIERS
  exact match, every assignment uses a valid tier
- nextTier helper: light→standard→heavy→heavy, null on invalid input
- Disabled mode: no block + enabled:false both no-op (back-compat)
- Enabled mode: attempt=0 returns default tier model, attempt=1
  escalates, beyond max_escalations caps, heavy agents stay heavy,
  default max_escalations=1 when omitted
- Precedence: per-agent override beats dynamic_routing,
  dynamic_routing beats phase-type models
- Validation: every settings key accepted, unknown tiers/sub-keys
  rejected, bare `dynamic_routing` rejected as config-set target

Documentation:
- get-shit-done/references/model-profiles.md — full reference section
- docs/CONFIGURATION.md — full settings table + escalation flow
- docs/USER-GUIDE.md — task-oriented "Cheap-by-default" section
- docs/FEATURES.md — config row cross-link

Verification:
- 23/23 pass on regression test
- 6843/6843 full suite (23 net new from 6820)
- lint-no-source-grep clean (376 test files)
- SDK schema mirror keeps CJS/SDK in sync per #2653 parity test

Closes #3024

* fix(#3024): honor escalate_on_failure:false + 3 CR follow-ups

CodeRabbit on PR #3031 (4 findings — 1 Major + 2 Minor + 1 Nitpick):

1. **Major (inline)** — get-shit-done/bin/lib/core.cjs:1668
   resolveModelForTier ignored dynamic_routing.escalate_on_failure.
   When the user set it to false, escalation should be disabled, but
   the resolver only checked attempt/max_escalations. An orchestrator
   that always passes attempt+1 on retry would silently escalate
   despite the user opting out.
   Fix: gate effectiveAttempt on `dr.escalate_on_failure !== false`
   so false short-circuits every attempt back to the default tier.

2. **Minor (inline)** — docs/CONFIGURATION.md:123-126
   The dynamic_routing rows in the Core Settings table had 4 cells
   instead of 5 (missing the Options column), breaking the table
   structure. Added explicit Options values for enabled / escalate_on_failure
   / max_escalations rows.

3. **Minor (outside-diff)** — references/model-profiles.md:179-195
   "Resolution Logic" sketch was pre-#3024 and didn't include
   dynamic_routing in the precedence ladder. Updated to a 6-step
   block with dynamic_routing at step 3 (between override and
   phase-type).

4. **Nitpick** — tests/feat-3024-dynamic-routing.test.cjs:189+
   Tests used `if (lightAgent) { ... }` guards that silent-pass
   when AGENT_DEFAULT_TIERS drifts. Replaced all 5 conditional
   skips with `assert.ok(lightAgent, '...')` preconditions so a
   tier-mapping change surfaces as a test failure.

Plus: 2 new regression tests for the Major fix:
- escalate_on_failure:false caps every attempt at default tier
- escalate_on_failure:true (explicit) still escalates normally

Verification:
- 25/25 pass on regression test (23 prior + 2 escalate_on_failure)
- 6845/6845 full suite (2 net new)
- lint-no-source-grep clean

* docs(#3024): align precedence + add fence language tags (CR follow-up)

CodeRabbit (3 minor):

1. docs/CONFIGURATION.md:691 — "Per-Phase-Type Models → Resolution
   precedence" was a 4-step block written pre-#3024; readers got
   contradictory rules between the per-phase-type section and the
   later dynamic_routing section. Updated to the same 5-step ladder
   with dynamic_routing at step 2, and noted that dynamic_routing
   is disabled by default so this section's behavior is unchanged
   when the kill-switch is off.

2. docs/CONFIGURATION.md:770 — escalation-flow code fence missing
   language tag (MD040). Added `text`.

3. references/model-profiles.md:184 — resolution-ladder code fence
   missing language tag (MD040). Added `text`.

No code changes; docs only. Verification: regression test still 25/25.

* docs(#3024): clarify precedence prose — five layers, not four (CR nitpick)

CodeRabbit nitpick: the "Per-Phase-Type Models → Resolution
precedence" prose said "The four layers compose..." but the ladder
above lists five (including Runtime default). Also "dynamic_routing
escalates per-attempt above all of them" misreads as suggesting
dynamic_routing wins over model_overrides — actually overrides still
win at step 1.

Reworded top-down so the precedence direction is unambiguous:
  - model_profile = base
  - models = phase-level override
  - dynamic_routing = per-attempt escalation
  - model_overrides = per-agent exception (top)
  - runtime default = fallback

No code changes; docs only.

* docs(#3024): note escalate_on_failure:false in escalation-flow diagram (CR)

CodeRabbit nitpick: the escalation-flow diagram in
docs/CONFIGURATION.md described the soft-failure → respawn →
tier_models[next_tier_up] path, but didn't surface the
`dynamic_routing.escalate_on_failure: false` kill-switch right next
to it. Users reading the flow diagram (which is the canonical place
to understand attempt behavior) wouldn't see that the kill-switch
overrides the soft-failure branch.

Added a one-paragraph note immediately after the flow listing,
before the tier-sequence example, so the kill-switch is visible
exactly where users decide whether escalation will happen.

No code changes; docs only.
2026-05-02 14:26:35 -04:00
Tom Boucher
d812c66020 feat(#3023): per-phase-type model map in .planning/config.json (#3030)
* feat(#3023): per-phase-type model map in .planning/config.json

Adds a new `models` block to .planning/config.json with six phase-type
slots (planning / discuss / research / execution / verification /
completion). Lets users express coarse tuning ("Opus for planning,
Sonnet for the rest") without learning the agent taxonomy.

Resolution precedence (highest → lowest):
  1. Per-agent `model_overrides[agent]`     (full IDs; targeted exception)
  2. Phase-type `models[phase_type]`         (NEW; tier alias)
  3. Profile table (`model_profile`)          (per-agent column)
  4. Runtime default

The three layers compose: `models` defaults a phase, `model_overrides`
carves an exception. Phase-type values are tier aliases (opus/sonnet/
haiku/inherit) so the runtime-resolution chain (#2517) stays correct
end-to-end without further branching.

Implementation:
- model-profiles.cjs: new AGENT_TO_PHASE_TYPE map + VALID_PHASE_TYPES
  set. Each agent in MODEL_PROFILES gets one phase-type assignment;
  tests assert coverage so adding a new agent without updating the
  table fails CI.
- core.cjs (resolveModelInternal): inserts phase-type tier lookup
  between per-agent override and profile-derived tier. Skips runtime
  resolution when the resolved tier is 'inherit' (was previously gated
  only on profile === 'inherit'; phase-type can now produce inherit
  independently).
- core.cjs (loadConfig): pass `parsed.models` through both code paths
  so resolveModelInternal can read it.
- config-schema.cjs + sdk/src/query/config-schema.ts: dynamic-pattern
  validator accepts only the six known phase-types; unknown slots
  rejected at config-set time.

Backward compat: configs without `models` behave exactly as today.

Tests (15 cases, all structural-IR — no stdout grep):
- Schema: AGENT_TO_PHASE_TYPE coverage, VALID_PHASE_TYPES exact match
- Resolver: phase-type alone; per-agent override beats phase-type;
  phase-type beats profile; issue's full example; "inherit"; empty
  block is no-op; no block is no-op
- Validation: each of the 6 slots accepted; unknown slot rejected;
  bare `models` (no slot) rejected

Verification:
- 15/15 pass on new regression test
- 6808/6808 full suite (5 net new), 0 fail
- lint-no-source-grep clean across 375 test files

Closes #3023

* docs(#3023): document `models` per-phase-type config in user-facing docs

Adds `models` block coverage to the three user-facing docs that ship
with each release:

1. docs/CONFIGURATION.md
   - New "Per-Phase-Type Models" section between "Per-Agent Overrides"
     and "Non-Claude Runtimes" with:
       * full example mixing models + model_overrides
       * phase-type → agent mapping table
       * resolution-precedence pseudocode
       * accepted values (tier alias only)
       * "When to use which" decision matrix
       * validation behavior + example error
   - Added `"models": {}` to the Full Schema snippet
   - Added a row for `models.<phase_type>` to the config keys table
     (next to model_profile_overrides for adjacency)

2. docs/FEATURES.md
   - Added a row for models.<phase_type> in the Configurable Settings
     table (right under model_profile)
   - Cross-link to CONFIGURATION.md for the full surface

3. docs/USER-GUIDE.md
   - New task-oriented "Tuning model cost by phase" section above
     "Using Non-Claude Runtimes" — leads with the concrete config
     and shows the override pattern (one-shot phase + targeted exception)
   - Cross-link to CONFIGURATION.md

Verification:
- 29/29 pass on config-schema-docs-parity + docs-update + new feature
  test (parity-check passes, so the config-schema entry I added in the
  feature commit is now matched by a docs row)
- 6808/6808 full suite pass
- lint-no-source-grep clean

Doc style follows the same pattern used by the existing model_profile,
model_overrides, and model_profile_overrides sections — example-led,
table-backed, cross-referenced. Each doc surfaces the feature at the
right depth (reference / settings table / task guide).

* fix(#3023): mirror phase-type tier in resolveReasoningEffortInternal (CR Major)

CodeRabbit caught a real Codex correctness bug + 3 minor docs/test issues:

1. **Major (outside-diff)** — resolveReasoningEffortInternal in core.cjs
   derived its tier exclusively from the profile table, ignoring the
   models.<phase_type> override added in #3023. Failure mode on Codex:

     Config: model_profile=balanced, models.execution=opus, agent=gsd-executor
     resolveModelInternal:           tier=opus    → gpt-5.4
     resolveReasoningEffortInternal: tier=sonnet  → reasoning_effort=medium
                                                     ↑
                                                  WRONG — should be xhigh
                                                  (opus tier on Codex)

   The runtime received a mismatched (model, effort) pair. Mirrored the
   phase-type lookup from resolveModelInternal so both functions derive
   from the same tier source. 'inherit' phase-type returns null effort
   (no runtime entry maps to 'inherit'; let runtime decide).

2. Minor — .changeset/per-phase-type-models.md `pr: TBD` → `pr: 3030`.

3. Minor (outside-diff) — model-profiles.md "Resolution Logic" section
   omitted the new phase-type tier. Updated the 4-step block to a 5-step
   block including `models[phase_type]` between override and profile,
   plus a paragraph noting that `model` and `reasoning_effort` derive
   from the same tier source.

4. Nitpick — added 2 typo-safety tests:
   - models.research = "haiku3" (typo) → falls through to profile
   - models.research = "openai/gpt-5" (full ID) → falls through to profile
   Plus 5 new reasoning_effort tests covering the Major fix:
   - exported correctly
   - phase-type override flips both model AND effort to same tier
   - inherit phase-type returns null effort
   - per-agent override still bypasses phase-type for effort
   - claude runtime ignores models.* (no effort propagation)

Verification:
- 24/24 pass on regression test (15 original + 2 typo-safety + 5 effort + 2 outside-diff related)
- 6815/6815 full suite (7 net new from 6808)
- lint-no-source-grep clean

The reasoning_effort tests are written semantically (phase-type override
must produce the SAME effort as a profile-only opus config) rather than
hard-coding tier-specific effort strings, so changes to the runtime tier
map don't break them.

* fix(#3023): phase-type override beats profile=inherit (CR Major round 2)

CodeRabbit caught another precedence inversion: when
  { model_profile: 'inherit', models: { execution: 'opus' } }
both resolvers short-circuited on `profile === 'inherit'` BEFORE the
phase-type override could be honored. Result: model returned 'inherit'
and reasoning_effort returned null — both contradicting the documented
precedence where models[phase_type] wins over model_profile.

Fix in resolveModelInternal:
- Compute tier from phase-type FIRST. If phase-type is a valid alias,
  it wins. Otherwise, fall back to profile-derived tier OR 'inherit'
  (when profile === 'inherit').
- Gate the runtime-resolution branch on `tier !== 'inherit'` (was
  `profile !== 'inherit'`) so phase-type=opus can flip runtime mapping
  on even when profile=inherit.
- Gate the inherit-return on `tier === 'inherit'` (was
  `profile === 'inherit'`).

Fix in resolveReasoningEffortInternal:
- Remove the `if (profile === 'inherit') return null;` early-return.
- Compute tier from phase-type first, fall back to profile. If
  phase-type is explicitly 'inherit' OR the resolved tier is 'inherit',
  return null (no runtime entry maps to inherit).

Tests added (5 new):
- model: phase-type wins over profile=inherit (with explicit opus, with
  haiku for one phase + planner-without-slot still inheriting)
- model: profile=inherit + no models block → all agents inherit (no
  regression on existing inherit semantics)
- model: profile=inherit + models block but agent has no slot → that
  agent inherits, agents with slots get phase-type tier
- effort: phase-type opus + profile=inherit → produces opus-tier
  effort, NOT null (the original bug)

Verification:
- 27/27 pass on regression test (24 prior + 3 model + 1 effort)
- 6820/6820 full suite (5 net new)
- lint-no-source-grep clean

The effort test reads the expected value by running a profile-only opus
config and comparing — semantic check, not hard-coded effort string. So
runtime tier map changes don't break the test.
2026-05-02 13:19:15 -04:00
Tom Boucher
f2decefede fix(#3010): post-install message and docs use /gsd-update --reapply (#3012)
* fix(#3010): post-install message and docs use /gsd-update --reapply

PR #2824 consolidated 86 skills into ~58, removing the standalone
/gsd-reapply-patches command and folding it into a flag on /gsd-update
(/gsd-update --reapply). The 1.39.1 hotfix (#2954) updated help.md
but missed three other surfaces that still recommended the dead form:

1. bin/install.js reportLocalPatches() — runtime emitter shown after
   every install with backed-up patches. All branches updated:
   - claude/opencode/kilo/copilot: /gsd-update --reapply
   - gemini: /gsd:update --reapply
   - codex: $gsd-update --reapply
   - cursor: gsd-update --reapply (mention the skill name)

2. get-shit-done/workflows/update.md — Step 4 prose and the
   check_local_patches block both referenced /gsd-reapply-patches.
   Replaced with /gsd-update --reapply (with backticks around the
   command per CR feedback for copy/paste UX).

3. Localized docs (en/ja-JP/ko-KR/zh-CN) — 14 files across
   ARCHITECTURE.md / COMMANDS.md / FEATURES.md / INVENTORY.md /
   USER-GUIDE.md / manual-update.md still listed the removed command.

Tests:
- bug-3010-reapply-patches-references.test.cjs (4 tests): scans
  bin/install.js's reportLocalPatches body, every workflow file, and
  every doc (excluding CHANGELOG history and help.md's deprecation
  notice) for the removed command form, and verifies each runtime
  branch emits the consolidated form via captured console output.
- tests/copilot-install.test.cjs:1081-1115 — stale assertions that
  hard-coded the removed string updated to assert /gsd-update --reapply.

Verification: 115/115 pass across both files.

Co-authored-by: Patrick Clery <patrick@patrickclery.com>
Closes #3010

* test(#3010): broaden dead-command scan + tighten runtime exact-match

CodeRabbit follow-up findings on #3012:

1. Workflow + docs scans only matched "/gsd-reapply-patches", missing
   the gemini ("/gsd:reapply-patches") and codex ("$gsd-reapply-patches")
   spellings. A regression that re-introduced either form in localized
   docs would have passed silently. Extracted a DEAD_COMMAND_PATTERNS
   array + findDeadCommands() helper used by both scans, so all three
   removed forms are checked uniformly. Match output also reports which
   spellings hit, for faster diagnosis.

2. reportLocalPatches runtime test asserted output.includes('update --reapply'),
   which is too loose — a malformed prefix like '/gsd:update --reapply' on
   the claude branch would have passed. Replaced with an exact
   {runtime → expected token} map covering all 7 branches:
     claude/opencode/kilo/copilot → /gsd-update --reapply
     gemini → /gsd:update --reapply
     codex → $gsd-update --reapply
     cursor → gsd-update --reapply
   Negative assertion also runs DEAD_COMMAND_PATTERNS against output for
   every runtime, so dead forms can't slip in regardless of branch.

Verification: 4/4 pass on bug-3010-reapply-patches-references.test.cjs.

* test(#3010): add prefix-absence guard for cursor runtime (CR follow-up)

CodeRabbit (Minor): the cursor expected token "gsd-update --reapply" is
a substring of every prefixed form ("/gsd-update --reapply" for claude/
opencode/kilo/copilot, "\$gsd-update --reapply" for codex). The positive
output.includes(expectedToken) check therefore can't distinguish correct
cursor output from a regression where the installer emits a prefixed
form for cursor — both pass the substring check.

Add an explicit prefix-absence assertion for cursor that fails if any
of /, \$, or : appears immediately before "gsd-update --reapply" in
output. The gemini form ("/gsd:update --reapply") doesn't share the
substring (gsd:update vs gsd-update) so it's already caught by the
positive includes failing on cursor's expected bare token.

Verification: 4/4 pass.

---------

Co-authored-by: Patrick Clery <patrick@patrickclery.com>
2026-05-02 09:38:34 -04:00
Tom Boucher
8de8acee46 fix(workflows): assert HEAD on per-agent branch before worktree commits (#2924) (#2941)
* fix(workflows): assert HEAD on per-agent branch before worktree commits

Worktree-mode setup could leave HEAD attached to a protected branch (master),
causing agent commits to land there. The previous response was a destructive
self-recovery via 'git update-ref refs/heads/master <sha>', which silently
rewinds the protected branch and destroys concurrent commits in multi-active
scenarios (parallel agents, user committing while agent runs).

- Reorder <worktree_branch_check> in execute-phase.md and quick.md to assert
  HEAD via 'git symbolic-ref' BEFORE any 'git reset --hard'. HALT with a
  blocker if HEAD is on main/master/develop/trunk/release/* or detached.
- Add a per-commit HEAD assertion (step 0) to gsd-executor.md
  <task_commit_protocol>; HEAD attachment can drift after 'git checkout <sha>'.
- Forbid 'git update-ref refs/heads/<protected>' in
  <destructive_git_prohibition>; surface the blocker rather than self-heal.
- Remove '--no-verify' as the worktree-mode default in execute-phase.md,
  execute-plan.md, quick.md, and references/git-integration.md. Hooks now
  run on every executor commit; opt out only via workflow.worktree_skip_hooks.
- Add regression test that parses the worktree_branch_check blocks structurally
  and asserts the symbolic-ref check precedes the reset --hard, no workflow
  performs update-ref on a protected ref, and --no-verify is no longer the
  default in any parallel-execution prompt.

* fix(#2924): address CodeRabbit review findings on worktree HEAD PR

- Add positive worktree-agent-* allow-list to <task_commit_protocol> step 0
  in gsd-executor.md and to <worktree_branch_check> in execute-phase.md and
  quick.md. The deny-list (main|master|develop|trunk|release/*) silently
  allowed feature/* and other arbitrary branches outside the agent namespace.
- Register workflow.worktree_skip_hooks in both config schemas
  (sdk/src/query/config-schema.ts and get-shit-done/bin/lib/config-schema.cjs)
  and document it in docs/CONFIGURATION.md so config-set accepts it.
- Fix stash lifecycle in execute-phase.md post-wave hook validation: stash
  under a named ref and pop after the hook run; warn on pop failure.
- Pre-dispatch PLAN.md commit in quick.md: gate on git diff --cached --quiet
  for idempotency and exit 1 with a clear error on commit failure (both the
  --no-verify and the normal branches) — no more swallowing real errors.
- Test fixes (tests/bug-2924-worktree-head-attachment.test.cjs):
  - Parse the protected-branch alternation structurally and require
    main, master, develop, trunk, release/.* (release/* was previously
    skipped by the \\b...\\b regex).
  - Use fs.readdirSync(dir, { recursive: true }) so workflows in nested
    subdirectories are also asserted against the update-ref ban.
  - Add allow-list assertions for execute-phase.md, quick.md, and
    gsd-executor.md to lock in the new positive namespace check.

* test(#2924): assert sub-section end marker exists before slicing

* test(#2924): use section boundary instead of fixed window for parallel-agents slice
2026-05-01 09:23:02 -04:00
Tom Boucher
7e9477bb30 docs(#2935): refresh README highlights for v1.39.0 across all languages (#2936)
Replaces stale v1.32/v1.37 highlight blocks with v1.39.0 highlights in
README.md and four translations, adds /gsd-edit-phase to phase-management
tables, documents workstream config inheritance, the post-merge build gate,
and per-runtime review.models.<cli> selection.

Closes #2935
2026-04-30 23:21:31 -04:00
Tom Boucher
444db1714b refactor(query): manifest-backed routing seam + family adapters (#2908)
Merging validated command-seam foundation.
2026-04-30 14:04:50 -04:00
Tom Boucher
abb2cb63f6 refactor: extract planning-workspace seam from core.cjs (#2901)
* refactor: extract planning workspace seam from core

* docs: document planning-workspace module and inventory updates

* fix: harden planning lock timeout and preserve workstream set contract

---------

Co-authored-by: Tom Boucher <thomas.boucher@sas.com>
2026-04-30 11:38:13 -04:00
Tom Boucher
ca88429bf8 docs(#2888): release notes for 1.40.0-rc.1 (#2889)
Add docs/RELEASE-v1.40.0-rc.1.md following the rc.7 format. Cover the
11 commits on main since v1.39.0-rc.7's release notes landed:

- #2790 — skill surface consolidated 86 → 59
- #2792 — namespace meta-skills + keyword-tag descriptions + context guard
- #2833 — phase-lifecycle status-line read-side
- #2876 — yamlQuote SKILL.md description (Copilot/Antigravity/Trae/CodeBuddy)
- #2768 — Gemini slash command namespace
- #2858 — gsd slash namespace drift cleanup
- #2851 — bare gsd-tools → absolute path
- #2866 — Codex installer trailing-newline preservation
- #2868 — canary publish moved from main to dev
- #2872 — auto-close PRs without issue link

Update CHANGELOG.md [Unreleased] with the same 1.40.0-rc.1 entries.

Closes #2888
2026-04-30 01:13:43 -04:00
Tom Boucher
5fdc950eb7 feat(#2792): namespace meta-skills + keyword-tag descriptions + context utilization guard (#2825)
* feat(#2792): namespace meta-skills retargeted at the post-#2790 surface

This branch is now based on #2790's HEAD (the consolidation PR) instead
of main, and every routing table targets the consolidated surface so a
user routed by a namespace meta-skill never lands at a deleted /
folded sub-skill.

Cross-PR inconsistencies the original PR #2825 carried (vs #2790):

  - ns-ideate routed to gsd-note / gsd-add-todo / gsd-add-backlog /
    gsd-plant-seed → all folded into gsd-capture by #2790. Now routes
    to gsd-capture (the parent picks the mode from the user's intent).
  - ns-context routed to gsd-scan and gsd-intel → folded into
    gsd-map-codebase --fast / --query by #2790. Now routes to those
    flag forms.
  - ns-manage routed all workspace intent to gsd-list-workspaces (a
    list-only entry) → CR also flagged the over-narrow target. #2790
    folds into gsd-workspace; routing now points there.
  - ns-workflow routed to gsd-research-phase → deleted outright by
    #2790. Removed.
  - ns-project routed to gsd-plan-milestone-gaps → deleted outright by
    #2790. Removed.
  - None of the namespaces previously surfaced #2790's new consolidated
    skills (gsd-capture, gsd-phase, gsd-config, gsd-workspace,
    gsd-progress). All five are now reachable through the routers.
  - extract_learnings → extract-learnings (canonicalized by #2858).

Defect fixes within the namespace skills:

  - Hyphen-form `name:` (gsd-workflow, …) per the canonical naming
    contract — the colon-form addressed CR's drift complaint.
  - `Skill` added to allowed-tools on every router. The body instructs
    "Invoke the matched skill directly using the Skill tool" — without
    Skill in the permission list the meta-skill cannot route at all.

New regression guard in tests/enh-2792-namespace-skills.test.cjs: every
gsd-* token in any namespace router's table column resolves to a
surviving commands/gsd/*.md file (or to a known consolidated parent for
flag-form targets like gsd-map-codebase --fast). This single test would
have caught every dead-end route the original PR shipped with.

Skill-count cap in tests/enh-2790-skill-consolidation.test.cjs now
filters out ns-*.md from its <= 63 cap. Namespace routers are
descriptor-only entries, not part of the consolidation surface that cap
is policing — they have their own contract in
tests/enh-2792-namespace-skills.test.cjs.

INVENTORY.md gains a "Namespace Meta-Skills" section with the 6 router
rows; INVENTORY-MANIFEST.json gains 6 entries; the headline count moves
59 → 65 to match.

Out of scope for this rebase: the gsd-health --context flag (PR #2825
advertised the contract but didn't implement it). That's a separate
feature concern and is left untouched here.

5908/5908 on `npm test`.

* feat(#2792): implement gsd-health --context utilization guard

The original PR #2825 advertised a `--context` flag on gsd-health with a
60%/70% utilization threshold table but never implemented the workflow
logic — CR caught it as a contract leak, the rebase deferred it. This
commit closes the gap with TDD red/green/refactor.

Math layer (pure):
  - get-shit-done/bin/lib/context-utilization.cjs
    classifyContextUtilization(tokensUsed, contextWindow) →
      { percent, state }
    State boundaries use the exact ratio:
      < 60% healthy / 60–70% warning / ≥ 70% critical (fracture point)
    Display percent rounded for humans. Throws TypeError on non-integer
    or out-of-range inputs.
  - STATES = Object.freeze({ HEALTHY, WARNING, CRITICAL }) exported
    so callers reference the names by symbol, not by literal string.

SDK CLI integration:
  - get-shit-done/bin/gsd-tools.cjs
    `validate context --tokens-used N --context-window M [--json]`
    routes to the classifier, owns the recommendation copy (the
    classifier intentionally does not — keeps the renderer free to
    evolve without touching the math layer or its tests), and uses
    core.output's rawValue path for the sync-flush guarantee.
  - sdk/src/query/validate.ts + sdk/src/query/index.ts
    TypeScript validateContext handler registered at 'validate.context'
    and 'validate context'. Mirrors the CJS classifier inline (15 lines
    of arithmetic; not worth a shared cross-language module).

User-facing wiring:
  - commands/gsd/health.md frontmatter advertises --context, body
    documents the three-state threshold table.
  - get-shit-done/workflows/health.md adds a `context_check` step
    that's reached only when --context is set. Step calls
    `gsd-sdk query validate.context` with self-reported tokensUsed and
    contextWindow, prints the SDK output verbatim, and ends. Includes
    a TEXT_MODE plain-text fallback for non-Claude runtimes per #2012.

Tests:
  - tests/context-utilization.test.cjs (17 tests) — pure-function
    contract: state thresholds at every boundary, percent rounding,
    input validation, return-shape (no recommendation field — that's
    the renderer's job).
  - tests/validate-context.test.cjs (9 tests) — SDK CLI plumbing:
    arg parsing errors, JSON vs human rendering, recommendation copy
    pinned per state.
  - tests/enh-2792-namespace-skills.test.cjs (4 new tests) — markdown
    contract: --context advertised in argument-hint, threshold table
    in command body, context_check step exists in workflow, step
    invokes gsd-sdk query validate.context with both flags.

Inventory bookkeeping:
  - docs/INVENTORY.md "CLI Modules" 31 → 32; new row for
    context-utilization.cjs.
  - docs/INVENTORY-MANIFEST.json mirror.

5939/5939 on `npm test`.
2026-04-30 01:04:41 -04:00
hoptop
8fc1fa263c feat(#2833): phase-lifecycle status-line — read-side (parseStateMd + formatGsdState scenes + tests + docs) (#2884)
* feat(#2833): parseStateMd reads phase-lifecycle frontmatter fields

Extend parseStateMd() to parse 4 new STATE.md frontmatter fields that drive
the phase-lifecycle status-line proposed in #2833:

- active_phase   : phase number when orchestrator is in-flight, null when idle
- next_action    : recommended next command when idle
- next_phases    : YAML flow array of phase numbers for next_action
- progress       : nested block with completed_phases / total_phases / percent

All fields default to undefined when absent — formatGsdState() (next commit)
degrades gracefully so existing STATE.md files keep rendering as before.

YAML scope intentionally narrow:
- Only top-level scalar keys (status, milestone, active_phase, next_action)
- Only single-line flow array for next_phases ([...])
- progress block requires 2-space indent for nested keys

Block sequences (- item over multiple lines) and inline comments inside
nested blocks are NOT parsed — keeping the regex-based parser predictable.
Comments outside frontmatter or after the closing --- still work.

Tests: all 27 existing tests still pass (no behavior change for STATE.md
files that don't carry the new fields).

Refs #2833

* feat(#2833): formatGsdState renders phase-lifecycle scenes + opt-in progress bar

Extend formatGsdState() with three lifecycle scenes that activate when the
new STATE.md frontmatter fields (added in the previous commit) are present.
Also append an opt-in progress bar to the milestone segment when
progress.percent is available.

Scenes (first match wins; falls through to the existing path otherwise):

  1. active_phase set     → 'v2.0 [██░] X% · Phase 4.5 executing'
                             (status field carries the lifecycle stage:
                              discussing / planning / executing / verifying)

  2. active_phase null +  → 'v2.0 [██░] X% · next execute-phase 4.5'
     next_action set         (idle state — surfaces what the user should
                              run next without opening STATE.md)

  3. percent=100 (or       → 'v2.0 [██████████] 100% · milestone complete'
     completed=total)

  4. (default fallback)    → 'v1.9 Code Quality · executing · ph (1/5)'
                             (existing rendering, byte-for-byte preserved
                              when none of the new fields are populated)

Backward compat is the design priority:
- STATE.md files without the new fields render identically to v1.38.x
- progress bar is opt-in (empty string when percent absent)
- Each new scene only activates when its specific fields are populated

A new helper renderProgressBar() generates the 10-segment bar that matches
the existing context meter style (so the two bars on the status-line are
visually consistent).

Tests: 27/27 existing tests still pass.

Refs #2833

* test(#2833): cover parseStateMd lifecycle fields + formatGsdState scenes

26 new tests organized in 5 describe blocks, modeled after the existing
enh-2538-statusline-last-command.test.cjs convention:

  parseStateMd #2833 lifecycle fields (7 tests)
    - reads active_phase / next_action / next_phases / progress.percent
    - 'null' literal handled correctly
    - YAML flow array parsing (1 item, multiple items)
    - progress nested block (3 fields)
    - absent fields return undefined

  formatGsdState #2833 lifecycle scenes (6 tests)
    - Scene 1: active_phase set → 'Phase X.Y <stage>'
    - Scene 2: idle + next_action → 'next <action> <phases>' (1+ phases)
    - Scene 3: percent=100 OR completed=total → 'milestone complete'

  formatGsdState #2833 backward compatibility (4 tests) — CRITICAL
    - Legacy STATE.md (no new fields) renders byte-for-byte unchanged
    - Empty state, partial state, progress-bar-opt-in all preserved

  progress bar rendering (6 tests)
    - 0% / 50% / 100% / clamping / opt-in absence

  formatGsdState #2833 scene priority (3 tests)
    - active_phase wins over next_action when both populated
    - next_action wins over fallback when active_phase null
    - percent=100 wins over fallback even with phase set

Combined run: 53/53 tests pass (existing 27 + new 26).

Refs #2833

* docs(#2833): describe phase-lifecycle frontmatter fields and rendering scenes

Add docs/STATE-MD-LIFECYCLE.md as the canonical reference for the four new
STATE.md frontmatter fields and the four status-line rendering scenes
introduced by this proposal:

- Frontmatter field reference (active_phase / next_action / next_phases /
  progress.percent) with type and population semantics
- Why progress.percent is intentionally the phase dimension and not the
  plans dimension (plans dimension trends optimistic when future phases
  are unplanned)
- The four rendering scenes including their priority order
- Stage-label convention for Scene 1 (discussing / planning / executing /
  verifying matching the four phase orchestrators)
- Frontmatter parsing constraints — frontmatter must start at file head,
  no comments inside nested blocks, next_phases is single-line flow only
- Backward-compatibility guarantee (locked in by the test suite)
- Cross-links to the foundation issue #1989 and the read-side issues
  this proposal helps close

The document deliberately scopes itself to the read-side (what the hook
parses, what it renders). Write-side SDK and workflow changes that
auto-maintain the fields are out of scope for this PR so each piece can
be reviewed independently — see the issue thread for the full proposal.

Refs #2833

* test(#2833): simplify '0% renders 10 empty segments' assertion

Address CodeRabbit nitpick — drop the convoluted assert.equal that built
the expected value via .replace() and rely on the existing assert.ok
includes-check. The behavior under test is unchanged; the assertion is
just easier to read.

Refs #2884 review comment
2026-04-30 00:48:49 -04:00
Tom Boucher
87917131f2 refactor(#2790): consolidate 86 gsd-* skills to 59 — fold flags, delete dead skills (#2824)
* feat(#2790): consolidate 86 gsd-* skills to 59 — zero functional loss

Closes #2790

- `capture.md` — absorbs add-todo (default), note (--note), add-backlog
  (--backlog), plant-seed (--seed), check-todos (--list)
- `phase.md` — absorbs add-phase (default), insert-phase (--insert),
  remove-phase (--remove), edit-phase (--edit)
- `config.md` — absorbs settings-advanced (--advanced),
  settings-integrations (--integrations), set-profile (--profile);
  settings.md retained as-is
- `workspace.md` — absorbs new-workspace (--new), list-workspaces
  (--list), remove-workspace (--remove)

- `update.md` — adds --sync (absorbs sync-skills) and --reapply
  (absorbs reapply-patches)
- `sketch.md` — adds --wrap-up (absorbs sketch-wrap-up)
- `spike.md` — adds --wrap-up (absorbs spike-wrap-up)
- `map-codebase.md` — adds --fast (absorbs scan) and --query (absorbs
  intel)
- `code-review.md` — adds --fix (absorbs code-review-fix)
- `progress.md` — adds --next (absorbs next) and --do (absorbs do)

join-discord, research-phase, session-report, from-gsd2,
analyze-dependencies, list-phase-assumptions, plan-milestone-gaps

autonomous.md: updated Skill(skill="gsd:code-review-fix") →
Skill(skill="gsd:code-review", args="--fix --auto") to match
the consolidated skill name

- New: tests/enh-2790-skill-consolidation.test.cjs (48 tests)
- Updated: 14 existing test files redirected from deleted command paths
  to their consolidated equivalents
- docs/INVENTORY.md: Commands count 86→59, ghost rows removed, new
  consolidated rows added
- docs/INVENTORY-MANIFEST.json: regenerated to match filesystem

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(#2790): add CHANGELOG entry for skill consolidation

* docs(#2790): update COMMANDS.md for 86→59 skill consolidation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(#2790): address CodeRabbit review findings

- CHANGELOG.md: add --next alongside --do in progress flag list
- config.md: remove trailing space from --profile code span (MD038)
- COMMANDS.md: add required descriptions to /gsd-phase examples;
  /gsd-phase without args errors, not interactive
- COMMANDS.md: add --next and --do to /gsd-progress flags table + examples
- test: convert content.includes('--reapply') to structural frontmatter
  parse; add allow-test-rule comment for workflow content assertions
- test: replace redundant existsSync duplicate with assertion that verifies
  the full consolidated flag surface (--sync | --reapply) in argument-hint

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(#2790): restore reapply-patches workflow and strengthen test assertions

- Create get-shit-done/workflows/reapply-patches.md: the #2790 consolidation
  deleted the 14K combined command+workflow file (reapply-patches.md) but
  update.md already referenced the workflow via execution_context_extended.
  Restoring it fixes a silent behavioral gap where --reapply had no workflow
  to load. Includes full three-way merge logic, hunk verification table
  (Step 4), and the Hunk Verification Gate (Step 5) that blocks cleanup
  until all user-added hunks are confirmed present in the merged output.

- Fix update.md: /gsd-reapply-patches → /gsd-update --reapply (stale ref)

- Fix reapply-verify-hunks.test.cjs: was checking existsSync(update.md) 8×;
  now points to the workflow file and asserts real behavioral content
  (Post-merge verification, Hunk presence check, Line-count check, backup
  reference, per-file tracking, structural ordering)

- Fix reapply-patches.test.cjs: replace content.includes() stubs with
  frontmatter-parsed argument-hint assertions; replace 4 existsSync(update.md)
  no-ops with real assertions against the workflow content

- Fix edit-phase.test.cjs: /gsd-edit-phase → /gsd-phase (COMMANDS.md now
  documents the consolidated command with --edit flag)

- Fix next-safety-gates.test.cjs: split OR predicates into independent
  assertions — --next in progress.md and --force in next.md workflow

- Fix workspace.test.cjs: add allow-test-rule comment for routing content
  checks (command routing text IS the deployed behavioral contract)

- Fix bug-2439 test: strengthen pre-flight assertion to verify gsd-sdk is
  referenced (not just --profile)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit review findings (CR round 2)

- INVENTORY.md: update sync-skills.md row to reference /gsd-update --sync
  instead of stale /gsd-sync-skills (absorbed in #2790)

- enh-2380-sync-skills.test.cjs: align INVENTORY.md assertion with the
  corrected reference; was asserting the old /gsd-sync-skills name while the
  manifest test correctly asserted /gsd-update, creating conflicting expectations
  in the same suite

- reapply-verify-hunks.test.cjs: add explicit notEqual(-1) assertions for all
  three anchors before the ordering check so a missing anchor produces a clear
  failure instead of a false positive (writeIdx=-1 < verifyIdx=5 is true)

- bug-2439-set-profile-gsd-sdk-preflight.test.cjs: defer fs.readFileSync until
  after the existence assertion; eager describe-level read caused the suite to
  crash before the existence test could run, making it effectively dead code

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(#2790): address CR — INVENTORY routing + reapply test contract wording

Two unresolved CodeRabbit findings (Major):

- docs/INVENTORY.md: workflow-file table still pointed at obsolete
  /gsd-do, /gsd-next, /gsd-note, /gsd-add-todo, /gsd-add-backlog,
  /gsd-check-todos, /gsd-plant-seed slash commands. Re-route to the
  consolidated /gsd-progress (--next, --do) and /gsd-capture (--note,
  --backlog, --seed, --list) so the inventory is internally consistent.

- tests/reapply-verify-hunks.test.cjs: 'verification tracks per-file
  status' asserted on phrasing that doesn't appear in reapply-patches.md
  (the 'per-file' substring only matched accidentally via 'sequential
  integer per file'). Switch to the actual contract text — Hunk
  Verification Table, one row per hunk per file, verified column.

* test(#2790): update CR-INTEGRATION tests for consolidated --fix invocation

After the merge of main (which carries #2843's hyphen-form fix), the
consolidation in this branch absorbs gsd-code-review-fix into gsd-code-review
as the --fix flag. Update the two CR-INTEGRATION tests that previously
asserted on the standalone gsd-code-review-fix skill name to instead assert
on a gsd-code-review invocation carrying --fix in its arg tokens.

Tests still parse Skill() invocations structurally; only the asserted
skill-name + arg-token shape changed.

* test(#2790): scope success_criteria check to the <success_criteria> block

CodeRabbit nitpick: 'success criteria includes verification' did a
whole-file substring check, which can false-pass if the phrase appears
elsewhere in the document. Extract the <success_criteria>...</success_criteria>
block first via extractTagBlock() and assert against that scope only.

* fix(#2790): post-rebase reconciliation with main

- INVENTORY.md/JSON: add reapply-patches workflow row + bump count to 85
- autonomous.md: switch consolidated --fix invocation to hyphen Skill name
- analyze-dependencies test: assert COMMANDS.md does NOT document the
  consolidated-away /gsd-analyze-dependencies entry (was: bare .includes())

* fix(#2790): address remaining CR findings — strengthen contract tests

Doc-fixes:
- INVENTORY.md: route transition.md & edit-phase.md rows to consolidated
  /gsd-progress --next and /gsd-phase --edit (was: deleted /gsd-next, /gsd-edit-phase)
- config.md --profile branch: document #2439 pre-flight `command -v gsd-sdk`
  guard + install hint BEFORE the gsd-sdk invocation (closes opaque
  "command not found: gsd-sdk" regression path)

Test discipline (no-source-grep contract):
- bug-2439: replace bare `content.includes('gsd-sdk')` with structured
  parse of <context> block + --profile branch; assert pre-flight token,
  install hint, #2439 citation, and ordering vs gsd-sdk invocation
- edit-phase: parse INVENTORY.md edit-phase.md row's "Invoked by" column
  and assert `/gsd-phase --edit` (not the deleted /gsd-edit-phase)
- next-safety-gates: tighten `--next` documentation contract — require
  --next AND --force AND completeness routing (was OR-based, passed when
  only --next present)
- reapply-patches: parse argument-hint flag list structurally; scan ALL
  <execution_context*> blocks for the @-include of reapply-patches.md;
  parse Hunk Verification Table header columns directly; locate Step 5
  via heading parsing then assert (i) table reference, (ii) verified=no
  gate, (iii) STOP/halt directive, (iv) explicit absent-table halt path
- workspace: parse frontmatter, tokenize argument-hint across multiple
  bracketed segments, parse @-include targets from <execution_context>
  rather than substring-matching the file body

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 00:43:47 -04:00
Jeremy McSpadden
4d394a249d fix(commands): normalize gsd slash namespace drift (#2858)
* fix(commands): normalize gsd slash namespace drift

* fix(#2855): address CodeRabbit findings on namespace drift PR

Three CR findings, all valid:

1. autonomous.md line 783 still had `gsd:discuss-phase` (the PR's own
   normalization missed this line). Switched to `gsd-discuss-phase` and
   updated the matching test in autonomous-interactive.test.cjs that was
   asserting the now-retired colon form.

2. tests/bug-2543-gsd-slash-namespace.test.cjs source-grepped the
   fix-slash-commands.cjs script with .includes() rather than driving
   its transform behaviour. Refactored fix-slash-commands.cjs to export
   a pure transformContent(src, cmdNames) function, kept the CLI behaviour
   unchanged via require.main, and replaced the source-grep block with
   five behavioural cases: rewrite, multi-occurrence, idempotence on
   canonical input, no-op on gsd-sdk/gsd-tools, and word-boundary safety.

3. tests/bug-2808-skill-hyphen-name.test.cjs matched `name:` anywhere
   in SKILL.md; a stray name: in the body could satisfy the assertion.
   Scoped the lookup to the YAML frontmatter block via the suggested
   diff (parse the leading --- ... --- region first, then find name:
   inside it).

Full suite: 5854/5854 passing.

* fix(#2855): address remaining CodeRabbit findings on PR #2858

Three structural concerns flagged on the namespace-drift fix PR:

1. scripts/fix-slash-commands.cjs:24 — `buildPattern([])` compiled
   `/gsd:()(?=[^a-zA-Z0-9_-]|$)/g`. The empty capture group still
   matches any `/gsd:` token followed by a non-word boundary
   (whitespace, EOL, punctuation), rewriting it to a stray `/gsd-`.
   Verified live: `transformContent("/gsd:", [])` → `"/gsd-"`. Added
   a guard returning null from `buildPattern` on empty input and
   updated `transformContent` and `processDir` to no-op when the
   pattern is null.

2. tests/autonomous-interactive.test.cjs:44-47 — assertion was
   `content.includes('gsd-discuss-phase') && content.includes('INTERACTIVE')`,
   which would false-pass on any unrelated co-occurrence (e.g.
   `INTERACTIVE=""` initialization plus a stray `gsd-discuss-phase`
   prose mention). Replaced with a structural extraction: locate the
   `**If \`INTERACTIVE\` is set:**` branch, bound it by the next
   `**If` / `<step>` boundary, and assert the
   `Skill(skill="gsd-discuss-phase", ...)` invocation lives inside
   that region. Tolerates whitespace around `(`, `skill`, and `=`.

3. tests/bug-2808-skill-hyphen-name.test.cjs:104 — colon-call regex
   was `Skill\(skill=...` and missed valid formatting like
   `Skill(skill = "gsd:cmd")` or `Skill( skill = ...)`. Loosened to
   `Skill\(\s*skill\s*=\s*...` so reformatting drift can't slip past
   the namespace guard.

Verification: 5854/5854 pass on `npm test` from the rebased branch.

* fix(#2855): drop pre-validation filter that hid namespace drift

CR finding on tests/bug-2808-skill-hyphen-name.test.cjs:128: the test
collected generated skill directories with
`.filter(entry => entry.isDirectory() && entry.name.startsWith('gsd-'))`,
then validated namespace invariants over that filtered list. Anything
that violated the prefix invariant — `gsd:extract-learnings` (colon
form), `extract_learnings` without prefix, `Gsd-foo` mis-cased — would
silently disappear from the iteration and the test would falsely pass.

Drop the `startsWith('gsd-')` filter so every generated directory shows
up. Add explicit assertions before the existing per-skill loop:
  - directory list is non-empty (catches a broken converter that
    produces nothing)
  - every directory begins with `gsd-`
  - every directory contains no `:`
  - every directory contains no `_`

Re-audited the full PR diff for the same anti-pattern: only this one
site filtered before validating the namespace; bug-2643 and
commands-doc-parity also use `readdirSync().filter()` but only by file
extension, which is correct.

5854/5854 on `npm test`.

* fix(#2855): address remaining CR findings (1 active + 2 nitpicks)

Three findings on PR #2858, all the same root cause: input narrowing
before validation lets drift slip past the guards.

1. tests/bug-2808-...:104 (active) — `colonCallRe` captured local names
   with `[a-z0-9-]+`, which excluded the underscore. A drift like
   `Skill(skill="gsd:extract_learnings")` (deprecated colon syntax with
   the old underscore filename) silently slid through. Broadened the
   capture to `[^'"\s)]+` so any malformed local name is surfaced; surrounding pattern (whitespace tolerance, escape support, flags)
   unchanged.

2. tests/bug-2643-...:43 (nitpick) — `extractSkillNamesHyphen` and
   `extractSkillNamesColon` had the same over-strict capture plus
   relied on a single regex over raw bytes, which the project test-
   rigor memory bans (`feedback_no_source_grep_tests.md`). Replaced
   with `extractSkillCalls(content)` — a small structural extractor
   that walks `Skill(` openers, locates each call's matching `)`,
   parses the body's `skill = "..."` keyword argument with permissive
   whitespace + quoting + escape handling, and returns
   `{ name, raw }` records. The two namespace-form helpers become
   thin filters over the structured output. Tightened the body class
   to `[^'"\\]+` so a trailing escape `\` before the closing quote
   (as in `Skill(skill=\"gsd-foo\", …)` written inside another string
   context) doesn't get included in the captured name.

3. tests/bug-2543-...:44 (nitpick) — `DOC_SEARCH_FILES` was a hand-
   curated 7-entry array. Every doc added in the future would silently
   weaken drift detection until someone remembered to extend the list.
   Replaced with `discoverDocSearchFiles(ROOT)`: globs every `.md`
   under `docs/` and adds `README.md` if present. New docs are picked
   up automatically.

Re-audited the diff surface for similar narrowings; no other sites
filter or constrain before validating namespace invariants.

5854/5854 on `npm test`.

* fix(#2855): recurse docs/ tree so localized translations are scanned too

CR finding: discoverDocSearchFiles() stopped at docs/*.md, leaving
localized translation trees (docs/ja-JP/, docs/zh-CN/, docs/ko-KR/,
docs/pt-BR/) and other nested doc collections (docs/skills/,
docs/superpowers/) invisible to the namespace-drift invariant.

Verified the gap: docs/ has 6 nested directories with ~30 .md files
that the previous top-level-only scan was skipping. None contain
/gsd: references today, but a future translation update or new
doc subdir could leak drift.

Switch to an iterative stack walk so every .md under docs/ is scanned
regardless of depth. Stack form (rather than recursion) avoids the
risk of running into the call-stack limit on deep doc trees.

5854/5854 on `npm test`.

---------

Co-authored-by: Tom Boucher <trekkie@nomorestars.com>
2026-04-29 22:56:59 -04:00
Tom Boucher
107a83ebf7 docs(#2859): add release notes for 1.39.0-rc.7 (#2860)
rc.7 will be the first RC in the 1.39.0 train that actually rolls in
the post-rc.5 fixes from main (rc.6 was content-identical to rc.5 — see
#2856). Notes enumerate each fix with PR/issue link, recap rc.6 / rc.5 /
rc.4, and follow the established docs/RELEASE-v1.39.0-rc.X.md format.

No SDK-version pinning advice (consistent with the rc.6 doc cleanup).
Markdownlint-clean fenced code blocks.

Closes #2859
2026-04-29 08:58:16 -04:00
Tom Boucher
43a13217b7 docs(#2856): add docs/RELEASE-v1.39.0-rc.6.md (#2857)
* docs(#2856): add release notes for 1.39.0-rc.6

Documents what's actually in rc.6 (= rc.5 content + version-bump only —
release/1.39.0 was not synced with main before the bump) plus the known
SDK publish failure (@gsd-build/sdk@1.39.0-rc.6 is missing from npm with
404 PUT error). Format mirrors RELEASE-v1.39.0-rc.5.md.

Closes #2856

* docs(#2856): drop SDK refs from rc.6 notes; tag git log fence

Per maintainer + CodeRabbit review:
- Strip the 'Known issue: split publish' section, the SDK pin Note, and
  the @gsd-build/sdk follow-up bullet. SDK publish failure is a known
  separate issue and shouldn't block the rc.6 docs.
- Add bash language tag to the git log fence (markdownlint MD040).
2026-04-29 08:43:39 -04:00
Tom Boucher
e81592878e feat(#2789): trim skill description anti-patterns; enforce 100-char budget (#2823)
* feat(#2789): trim skill description anti-patterns; enforce 100-char budget

- Trim descriptions in all commands/gsd/*.md files over 100 chars
- Remove flag documentation from descriptions (belongs in argument-hint)
- Remove Triggers: keyword stuffing
- Add scripts/lint-descriptions.cjs — fails on descriptions > 100 chars
- Add npm script: lint:descriptions
- Add tests/enh-2789-description-budget.test.cjs

Closes #2789

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(#2789): add CHANGELOG entry for description budget lint

* docs(#2789): update COMMANDS.md descriptions; add skill description standards note

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 08:14:11 -04:00