Compare commits

...

334 Commits

Author SHA1 Message Date
Tom Boucher
42ed7cee8d refactor: deepen GSDTools query execution seams (#3085)
* refactor: deepen gsdtools query execution seams

* docs: add changeset for query seam deepening

* docs: fix changeset summary text

* fix: address coderabbit query seam findings

* test: address remaining coderabbit findings and notes

* refactor: use internal gsdtools error type import
2026-05-03 18:56:41 -04:00
Tom Boucher
5e21bf7567 Deepen query dispatch seam with Command Topology Module (#3078)
* Deepen query dispatch seam with command topology module

* Stabilize SDK parity defaults and integration test gating

* docs(architecture): record pre-project config policy and e2e gate

* refactor(query): stop injecting native adapter in CLI dispatch path

* fix(config): align workflow auto-chain typing and docs
2026-05-03 18:11:38 -04:00
Tom Boucher
9c92c32f6e refactor(query): deepen runtime context/native adapter/output seams (#3076)
* refactor(query): deepen runtime context, native adapter, and cli output seams

* chore(changeset): add fragment for query seam deepening continuation

* refactor(query): converge internal command-resolution imports on canonical seam

* refactor(query): remove dead seam wrappers and converge on canonical modules

* docs(architecture): update context and adr for query seam completion

* fix(query): preserve gsd-tools stderr in cli output and clarify static ws test scope

* test(query): cover whitespace stderr and null exitCode fallback
2026-05-03 16:31:48 -04:00
Tom Boucher
5c9f34bd31 refactor(cli): extract Query CLI Adapter Module seam (#3074)
* refactor(cli): extract query adapter seam from cli entrypoint

* test: update ws forwarding guard for query-cli-adapter seam

* fix(query): close remaining CodeRabbit findings on cli adapter

* test: address remaining CodeRabbit nitpicks on ws forwarding coverage
2026-05-03 15:57:01 -04:00
Tom Boucher
b6c401dc90 refactor(query): deepen command/dispatch seams and resolve coderabbit findings
* refactor(query): deepen command definition seam and fold fallback mapping cleanup

* refactor(query): add shared dispatch formatting module seam

* fix(query): restore QueryResult type import in dispatch deps

* test/query: align raw-output policy and definition normalization contracts

* refactor(query): deepen diagnosis, invariant report, and error taxonomy seams

* refactor(query): deepen dispatch plan, fallback bridge, policy snapshot, and hints seams

* refactor(query): deepen validation, fallback policy, capability, and result builder seams

* refactor(query): deepen resolution strategy, output classifier, observability, and policy-capability seams

* refactor(query): finalize deep strategy/classifier/observability/capability seams

* test/query: address coderabbit inline and out-of-diff dispatch nits

* fix(query): address remaining coderabbit input-validation and bridge stderr threads

* fix(query): address remaining coderabbit dispatch and strategy/output nits
2026-05-03 15:29:34 -04:00
Tom Boucher
c3f896f311 docs(contributing): codify CONTEXT + ADR contribution and testing standards 2026-05-03 14:54:14 -04:00
Tom Boucher
f104dab332 refactor(query): deepen dispatch policy seam with structured result contract (#3066)
* refactor(query): deepen dispatch policy seam with structured result contract

Closes #3065.

- unify query dispatch outcome as typed success/failure union
- include error kind/details + final exit_code in failure path
- align native and fallback paths under one dispatch policy seam
- make CLI query path consume seam result (thin adapter)
- add ADR + context term for Dispatch Policy Module

* refactor(query): strengthen dispatch seam with shared error mapper and typed details

- add query-dispatch-error-mapper module shared by native/fallback paths
- remove ad-hoc inline mapping in dispatch/fallback executors
- lock error-details schema in mapper + dispatch tests
- document structured dispatch contract in QUERY-HANDLERS.md

* fix(query): return structured fallback failure when path resolution throws

- guard resolveGsdToolsPath in cjs dispatch path
- map thrown resolution errors to fallback_failure result
- add regression test for structured failure contract
2026-05-03 14:30:27 -04:00
Tom Boucher
5975f06b6a refactor(query): extract command catalog seam for registry wiring (#3060)
* refactor(sdk): extract gsdtools transport seam with per-command policy

* refactor(query): centralize registry command catalog wiring

* refactor(query): unify command resolution seam across sdk callers

* fix(sdk): address CodeRabbit transport policy and timeout findings

* refactor(query): extract mutation event mapper seam

* refactor(query): converge mutation and transport policy data

* refactor(query): share fallback orchestration across cli and sdk

* refactor(query): split static registry catalog by domain clusters

* refactor(query): extract mutation event emission decorator seam

* refactor(query): extract alias-family handler catalog module

* refactor(query): extract cjs fallback execution adapter

* refactor(query): deepen command semantics seam

* refactor(query): extract deep dispatch seam

* refactor(query): deepen cjs fallback execution seam

* refactor(query): merge routing plan into dispatch seam

* fix(query): address CodeRabbit review findings on PR #3060

Critical: prevent double-execution race by checking timeout errors
before subprocess fallback (gsd-transport.ts).

Major: fix execRaw() to respect transport policy outputMode instead
of hardcoding 'raw' (gsd-tools.ts).

Major: add explicit 30s timeout to subprocess fallback execution
(query-fallback-executor.ts).

Major: remove raw args from stderr banner to prevent secret leakage
(query-fallback-executor.ts).

Minor: ensure native text output has trailing newline for CLI parity
(query-dispatch.ts).

Update gsd-tools.test.ts to match new execRaw() behavior.

* fix(tests): update CLI integration tests for catalog-based registration

The refactoring moved handler registration from inline registry.register()
calls to catalog-based registration (registerStaticCatalog/registerAliasCatalog).

- gsd-sdk-query-registry-integration.test.cjs: collectRegisteredNames() now
  also scans catalog files for handler names registered via the new system.
- bug-2492-context-coverage-gate.test.cjs: checks for catalog-based
  registration (DECISION_ROUTING_STATIC_CATALOG) instead of inline strings.
- bug-2524-sdk-query-ws-flag.test.cjs: checks for dispatchNative callback
  pattern instead of direct registry.dispatch() call.

* fix(query): address remaining CodeRabbit review findings

- query-command-semantics.ts: guard stats/progress rewrite so option
  tokens (e.g. --pick) are not turned into subcommands, preserving the
  top-level handler dispatch.

- query-dispatch.ts: formatOutput now skips --pick for text-format
  responses (matching CJS fallback behavior) and surfaces a proper error
  when extractField returns undefined instead of silently producing
  'undefined'.

- query-dispatch.ts: fix backwards error message — 'registered' is the
  restrictive policy that disables fallback, not enables it.

- tests/bug-2492-context-coverage-gate.test.cjs: check
  VERIFY_DECISION_STATIC_CATALOG (the correct catalog for plan-gate
  handlers) instead of DECISION_ROUTING_STATIC_CATALOG.

- tests/gsd-sdk-query-registry-integration.test.cjs: resolve catalog
  variable before loading entries so the drift guard checks each
  referenced catalog individually.

* refactor(query): deepen registry assembly module with strict invariants

- extract registry assembly into dedicated module
- split build vs mutation decoration internals
- add strict assembly invariants:
  1) no duplicate keys
  2) alias canonicals must have handlers
  3) mutation commands must be registered
  4) raw-output policy commands must be registered
- slim query index to thin re-export seam
- add focused registry assembly tests
- update drift-guard tests to target new seam

* test(query): add thin-seam coverage for query index re-exports

* fix(query): return structured native dispatch errors + tighten decisions.parse guard

- runQueryDispatch native path now catches adapter errors and returns
  QueryDispatchResult.error instead of throwing.
- preserve legacy CLI exit contract by using code=1 for native dispatch
  failures.
- strengthen bug-2492 guard: decisions.parse assertion now checks
  VERIFY_DECISION_STATIC_CATALOG OR explicit command token.
2026-05-03 13:57:32 -04:00
Tom Boucher
0f98952a3d refactor(sdk): extract GSDTools transport seam + policy (#3058)
* refactor(sdk): extract gsdtools transport seam with per-command policy

* fix(sdk): address CodeRabbit transport policy and timeout findings

* fix(sdk): harden raw transport formatting and raw-path coverage
2026-05-03 08:20:05 -04:00
Tom Boucher
eb365f7336 docs: audit and update docs/ for v1.40.0 release (#3048)
* docs(en): update FEATURES/USER-GUIDE/COMMANDS for v1.40.0 surface

- FEATURES.md: append v1.40.0 section (#122 skill consolidation, #123
  namespace meta-skills, #124 context-window guard, #125 phase-lifecycle
  status-line read-side); add to TOC.
- USER-GUIDE.md: add slash-command form (hyphen vs colon) primer and
  namespace routing primer; replace deleted slash forms in walkthroughs
  (`/gsd-add-backlog`, `/gsd-plant-seed`, `/gsd-add-phase`,
  `/gsd-set-profile`, `/gsd-list-workspaces`, etc.) with consolidated
  forms (`/gsd-capture --backlog`, `/gsd-phase --insert`,
  `/gsd-config --profile`, `/gsd-workspace --list`, etc.); fix
  `/gsd-spike-wrap-up` and `/gsd-sketch-wrap-up` to flag form.
- COMMANDS.md: clarify Command Syntax (Gemini = colon form, others =
  hyphen form); add Namespace Meta-Skills section with all six routers;
  add `--context` to /gsd-health flag table.

Refs #3047

* docs(en): refresh INVENTORY/CLI-TOOLS/STATE-MD-LIFECYCLE for v1.40.0

- INVENTORY.md: workflow-row "Invoked by" column updated to point at
  consolidated commands (`/gsd-phase` family, `/gsd-workspace --list`,
  `/gsd-config --advanced/--integrations/--profile`,
  `/gsd-sketch --wrap-up`, `/gsd-spike --wrap-up`); CLI-modules row for
  `secrets.cjs` updated to `/gsd-config --integrations`. Command count
  and namespace meta-skills section already reflect 65 shipped (= 59
  consolidated sub-skills + 6 ns-* routers).
- CLI-TOOLS.md: add `validate context` row under Validation Commands
  with the 60 %/70 % threshold envelope used by `/gsd-health --context`.
- STATE-MD-LIFECYCLE.md: flip status header from "proposed" to
  "shipped in v1.40.0" since `parseStateMd()` and `formatGsdState()`
  now read and render `active_phase`, `next_action`, `next_phases`,
  and `progress`.

`docs/AGENTS.md` audited and verified clean — `gsd-code-fixer` row
already lists the correct `/gsd-code-review --fix` spawner; no
deleted-skill references found. `docs/INVENTORY-MANIFEST.json`
audited and verified clean — already enumerates the 65 commands
(including six ns-* routers) and contains no deleted slash forms.

Refs #3047

* docs(en): cleanup ARCHITECTURE/CONFIGURATION for v1.40.0

- ARCHITECTURE.md: split Commands install-target list to call out the
  Gemini colon form (`/gsd:command-name`) vs hyphen form for every
  other runtime. Add a new subsection covering two-stage hierarchical
  routing via the six namespace meta-skills (#2792) and a paired note
  on the MCP token-budget interaction so readers see the two big
  per-turn cost levers in one place.
- CONFIGURATION.md: rewrite three references to the deleted
  `/gsd-settings-advanced` and `/gsd-settings-integrations` slash
  forms to use the consolidated `/gsd-config --advanced` /
  `/gsd-config --integrations` invocations. Add a new "STATE.md
  Frontmatter (Phase Lifecycle)" section documenting the four
  optional fields (`active_phase`, `next_action`, `next_phases`,
  `progress`) read by the v1.40 status-line, with a pointer to
  STATE-MD-LIFECYCLE.md for the full reference.

`docs/manual-update.md` audited and verified clean — already documents
`/gsd-update --reapply` (the consolidated form), no reference to the
deleted `/gsd-reapply-patches`.

Refs #3047

* docs(i18n): mirror v1.40.0 slash-command rename into ja-JP/ko-KR/zh-CN/pt-BR

Mechanical token-level renames only — every reference to a deleted
micro-skill slash form is rewritten to the consolidated form on the
matching parent skill. No prose was machine-translated; new prose
sections (slash-form primer, namespace routing primer, v1.40 feature
entries, STATE.md frontmatter) were left for human translator
follow-up.

Renames applied uniformly across all four trees:
  /gsd-add-todo, /gsd-add-note, /gsd-add-backlog,
  /gsd-plant-seed, /gsd-check-todos      → /gsd-capture[ --note|
                                            --backlog|--seed|--list]
  /gsd-add-phase, /gsd-insert-phase,
  /gsd-remove-phase, /gsd-edit-phase     → /gsd-phase[ --insert|
                                            --remove|--edit]
  /gsd-new-workspace, /gsd-list-workspaces,
  /gsd-remove-workspace                  → /gsd-workspace[ --new|
                                            --list|--remove]
  /gsd-settings-advanced,
  /gsd-settings-integrations,
  /gsd-set-profile                       → /gsd-config[ --advanced|
                                            --integrations|--profile]
  /gsd-sketch-wrap-up                    → /gsd-sketch --wrap-up
  /gsd-spike-wrap-up                     → /gsd-spike --wrap-up
  /gsd-reapply-patches                   → /gsd-update --reapply
  /gsd-code-review-fix                   → /gsd-code-review --fix
  /gsd-plan-milestone-gaps               → /gsd-audit-milestone

Refs #3047

* docs(changelog): regroup [Unreleased] under Feature/Enhancement/Fix

Replace the existing Keep-a-Changelog \`Added\` / \`Changed\` /
\`Performance\` / \`Removed\` / \`Fixed\` sub-headers in the [Unreleased]
block with the issue/PR template taxonomy:

  Added                 → Feature
  Changed / Performance → Enhancement
  Removed               → Enhancement
  Fixed                 → Fix

Order within the release: Feature → Enhancement → Fix. Every bullet
preserved verbatim — only headers and grouping changed; the awkward
inline-versioned headers (\`### Added — 1.40.0-rc.1\`,
\`### Changed — 1.40.0-rc.1\`, \`### Fixed — 1.40.0-rc.1\`) folded into
the same buckets with the \`— 1.40.0-rc.1\` suffix dropped, since the
[Unreleased] block IS 1.40.0-rc.1.

The [1.39.2] hotfix block called out in #3047's spec does not yet
exist in CHANGELOG.md (the previously released hotfix is [1.39.1]),
so this commit only regroups [Unreleased]. Older release blocks
([1.39.1] and earlier) are frozen and untouched.

Refs #3047

* docs(changeset): add fragment for v1.40.0 doc audit

Refs #3047

* docs(en): strip leading / from deleted slash-command tokens in FEATURES

REQ-CONSOLIDATE-03 and REQ-CONSOLIDATE-04 listed deleted commands by
their `/gsd-foo` form for the historical record. The docs-parity tests
in bug-3010, bug-3029-3034, and bug-3042-3044 use the regex
`/\/gsd-[a-z0-9][a-z0-9-]*/g` to scan user-facing surfaces for any
remaining mention of removed slash forms — they cannot tell prose
about a deleted command from a live recommendation.

Strip the leading slash from the bare-name references (preserve the
historical text otherwise). Tests now require a `/` prefix to match,
so `gsd-add-todo` reads identically to a human but no longer trips
the parser.

Verified locally: 65/65 tests pass across the three docs-parity
suites that were red on CI run 25270072600.

Refs #3047

* docs(en): fix CR feedback + drop literal /gsd:plan-phase from USER-GUIDE

CI: tests/bug-2543-gsd-slash-namespace.test.cjs flagged
docs/USER-GUIDE.md:35 for embedding the literal `/gsd:plan-phase`
token in the parenthetical Gemini-form example. The test scans every
.md under docs/ for `/gsd:<live-cmd>` because non-Gemini surfaces must
not advertise the colon form. Replaced the literal example with a
prose substitution rule.

CR: docs/ARCHITECTURE.md:125 — the namespace meta-skills were listed
by file-prefix (`gsd-ns-workflow`) but the invocable frontmatter `name:`
is the bare form (`gsd-workflow`). Verified against the six
`commands/gsd/ns-*.md` files. Replaced with the canonical names and
noted the file/name disagreement in-line.

CR: docs/COMMANDS.md:723 — `v1.40` aligned to canonical `v1.40.0`.

CR: docs/FEATURES.md:2679 — REQ-CTX-GUARD-02 advertised the wrong
invocation (`gsd-tools validate context`). The shipped handler is
exposed via `gsd-sdk query validate.context` and requires explicit
`--tokens-used <int>` + `--context-window <int>` flags (verified
against sdk/src/query/validate.ts:849-882 and
get-shit-done/bin/lib/validate-command-router.cjs:19-36).

CR: docs/zh-CN/README.md:533 — added `inherit` to the profile-options
parenthetical to match the canonical set (verified against
model-profiles.cjs:29 `VALID_PROFILES = […MODEL_PROFILES['gsd-planner'], 'inherit']`).

Verified locally: 74/74 tests pass across the four docs-parity suites
that were red on CI runs 25270072600 and 25270182903.

Refs #3047
2026-05-03 07:33:27 -04:00
Tom Boucher
1e6737cd8e feat(plan-phase): --research-phase flag + scrub stale slash-command refs (#3042, #3044) (#3045)
* feat(plan-phase): --research-phase flag absorbs deleted /gsd-research-phase + scrub stale refs (#3042, #3044)

#3042 (orphaned research-phase): /gsd-research-phase had a workflow file
but no slash-command stub. Rather than restore the orphan, the research-
only capability is now a flag on /gsd-plan-phase:

  /gsd-plan-phase --research-phase <N>

When set, the workflow scopes to phase N, runs the research step (Section
5 of the existing plan-phase workflow), then early-exits before the
planner/plan-checker/verifier chain.

Per RCA against the deleted standalone, the flag adds two modifiers to
fully cover the original surface (Option B from the RCA discussion):

- --view : print existing RESEARCH.md to stdout, no spawn. Cheapest mode
  for the correction-without-replanning loop the issue reporter
  explicitly called out. Errors with a clear hint if RESEARCH.md is
  missing.
- --research : reuse the existing "force re-research" semantics. In
  research-only mode this skips the existing-RESEARCH.md prompt and
  re-spawns unconditionally.
- Neither flag, RESEARCH.md exists : prompt update/view/skip. Mirrors
  the deleted standalone's existing-artifact menu (#3042 RCA).

#3044 (stale slash-command refs): scrubbed five deleted commands from
all user-facing surfaces, including English docs, 4 localized doc sets
(ja-JP, ko-KR, zh-CN, pt-BR), workflows, templates, and references.

  /gsd-check-todos          → /gsd-capture --list
  /gsd-new-workspace        → /gsd-workspace --new
  /gsd-status               → /gsd-progress
  /gsd-plan-milestone-gaps  → table rows / orphan sections removed
                              (PR #3038 only scrubbed workflows/agent;
                              missed the docs surfaces this PR covers)
  /gsd-research-phase       → /gsd-plan-phase --research-phase

Includes a fix to docs/issue-driven-orchestration.md (PR #3036)
which itself referenced /gsd-new-workspace 4 times — self-correction.

Removed:
- get-shit-done/workflows/research-phase.md (orphan, capability
  absorbed into --research-phase flag)

Tests:
- tests/bug-3042-3044-research-flag-and-stale-refs.test.cjs — 46
  structural-IR tests across both bugs:
  - argument-hint advertises --research-phase + --view
  - workflow parses --research-phase, sets RESEARCH_ONLY,
    early-exits before planner
  - --view prints RESEARCH.md without spawning
  - --research forces refresh in research-only mode
  - existing-RESEARCH.md prompt path with update/view/skip
  - workflows/research-phase.md is removed
  - 5 deleted slash-commands absent from 17 English user-facing
    surfaces + 16 localized doc surfaces (4 locales × 4 docs each)
  - replacement command tokens present where deleted ones lived

6950/6950 full suite pass. Lints clean.

Closes #3042
Closes #3044

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix: address all 8 CR findings on PR #3045

Major (3):
- get-shit-done/workflows/plan-phase.md:344 — added explicit early-exit
  guard at Section 5.1: "Skip if RESEARCH_ONLY=true". Without it, an LLM
  could fall through "use existing, skip to step 6" → planner spawn,
  violating the research-only contract. The guard makes the early-exit
  unreachable from any non-research-only branch.
- get-shit-done/references/continuation-format.md (3 examples) +
  zh-CN/.../continuation-format.md (3 examples) — pointed to
  `/gsd-plan-phase --research-phase` but docs/COMMANDS.md didn't
  document the flag. Added a full --research-phase + --view + --research
  modifier section to the /gsd-plan-phase flag table in COMMANDS.md so
  the canonical reference matches the continuation examples.

Minor (5):
- docs/FEATURES.md:1632 — `/gsd-plan-phase --research-phase` →
  `/gsd-plan-phase --research-phase <N>` (include required arg).
- get-shit-done/templates/README.md:46 — NN-VALIDATION.md producer
  reverted from `/gsd-plan-phase --research-phase` (Nyquist) to plain
  `/gsd-plan-phase` (Nyquist). VALIDATION.md is created during normal
  Nyquist flow, not research-only mode — the bulk replacement was
  wrong for that line.
- get-shit-done/workflows/help.md:89 — signature line was missing
  `--research`; added it alongside `--research-phase` and `--view`.
- tests/bug-3042-3044-...:197 — promptHasView/promptHasSkip were
  tautological (matched anywhere in 1700-line workflow). Tightened
  to a proximity check anchored on "RESEARCH.md already exists" prompt
  header within a 600-char window. Updated workflow to emit that
  literal phrase.
- tests/feat-2840-...:95 — workspace assertion used `/gsd-workspace`
  but the documented replacement is `/gsd-workspace --new`. Tightened
  to require both tokens (in 3 places: requiredCommands list, regex
  in conceptPairs, error message).

6950/6950 full suite pass. Lint clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-02 23:12:50 -04:00
Tom Boucher
dca12242b5 fix(install): skip Gemini local commands/gsd when global GSD present (#3037) (#3041)
* fix(install): skip Gemini local commands/gsd when global GSD present (#3037)

Reporter showed that running `npx get-shit-done-cc --gemini --global`
followed by `--gemini --local` in a project creates the same 65 GSD
command files in both Gemini scopes:
  - ~/.gemini/commands/gsd/ (user scope)
  - <project>/.gemini/commands/gsd/ (workspace scope)

Gemini conflict-detects by command name across scopes and renames every
overlapping /gsd:* command to /workspace.gsd:* and /user.gsd:*, breaking
the documented /gsd:* namespace.

Fix: in bin/install.js, when handling --gemini --local, detect whether
~/.gemini/commands/gsd/ already exists with managed-shape content. If
so, skip the local copy and print a clear three-line warning explaining
the conflict avoidance. The user-scope install already provides the
same /gsd:* commands in this project; the local copy adds zero value.

Sibling fixes (test isolation):
- tests/install-minimal-all-runtimes.test.cjs: pass HOME/USERPROFILE
  through the spawned installer's env so the developer's real
  ~/.gemini/commands/gsd/ doesn't trigger the new skip path during
  test runs that want to assert the local-install populates
  commands/gsd/.
- tests/gemini-namespacing.test.cjs: the "Gemini Install (Behavioral)"
  describe block now creates an isolated tmpHome and points
  process.env.HOME at it before calling install(false, 'gemini'),
  with proper restore in afterEach.

Test:
- tests/bug-3037-gemini-duplicate-commands.test.cjs — 4 structural
  tests:
    1. global install populates HOME/.gemini/commands/gsd
    2. local install AFTER global skips the local copy
    3. local install with NO existing global still populates locally
       (no-regression)
    4. local install when HOME has .gemini/ but no GSD-managed
       commands/gsd/ still populates locally (non-GSD-Gemini-user
       no-regression)

6909/6909 full suite pass. Lints clean.

Closes #3037

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix: address CR feedback on PR #3041 — narrower detection + USERPROFILE restore

CR findings:

1. **bin/install.js (Major)** — userScopeHasGsd used
   `fs.readdirSync(homeGeminiGsd).length > 0` which would skip the
   local install for any non-empty directory, including a user who
   hand-dropped a single override at ~/.gemini/commands/gsd/<thing>
   .toml without ever running --gemini --global. Narrowed the
   detection to require at least 3 canonical GSD command files
   (help.toml, progress.toml, new-project.toml) — a marker that
   ships in every GSD Gemini install (minimal mode included) and is
   structurally impossible to produce by accident.

2. **tests/bug-3037-...:59 (Minor)** — beforeEach overwrites
   process.env.USERPROFILE but afterEach only restores HOME, leaking
   the temp home into later tests on Windows or any code path that
   reads USERPROFILE. Added save/restore symmetric with HOME.

Plus added a 5th regression test covering the narrowed detection:
"local install when HOME has hand-dropped overrides UNDER commands/gsd/
(but no full GSD) still populates locally" — directly exercises the
edge case CR identified.

5/5 targeted tests pass. 6910/6910 full suite pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-02 17:44:52 -04:00
Tom Boucher
7714b5244b fix(workflows,docs): scrub stale /gsd-code-review-fix and /gsd-plan-milestone-gaps refs (#3029, #3034) (#3038)
* fix(workflows,docs): scrub stale /gsd-code-review-fix and /gsd-plan-milestone-gaps refs (#3029, #3034)

#2790 consolidated /gsd-code-review-fix into /gsd-code-review --fix and
deleted /gsd-plan-milestone-gaps in favor of inline gap planning as part
of /gsd-audit-milestone's output. The deletion was propagated through
some surfaces (#2950 covered help/do/settings/discuss-phase/etc.) but
several user-facing surfaces still emitted the old forms:

#3029 — /gsd-code-review-fix references in:
- agents/gsd-code-fixer.md (description, "Spawned by", recovery prose)
- get-shit-done/workflows/code-review.md (offer text)
- get-shit-done/workflows/execute-phase.md (offer text)
- get-shit-done/workflows/code-review-fix.md (internal retry hints)
- docs/INVENTORY.md (agent + workflow rows)
- docs/CONFIGURATION.md (workflow.code_review row)
- docs/USER-GUIDE.md (3 occurrences in walkthrough)
- docs/AGENTS.md (gsd-code-fixer agent stub)
- docs/FEATURES.md (commands list + REQ-REVIEW-04)

All replaced with /gsd-code-review --fix. Internal retry hints in the
workflow file itself updated to point at the new form. Release notes
(docs/RELEASE-*.md) and gsd-ns-review's "absorbed by" deletion note
left unchanged — historical/explanatory content.

#3034 — /gsd-plan-milestone-gaps references in:
- get-shit-done/workflows/audit-milestone.md (<offer_next> blocks for
  gaps_found and tech_debt: lines 281, 323)
- commands/gsd/complete-milestone.md (gaps_found pre-flight: lines 46, 57)

Replaced with inline closure path:
  /gsd-phase --insert <N> "Close gap: <REQ-ID> ..."
  /gsd-discuss-phase <N>
  /gsd-plan-phase <N>
  /gsd-execute-phase <N>

Plus a Nyquist-coverage hint pointing at /gsd-validate-phase /
/gsd-secure-phase for retroactive audit-chain hygiene gaps. The
gsd-ns-project SKILL.md "deleted by #2790" note is preserved
(it's the canonical pointer for future readers asking what
happened to the command).

Tests:
- tests/bug-3029-3034-stale-command-routes.test.cjs — parser-based
  assertions per fixed surface, plus a structural cross-check that
  gsd-ns-project keeps the deletion note. 15 tests, all green.
- 6905/6905 full suite passes.

Closes #3029
Closes #3034

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix: address CR feedback on PR #3038 — argument order, structural tests, agent count

CR findings on PR #3038:

1. **docs/USER-GUIDE.md (Major)** — `--fix` examples used flag-first form
   (`/gsd-code-review --fix 3`), but the supported CLI grammar is
   phase-first (`/gsd-code-review 3 --fix`). The original sed-based
   replacement preserved the position of the `gsd-code-review-fix`
   token, producing the wrong order. Fixed in USER-GUIDE.md (3
   occurrences) and the same drift in the workflow surfaces:
   - get-shit-done/workflows/code-review-fix.md (2 retry hints)
   - get-shit-done/workflows/code-review.md (offer text)
   - get-shit-done/workflows/execute-phase.md (offer text)

2. **docs/AGENTS.md (Minor)** — internal count drift: line 483 said
   "Ten additional agents" but line 725 said "12 advanced/specialized".
   Filesystem reality: 33 agents total, 21 primary, 12 specialized
   (count of `### ` stubs in the Advanced and Specialized section).
   Updated lines 3, 13, 483 to use 12/33 and added the two missing
   names (doc-classifier, doc-synthesizer) to the inline list at
   line 13.

3. **tests:94 (Major refactor suggestion)** — `.includes()` token checks
   were source-grep style. Refactored to a typed-IR pattern: extract
   the SET of slash-command tokens via regex, assert membership on the
   parsed Set instead of substring scanning the raw file text. Added
   the `allow-test-rule` comment explaining the IR-build vs
   IR-assertion split per scripts/lint-no-source-grep.cjs convention.

4. **tests:130 (Major)** — replacement-path assertion was file-wide and
   could false-pass on generic mentions of "inline" elsewhere in the
   file. Refactored: `extractOfferBlocks(content)` returns the typed
   list of `<offer_next>` and "Pre-flight" blocks where the deleted
   command previously lived, and the assertion runs against those
   blocks specifically. Now requires `/gsd-phase --insert` or
   inline-audit prose to appear in the same offer block, not just
   somewhere in the file.

15/15 targeted tests pass. 6905/6905 full suite pass. Lints clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-02 17:23:44 -04:00
Tom Boucher
117b3ec009 docs: add issue-driven orchestration guide (#2840) (#3036)
* docs: add issue-driven orchestration guide (#2840)

Adds docs/issue-driven-orchestration.md — a recipe for driving GSD from a
GitHub / Linear / Jira issue using existing primitives. Maps Symphony-style
orchestration concepts onto GSD commands without vendoring code, adding a
daemon, or introducing tracker integration.

Concept mapping covers:
- WORKFLOW.md → ROADMAP.md / STATE.md / phase CONTEXT.md / phase PLAN.md
- isolated agent workspace → /gsd-new-workspace --strategy worktree
- agent dispatch → /gsd-manager (interactive), /gsd-autonomous (unattended)
- per-phase steps → /gsd-discuss-phase → /gsd-plan-phase → /gsd-execute-phase
- proof-of-work → /gsd-verify-work (UAT.md persists across /clear)
- adversarial review → /gsd-review (cross-AI peer review)
- human merge gate → /gsd-ship
- follow-up capture → /gsd-note, /gsd-plant-seed, /gsd-new-milestone

End-to-end flow walks through 7 numbered steps from picking the tracker
issue to capturing follow-ups. Safety boundaries (isolated worktrees,
explicit human review, no automatic public posting, verification before
ship) and non-goals (no vendoring, no daemon, no mandatory tracker, no
gate bypass, no command-surface expansion) are spelled out explicitly so
the doc cannot drift into "let's just add one more flag".

Cross-linked from docs/README.md (Documentation Index) and
docs/USER-GUIDE.md (Table of Contents preamble).

Tests: tests/feat-2840-issue-driven-orchestration-guide.test.cjs — 9
structural-IR tests parse the guide into a typed record and assert on
flags (commandsPresent, conceptPairs, nonGoalFlags, safetyFlags,
numberedSteps). Fence-language MD040 check enforced. Cross-link
presence enforced. No raw-text assertions on prose.

6890/6890 tests pass. Lint:tests clean (allow-test-rule comment justifies
the doc-shape parser per scripts/lint-no-source-grep.cjs escape hatch).
Lint:changeset clean.

Closes #2840

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(test): guard USER-GUIDE.md existsSync before read (CR #3036)

CR Minor: cross-linked-from-USER-GUIDE.md test called fs.readFileSync
directly without first asserting fs.existsSync, asymmetric with the
README.md test above. A missing USER-GUIDE.md would throw ENOENT instead
of producing a meaningful assertion message. Mirror the null-guard
pattern.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-02 16:57:42 -04:00
Tom Boucher
95d2bc20f8 feat(hooks): opt-in SessionStart update banner for non-statusline users (#2795) (#3035)
* feat(hooks): opt-in SessionStart update banner for non-statusline users (#2795)

When a user declines (or keeps a non-GSD) statusline at install time, the
installer now offers an opt-in SessionStart banner that surfaces GSD update
availability. The banner reads the existing
~/.cache/gsd/gsd-update-check.json cache (written by
gsd-check-update-worker.js) and emits a single systemMessage line only when
update_available is true:

    GSD update available: <installed> → <latest>. Run /gsd-update.

It is silent when up-to-date and rate-limits "check failed" diagnostics to
once per 24h via a sentinel file so a corrupt cache doesn't nag every
session. Removed cleanly by `npx get-shit-done-cc --uninstall` which strips
both the script and the SessionStart entry. The banner is never offered when
GSD's statusline is being installed (statusline already surfaces update
info, so re-prompting would be noise).

Implementation:
- hooks/gsd-update-banner.js — pure functions buildBannerOutput,
  shouldSuppressFailureWarning, readCache; thin main() wires them.
- bin/install.js — handleUpdateBanner() prompt, parseUpdateBannerInput(),
  buildUpdateBannerHookEntry(), buildUpdateBannerPromptText(); chained into
  installAllRuntimes() so finalize() receives both flags. updateBannerCommand
  computed alongside the other JS-hook commands; finishInstall() registers
  the SessionStart entry only when shouldInstallBanner === true and the
  hook file is present at the target.
- Hook ships in scripts/build-hooks.js HOOKS_TO_COPY, listed in
  MANAGED_HOOKS for stale-detection in gsd-check-update-worker.js, in the
  uninstall hook-removal lists in install.js, and in the
  rewriteLegacyManagedNodeHookCommands allowlist.

Tests:
- tests/feat-2795-update-banner.test.cjs — 22 tests, structural-IR
  assertions on parsed JSON envelopes (no raw-text matching). Covers
  pure-function branches (cache present/absent, parseError, rate-limit
  suppression, missing version fields), end-to-end hook invocation against
  fixture cache states, and install.js wiring (prompt text, input parsing,
  hook entry shape).
- tests/trae-install.test.cjs — updated install() return-shape assertion to
  include updateBannerCommand: null for the no-settings runtime.
- 6881/6881 tests pass.

Docs (bundled in same commit per the bundle-docs-with-code skill):
- docs/USER-GUIDE.md — new "Surface GSD Update Notifications Without GSD's
  Statusline" task section with opt-in/opt-out instructions.
- docs/FEATURES.md — REQ-HOOK-08 added; "Update Banner" subsection under
  the Hook System feature with cache flow + removal path.
- docs/INVENTORY.md — hook count 11 → 12, new row for gsd-update-banner.js.
- docs/INVENTORY-MANIFEST.json — regenerated.

Closes #2795

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(install): gate banner prompt on actual installability (CR #3035)

CodeRabbit findings on PR #3035:

- bin/install.js (Major): continueAfterStatusline gated banner prompt on
  the raw `shouldInstallStatusline` flag from handleStatusline. But
  finishInstall later silently skips the statusline write on local
  installs unless --force-statusline is set (#2248). Two consequences:
    1. Interactive local Claude/Gemini installs got neither a statusline
       nor a banner offer.
    2. Codex/Cursor/Copilot/Windsurf/Trae/Cline-only installs (where
       every result.updateBannerCommand is null) still got prompted even
       though the choice was silently ignored.
  Fix: derive willInstallStatusline = shouldInstallStatusline &&
  (isGlobal || forceStatusline), and gate the banner prompt on a
  canInstallBanner precondition computed from results[].updateBannerCommand.
  Pass the raw shouldInstallStatusline through to finalize unchanged so
  per-runtime statusline gating in finishInstall is unaffected.

- tests/feat-2795-update-banner.test.cjs (Minor): rate-limit suppression
  test parsed r1.stdout without first asserting r1.status === 0. Other
  e2e tests in this file (lines 210, 241) do this. A non-zero exit would
  surface as a cryptic SyntaxError instead of a status assertion failure.
  Fix applied verbatim.

6881/6881 tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-02 16:33:16 -04:00
Tom Boucher
35fffe7f31 docs(out-of-scope): record #2758 agent-template-rendering decision
Closed on the technical merits: the determinism claim is theoretical (no
observed misinterpretation), token waste is small and unmeasured, and PR
#2279's orchestrator-embedding path already serves the deterministic-gating
need without a parallel templating subsystem.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-02 15:56:24 -04:00
Tom Boucher
d137ce86ec docs(out-of-scope): record #2756 temporal-context decision
Reporter did not return to clarify the actual ask after the narrowing-then-
retraction in the comment thread. Closing as wontfix per .out-of-scope/
temporal-context.md with re-open criteria spelled out.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-02 15:53:08 -04:00
Tom Boucher
8c43ba7301 docs(#3025): MCP tool schema as a context-budget concern (#3032)
* docs(#3025): MCP tool schema as a context-budget concern

Adds documentation covering the largest GSD cost lever that GSD
itself does not own: MCP tool schema injection. Every enabled MCP
server adds its schema to every turn (often 20k+ tokens for
heavyweight servers like browser/playwright, mac-tools, etc.),
which can dwarf whatever `model_profile` tuning saves.

Two doc surfaces (per the bundle-docs-with-code skill depth gradient):

1. get-shit-done/references/context-budget.md
   - New "MCP Tool Schema Cost (Harness Concern)" section.
   - Explains schemas-per-turn cost framing.
   - Names enabledMcpjsonServers / disabledMcpjsonServers and
     .claude/settings.json explicitly.
   - Pre-phase audit checklist: browser/playwright, platform-specific,
     cross-project/stale, duplicate/shadow.
   - Explicit "GSD does not manage MCP enablement — harness concern"
     statement so users don't hunt for a GSD setting.
   - Links to Anthropic Claude Code MCP docs as canonical reference.
   - Notes compounding interaction with model_profile (additive levers).

2. docs/USER-GUIDE.md
   - New task-oriented "Trim MCP servers to reduce per-turn cost"
     section above "Using Non-Claude Runtimes".
   - Same checklist condensed.
   - Cross-link to context-budget.md for the full reference.

Tests:
- tests/feat-3025-mcp-token-budget-docs.test.cjs (12 cases) parses
  both docs into typed semantic-flag records and asserts behavioral
  invariants (mentions key, includes audit, names harness, etc.)
  rather than substring-matching prose. Adheres to CONTRIBUTING.md
  no-source-grep — section can be reworded freely as long as the
  required semantics survive.
- Markdownlint pre-flight tests (MD040 fence language, MD056 table
  column count) per the bundle-docs-with-code skill so CR can't
  ratchet on prose nitpicks across multiple review rounds.

Verification:
- 12/12 pass on regression test
- 6857/6857 full suite (12 net new)
- lint-no-source-grep clean (377 test files)

Companion to #3023 (per-phase-type model map) and #3024 (dynamic
routing). Together they cover the three biggest cost levers users
ask about; this issue covers the one GSD does not own.

Closes #3025

* docs(#3025): batch 3 CR fixes — pr id, relative link, named flag

CodeRabbit on PR #3032 (3 minor — 2 inline + 1 nitpick), all in one
push per the bundle-docs-with-code skill (avoid per-round nitpick
ratchet):

1. Inline (Minor) — .changeset/mcp-token-budget-docs.md:3
   `pr: TBD` → `pr: 3032` so changeset tooling can link the entry.

2. Inline (Minor) — docs/USER-GUIDE.md:1101
   Used a hardcoded `https://github.com/.../blob/main/...` URL for the
   cross-link to `context-budget.md`. Rest of USER-GUIDE.md uses
   relative links. Switched to `../get-shit-done/references/context-
   budget.md#mcp-tool-schema-cost-harness-concern` so feature-branch
   work shows the right content and rename-resilience is preserved.

3. Nitpick — tests/feat-3025-mcp-token-budget-docs.test.cjs:234
   The cross-link assertion used an inline `/context-budget/i.test(...)`
   while every other invariant in the file lived as a named flag in
   `parseMcpBudgetSection`. Per CONTRIBUTING.md no-source-grep, added
   `crossLinksContextBudget` to the parser and asserted on
   `parsed.crossLinksContextBudget` so the cross-link rule sits next
   to its siblings.

Verification:
- 12/12 pass on regression test (no count change; refactor only)
- No source code changes, only docs + tests

* test(#3025): strip inline markdown before phrase-match (CR nitpick)

CodeRabbit caught that the `explainsHarnessNotGsd` primary regex
branch couldn't match "GSD does **not** manage" in
context-budget.md because the markdown bold markers (`**`) sit
between contiguous words. The test passed today only via the
fallback `harness (concern|setting|controlled)` branch — the
primary branch was effectively dead code.

Fix: strip inline markdown emphasis (`**`, `*`, `~~`) and inline-
code backticks before any phrase-matching in `parseMcpBudgetSection`.
All seven flag computations now run against the stripped text so
markdown formatting can't silently invalidate any invariant.

Underscores are intentionally NOT stripped — `model_profile` and
other snake_case identifiers must survive intact for the
mentionsModelProfileInteraction check to find them.

Verification: 12/12 still pass; primary branches now fire on
real markdown content rather than relying on fallbacks.

* test(#3025): guard markdownlint tests against null section (CR nitpick)

CodeRabbit caught that the MD040 and MD056 markdownlint pre-flight
tests called `section.match(...)` and `section.split('\n')`
directly on the value returned by `extractSection`, which returns
null when no matching header is found. If the MCP section is ever
removed (regression), both tests would throw `TypeError: Cannot
read properties of null` instead of producing a clean assertion
failure naming the actual problem.

The semantic tests above are protected because parseMcpBudgetSection
short-circuits to a typed-falsy record on null input. The
markdownlint tests bypassed that guard since they need raw section
text, not parsed flags. Added `assert.ok(section, ...)` preconditions
to both so a missing section produces a meaningful failure message.

No content changes; defensive programming only.

Verification: 12/12 still pass.
2026-05-02 15:24:26 -04:00
Tom Boucher
e1d661ece0 feat(#3024): dynamic routing with failure-tier escalation (#3031)
* feat(#3024): dynamic routing with failure-tier escalation

Adds a `dynamic_routing` block to .planning/config.json that lets
the resolver start agents on a cheap tier and escalate one tier up
when the orchestrator detects a soft failure (verification
inconclusive, plan-check FLAG, etc.). Solves the "pay Opus rates as
insurance" anti-pattern by making escalation observed-quality-driven.

Architecture:
- AGENT_DEFAULT_TIERS map (light/standard/heavy) — every agent in
  MODEL_PROFILES declares a default tier; tests assert coverage
  so adding a new agent without updating the map fails CI.
- nextTier(currentTier) helper — light → standard → heavy → heavy
  (heavy stays at heavy; can't go further).
- resolveModelForTier(cwd, agentType, attempt) — new resolver. The
  orchestrator tracks the attempt counter and passes 0 for the
  first spawn, 1+ on escalation. The resolver caps internally at
  max_escalations so the orchestrator can blindly bump the counter.
- Schema validation: dynamic_routing.enabled / escalate_on_failure /
  max_escalations / tier_models.<light|standard|heavy>. Unknown
  tiers and unknown sub-keys rejected at config-set time.
- SDK schema mirror updated to keep CJS/SDK in lockstep (#2653).

Resolution precedence (highest → lowest):
  1. model_overrides[<agent>]              (full IDs accepted)
  2. dynamic_routing.tier_models[<tier>]   (NEW; escalation-aware)
  3. models[<phase_type>]                  (#3023 phase-type map)
  4. model_profile                         (per-agent column)
  5. Runtime default

Backward compatibility: dynamic_routing is disabled by default
(enabled: false or block omitted). resolveModelForTier short-
circuits to resolveModelInternal in that case, so callers can
adopt unconditionally without breaking existing behavior.

This PR delivers the JS-layer infrastructure: schema + tier map +
resolver. Orchestrator adoption (workflow markdown updates that
detect soft failures and call resolveModelForTier with attempt+1)
is incremental follow-up — verifier / plan-checker / integration-
checker each adopt the protocol when ready.

Tests (23 cases, all structural-IR — no stdout grep):
- Schema invariants: AGENT_DEFAULT_TIERS coverage, VALID_AGENT_TIERS
  exact match, every assignment uses a valid tier
- nextTier helper: light→standard→heavy→heavy, null on invalid input
- Disabled mode: no block + enabled:false both no-op (back-compat)
- Enabled mode: attempt=0 returns default tier model, attempt=1
  escalates, beyond max_escalations caps, heavy agents stay heavy,
  default max_escalations=1 when omitted
- Precedence: per-agent override beats dynamic_routing,
  dynamic_routing beats phase-type models
- Validation: every settings key accepted, unknown tiers/sub-keys
  rejected, bare `dynamic_routing` rejected as config-set target

Documentation:
- get-shit-done/references/model-profiles.md — full reference section
- docs/CONFIGURATION.md — full settings table + escalation flow
- docs/USER-GUIDE.md — task-oriented "Cheap-by-default" section
- docs/FEATURES.md — config row cross-link

Verification:
- 23/23 pass on regression test
- 6843/6843 full suite (23 net new from 6820)
- lint-no-source-grep clean (376 test files)
- SDK schema mirror keeps CJS/SDK in sync per #2653 parity test

Closes #3024

* fix(#3024): honor escalate_on_failure:false + 3 CR follow-ups

CodeRabbit on PR #3031 (4 findings — 1 Major + 2 Minor + 1 Nitpick):

1. **Major (inline)** — get-shit-done/bin/lib/core.cjs:1668
   resolveModelForTier ignored dynamic_routing.escalate_on_failure.
   When the user set it to false, escalation should be disabled, but
   the resolver only checked attempt/max_escalations. An orchestrator
   that always passes attempt+1 on retry would silently escalate
   despite the user opting out.
   Fix: gate effectiveAttempt on `dr.escalate_on_failure !== false`
   so false short-circuits every attempt back to the default tier.

2. **Minor (inline)** — docs/CONFIGURATION.md:123-126
   The dynamic_routing rows in the Core Settings table had 4 cells
   instead of 5 (missing the Options column), breaking the table
   structure. Added explicit Options values for enabled / escalate_on_failure
   / max_escalations rows.

3. **Minor (outside-diff)** — references/model-profiles.md:179-195
   "Resolution Logic" sketch was pre-#3024 and didn't include
   dynamic_routing in the precedence ladder. Updated to a 6-step
   block with dynamic_routing at step 3 (between override and
   phase-type).

4. **Nitpick** — tests/feat-3024-dynamic-routing.test.cjs:189+
   Tests used `if (lightAgent) { ... }` guards that silent-pass
   when AGENT_DEFAULT_TIERS drifts. Replaced all 5 conditional
   skips with `assert.ok(lightAgent, '...')` preconditions so a
   tier-mapping change surfaces as a test failure.

Plus: 2 new regression tests for the Major fix:
- escalate_on_failure:false caps every attempt at default tier
- escalate_on_failure:true (explicit) still escalates normally

Verification:
- 25/25 pass on regression test (23 prior + 2 escalate_on_failure)
- 6845/6845 full suite (2 net new)
- lint-no-source-grep clean

* docs(#3024): align precedence + add fence language tags (CR follow-up)

CodeRabbit (3 minor):

1. docs/CONFIGURATION.md:691 — "Per-Phase-Type Models → Resolution
   precedence" was a 4-step block written pre-#3024; readers got
   contradictory rules between the per-phase-type section and the
   later dynamic_routing section. Updated to the same 5-step ladder
   with dynamic_routing at step 2, and noted that dynamic_routing
   is disabled by default so this section's behavior is unchanged
   when the kill-switch is off.

2. docs/CONFIGURATION.md:770 — escalation-flow code fence missing
   language tag (MD040). Added `text`.

3. references/model-profiles.md:184 — resolution-ladder code fence
   missing language tag (MD040). Added `text`.

No code changes; docs only. Verification: regression test still 25/25.

* docs(#3024): clarify precedence prose — five layers, not four (CR nitpick)

CodeRabbit nitpick: the "Per-Phase-Type Models → Resolution
precedence" prose said "The four layers compose..." but the ladder
above lists five (including Runtime default). Also "dynamic_routing
escalates per-attempt above all of them" misreads as suggesting
dynamic_routing wins over model_overrides — actually overrides still
win at step 1.

Reworded top-down so the precedence direction is unambiguous:
  - model_profile = base
  - models = phase-level override
  - dynamic_routing = per-attempt escalation
  - model_overrides = per-agent exception (top)
  - runtime default = fallback

No code changes; docs only.

* docs(#3024): note escalate_on_failure:false in escalation-flow diagram (CR)

CodeRabbit nitpick: the escalation-flow diagram in
docs/CONFIGURATION.md described the soft-failure → respawn →
tier_models[next_tier_up] path, but didn't surface the
`dynamic_routing.escalate_on_failure: false` kill-switch right next
to it. Users reading the flow diagram (which is the canonical place
to understand attempt behavior) wouldn't see that the kill-switch
overrides the soft-failure branch.

Added a one-paragraph note immediately after the flow listing,
before the tier-sequence example, so the kill-switch is visible
exactly where users decide whether escalation will happen.

No code changes; docs only.
2026-05-02 14:26:35 -04:00
Tom Boucher
d812c66020 feat(#3023): per-phase-type model map in .planning/config.json (#3030)
* feat(#3023): per-phase-type model map in .planning/config.json

Adds a new `models` block to .planning/config.json with six phase-type
slots (planning / discuss / research / execution / verification /
completion). Lets users express coarse tuning ("Opus for planning,
Sonnet for the rest") without learning the agent taxonomy.

Resolution precedence (highest → lowest):
  1. Per-agent `model_overrides[agent]`     (full IDs; targeted exception)
  2. Phase-type `models[phase_type]`         (NEW; tier alias)
  3. Profile table (`model_profile`)          (per-agent column)
  4. Runtime default

The three layers compose: `models` defaults a phase, `model_overrides`
carves an exception. Phase-type values are tier aliases (opus/sonnet/
haiku/inherit) so the runtime-resolution chain (#2517) stays correct
end-to-end without further branching.

Implementation:
- model-profiles.cjs: new AGENT_TO_PHASE_TYPE map + VALID_PHASE_TYPES
  set. Each agent in MODEL_PROFILES gets one phase-type assignment;
  tests assert coverage so adding a new agent without updating the
  table fails CI.
- core.cjs (resolveModelInternal): inserts phase-type tier lookup
  between per-agent override and profile-derived tier. Skips runtime
  resolution when the resolved tier is 'inherit' (was previously gated
  only on profile === 'inherit'; phase-type can now produce inherit
  independently).
- core.cjs (loadConfig): pass `parsed.models` through both code paths
  so resolveModelInternal can read it.
- config-schema.cjs + sdk/src/query/config-schema.ts: dynamic-pattern
  validator accepts only the six known phase-types; unknown slots
  rejected at config-set time.

Backward compat: configs without `models` behave exactly as today.

Tests (15 cases, all structural-IR — no stdout grep):
- Schema: AGENT_TO_PHASE_TYPE coverage, VALID_PHASE_TYPES exact match
- Resolver: phase-type alone; per-agent override beats phase-type;
  phase-type beats profile; issue's full example; "inherit"; empty
  block is no-op; no block is no-op
- Validation: each of the 6 slots accepted; unknown slot rejected;
  bare `models` (no slot) rejected

Verification:
- 15/15 pass on new regression test
- 6808/6808 full suite (5 net new), 0 fail
- lint-no-source-grep clean across 375 test files

Closes #3023

* docs(#3023): document `models` per-phase-type config in user-facing docs

Adds `models` block coverage to the three user-facing docs that ship
with each release:

1. docs/CONFIGURATION.md
   - New "Per-Phase-Type Models" section between "Per-Agent Overrides"
     and "Non-Claude Runtimes" with:
       * full example mixing models + model_overrides
       * phase-type → agent mapping table
       * resolution-precedence pseudocode
       * accepted values (tier alias only)
       * "When to use which" decision matrix
       * validation behavior + example error
   - Added `"models": {}` to the Full Schema snippet
   - Added a row for `models.<phase_type>` to the config keys table
     (next to model_profile_overrides for adjacency)

2. docs/FEATURES.md
   - Added a row for models.<phase_type> in the Configurable Settings
     table (right under model_profile)
   - Cross-link to CONFIGURATION.md for the full surface

3. docs/USER-GUIDE.md
   - New task-oriented "Tuning model cost by phase" section above
     "Using Non-Claude Runtimes" — leads with the concrete config
     and shows the override pattern (one-shot phase + targeted exception)
   - Cross-link to CONFIGURATION.md

Verification:
- 29/29 pass on config-schema-docs-parity + docs-update + new feature
  test (parity-check passes, so the config-schema entry I added in the
  feature commit is now matched by a docs row)
- 6808/6808 full suite pass
- lint-no-source-grep clean

Doc style follows the same pattern used by the existing model_profile,
model_overrides, and model_profile_overrides sections — example-led,
table-backed, cross-referenced. Each doc surfaces the feature at the
right depth (reference / settings table / task guide).

* fix(#3023): mirror phase-type tier in resolveReasoningEffortInternal (CR Major)

CodeRabbit caught a real Codex correctness bug + 3 minor docs/test issues:

1. **Major (outside-diff)** — resolveReasoningEffortInternal in core.cjs
   derived its tier exclusively from the profile table, ignoring the
   models.<phase_type> override added in #3023. Failure mode on Codex:

     Config: model_profile=balanced, models.execution=opus, agent=gsd-executor
     resolveModelInternal:           tier=opus    → gpt-5.4
     resolveReasoningEffortInternal: tier=sonnet  → reasoning_effort=medium
                                                     ↑
                                                  WRONG — should be xhigh
                                                  (opus tier on Codex)

   The runtime received a mismatched (model, effort) pair. Mirrored the
   phase-type lookup from resolveModelInternal so both functions derive
   from the same tier source. 'inherit' phase-type returns null effort
   (no runtime entry maps to 'inherit'; let runtime decide).

2. Minor — .changeset/per-phase-type-models.md `pr: TBD` → `pr: 3030`.

3. Minor (outside-diff) — model-profiles.md "Resolution Logic" section
   omitted the new phase-type tier. Updated the 4-step block to a 5-step
   block including `models[phase_type]` between override and profile,
   plus a paragraph noting that `model` and `reasoning_effort` derive
   from the same tier source.

4. Nitpick — added 2 typo-safety tests:
   - models.research = "haiku3" (typo) → falls through to profile
   - models.research = "openai/gpt-5" (full ID) → falls through to profile
   Plus 5 new reasoning_effort tests covering the Major fix:
   - exported correctly
   - phase-type override flips both model AND effort to same tier
   - inherit phase-type returns null effort
   - per-agent override still bypasses phase-type for effort
   - claude runtime ignores models.* (no effort propagation)

Verification:
- 24/24 pass on regression test (15 original + 2 typo-safety + 5 effort + 2 outside-diff related)
- 6815/6815 full suite (7 net new from 6808)
- lint-no-source-grep clean

The reasoning_effort tests are written semantically (phase-type override
must produce the SAME effort as a profile-only opus config) rather than
hard-coding tier-specific effort strings, so changes to the runtime tier
map don't break them.

* fix(#3023): phase-type override beats profile=inherit (CR Major round 2)

CodeRabbit caught another precedence inversion: when
  { model_profile: 'inherit', models: { execution: 'opus' } }
both resolvers short-circuited on `profile === 'inherit'` BEFORE the
phase-type override could be honored. Result: model returned 'inherit'
and reasoning_effort returned null — both contradicting the documented
precedence where models[phase_type] wins over model_profile.

Fix in resolveModelInternal:
- Compute tier from phase-type FIRST. If phase-type is a valid alias,
  it wins. Otherwise, fall back to profile-derived tier OR 'inherit'
  (when profile === 'inherit').
- Gate the runtime-resolution branch on `tier !== 'inherit'` (was
  `profile !== 'inherit'`) so phase-type=opus can flip runtime mapping
  on even when profile=inherit.
- Gate the inherit-return on `tier === 'inherit'` (was
  `profile === 'inherit'`).

Fix in resolveReasoningEffortInternal:
- Remove the `if (profile === 'inherit') return null;` early-return.
- Compute tier from phase-type first, fall back to profile. If
  phase-type is explicitly 'inherit' OR the resolved tier is 'inherit',
  return null (no runtime entry maps to inherit).

Tests added (5 new):
- model: phase-type wins over profile=inherit (with explicit opus, with
  haiku for one phase + planner-without-slot still inheriting)
- model: profile=inherit + no models block → all agents inherit (no
  regression on existing inherit semantics)
- model: profile=inherit + models block but agent has no slot → that
  agent inherits, agents with slots get phase-type tier
- effort: phase-type opus + profile=inherit → produces opus-tier
  effort, NOT null (the original bug)

Verification:
- 27/27 pass on regression test (24 prior + 3 model + 1 effort)
- 6820/6820 full suite (5 net new)
- lint-no-source-grep clean

The effort test reads the expected value by running a profile-only opus
config and comparing — semantic check, not hard-coded effort string. So
runtime tier map changes don't break the test.
2026-05-02 13:19:15 -04:00
Tom Boucher
c9f5b7daac fix(#3020): probe user shell PATH at install-time, not just process.env.PATH (#3028)
* fix(#3020): probe user shell PATH at install-time, not just process.env.PATH

The installer's "✓ GSD SDK ready" message was a false positive whenever
the install subprocess's process.env.PATH contained the gsd-sdk shim
but the user's later interactive shells did not. Three known sources
of mismatch on POSIX:

- ~/.local/bin: install subprocess inherits npm/npx-injected PATH;
  user's login shell may not add ~/.local/bin if .profile/.bashrc/
  .zshrc don't.
- nvm/fnm/volta: node version managers shim PATH per-shell, so
  `npm prefix -g` from inside the install subprocess can resolve to
  a different bin dir than the user's interactive shell sees.
- npm-prefix tooling: some installers inject extra PATH entries that
  vanish in fresh sessions.

Result reported on #3011 by @x0rk and @stefanoginella: install prints
✓, but every workflow invocation later fails with
"bash: gsd-sdk: command not found".

Fix:

- isGsdSdkOnPath(pathString?) — now accepts an explicit PATH string.
  Zero-arg form preserves existing behavior (reads process.env.PATH).
  Pure walk, no spawn. Lets callers verify against any PATH source.

- getUserShellPath() — new helper. Probes the user's login shell via
  `$SHELL -lc 'printf %s "$PATH"'` (POSIX). 2-second timeout so a
  misconfigured rc file can't hang the install. Returns null on
  Windows (cross-shell PATH probing requires a different strategy
  per Git Bash / PowerShell / cmd.exe — tracked separately) or when
  the probe fails; callers fall back to process.env.PATH in that case.

- installSdkIfNeeded() — after the existing isGsdSdkOnPath() check
  passes, also verify the shim is reachable from getUserShellPath()
  on POSIX. If install-PATH and user-shell-PATH disagree, downgrade
  to the actionable ⚠ diagnostic from PR #3014 (which has the shim
  location, shell-specific PATH-update commands, and an npx fallback
  note). Routing affected users into PR #3014's diagnostic is the
  point — not silently green-then-red.

Tests:
- bug-3020-install-shell-path-probe.test.cjs (10 tests, structural):
  - isGsdSdkOnPath accepts an explicit PATH (true/false on fixture
    PATH dirs with/without an executable shim)
  - zero-arg form returns a boolean
  - empty string PATH → false
  - getUserShellPath returns string-or-null
  - returns null on Windows
  - returns null when $SHELL unset on POSIX
  - cross-shell mismatch detection: install-PATH and user-PATH that
    differ produce different isGsdSdkOnPath results — the invariant
    the install-time check now exploits
- All assertions on structural records, not console output. Adheres
  to typed-IR / CONTRIBUTING.md "Prohibited: Raw Text Matching".

Verification:
- 10/10 pass on new regression test
- 6768/6768 pass on full suite (5 net-new tests)
- lint-no-source-grep clean

Windows cross-shell coverage (gsd-sdk.cmd resolves under PowerShell
but not Git Bash without a no-extension sibling) is tracked separately
— this PR is the POSIX-side fix and the Windows scaffolding (the
optional pathString arg on isGsdSdkOnPath) that a Windows fix can
build on.

Closes #3020

* fix(#3020): type-guard pathString, last-line PATH parse (CR)

CodeRabbit on PR #3028 (4 findings — 3 actionable + 1 nitpick):

1. .changeset/install-shell-path-probe.md (2 findings):
   - `pr: TBD` → `pr: 3028`
   - Doc said `echo $PATH` but impl uses `printf %s "$PATH"` (chosen
     to avoid shell-dependent echo behavior, e.g. interpreting `-n`).
     Aligned changeset prose with implementation.

2. bin/install.js:9176 — isGsdSdkOnPath(pathString) used
   `pathString !== undefined` to gate the explicit-PATH branch, but
   getUserShellPath() can return null and `null.split()` throws.
   Tightened to `typeof pathString === 'string'` so null / number /
   object inputs fall back to process.env.PATH. Added 2 regression
   tests covering the null and non-string cases.

3. bin/install.js:9232 — getUserShellPath trimmed entire stdout. A
   misconfigured rc file that prints a banner / motd / log line
   BEFORE the printf would pollute the result and incorrectly flip
   the cross-shell check to false. Take the LAST non-empty line
   (PATH itself is single-line) so noise can't hijack the probe.

4. Nitpick: the changeset PR placeholder — covered by (1).

Verification: 12/12 pass on regression test (10 original + 2 new
type-guard tests), 6770/6770 full suite, lint clean.

* docs(#3020): JSDoc references printf %s "$PATH", not echo $PATH (CR)

CodeRabbit caught two stale JSDoc references that still said
`$SHELL -lc 'echo $PATH'` while the implementation uses
`$SHELL -lc 'printf %s "$PATH"'`. echo is undesirable here because:

- POSIX echo's behavior with `-n` / backslash escapes varies across
  shells (bash builtin vs /bin/echo vs zsh) and can introduce
  trailing-newline pollution that the per-line trim now papers over.
- printf is portable and emits exactly the bytes given.

Synced both stale doc strings:
  - bin/install.js:9211 (getUserShellPath JSDoc)
  - tests/bug-3020-install-shell-path-probe.test.cjs:27 (header)

No behavior change — implementation already uses printf.
2026-05-02 11:45:39 -04:00
Tom Boucher
6df9b44297 fix(#3018): codex adapter must stop and ask, not silently default decisions (#3027)
* fix(#3018): codex adapter must stop and ask, not silently default decisions

@jon-hendry: running `\$gsd-discuss-phase 81` in Codex Default mode
proceeded toward writing CONTEXT.md / DISCUSSION-LOG.md / checkpoint
artifacts without surfacing the discussion questions to the user. The
generated Codex skill adapter explicitly told it to do that:

  Execute mode fallback:
  - When `request_user_input` is rejected (Execute mode), present a
    plain-text numbered list and pick a reasonable default.

That instruction is wrong for any workflow whose contract is to
discuss with the user (most prominently `$gsd-discuss-phase`). The
fallback now requires the agent to:

1. STOP. Present the questions as a plain-text numbered list, then
   wait for the user's reply.
2. Only proceed without a user answer when one of these is true:
   (a) invocation included --auto / --all,
   (b) user explicitly approved a default for this question, or
   (c) workflow's documented contract permits autonomous defaults.
3. Do NOT write CONTEXT.md, DISCUSSION-LOG.md, PLAN.md, or checkpoint
   files until the user has answered or one of (a)-(c) above applies.

Tests:
- bug-3018-codex-discuss-fallback.test.cjs (5 tests, structural-IR):
  parses the generated header into sections via regex,  asserts on
  the Execute-mode-fallback section's content (must contain stop/
  wait + plain-text directives, must NOT contain "pick a reasonable
  default", must name a permission path, must forbid artifact
  writing). No raw text snapshot — the assertions describe the
  behavioral invariant, so prose can be reworded without test churn.
- codex-config.test.cjs:128 still passes — section still mentions
  "Execute mode" as required.

Verification:
- 5/5 pass on new regression test
- 116/116 pass on bug-3018 + codex-config combined
- 6763/6763 pass on full suite
- lint-no-source-grep clean

Closes #3018

* test(#3018): parse fallback into typed semantic-flag record (CR)

CodeRabbit nitpick on PR #3027: the regression tests grepped the
generated header prose with regex, which is brittle and tests wording
rather than semantics. Per CONTRIBUTING.md "no-source-grep" standard.

Refactored to a structural-IR shape:

- New `parseExecuteModeFallback(section)` walks the section text once
  and returns a typed record:
    {
      ok, sectionLength,
      instructsStop,                          // STOP/HALT/WAIT directive
      presentsPlainTextQuestions,             // plain-text / numbered list
      namesPermissionPath,                    // --auto / --all / explicit approval
      forbidsWritingArtifactsBeforeAnswer,    // write-ban + named artifact class
      silentlyPicksDefaults,                  // anti-pattern guard (must be false)
    }
- Each positive invariant gets its own test asserting on the parsed
  boolean, so a failure points at the exact invariant that broke.
- A final test does a single assert.deepStrictEqual against the full
  expected contract — gives a structured diff when any flag flips.
- The artifact-write ban now requires BOTH a "do not write" intent
  AND a named artifact class (was a single broad regex), so generic
  "do not write" prose elsewhere in the section can't satisfy it.

Verification: 8/8 pass; lint-no-source-grep clean.
2026-05-02 11:45:36 -04:00
Tom Boucher
e3b64b39f8 fix(#3019): query --help reaches handler instead of short-circuiting (#3026)
* fix(#3019): query --help reaches handler instead of short-circuiting to top-level usage

The query argv parser in sdk/src/cli.ts harvested -h/--help as a global
flag and main() short-circuited dispatch when args.help was true. Net
effect: every `gsd-sdk query <anything> --help` printed top-level USAGE
instead of contextual subcommand help. There was no path for users to
discover what arguments a query subcommand accepts — they had to trigger
"required" errors by trial and error.

Two-layer fix:

1. sdk/src/cli.ts (parseCliArgsQueryPermissive)
   - Push -h / --help onto queryArgv instead of consuming them silently,
     so the registered handler / gsd-tools.cjs fallback gets to interpret
     the flag and render contextual help.
   - Only honor the global help flag when there is NO real subcommand to
     dispatch to (i.e. queryArgv contains only help flags). Preserves
     `gsd-sdk query --help` → top-level USAGE while letting
     `gsd-sdk query phase add --help` reach the handler.

2. get-shit-done/bin/gsd-tools.cjs
   - Render top-level usage on --help / -h / -? / --usage instead of
     erroring with "Unknown flag". The discovery hint in the usage text
     points users at the working method (run without args → error names
     required arguments) and references #3019 for tracking subcommand-
     level help printers.
   - --version remains rejected (no discovery use-case).

#1818 anti-hallucination invariant preserved: the destructive command
NEVER executes when --help is present. The new shape returns success:true
+ usage on stdout instead of the old success:false + error on stderr —
both satisfy "destructive command did not run", and the new shape also
restores discoverability.

Tests:
- sdk/src/cli.test.ts: 4 new vitest cases covering #3019 — query argv
  parser keeps --help with subcommand, parses -h short flag, preserves
  bare `query --help` top-level behavior, preserves --help position when
  intermixed with other query flags.
- tests/bug-3019-help-passthrough.test.cjs: 5 node:test cases on the
  fallback — bare gsd-tools (no args) errors with usage; --help renders
  usage on stdout exit 0; -h same; subcommand --help renders usage; usage
  hint mentions discovery method (without prose substring matching —
  parses into typed sections).
- tests/bug-1818-unknown-flags.test.cjs: rewritten to assert the new
  invariant ("destructive command did not run" + "usage was rendered")
  instead of the old shape ("--help is rejected with non-zero exit").
  Each destructive test seeds a sentinel artifact (phase dir, slug
  output) and asserts it survives.

Verification:
- 47/47 vitest pass on sdk/src/cli.test.ts
- 5/5 pass on tests/bug-3019-help-passthrough.test.cjs
- 8/8 pass on tests/bug-1818-unknown-flags.test.cjs (rewritten)
- 6763/6763 pass on full node:test suite
- lint-no-source-grep clean (0 violations)

Closes #3019

* fix(#3019): SDK fallback forwards plain-text help, broader usage list (CR)

CodeRabbit on PR #3026 (4 findings — 1 Major outside-diff, 2 inline,
1 nitpick):

1. **Major outside-diff** — sdk/src/cli.ts:442-454. The fallback path
   that delegates to gsd-tools.cjs called parseCliQueryJsonOutput
   (JSON.parse) on stdout. Now that gsd-tools renders plain-text usage
   on --help, JSON.parse threw "Unexpected token 'U'". Wrapped the
   parse in try/catch — on parse failure, forward the plain stdout
   verbatim so subcommand help reaches the user. Regression test:
   tests/bug-3019-help-passthrough.test.cjs spawns the built SDK and
   asserts `gsd-sdk query phase --help` exits 0, stdout contains the
   gsd-tools usage, and stderr does NOT contain a JSON-parse error.

2. .changeset/help-passthrough.md:3 — `pr: TBD` → `pr: 3026`.

3. gsd-tools.cjs:346 (TOP_LEVEL_USAGE):
   - Removed self-referencing `#3019` link (immediately stale after
     this PR merges).
   - Expanded Commands list from 17 → all 47 dispatcher cases:
     agent-skills, audit-open, audit-uat, check-commit, commit, …
     phase, phases, roadmap, milestone, validate, progress, intel,
     graphify, learnings, etc. — the bulk of the surface that was
     previously unreachable via --help discovery.

4. Nitpick: `isUsageOutput` was duplicated in bug-1818 and
   bug-3019-help-passthrough tests. Moved to tests/helpers.cjs with
   structural-comment, removed both duplicates.

Verification: 47/47 vitest pass, 14/14 regression tests pass,
6764/6764 full suite, lint clean.

* test(#3019): use t.skip() instead of bare return when SDK not built (CR)

CodeRabbit follow-up on PR #3026:

The integration test guarded against missing sdk/dist/cli.js with a
bare `return;` — node:test counts that as a passing test (0 assertions
exercised, 0 failures). On a CI checkout that hasn't run the SDK build,
the #3026 regression test silently green-lit and no signal ever
surfaced that the integration check was skipped.

Switched to `t.skip(...)` via the test context parameter so the
omission shows up in the test report. The unit-level fix
(sdk/src/cli.ts) is still covered by vitest, so the skip only affects
the end-to-end spawn-built-SDK check.

Verification: 6/6 pass when SDK is built; 5 pass + 1 skip when not.
2026-05-02 11:45:33 -04:00
Tom Boucher
8e25eb6546 fix(#3017): codex SessionStart hook uses absolute node, not bare 'node' (#3022)
* fix(#3017): codex SessionStart hook uses absolute node, not bare 'node'

PR #3002 fixed #2979 for settings.json-based managed JS hooks (Claude
Code, Gemini, Antigravity) by routing through buildHookCommand() →
resolveNodeRunner(), emitting the absolute Node binary path so hooks
resolve under GUI/minimal-PATH runtimes (/usr/bin:/bin:/usr/sbin:/sbin)
where nvm/Homebrew/Volta-installed node is not on PATH.

The Codex install path bypassed both helpers — line 7935 of bin/install.js
wrote `command = "node ${path}"` directly into config.toml. So Codex
SessionStart hook still failed with exit 127 ("node: command not found")
under the same minimal-PATH conditions PR #3002 was meant to close.

Fix:
- Add buildCodexHookBlock(targetDir, { absoluteRunner, eol }) — a pure
  helper that emits the toml hook block with the absolute runner. Returns
  null when absoluteRunner is null so the caller skips registration with
  a warning instead of writing a broken bare-node hook.
- Add rewriteLegacyCodexHookBlock(content, absoluteRunner) — mirror of
  rewriteLegacyManagedNodeHookCommands for the toml surface, so
  reinstall migrates a 1.39.x bare-node config.toml to the absolute form.
  Uses basename equality (CODEX_MANAGED_HOOK_BASENAMES set) so user-
  authored bare-node hooks are left alone.
- Replace the inline string-concat at line 7935 with a call to the new
  helper, threaded with the detected line ending so CRLF files stay CRLF.
- On the codex reinstall path, call rewriteLegacyCodexHookBlock first so
  existing bare-node entries get migrated before the new entry is added.

Tests:
- bug-3017-codex-hook-absolute-node.test.cjs (9 tests, all typed-IR):
  - buildCodexHookBlock emits absolute runner, parses to expected fields
  - returns null on missing runner (caller skips)
  - integrates with resolveNodeRunner() in the live process
  - rewriteLegacyCodexHookBlock migrates managed bare-node entries
  - leaves user-authored bare-node hooks alone (basename allowlist)
  - leaves entries with absolute runner unchanged (idempotent)
  - returns content unchanged when absoluteRunner is null
- codex-config.test.cjs e2e expectation updated to match new shape:
  parsed.hooks.SessionStart[0].hooks[0].command now equals
  '"<process.execPath>" "<hookPath>"' instead of 'node <hookPath>'.

Verification:
- 9/9 pass on the new regression test
- 179/179 pass across all codex-touching test files
- 6767/6767 pass on full suite, lint-no-source-grep clean
- Adheres to typed-IR / CONTRIBUTING.md "Prohibited: Raw Text Matching":
  parseCodexHookBlock returns a typed record; assertions are on
  structured fields (runner, hookPath, type, hasMarker), not stdout regex.

Closes #3017

* test(#3017): tighten runner assertions to exact process.execPath (CR)

CodeRabbit on PR #3022 (3 findings, 2 actionable + 1 nitpick):

1. .changeset/codex-bare-node-fix.md:3 — replace `pr: TBD` with
   `pr: 3022` so changeset metadata is traceable.

2. tests/bug-3017-codex-hook-absolute-node.test.cjs:81-146 — the test
   asserted `parsed.runner !== 'node'` and `parsed.runner.includes('/node')`,
   which would false-positive on any absolute path containing '/node'
   (e.g. /Users/x/notnode/foo). Tightened to compare against the EXACT
   absolute path supplied by the caller (after stripping toml + JSON
   escape layers via a new unescapeRunner() helper). The live-process
   integration test now compares against process.execPath exactly. The
   rewriteLegacyCodexHookBlock test also uses exact-equality.

3. Nitpick (skipped): use repository's TOML parser for parsing instead
   of bespoke regex. The hand-rolled parser is small, scoped, and
   fully tested by these structural assertions; pulling in a TOML lib
   for tests would create a circular dependency on the SUT (the
   installer's own parser). Leaving as-is.

Verification: 9/9 pass on regression test, 6767/6767 full suite, lint clean.
2026-05-02 11:45:30 -04:00
Tom Boucher
f2decefede fix(#3010): post-install message and docs use /gsd-update --reapply (#3012)
* fix(#3010): post-install message and docs use /gsd-update --reapply

PR #2824 consolidated 86 skills into ~58, removing the standalone
/gsd-reapply-patches command and folding it into a flag on /gsd-update
(/gsd-update --reapply). The 1.39.1 hotfix (#2954) updated help.md
but missed three other surfaces that still recommended the dead form:

1. bin/install.js reportLocalPatches() — runtime emitter shown after
   every install with backed-up patches. All branches updated:
   - claude/opencode/kilo/copilot: /gsd-update --reapply
   - gemini: /gsd:update --reapply
   - codex: $gsd-update --reapply
   - cursor: gsd-update --reapply (mention the skill name)

2. get-shit-done/workflows/update.md — Step 4 prose and the
   check_local_patches block both referenced /gsd-reapply-patches.
   Replaced with /gsd-update --reapply (with backticks around the
   command per CR feedback for copy/paste UX).

3. Localized docs (en/ja-JP/ko-KR/zh-CN) — 14 files across
   ARCHITECTURE.md / COMMANDS.md / FEATURES.md / INVENTORY.md /
   USER-GUIDE.md / manual-update.md still listed the removed command.

Tests:
- bug-3010-reapply-patches-references.test.cjs (4 tests): scans
  bin/install.js's reportLocalPatches body, every workflow file, and
  every doc (excluding CHANGELOG history and help.md's deprecation
  notice) for the removed command form, and verifies each runtime
  branch emits the consolidated form via captured console output.
- tests/copilot-install.test.cjs:1081-1115 — stale assertions that
  hard-coded the removed string updated to assert /gsd-update --reapply.

Verification: 115/115 pass across both files.

Co-authored-by: Patrick Clery <patrick@patrickclery.com>
Closes #3010

* test(#3010): broaden dead-command scan + tighten runtime exact-match

CodeRabbit follow-up findings on #3012:

1. Workflow + docs scans only matched "/gsd-reapply-patches", missing
   the gemini ("/gsd:reapply-patches") and codex ("$gsd-reapply-patches")
   spellings. A regression that re-introduced either form in localized
   docs would have passed silently. Extracted a DEAD_COMMAND_PATTERNS
   array + findDeadCommands() helper used by both scans, so all three
   removed forms are checked uniformly. Match output also reports which
   spellings hit, for faster diagnosis.

2. reportLocalPatches runtime test asserted output.includes('update --reapply'),
   which is too loose — a malformed prefix like '/gsd:update --reapply' on
   the claude branch would have passed. Replaced with an exact
   {runtime → expected token} map covering all 7 branches:
     claude/opencode/kilo/copilot → /gsd-update --reapply
     gemini → /gsd:update --reapply
     codex → $gsd-update --reapply
     cursor → gsd-update --reapply
   Negative assertion also runs DEAD_COMMAND_PATTERNS against output for
   every runtime, so dead forms can't slip in regardless of branch.

Verification: 4/4 pass on bug-3010-reapply-patches-references.test.cjs.

* test(#3010): add prefix-absence guard for cursor runtime (CR follow-up)

CodeRabbit (Minor): the cursor expected token "gsd-update --reapply" is
a substring of every prefixed form ("/gsd-update --reapply" for claude/
opencode/kilo/copilot, "\$gsd-update --reapply" for codex). The positive
output.includes(expectedToken) check therefore can't distinguish correct
cursor output from a regression where the installer emits a prefixed
form for cursor — both pass the substring check.

Add an explicit prefix-absence assertion for cursor that fails if any
of /, \$, or : appears immediately before "gsd-update --reapply" in
output. The gemini form ("/gsd:update --reapply") doesn't share the
substring (gsd:update vs gsd-update) so it's already caught by the
positive includes failing on cursor's expected bare token.

Verification: 4/4 pass.

---------

Co-authored-by: Patrick Clery <patrick@patrickclery.com>
2026-05-02 09:38:34 -04:00
Tom Boucher
a4e5cc7c24 fix(#3011): actionable SDK-not-on-PATH diagnostic with shim location and shell-specific commands (#3014)
* fix(#3011): actionable SDK-not-on-PATH diagnostic with shim location and shell-specific commands

The previous diagnostic was a generic 'GSD SDK files are present but
gsd-sdk is not on your PATH' message with no concrete path or
shell-specific PATH-export command. Windows users reported that they
couldn't find where the shim was written and didn't know how to add
it to PATH for each shell (PowerShell vs cmd.exe vs Git Bash vs WSL
all read PATH from different sources).

New formatSdkPathDiagnostic({ shimDir, platform, runDir }) helper
returns a typed IR:
- shimLocationLine: explicit 'Shim written to: <path>'
- actionLines: platform-specific PATH-export commands
  - Windows: 3 lines (PowerShell, cmd.exe, Git Bash with backslash->/
    translation for bash compatibility)
  - POSIX: 1 line (export PATH=...)
- npxNoteLines: 'you're running via npx ... npm install -g instead'
  when runDir is under an _npx cache segment (where the shim may be
  written to a temp dir that won't persist for the user's interactive
  shell)
- isNpx, isWin32: structured booleans for assertions

Renderer in install.js just emits each line. Tests assert on the
typed IR fields directly (no source-grep, no console-output parsing).

Tests: 12 cases across 5 suites covering Windows shell flavors
(PowerShell preserves backslashes, Git Bash translates to forward),
POSIX exports, null-shimDir fallback to npm install -g advice, npx
detection on both path-separator conventions, and IR shape contract.

Closes #3011

* fix(#3011): cmd.exe guidance uses powershell -Command, not setx

CodeRabbit flagged the cmd.exe action line as a Major Windows
correctness bug:

  setx PATH "${shimDir}; %PATH%"

Two failure modes:
1. setx silently truncates the registry value above 1024 chars,
   permanently storing the truncated PATH and breaking applications
   until restored from the registry backup or fixed manually.
2. %PATH% expands to its current literal value at the moment setx
   runs, and the result is written as REG_SZ instead of REG_EXPAND_SZ.
   Lazy references like %SystemRoot% are baked in as literals, so
   future changes to those variables stop propagating.

Replace with the same SetEnvironmentVariable call already used for
the PowerShell line, invoked through `powershell -Command` so cmd.exe
users get a safe command without us recommending two different APIs.

* fix(#3011): escape shimDir for PowerShell, bash, and POSIX export

CodeRabbit (Minor): a Windows username with a single quote (e.g.
"C:\Users\O'Neil\AppData\Roaming\npm") would interpolate raw into the
suggested commands, producing unparseable shell input the user can't
fix without understanding the bug.

Each shell context needs a different escape:

- PowerShell single-quoted strings: '' is the literal-quote escape.
  Apply to both the PowerShell line and the cmd.exe line (which
  delegates to PowerShell).

- Git Bash, where the path lives inside an outer single-quoted echo:
  '\'' (close-quote, escaped-quote, reopen-quote) embeds a literal
  single quote. The slash-conversion (\\ → /) still applies first.

- POSIX export (Linux/macOS) inside double quotes: escape \, $, ",
  and backtick so the path is copied verbatim. $PATH lives outside
  the escape and still expands at paste time.

Regression test: bug-3011-sdk-path-diagnostic.test.cjs locks in the
expected escape sequence for all three shell flavors.
2026-05-02 09:30:58 -04:00
Tom Boucher
f55069ecbf test(#2974): migrate 8 test files to typed-IR assertions (#3016)
* test(#2974): migrate 8 test files to typed-IR assertions

Replaces raw stdout/stderr substring matching with structured-field
assertions per CONTRIBUTING.md "Prohibited: Raw Text Matching on Test
Outputs". Adds shared infrastructure for typed error emission so this
pattern is the easy path going forward.

Shared infrastructure:
- core.cjs: ERROR_REASON frozen enum + setJsonErrorMode/getJsonErrorMode
- gsd-tools.cjs: --json-errors CLI flag, parsed before subcommand dispatch
- config.cjs: typed reasons at all 7 error sites
- graphify.cjs: GRAPHIFY_REASON enum + reason/timeout_ms in execGraphify result
- bin/install.js: pure buildSdkFailFastReport() IR builder + renderer
- hooks/gsd-session-state.sh, gsd-phase-boundary.sh: emit Claude Code
  hookSpecificOutput JSON envelope with typed state_present/config_mode/
  planning_modified/file_path fields (no-op when hooks.community is off)

Test migrations (all pass, 171 tests across the 8 files):
- bug-2649-sdk-fail-fast: assert on ir.reason / ir.context / ir.fix_command
- bug-2687-config-read-warning-parity: assert.equal stderr === ''
- bug-2796-arg-parsing-regression: assert on result.json.updated/.phase
- bug-2838-summary-rescue: parse rescue footer, assert mtime invariant
- bug-2943-config-get-context-window: parse JSON, assert ERROR_REASON.CONFIG_KEY_NOT_FOUND
- graphify: assert reason === GRAPHIFY_REASON.ENOENT/TIMEOUT
- hooks-opt-in: parse hookSpecificOutput, assert typed fields
- security-scan: reclassified as source-text-is-the-product (scan label
  output and CI workflow YAML ARE the deployed contract)

Verification: lint-no-source-grep clean (0 violations), full suite
6741/6741 pass.

Closes #2974

* test(#2974): address CR feedback — typed code field, robust idempotency

Two CodeRabbit findings on #3016 addressed:

1. tests/hooks-opt-in.test.cjs:355 (Minor, inline) —
   parsed.reason.includes('Conventional Commits') was still substring
   matching after the typed-IR migration. Fixed at the source: the
   gsd-validate-commit hook now emits a typed `code` field
   ('CONVENTIONAL_COMMITS_VIOLATION', 'COMMIT_SUBJECT_TOO_LONG')
   alongside the human-readable `reason`. Test asserts strictEqual
   on the code; the prose copy is no longer part of the test contract.

2. tests/bug-2838-summary-rescue-gitignored-planning.test.cjs:224-250
   (Outside-diff) — mtimeMs alone can stay unchanged on coarse-grained
   filesystems (HFS+, FAT) when two rewrites land within the same
   timestamp tick, falsely passing the idempotency assertion.
   Replaced with a full snapshot (mtimeMs, ctimeMs, size, ino, sha256
   of contents) compared via assert.deepStrictEqual — the hash
   catches any rewrite the timestamp would miss.

Verification: 30/30 pass on the two affected files; lint-no-source-grep
clean (0 violations across 368 test files).
2026-05-02 09:27:23 -04:00
Tom Boucher
de25400b70 fix(#2979): emit absolute node path in managed hooks for GUI/minimal-PATH runtimes (#3002)
* fix(#2979): emit absolute node path in managed hooks for GUI/minimal-PATH runtimes

Installer-emitted hook commands started with bare 'node' which works
under interactive shells (nvm/Homebrew/Volta on PATH) but fails in
GUI-launched runtimes that start with /usr/bin:/bin:/usr/sbin:/sbin.
Every managed JS hook (gsd-check-update, gsd-statusline, gsd-context-monitor,
gsd-prompt-guard, gsd-read-guard, gsd-read-injection-scanner,
gsd-workflow-guard) failed with /bin/sh: node: command not found —
silently disabling update checks, statusline, and security guards.

Fix: new resolveNodeRunner() helper returns process.execPath (the
absolute path of the Node binary running the installer) forward-slash-
normalized and double-quoted. Used in:
  - buildHookCommand() for global installs (.js runner)
  - local-install code paths for all 7 managed JS hooks

.sh hooks keep bare 'bash' — /bin/bash is in the POSIX standard PATH
and always resolves under minimal-PATH GUI launches.

Tests: bug-2979-hook-absolute-node.test.cjs parses emitted commands
into { runner, hookPath } records and asserts:
  - resolveNodeRunner returns quoted absolute forward-slash node path
  - .js hooks emit absolute runner (default and portableHooks modes)
  - .sh hooks still emit bare 'bash'

Closes #2979

* chore(#2979): add changeset fragment for PR #3002

* chore(#2979): add changeset fragment for PR #3002

* fix(#2979): resolveNodeRunner returns null on missing execPath; rewrite legacy bare-node managed hooks (CR feedback)

CodeRabbit on PR #3002 caught two issues:

1. resolveNodeRunner fell back to bare 'node' when process.execPath was
   empty -- recreating the exact #2979 bug. Now returns null. Callers
   (buildHookCommand and the local-install code paths) check for null
   and skip registration rather than emit a broken command.

2. The original #2979 fix only updated NEWLY registered hooks. Existing
   bare-node managed hook entries from pre-#2979 installs stayed
   broken across reinstalls. New rewriteLegacyManagedNodeHookCommands
   walks settings.hooks and rewrites any managed-hook entry that starts
   with bare 'node ' to use the absolute runner. Filename allowlist
   (gsd-check-update.js, gsd-statusline.js, gsd-context-monitor.js,
   gsd-prompt-guard.js, gsd-read-guard.js, gsd-read-injection-scanner.js,
   gsd-workflow-guard.js) ensures user-authored bare-node hooks are
   left untouched.

Tests: bug-2979-hook-absolute-node.test.cjs grows by 8 cases:
- 5 for the migration walker (rewrites managed entries, leaves quoted-
  runner entries alone, leaves user-authored entries alone, leaves .sh
  entries alone, no-ops on null runner).
- 2 for resolveNodeRunner returning null on empty execPath.
- 1 for buildHookCommand returning null when execPath unavailable.

* chore(#3002): drop direct CHANGELOG.md edit; release entry now lives in .changeset/

The changeset-fragment workflow (#2975) renders fragments into
CHANGELOG.md at release time. Direct edits to [Unreleased] on
each PR caused merge conflicts on every concurrent PR. This commit
restores CHANGELOG.md to match origin/main; the release entry for
this fix is preserved in the .changeset/*.md fragment(s) on this
branch, which the release workflow consolidates.

* fix(#2979): guard hook + statusline pushes against null commands (CR follow-up)

CodeRabbit on PR #3002 found an outside-diff issue: when
resolveNodeRunner() returns null, every dependent *Command becomes
null, but the registration sites still pushed { type: 'command',
command: null } entries onto settings.hooks. The runtime's hook
schema rejects null commands and the failure surfaces as a confusing
parse error.

Fix:
- One unified warning at the top of configureSettings when ANY JS-hook
  command resolves null (operator sees the cause once instead of per-hook).
- Each of the 6 managed JS hook registration if-clauses now guards on
  the *Command variable being truthy: && updateCheckCommand,
  && contextMonitorCommand, && promptGuardCommand, && readGuardCommand,
  && readInjectionScannerCommand, && workflowGuardCommand.
- Statusline registration adds an else-if (!statuslineCommand) clause
  with its own warn before the settings.statusLine write site.

Tests: bug-2979-hook-absolute-node.test.cjs grows by 7 cases
(6 per-hook structural assertions parsing install.js for the
`fs.existsSync(<file>) && <command>` shape, plus 1 statusline
guard-precedes-write test).

* fix(#2979): defense-in-depth validateHookFields before writeSettings (CR)

CodeRabbit on PR #3002 (post-fix-up review): replace source-grep
structural tests with behavioral assertions on the settings object.

The push-site `&& <command>` guards (commit ce696c64) prevent null
commands from being pushed in the first place. As a defense-in-depth
backstop, install.js now runs validateHookFields(settings) right
before writeSettings(); validateHookFields already filters
{type:'command', command: null} entries (line 5884), so even if my
push-site guards ever regress, no null-command entries reach disk.

Tests: replaced the 7 install.js source-grep tests with 8 truly
behavioral tests:
- validateHookFields strips null-command entries for each of the 6
  managed JS hook shapes (parameterized by event + matcher)
- validateHookFields drops the entry entirely when all its hooks are
  null-command
- validateHookFields preserves agent-type hooks while stripping
  null-command sibling hooks in the same entry

These tests exercise the actual function the production code uses,
not its source representation. They survive future refactors of the
registration call sites.

* fix(#2979): tighten managed-hook migration to basename equality (CR)

CodeRabbit on PR #3002 (post-fix-up review): the previous
`trimmed.includes(name)` matcher had a false-positive vector. A
user-authored hook whose path contained a managed filename as a
substring (e.g. /home/me/scripts/wraps-gsd-check-update.js-helper.js)
would be unconditionally rewritten with the GSD runner, replacing
the user's bare `node` with our absolute path -- silently mutating
their hook configuration.

Fix: parse the command into <runner> <script-token> with the
script-token allowed to be quoted (single or double) or bareword.
Extract the path inside quotes, take the basename (handles both
forward and backslash separators on Windows), and match against
MANAGED_HOOK_FILES via Set.has() — exact equality, not substring.

Tests: bug-2979 grows by 4 cases:
- user hook with managed-filename-as-substring is NOT rewritten
- single-quoted path: rewritten correctly
- bareword path: rewritten correctly
- Windows backslash path: basename extraction works
2026-05-02 00:40:09 -04:00
Tom Boucher
ca78b65de7 fix(#2973): /gsd-profile-user writes dev-preferences.md to skills/, not legacy commands/gsd/ (#3003)
* fix(#2973): /gsd-profile-user writes dev-preferences.md to skills/ not legacy commands/gsd/

v1.39.0's install summary claimed the legacy ~/.claude/commands/gsd/
directory had been removed in favor of skills-only architecture, but
the cmdGenerateDevPreferences writer at profile-output.cjs:781 still
defaulted to the legacy path. Every /gsd-profile-user --refresh
deterministically re-created the legacy directory.

Missed in PR #1540's migration because dev-preferences is a
runtime-generated user artifact, not a GSD-shipped command file.

Fix:
- Writer default: ~/.claude/skills/gsd-dev-preferences/SKILL.md
- profile-user.md Display message + artifact list reference new path
- New migrateLegacyDevPreferencesToSkill(targetDir, saved) installer
  helper. Called at all 5 skills-aware install branches. Copies
  preserved legacy dev-preferences.md into skills/gsd-dev-preferences/
  SKILL.md, but ONLY if no SKILL.md already exists -- never clobbers
  user-customized skill content.

Tests: bug-2973-profile-user-skills-path.test.cjs runs the writer in
a subprocess (core.cjs:output uses fs.writeSync(1, ...) which bypasses
in-process stubbing), asserts the writer's command_path field is the
skills location, the file is on disk at that path, the legacy path is
NOT created. Tests for migration helper assert it writes when no skill
exists and skips when one does.

Closes #2973

* chore(#2973): add changeset fragment for PR #3003

* fix(#2973): rephrase comment to avoid cline-install leaked-path lint

The new comment at line 780 of profile-output.cjs literally contained
the string '~/.claude/commands/gsd/' which the cline-install
leaked-path regression test (tests/cline-install.test.cjs:175)
correctly flagged.

Cline transforms .claude/skills/ -> .cline/skills/ in installed .cjs
files but does not transform .claude/commands/. The new comment talks
about the legacy 'commands/gsd' subdirectory without the ~/.claude/
prefix, so the lint passes. The path semantics are unchanged -- the
runtime construction at line 787 still uses path.join(os.homedir(),
'.claude', 'skills', ...) which the lint regex does not match.

* test(#2973): add timeout to spawnSync to prevent CI hangs (CR feedback)

CodeRabbit on PR #3003: without a timeout, a regression that hangs the
writer or dispatcher would block CI indefinitely. Added a 30s timeout
(generous for what should complete in <1s) and an explicit signal
assertion so a timeout trip surfaces as a clear test failure with
context rather than a hung worker.

* test(#2973): add allow-test-rule annotation for legitimate product-text parsing

The new var-binding lint from #2982/#2985 caught readFileSync(...).match()
and readFileSync(...).includes() calls in this test. Both are legitimate
structural assertions against the product workflow markdown, not source-grep:

- match() extracts the path from a structured Display: "..." line and
  asserts on the typed path value (same pattern as bug-2470's installer
  scanForLeakedPaths regex test).
- includes() asserts the absence of a legacy path literal.

profile-user.md IS the shipped workflow artifact, and its Display: line
IS what the user sees. Per the existing test-rigor convention, this is
the source-text-is-the-product justification category.

Annotated with allow-test-rule citing that category.

* chore(#3003): drop direct CHANGELOG.md edit; release entry now lives in .changeset/

The changeset-fragment workflow (#2975) renders fragments into
CHANGELOG.md at release time. Direct edits to [Unreleased] on
each PR caused merge conflicts on every concurrent PR. This commit
restores CHANGELOG.md to match origin/main; the release entry for
this fix is preserved in the .changeset/*.md fragment(s) on this
branch, which the release workflow consolidates.

* fix(#2973): preserve user-owned gsd-dev-preferences skill across wipe (CR)

CodeRabbit on PR #3003 caught a real bug: copyCommandsAsClaudeSkills()
wipes ALL gsd-* skill directories at the top of every install, then
reinstalls from the package source. Since gsd-dev-preferences is
user-generated (written by /gsd-profile-user --refresh) and NOT
shipped by the npm package, the wipe deletes the user's customized
SKILL.md with nothing to restore from.

Fix: USER_OWNED_SKILLS allow-list in copyCommandsAsClaudeSkills.
Snapshot files under skills/gsd-dev-preferences/ before the wipe,
restore after. Same preserve/restore pattern as PR #1924.

Tests: bug-2973 grows by 2 cases:
- user-customized SKILL.md survives the wipe
- non-user-owned gsd-* skills are still wiped (preservation is opt-in)
2026-05-02 00:29:45 -04:00
Tom Boucher
1a51ec5829 fix(#2990): gsd-code-fixer worktree attaches to a new branch, not the user-checked-out one (#3001)
* fix(#2990): gsd-code-fixer worktree attaches to a new branch, not the user-checked-out one

The agent's setup_worktree step ran 'git worktree add "$wt" "$branch"'
where $branch was the user's currently-checked-out branch in the main
repo. Git refuses to check out the same branch in two worktrees by
default, so the call failed before any review fix could be applied.

This is the next-layer failure after #2686 (foreground/background race)
and #2839 (transactional cleanup): the isolation strategy was correct
in design, blocked only by git's same-branch protection.

Fix:
- Create a new branch 'gsd-reviewfix/${padded_phase}-$$' from the
  current branch tip and attach the worktree to it via
  'git worktree add -b "$reviewfix_branch" "$wt" "$branch"'.
- Cleanup tail is now four steps:
  1. 'git -C "$main_repo" merge --ff-only "$reviewfix_branch"'
     -- captures the agent's commits on the user's branch. --ff-only
     fails loudly on divergence (concurrent commits to $branch); the
     temp branch is preserved for manual merge.
  2. 'git worktree remove "$wt" --force'.
  3. 'git -C "$main_repo" branch -D "$reviewfix_branch"' ONLY if
     ff-only succeeded.
  4. 'rm -f "$sentinel"' last (preserves #2839 transactional ordering).
- Recovery sentinel JSON now records reviewfix_branch alongside
  worktree_path so a re-run after interruption cleans both the orphan
  worktree and the orphan temp branch.

Regression test: tests/bug-2990-code-fixer-worktree-branch.test.cjs
parses the agent .md into structured 'git worktree add' invocation
records (skipping occurrences inside markdown inline-code or bash
comments -- those are citations of the OLD pattern, not executable)
and asserts the structural invariants on the new pattern.

Closes #2990

* chore(#2990): add changeset fragment for PR #3001

* chore(#2990): add changeset fragment for PR #3001

* fix(#2990): correct main_repo parsing and ff_status capture (CR feedback)

CodeRabbit on PR #3001 caught two real bugs in the cleanup tail:

1. `awk '/^worktree / { print $2 }'` truncates paths containing
   spaces. /path/with spaces/repo becomes /path/with. Replaced with
   `sub(/^worktree /, ''); print` which strips the prefix and
   preserves the full path.

2. `if ! git merge ...; then ff_status=$?` captures the exit of the
   `!` operator (always 1 on failure), not the merge command's exit
   code. Restructured to `if cmd; then ff_status=0; else ff_status=$?`
   so the else-branch captures the real merge exit code.

Tests still pass: bug-2990 structural assertions on the agent .md
content unchanged.

* fix(#2990): recovery extracts reviewfix_branch and deletes orphan branch (CR)

CodeRabbit on PR #3001 found two issues:

1. (Major) Recovery code only extracted worktree_path from the sentinel.
   If a prior run died after `git worktree remove` but before
   `git branch -D`, the orphan reviewfix branch survived forever. The
   sentinel records reviewfix_branch (line 272) and the docs claim
   recovery deletes it, but the code didn't.
   Fixed: emit BOTH worktree_path and reviewfix_branch from the parser
   (newline-separated), capture each into shell vars, and call
   `git branch -D "$prior_branch" 2>/dev/null || true` after worktree
   removal but before sentinel deletion.

2. (Quick win) The bug-2990 test used regex .test() against the raw
   markdown, which would have been satisfied by prose mentioning the
   token. Restructured to:
   - parseCleanupGitInvocations() returns ordered records with structured
     fields (verb, targetsReviewfixBranch, isMergeFfOnly, isBranchDelete)
   - assert exactly-one merge --ff-only AND exactly-one branch -D
   - assert merge precedes branch-delete in execution order
   - parse the sentinel JSON.stringify call to extract field names and
     assert reviewfix_branch is among them

   Added 2 new tests for the recovery-block invariant: parses the recovery
   node -e block and asserts it extracts parsed.reviewfix_branch alongside
   parsed.worktree_path; and asserts the recovery shell calls
   `git branch -D "$prior_branch"`.

* test(#2990): add allow-test-rule annotation for product-text parsing (CR follow-up)

The lint-tests CI catch flagged md.match() in the new structural-IR
test suite. The .match() calls extract typed fields (cleanup-tail
git invocation records, sentinel JSON field names, recovery-block
node script content) from agents/gsd-code-fixer.md — which IS the
deployed agent product. Asserting on those typed fields tests the
runtime contract, not source code internals.

source-text-is-the-product is the correct classification per the
existing convention (matches thread-session-management.test.cjs and
the others reclassified in PR #2985's CR follow-up).

* chore(#3001): drop direct CHANGELOG.md edit; release entry now lives in .changeset/

The changeset-fragment workflow (#2975) renders fragments into
CHANGELOG.md at release time. Direct edits to [Unreleased] on
each PR caused merge conflicts on every concurrent PR. This commit
restores CHANGELOG.md to match origin/main; the release entry for
this fix is preserved in the .changeset/*.md fragment(s) on this
branch, which the release workflow consolidates.
2026-05-02 00:29:43 -04:00
Tom Boucher
4277f7d7e8 fix(#2994): move verify-reapply-patches.cjs to get-shit-done/bin/ so it ships to user installs (#3000)
* fix(#2994): move verify-reapply-patches.cjs to get-shit-done/bin/ so installer ships it

scripts/verify-reapply-patches.cjs (added in #2972 to close the
verified-yes-without-checking gap from #2969) shipped in the npm tarball
but never reached user installs: bin/install.js copies get-shit-done/
recursively but does not copy the top-level scripts/ directory.

Effect: every fresh install hit `Cannot find module …/scripts/verify-reapply-patches.cjs`
on Step 5 of /gsd-reapply-patches. The whole point of moving
verification out of LLM-driven prose into a deterministic script is
undone if the script does not resolve at runtime.

Fix: move the script to get-shit-done/bin/verify-reapply-patches.cjs
(same pattern as gsd-tools.cjs and other runtime bin scripts that the
installer ships) and update reapply-patches.md Step 5 to invoke
${GSD_HOME}/get-shit-done/bin/verify-reapply-patches.cjs.

Tests:
- bug-2969 SCRIPT path updated to the new location
- New bug-2994-verify-reapply-patches-installed-path.test.cjs parses
  reapply-patches.md into structured invocation records and asserts
  every node ${GSD_HOME}/... reference lives under get-shit-done/
  (the installed tree). Catches future regressions where someone moves
  a runtime-needed script back to scripts/.

Closes #2994

* chore(#2994): add changeset fragment for PR #3000

* chore(#2994): add changeset fragment for PR #3000

* docs(#2994): update verifier-script-location comment to reflect new path (CR)

CodeRabbit on PR #3000: the parenthetical at line 278 still said the
script ships under scripts/, but this PR moved it to get-shit-done/bin/.
Updated the prose to reference the new location and the installer
target path.

* chore(#3000): drop direct CHANGELOG.md edit; release entry now lives in .changeset/

The changeset-fragment workflow (#2975) renders fragments into
CHANGELOG.md at release time. Direct edits to [Unreleased] on
each PR caused merge conflicts on every concurrent PR. This commit
restores CHANGELOG.md to match origin/main; the release entry for
this fix is preserved in the .changeset/*.md fragment(s) on this
branch, which the release workflow consolidates.
2026-05-02 00:29:34 -04:00
Tom Boucher
cde793f1f0 fix(#2992): deterministic latest-version check — package name is a constant, not LLM choice (#2993)
* fix(#2992): deterministic latest-version check — package name is a constant, not LLM choice

The /gsd-update workflow's check_latest_version step was prescribed in
LLM-driven prose: "run `npm view get-shit-done-cc version`". The
executing model could and did shortcut the prescription and invent
npm queries against name-shaped guesses — `@get-shit-done/cli`,
`get-shit-done-cli`, `gsd` — all of which 404 or, worse, return an
unrelated typosquat (the 2016 `get-shit-done` timer package). Same
architectural anti-pattern as #2969 (Hunk Verification Gate where
the LLM filled `verified: yes` without checking).

Implementation built TDD per #2992:

  get-shit-done/bin/check-latest-version.cjs
    - PACKAGE_NAME = 'get-shit-done-cc' as a module constant; not
      parameterised, not exposed for override.
    - checkLatestVersion({ spawn? }) returns
      { ok: bool, version?: string, reason: CHECK_REASON.X, detail? }
      via a frozen enum: OK / FAIL_NPM_FAILED / FAIL_INVALID_OUTPUT.
    - --json mode emits the structured record on stdout for the
      workflow to parse via jq.
    - Windows-aware: uses { shell: process.platform === 'win32' }
      since npm is npm.cmd on Windows (same lesson as #2962).
    - Stored under get-shit-done/bin/ (not top-level scripts/) because
      that path IS in the user's installed config dir; top-level
      scripts/ ships in the npm tarball but is not copied into
      ~/.claude/ at install time.

  tests/bug-2992-check-latest-version.test.cjs
    - 7 tests, all assertions on the typed CHECK_REASON enum + the
      structured record. Injectable spawn function so no real npm
      process is invoked. Covers OK, npm-non-zero, invalid-output,
      empty-output, pre-release semver, PACKAGE_NAME constant lock,
      enum-shape lock.

  get-shit-done/workflows/update.md
    - check_latest_version step rewritten to call the script via
      `node "${GSD_HOME}/get-shit-done/bin/check-latest-version.cjs"
      --json` and parse the structured response with jq. Explicit
      "Do NOT run `npm view` or `npm search` directly" guidance
      cites #2992 so future contributors understand why.

Closes #2992

* fix(#2992): trailing slash on GSD_HOME default to satisfy bare-path lint

The bug-2470 regression test scans update.md for bare `$HOME/.claude`
references (no trailing slash). The PR added one in the new
check_latest_version step. Fix: trailing slash on the default value
(`${GSD_HOME:-$HOME/.claude/}`). Bash POSIX collapses the resulting
double slash; the lint pattern's negative lookahead is now satisfied.

* fix(#2992): emit GSD_DIR from get_installed_version, use it in check_latest_version

Addresses CodeRabbit feedback: the previous `${GSD_HOME:-$HOME/.claude/}`
fallback hardcoded the Claude runtime path, which silently breaks for
non-Claude runtimes (gemini, codex, opencode, kilo).

Fix:
- get_installed_version now emits a 4th line with the resolved config
  dir ($LOCAL_DIR or $GLOBAL_DIR), captured by callers as GSD_DIR.
- check_latest_version uses $GSD_DIR/get-shit-done/bin/check-latest-version.cjs.
  Empty GSD_DIR (UNKNOWN scope) skips the version check and falls
  through to fresh-install path.

This keeps the package name deterministic (#2992) AND respects the
detected runtime, instead of assuming Claude.

* chore(#2992): add changeset fragment for PR #2993

* chore(#2992): add changeset fragment for PR #2993

* fix(#2992): consolidate LATEST_RESULT parsing inside the GSD_DIR guard

CodeRabbit on PR #2993: the previous structure separated the GSD_DIR
guard from the jq parsing, so when GSD_DIR was empty the parsing block
ran against an unset LATEST_RESULT and produced misleading 'couldn't
check for updates' diagnostics instead of clean 'no_install_detected'.

Move all field assignments inside the conditional so the skip path
seeds LATEST_OK=false, LATEST_VERSION='', LATEST_REASON='no_install_detected',
and LATEST_STATUS=0 atomically.

* fix(#2992): emit GSD_DIR in early-return; add code-block lang and spawnSync timeout (CR)

CodeRabbit on PR #2993 caught three issues:

1. (Major) The early-return path in get_installed_version (PREFERRED_CONFIG_DIR
   fast path) only echoed 3 lines, but PR #2993 changed the contract to 4
   (GSD_DIR is now line 4). Downstream check_latest_version misread valid
   installs as UNKNOWN. Added `echo "$PREFERRED_CONFIG_DIR"` before exit 0.

2. (Minor) Markdown MD040: fenced code block at line 310 was missing a
   language identifier. Added ```text.

3. (Quick win) spawnSync('npm view ...') had no timeout, so a hung network
   could block /gsd-update indefinitely. Added 15s timeout; on timeout
   spawnSync returns with signal !== null and the existing failure path
   emits FAIL_NPM_FAILED.

* fix(#3008): kill cross-process race in install-minimal:307 mid-copy test

Old shape compared listTmpStageDirs() snapshots before/after the
mid-copy throw. Under scripts/run-tests.cjs --test-concurrency=4,
tests/install-minimal-all-runtimes.test.cjs runs in a parallel
subprocess and also creates gsd-minimal-skills-* dirs in shared
os.tmpdir(). The parallel process's create/remove activity between
this test's two snapshots caused deterministic failure when timing
aligned -- presented as 'flaky' but is a real race.

CI failure data (PR #2993 run 25238555786):
  expected (before): ['gsd-minimal-skills-km1O1O']
  actual   (after):  []

Both processes behaved correctly in isolation. The test was wrong:
it observed a shared filesystem state across processes.

Fix: stub fs.mkdtempSync inside this test to record THIS call's
stage dir path. After the throw, assert fs.existsSync(stagedDir)
=== false. Direct observation of the function's own behavior; no
global tmpdir scan; no parallel-process interference.

Closes #3008

* fix(#2992): distinguish timeout from npm failure; guard empty LATEST_RESULT (CR)

CodeRabbit on PR #2993 (post-fix-up review) caught two improvements:

1. (Low value) check-latest-version.cjs:55-61 — when spawnSync times
   out, r.status is null and r.signal is set (e.g. 'SIGTERM'), but
   r.stderr is empty. Without the signal-first branch, both timeouts
   and genuine npm failures shaped as 'npm exited non-zero' in detail,
   making logs ambiguous. Added explicit signal-first branch:
   'npm timed out (signal: SIGTERM)'.

2. (Quick win) update.md:284-315 — when node is missing or the script
   doesn't exist, LATEST_RESULT is empty. Piping empty to jq parses
   without error but leaves LATEST_OK / LATEST_REASON as empty
   strings, producing the user-visible diagnostic
   'Couldn\'t check for updates (reason: , exit: N)' with a blank
   reason. Added an explicit guard that sets LATEST_REASON to
   'script_not_found_or_node_unavailable' when LATEST_RESULT is empty,
   so operators see a meaningful failure message.

Tests: bug-2992 grows by 2 cases (timeout signal detail + empty
stderr fallback).
2026-05-02 00:29:31 -04:00
Tom Boucher
ffeeb92c14 fix(#2997): mask SECRET_CONFIG_KEYS in SDK config-set/get and init responses (#2999)
* fix(#2997): mask SECRET_CONFIG_KEYS in SDK config-set/get and init responses

The CJS→TS port at sdk/src/query/config-mutation.ts:240,243 and
config-query.ts:122,128,132 dropped the masking layer that secrets.cjs
spec defines for brave_search/firecrawl/exa_search. Result: the SDK
echoed plaintext API keys into machine-readable JSON output (stdout,
transcripts, CI logs).

Adjacent leak in init.ts:673-675 / init.cjs:728-730: the init bundle
passed config.brave_search through raw, leaking the API key whenever
the user had stored one.

Fix:
- New sdk/src/query/secrets.ts ports SECRET_CONFIG_KEYS, isSecretKey,
  maskSecret, maskIfSecret. Exact CJS parity (verified by 17 tests
  in secrets.test.ts that import secrets.cjs and compare).
- config-set masks value + previousValue in response; on-disk plaintext
  intact (key stays usable).
- config-get masks read response. --default flows through unmasked
  (user's own input, not stored secret).
- init.ts/init.cjs mask string values only; booleans (availability
  flags) pass through unchanged so the typed contract is preserved.

Tests: 17 in secrets.test.ts (including CJS parity), 5 in
config-mutation.test.ts (#2997 block — covers on-disk-preserved,
previousValue masking, short-value, unset, non-secret pass-through),
4 in config-query.test.ts.

Closes #2997

* chore(#2997): add changeset fragment for PR #2999

* chore(#2997): add changeset fragment for PR #2999

* chore(#2999): drop direct CHANGELOG.md edit; release entry now lives in .changeset/

The changeset-fragment workflow (#2975) renders fragments into
CHANGELOG.md at release time. Direct edits to [Unreleased] on
each PR caused merge conflicts on every concurrent PR. This commit
restores CHANGELOG.md to match origin/main; the release entry for
this fix is preserved in the .changeset/*.md fragment(s) on this
branch, which the release workflow consolidates.
2026-05-02 00:17:45 -04:00
Tom Boucher
4e378d37d8 fix(#3008): kill cross-process race in install-minimal:307 mid-copy test (#3009)
Old shape compared listTmpStageDirs() snapshots before/after the
mid-copy throw. Under scripts/run-tests.cjs --test-concurrency=4,
tests/install-minimal-all-runtimes.test.cjs runs in a parallel
subprocess and also creates gsd-minimal-skills-* dirs in shared
os.tmpdir(). The parallel process's create/remove activity between
this test's two snapshots caused deterministic failure when timing
aligned -- presented as 'flaky' but is a real race.

CI failure data (PR #2993 run 25238555786):
  expected (before): ['gsd-minimal-skills-km1O1O']
  actual   (after):  []

Both processes behaved correctly in isolation. The test was wrong:
it observed a shared filesystem state across processes.

Fix: stub fs.mkdtempSync inside this test to record THIS call's
stage dir path. After the throw, assert fs.existsSync(stagedDir)
=== false. Direct observation of the function's own behavior; no
global tmpdir scan; no parallel-process interference.

Closes #3008
2026-05-01 22:37:48 -04:00
Tom Boucher
9f09246f3b fix(#2998): populate gsd-pristine/ from install transform pipeline so verifier has a real baseline (#3004)
* fix(#2998): populate gsd-pristine/ from install transform pipeline so verifier has a real baseline

saveLocalPatches declared a pristineDir variable and JSDoc'd 'saves
pristine copies to gsd-pristine/' but no code ever wrote there. Effect:
/gsd-reapply-patches Step 5 verifier (#2972) silently fell back to its
over-broad heuristic ('every significant backup line') -- exactly the
silent-success-on-lost-content failure mode #2969 was designed to
prevent.

Fix: new populatePristineDir({...}) helper runs copyWithPathReplacement
(the install transform pipeline) into a tmp staging dir, then copies out
only the modified-file paths into gsd-pristine/. saveLocalPatches now
accepts a pristineCtx and calls the helper when local patches are
detected. Soft-fails on transform errors (logs warning, continues with
empty pristine -- no worse than pre-fix).

Pristine reflects the about-to-install version's content, which is the
right baseline for 'what would survive without the user's modifications'.

Tests: bug-2998-pristine-dir-populated.test.cjs asserts the helper is
exported, no-ops on empty input, writes one pristine file per source-
existing path, skips ghost paths, and produces deterministic output
(byte-identical across runs -- the property pristine_hashes depends on).

Closes #2998

* chore(#2998): add changeset fragment for PR #3004

* fix(#2998): expand pristine to all manifest install roots; clear stale pristine on populate (CR)

CodeRabbit on PR #3004 caught two issues:

1. populatePristineDir only staged packageSrc/get-shit-done/ but
   manifest.files records edits under several install roots (commands/,
   agents/, hooks/, skills/, root files like .clinerules). Modified
   paths outside get-shit-done/ were silently skipped, leaving the
   verifier with no baseline for those edits. Fixed by computing the
   set of top-level dirs from the modified set and staging each one
   that exists in source. Root-level files (no slash) bypass the
   transform pipeline and are copied directly.

2. populatePristineDir did not wipe pre-existing gsd-pristine/ before
   populating. A previous run's stale pristine could survive into the
   current run's diff baseline. Now wipe before populate AND in the
   catch path so soft-failures don't leave half-populated data on disk.

Tests: bug-2998-pristine-dir-populated.test.cjs grows by 2 cases:
- agents/ paths are staged and copied (was silently skipped pre-fix)
- mixed get-shit-done/ + agents/ in same modified list both stage
2026-05-01 21:14:14 -04:00
Tom Boucher
c2ada7e799 feat(#2995): post-install path audit for workflow-invoked scripts (#2996)
* feat(#2995): post-install path audit for workflow-invoked scripts

Catches the gap class surfaced by #2994: a workflow references a script
via ${GSD_HOME}/<path> that ships in the npm tarball but is not copied
to the user's config dir at install time. Unit tests don't catch it
because they resolve the script via path.join(__dirname, '..', 'scripts',
…) — the source layout, not the deployed layout.

Implementation built TDD per #2995, vertical slices with structured-IR
assertions:

  scripts/audit-workflow-script-paths.cjs
    - Pure auditWorkflowScriptPaths({ workflowsDir, repoRoot,
      installedPrefixes }) returns { ok, findings: [{ workflow, path,
      kind }] } via the AUDIT_FINDING enum.
    - Two finding kinds: MISSING_FROM_REPO (typo / file deleted) and
      NOT_INSTALLED (#2994 class — first segment outside installed
      prefixes).
    - Tolerates ${GSD_HOME:-...} default-fallback syntax.

  tests/bug-2995-post-install-script-paths.test.cjs
    - 9 tests across 3 suites:
      • Pure-function pass and per-finding-kind detection (5 tests on
        synthetic fixtures).
      • Real workflow audit (2 tests asserting the actual repo's
        get-shit-done/workflows/ has no NEW gaps and KNOWN_GAPS stays
        consistent with audit findings).
      • Enum shape lock + extractReferences edge cases.
    - All assertions on typed AUDIT_FINDING enum / structured records;
      zero raw text matching.
    - KNOWN_GAPS is a Set keyed on `workflow|path|kind` strings;
      currently contains the #2994 entry. The companion test fails if
      a KNOWN_GAPS entry no longer matches a real finding (forces the
      allow-list to shrink as gaps fix).

The audit immediately catches #2994's gap on `reapply-patches.md`. The
allow-list contains exactly that entry; new gaps fail CI; #2994's fix
will remove the entry as part of the same PR.

Closes #2995
Refs #2994

* chore(#2995): add changeset fragment for PR #2996

* chore(#2995): add changeset fragment for PR #2996

* fix(#2995): emit both NOT_INSTALLED + MISSING_FROM_REPO; clean up fixture leak (CR)

CodeRabbit on PR #2996 found two issues:

1. (Low value) auditWorkflowScriptPaths short-circuited on NOT_INSTALLED,
   masking MISSING_FROM_REPO for the same ref. Removed the `continue` so
   both findings emit in one run; added a regression test.

2. (Low value) bug-2995 test created tmpRoot in before() but never wrote
   into it; per-fixture mkdtempSync dirs leaked. Rooted fixture repos
   under tmpRoot so the after() cleanup actually frees them.
2026-05-01 21:13:45 -04:00
Tom Boucher
55ae8e42d2 test(#2986): mutation-killer suite for config-schema.cjs (95 typed assertions) (#3005)
* test(#2986): mutation-killer suite for config-schema.cjs (95 typed assertions)

Stryker measured 4.62% mutation score on config-schema.cjs (6 killed,
124 survived). Surviving mutants documented that existing tests were
exercising paths without verifying outputs.

Adds tests/bug-2986-config-schema-mutation-killers.test.cjs (95 tests,
4 suites) targeting each surviving mutant class:

- M1/M4: parameterized isValidConfigKey(key) === true for every member
  of VALID_CONFIG_KEYS. Kills static-key-fast-path mutations
  (if (VALID_CONFIG_KEYS.has(...)) return true; -> if (false) return true;)
  because no static key matches any DYNAMIC_KEY_PATTERN by design.

- M2: representative dynamic-pattern keys (one per pattern). Each matches
  exactly one pattern. Kills .some -> .every mutation: with .every, no
  single key matches all patterns -> all dynamic keys would be rejected.

- M3: strictEqual against the literal boolean true/false (not assert.ok
  truthy checks). Kills polarity-flip mutations.

- Anchor-tightening: keys that differ from valid by one char beyond the
  documented shape (trailing dot-segment, empty agent name, non-enum tier,
  etc.). Kills regex-loosening mutations on ^, $, charset boundaries.

Tests assert on typed boolean return values from the lib's public surface.
Zero source-grep, zero raw-text matching.

* chore(#2986): add changeset fragment for PR #3005

* test(#2986): use dynamic-only rep key for features pattern (CR feedback)

CodeRabbit on PR #3005: features.thinking_partner is in the static
VALID_CONFIG_KEYS set, so the static fast-path returns true before
DYNAMIC_KEY_PATTERNS.some() is ever called. A Stryker mutant that
removed only the features entry from DYNAMIC_KEY_PATTERNS would
survive because the test only ever exercised the static path for
that key.

Replaced features.thinking_partner with features.some_dynamic_feature
which is NOT in static keys, so isValidConfigKey must reach the
dynamic path to return true. Added a per-rep invariant that asserts
each representative key is NOT a member of VALID_CONFIG_KEYS,
catching this class of mistake at test time on any future
representative-key change.
2026-05-01 21:13:25 -04:00
Tom Boucher
3657c4ea9e fix(#3006): retarget PR-template CHANGELOG checkboxes at the changeset workflow (#3007)
The three PR templates still asked contributors to tick `CHANGELOG.md
updated`, contradicting the post-#2978 rule (documented in
CONTRIBUTING.md and enforced by scripts/changeset/lint.cjs) that
`CHANGELOG.md` must not be edited directly.

Each checkbox now references `npm run changeset` with the appropriate
`--type` (Fixed/Changed/Added) and notes the `no-changelog` opt-out
label where applicable, so `gh pr create` users land in the correct
workflow by copy-paste.

Closes #3006

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 20:01:04 -04:00
Tom Boucher
918f987a19 feat(#2982): extend no-source-grep lint to catch var-binding readFileSync.includes() (#2985)
* feat(#2982): extend no-source-grep lint to catch var-binding readFileSync.includes()

The base lint (scripts/lint-no-source-grep.cjs) only catches
readFileSync(...).<text-method>() chained directly. The much more
common var-binding form escapes it:

  const src = fs.readFileSync(p, 'utf8');
  // 50 lines later
  if (src.includes('foo')) {}        // ← still grep, lint missed it

Scan of the test suite found ~141 files using this pattern.

Implementation built TDD per #2982 with structured-IR assertions:

  scripts/lint-no-source-grep-extras.cjs
    - detectVarBindingViolations(src) — pure detector, two passes:
      pass 1 collects vars bound from readFileSync, pass 2 finds any
      <var>.<includes|startsWith|endsWith|match|search>( on those vars.
    - detectWrappedAssertOkMatch(src) — flags
      assert.ok(<expr>.match(...)) which escapes the assert.match rule.
    - VIOLATION enum exposes stable codes for tests to assert on.

  scripts/lint-no-source-grep.cjs
    - Wires the new detectors into the existing per-file check; one
      additional violation row per file with the first 3 sample tokens.

  tests/bug-2982-lint-var-binding.test.cjs
    - 13 tests, all assertions on typed VIOLATION enum / structured
      records. Covers all 5 text-match methods, multi-var, no-bind,
      string literal (must NOT trigger), wrapped assert.ok(.match),
      and assert.match (must NOT double-flag).

Migration backlog (#2974 expanded scope):

  - 42 files annotated `// allow-test-rule: source-text-is-the-product`
    (legitimate — they read .md/.json/.yml files whose deployed text
    IS the product)
  - 3 files annotated `// allow-test-rule: pending-migration-to-typed-ir [#2974]`
    (read .cjs/.js source — clear migration debt)
  - 95 files annotated `pending-migration-to-typed-ir [#2974]` with
    `Per-file review may reclassify as source-text-is-the-product
    during migration` (mixed — manual review under #2974)

After this lands the lint reports 0 violations on main; new
violations in PRs surface immediately.

Closes #2982
Refs #2974

* test(#2982): fix truncated test name per CR

The label ended with a bare '(' from a copy-paste mishap. Now reads
'does NOT flag .matchAll(...) — matchAll is not match, so
assert.ok(.matchAll(...)) is not flagged'.

* chore(#2982): add changeset fragment for PR #2985

* chore(#2982): add changeset fragment for PR #2985
2026-05-01 19:50:10 -04:00
Tom Boucher
17a4321bf5 docs(#2989): promote v1.39.1 hotfix entries from [Unreleased] to dated section (#2991)
Both v1.39.0 (stable, tagged 2026-05-01T03:05:33Z) and v1.39.1
(hotfix, tagged 2026-05-01T21:03:54Z) shipped to npm but the
CHANGELOG `[Unreleased]` link still pointed at `v1.38.5...HEAD` and
the entries that landed in v1.39.1 were still un-promoted.

Move the five v1.39.1 hotfix entries (#2917, #2949, #2954, #2962,
#2969) into a new `## [1.39.1] - 2026-05-01` section above
`## [1.38.5]`, with a one-line intro and install snippet matching
the conventions used in earlier dated sections.

Update the `[Unreleased]` link to point at `v1.39.1...HEAD`.

Out of scope (separate cleanup):
  - Backfilling a `## [1.39.0]` section. The CHANGELOG never had one;
    this PR doesn't make that worse but also doesn't try to invent
    release-note text from commit messages.
  - The eight v1.39.1 commits without `[Unreleased]` entries
    (#2942, #2944, #2924/#2941, #2940, #2947, #2950, #2948, #2957).
    These weren't in `[Unreleased]` to begin with; faithful
    promotion only moves what was already documented.
  - Adding a `docs/RELEASE-v1.39.1.md` file. The `docs/RELEASE-*.md`
    pattern in this repo is RC-only; stable patches historically
    don't have a counterpart.

The post-v1.39.1 hardening entries (#2980, #2983, #2987 from this
session, plus #2976 which was pre-skipped from the v1.39.1
cherry-pick set after #2980 landed) remain in the new
`[Unreleased]` section — they ship in the next release.

Closes #2989
2026-05-01 18:21:09 -04:00
Tom Boucher
9d5db87249 feat(#2975): adopt changeset-fragment workflow to eliminate CHANGELOG conflicts (#2978)
* feat(#2975): adopt changeset-fragment workflow to eliminate CHANGELOG conflicts

Two PRs that both edit `### Fixed` in CHANGELOG.md always conflict on merge.
Recently bit on #2960/#2972 in the same session — fix-the-conflict-and-rebase
tax. Replace the shared-file model with per-PR fragment files that never
share lines.

Implementation built TDD per #2975, vertical slices with structured-IR
assertions throughout:

  scripts/changeset/parse.cjs       - fragment text → typed record + frozen
                                      FRAGMENT_ERROR enum (8 tests)
  scripts/changeset/render.cjs      - fragments → structured IR with
                                      Keep-a-Changelog section ordering
                                      (2 tests)
  scripts/changeset/serialize.cjs   - IR ↔ markdown round-trip pair
                                      (parse(serialize(ir)) === ir,
                                      3 tests)
  scripts/changeset/cli.cjs         - file-I/O wrapper with --json mode;
                                      reads .changeset/, folds into
                                      CHANGELOG.md, deletes consumed
                                      fragments. Idempotent. (1 test)
  scripts/changeset/lint.cjs        - pure verdict (changedFiles, labels)
                                      → { ok, reason } via LINT_REASON
                                      enum. Honors `no-changelog` label.
                                      (5 tests)
  scripts/changeset/new.cjs         - fragment scaffolder with random
                                      adjective-noun-noun filename. Tests
                                      assert via parseFragment round-trip.
                                      (3 tests)

Total: 22 tests, all assertions on typed structured fields. No regex on
text, no String#includes on file content. Lint clean across 356 test files.

Supporting:

  .changeset/README.md              - format spec + workflow docs
  .changeset/eager-hawks-rally.md   - dogfood fragment for THIS PR (will
                                      be the first thing the new release
                                      tool consumes)
  .github/workflows/changeset-required.yml
                                    - CI: every PR runs lint.cjs
  package.json                      - npm run changeset, changelog:render,
                                      lint:changeset
  CONTRIBUTING.md                   - new "CHANGELOG Entries — Drop a
                                      Fragment" section between PR
                                      Guidelines and Testing Standards

Closes #2975

* fix(#2975): address CodeRabbit findings on changeset workflow

7 valid findings (4 Major, 3 Minor); all addressed:

scripts/changeset/parse.cjs
  - Preserve fragment body verbatim. Previously body.trim() ate
    intentional leading whitespace (code blocks, etc.); now trim() is
    used only for the emptiness check, and a single trailing newline
    is stripped (the editor-added one) so well-formed fragments
    round-trip byte-for-byte. Added a regression test asserting a
    code-block-leading body is preserved.

scripts/changeset/cli.cjs
  - Validate flag values during argument parsing. parseArgs now returns
    { ok, opts | error }; rejects `--repo` etc. with no following value
    or with another flag as the value. main() surfaces the error
    message before exiting 2.
  - Handle post-write fragment-deletion failures. After CHANGELOG.md
    is written, any unlink failure is captured into a structured
    deleteFailures list with reason 'fail_fragment_delete'; cmdRender
    returns exitCode=1 with the partial-failure detail instead of
    leaving the changelog updated and fragments behind (which would
    cause double-consumption on rerun).

scripts/changeset/lint.cjs
  - Treat CHANGELOG.md as a linted user-facing path. Direct edits to
    CHANGELOG.md (the bypass route around the new workflow) now fail
    the lint with FAIL_MISSING_FRAGMENT. Added a regression test for
    that case.
  - Use cp.execFileSync instead of cp.execSync for the git diff call.
    Eliminates the shell-interpolation surface on GITHUB_BASE_REF;
    git's own arg parser remains the validator.

scripts/changeset/new.cjs
  - Atomic fragment creation. existsSync() + writeFileSync was racy
    under concurrent invocations. Now writeFileSync uses { flag: 'wx' }
    which fails EEXIST on collision; the random-name retry loop
    catches EEXIST and re-rolls. Throws explicitly after 16 attempts
    rather than silently overwriting.

.changeset/README.md
  - Add language tag `md` to the format example fence (markdownlint
    MD040).

All 25 changeset tests pass; lint clean (356 test files, 0 violations).

* fix(#2975): sanitize --type and validate flag values in new.cjs (CR fixes)

Two CR findings on scripts/changeset/new.cjs:

1. (Minor) `type` was embedded in frontmatter without sanitization. A
   newline in the value (e.g. `--type 'Fixed\ntype: Added'`) would
   corrupt the fragment. scaffoldFragment now validates `type` against
   the Keep-a-Changelog ALLOWED_TYPES set BEFORE writing — same set
   parse.cjs uses on consume. Throws with a typed error referencing
   the allowed values; tests cover the newline case + 4 other
   non-allowed values.

2. (Minor) `--repo` (and other value-taking flags) without a value
   silently set opts.repo to undefined, which produced a cryptic
   ERR_INVALID_ARG_TYPE deep inside path.join. parseArgs now mirrors
   the cli.cjs convention: returns { ok, opts | error }, validates
   that the next token exists and is not itself another flag, and
   surfaces a precise "missing value for --repo" message before exit.
   Added 3 tests: missing-trailing-value, flag-as-value, well-formed.

29 tests pass across the changeset suite (4 new regression tests).
2026-05-01 18:12:20 -04:00
Tom Boucher
cb98a88139 fix(#2987): skip dry-run publish validation when version is already on npm (#2988)
The `Dry-run publish validation` step ran `npm publish --dry-run` with
no `if:` guard. `npm publish --dry-run` contacts the registry and
exits 1 with "You cannot publish over the previously published
versions" when the target version exists.

The earlier `Detect prior publish (reconciliation mode)` step already
discovers this case and sets steps.prior_publish.outputs.skip_publish=true.
The actual publish step (further down) is gated on that. The
rehearsal step was missing the gate, so any re-run of an
already-published hotfix blew up at the rehearsal before reaching
the reconciliation logic — exactly when an operator is trying to
recover from a later-step failure (merge-back, summary, etc.).

Add `if: ${{ steps.prior_publish.outputs.skip_publish != 'true' }}`
matching the publish step's gate. The rehearsal still runs on first
publishes where it has value.

Trigger: run 25233855236.

Closes #2987
2026-05-01 17:39:35 -04:00
Tom Boucher
fb92d1e596 fix(#2983): classifier exit-code discipline, base-tag staging, drop vestigial merge-back (#2984)
* fix(#2983): classifier exit-code discipline, base-tag staging, drop vestigial merge-back

Three issues surfaced by CodeRabbit's post-merge review of #2981 plus
a production failure on the v1.39.1 release run.

(1) Overloaded classifier exit code

scripts/diff-touches-shipped-paths.cjs reused exit 1 for both the
legitimate "no shipped paths" result and Node's default exit on
uncaught throw, so any classifier failure (corrupt package.json,
EPERM, etc.) was indistinguishable from a normal skip — the workflow's
`if ! ... ; then skip` idiom would silently drop the commit.

Distinct exit codes now:
  0  shipped       — at least one path is in the npm `files` whitelist
  1  not shipped   — CI / test / docs / planning only
  2  classifier error — workflow MUST fail-fast

uncaughtException + unhandledRejection + try/catch around fs/JSON
parsing all route to exit 2 with stderr context.

(2) Classifier missing at the base tag (CRITICAL)

`Prepare hotfix branch` runs `git checkout -b "$BRANCH" "$BASE_TAG"`
BEFORE the cherry-pick loop, replacing the working tree with the base
tag's contents. Base tags predating #2980 (notably v1.39.0, the most
likely next hotfix base) don't have scripts/diff-touches-shipped-paths.cjs
at all — `node <missing>` exits non-zero — `if !` skips every commit —
empty hotfix branch published. Strictly worse than the original #2980
push-rejection, which at least failed loudly.

Stage the classifier from the dispatched ref's working tree into
$RUNNER_TEMP at the top of the run script (before any working-tree-
mutating git command). The cherry-pick loop now references $CLASSIFIER
(staged) instead of the in-tree path. Sanity guards: refuse to start
if scripts/diff-touches-shipped-paths.cjs is missing in the dispatched
ref, refuse to proceed if cp didn't materialize $CLASSIFIER.

The cherry-pick loop captures node's exit via ${PIPESTATUS[1]} and
dispatches via explicit case:
  0  proceed with cherry-pick
  1  skip into NON_SHIPPED_SKIPPED
  *  emit ::error:: + exit "$CLASSIFIER_RC"

(3) Drop the merge-back PR step

Auto-cherry-pick only picks commits already on main (`git cherry HEAD
origin/main` outputs the unmerged ones; we filter fix:/chore: from
main). By construction every code commit on the hotfix branch is
already on main. The only hotfix-branch-only commit is `chore: bump
version to X.Y.Z for hotfix`, which either no-ops against main or
rewinds main's in-progress version. The merge-back PR was vestigial.

It also failed in production on run 25232968975 with `GitHub Actions
is not permitted to create or approve pull requests (createPullRequest)`
— org policy blocks PR creation from the workflow's GH_TOKEN. Even
without that block, the PR would have nothing useful to merge.

Step removed. The `pull-requests: write` permission granted solely
for the merge-back step has been dropped from the release job
(least-privilege).

Regression coverage

tests/bug-2983-classifier-exit-codes-and-base-tag-staging.test.cjs
adds 12 assertions across two describe blocks:

  - 5 classifier behavioral: exit 0/1 preserved, exit 2 on missing
    package.json, exit 2 on malformed JSON, exit-code constants
    exported.
  - 7 workflow contract: classifier staged before checkout, target
    is $RUNNER_TEMP, missing-source guard, missing-staged guard,
    PIPESTATUS-based dispatch, error branch fails workflow, loop uses
    staged path (not in-tree).

tests/bug-2980-hotfix-only-picks-shipping-changes.test.cjs updated
where it asserted the pre-#2983 `if ! ... ; then` shape: now accepts
the post-#2983 case-dispatch form. The test still proves the
classifier participates; bug-2983 enforces the specific shape.

Run summary references for the curious reviewer:

  - Run 25232010071 — original #2980 trigger (workflow-file push
    rejection)
  - Run 25232968975 — failed merge-back step that prompted the
    "is this even useful?" question that drove the removal

Closes #2983

* fix(#2983): address CodeRabbit findings on PR #2984

Two findings, both real, both fixed.

(1) [Critical] PIPESTATUS capture clobbered by `|| true`

Pre-fix shape:
  git diff-tree ... | node "$CLASSIFIER" || true
  CLASSIFIER_RC="${PIPESTATUS[1]}"

When the classifier exits 1 ("not shipped" — common case) or 2
(error), `|| true` triggers the right-hand side. `true` is a
one-command "pipeline" that overwrites PIPESTATUS to (0).
${PIPESTATUS[1]} on the next line is therefore unset (or stale
under set -u). The case dispatch then matched the empty string —
falling into `*)` and failing the workflow on every non-shipped
commit, OR matching `0)` after some shells default-init unset
to 0 and silently picking commits that don't ship.

Local repro confirms the issue:

  $ bash -c 'set -euo pipefail; false | sh -c "exit 7" || true; \
       echo "PIPESTATUS: ${PIPESTATUS[*]}"; \
       echo "[1]: ${PIPESTATUS[1]:-<unset>}"'
  PIPESTATUS: 0
  [1]: <unset>

Fix: bracket the pipeline in `set +e`/`set -e`, snapshot
PIPESTATUS into a local array on the very next line, then
dispatch on the snapshot:

  set +e
  git diff-tree ... | node "$CLASSIFIER"
  PIPE_RC=("${PIPESTATUS[@]}")
  set -e
  DIFFTREE_RC="${PIPE_RC[0]}"
  CLASSIFIER_RC="${PIPE_RC[1]}"

The snapshot must happen on the first line after the pipeline;
any intervening simple command resets PIPESTATUS. The array form
is invariant against that.

Bonus from the new shape: $DIFFTREE_RC is now also captured.
git diff-tree is unlikely to fail on a known-good $SHA, but if
it does, we no longer feed partial/empty input to the classifier
and call it "not shipped." A non-zero DIFFTREE_RC emits
::error::git diff-tree failed and exits.

(2) [Minor] Stale "Merge-back PR opened against main" summary line

The hotfix run summary still printed:
  echo "- Merge-back PR opened against main"

But the merge-back step itself was removed in the previous commit
on this branch. Operators reading the summary would expect a PR
that doesn't exist. Replaced with explicit non-action text:

  echo "- No merge-back PR (auto-picked commits are already on main)"

Test coverage

bug-2983 test file gains 3 assertions:
  - PIPE_RC array-snapshot pattern is required (regex matches the
    exact `PIPE_RC=("${PIPESTATUS[@]}")` form).
  - The `pipeline || true; ${PIPESTATUS[1]}` antipattern is
    explicitly forbidden via assert.doesNotMatch.
  - DIFFTREE_RC is captured from PIPE_RC[0] and a non-zero value
    triggers ::error::git diff-tree failed.
  - Run summary forbids `Merge-back PR opened against main` and
    requires the new non-action sentence.

bug-2964 test's loop-anchor window bumped 6 KB → 8 KB to
accommodate the additional pre-pick scaffolding (the test's own
comment had already anticipated this kind of growth, citing prior
precedents from #2970 and #2980).

Mark CodeRabbit comments resolved post-commit.

Refs CR finding ids 3175253571, 3175253578 on PR #2984.
2026-05-01 17:25:20 -04:00
Tom Boucher
7424271aa0 fix(#2980): hotfix cherry-pick only picks commits that change what ships (#2981)
* fix(#2980): pre-skip workflow-file cherry-picks in release-sdk hotfix loop

The default GITHUB_TOKEN issued to the release-sdk run lacks the
`workflow` scope, so the prepare job's `git push origin "$BRANCH"` is
rejected by GitHub when any cherry-picked commit modifies a file under
`.github/workflows/`:

  ! [remote rejected] hotfix/X.YY.Z -> hotfix/X.YY.Z
    (refusing to allow a GitHub App to create or update workflow ...
     without `workflows` permission)

Pre-#2980 behavior: the auto_cherry_pick loop happily picked
workflow-file commits, then the trailing push exploded with no clear
signal which commit was the culprit. v1.39.1 hit this on PR #2977
(run 25232010071) — earlier release-sdk fixes (#2965, #2967, #2970)
had been skipped on conflict so their workflow-file changes never
reached the push step, masking the bug; #2977 was the first
workflow-file commit to apply cleanly and the push immediately
exploded.

Fix: pre-pick guard in the cherry-pick loop. Inspect each candidate
commit's file list via `git diff-tree --no-commit-id --name-only -r`
BEFORE attempting the pick. If any path matches `^\.github/workflows/`,
skip the commit, emit a `::warning::` annotation naming the dropped
commit, and append to a new `WORKFLOW_SKIPPED` bucket. The run summary
surfaces this bucket in its own section, distinct from `CONFLICT_SKIPPED`
(real merge conflicts) and `POLICY_SKIPPED` (feat/refactor exclusions),
so operators reviewing the run never confuse the remediation paths.

The loud-warning piece is non-negotiable: silent drops were explicitly
rejected as a failure mode during the option-1/2/3 tradeoff discussion.
If a workflow-file fix genuinely needs to ship in a hotfix, the
operator applies it manually on the hotfix branch using a token with
`workflow` scope, or lands it on main and re-cuts the release.

Regression covered by tests/bug-2980-skip-workflow-file-cherrypicks.test.cjs
(5 assertions: pre-pick guard exists, uses `git diff-tree`, emits
`::warning::`, lands in dedicated bucket, surfaces in summary).

The bug-2964 test's 4 KB window after the cherry-pick-loop anchor was
nudged to 6 KB to accommodate the new pre-pick scaffolding — the test's
own comment had already anticipated this kind of growth (citing #2970's
merge-commit pre-skip as prior precedent).

Closes #2980

* refactor(#2980): replace workflow-file pre-skip with shipped-paths filter

The previous commit on this branch caught only the .github/workflows/*
subset of the bug, treating the symptom (push rejection on workflow-file
changes) rather than the root cause (the fix:/chore: filter is too broad
— it picks any commit with that conventional-commit type even when the
diff cannot affect the published npm package).

CI-only fixes (release-sdk.yml itself, hotfix tooling, test-only
commits) shouldn't flow through hotfix runs at all — they cannot change
what `npm install get-shit-done-cc@X.YY.Z` produces. The
.github/workflows/* push rejection is just the loudest of these
"shouldn't have been picked" cases; tests/, docs/, .planning/ commits
get picked silently with the same lack of effect on consumers.

Replace the workflow-file pre-skip with a shipped-paths filter:

- New scripts/diff-touches-shipped-paths.cjs reads package.json `files`,
  plus package.json itself (always-shipped per `npm pack` semantics),
  and exits 0 iff any input path is in the shipped set. Lockfile is
  not shipped (npm pack excludes it unless explicitly in `files`).
- Workflow loop now pipes `git diff-tree --no-commit-id --name-only -r`
  through the classifier; on exit 1 the commit is skipped and
  appended to a new NON_SHIPPED_SKIPPED bucket (replaces
  WORKFLOW_SKIPPED).
- Run summary surfaces NON_SHIPPED_SKIPPED as informational — no
  ::warning:: annotation. A non-shipping commit cannot affect the
  package, so a yellow alert would imply remediation is possible
  and would mislead operators.

The classifier in a separate .cjs file (rather than inline bash
heredoc) is so its rules — directory-prefix vs exact-match,
package.json-always-shipped, lockfile-not-shipped — are unit-testable
in tests/bug-2980-hotfix-only-picks-shipping-changes.test.cjs (11 new
assertions: 4 static workflow + 6 classifier behavioral + 1 mixed-
diff edge case).

Why this dissolves the original push-rejection bug: workflow files
aren't in `files`, so workflow-only commits are skipped pre-pick.
The push step never sees them.

If a workflow-file fix genuinely needs to ship in a hotfix release
(extremely rare — the hotfix workflow is read from main's ref, not
the hotfix branch's), the operator applies it manually using a token
with `workflow` scope. The pre-skip puts that requirement in the run
summary explicitly.

Closes #2980
2026-05-01 16:59:49 -04:00
Tom Boucher
7a416b10d4 fix(#2976): allow same-version bump in release-sdk hotfix release job (#2977)
The release job's "Bump in-tree version (not committed)" step ran
`npm version "$VERSION" --no-git-tag-version` without --allow-same-version,
so on real hotfix runs it failed with `npm error Version not changed` —
because the prepare job had already committed the bump on the hotfix
branch (the release job checks out BRANCH on real runs vs BASE_TAG on
dry-runs, which is why dry-run never caught it).

Pass --allow-same-version to both bumps, matching release.yml:326.

Closes #2976
2026-05-01 16:32:18 -04:00
Tom Boucher
ef43f5161f fix(#2969): deterministic Step 5 verification gate for /gsd-reapply-patches (#2972)
* fix(#2969): deterministic Step 5 verification gate for /gsd-reapply-patches

The prior Step 5 "Hunk Verification Gate" was prescribed correctly in the
workflow text — but executed laxly by the LLM, which filled in `verified: yes`
without actually checking content presence. The reporter observed three
distinct files (skills/gsd-discuss-phase/SKILL.md, skills/gsd-autonomous/
SKILL.md, get-shit-done/workflows/new-project.md) where archives contained
substantive user-added blocks that did not survive into the merged result, yet
the gate reported clean.

Move verification from LLM-driven prose into a deterministic Node script the
workflow calls. The script can't be shortcut.

Changes:

- scripts/verify-reapply-patches.cjs (new): pure Node, no external deps.
  For each file in the patches dir, computes user-added significant lines as
  the line-set diff between backup and pristine baseline (when available;
  falls back to "every significant backup line" when no pristine — over-broad
  but the safe direction for this bug class). Asserts each line appears
  literally in the merged installed file via String.prototype.includes.
  Filters trivial lines (length < 12 chars, pure punctuation, decorative
  comments) so harmless drift doesn't trigger false failures. Exits 0 on
  pass, 1 on any miss with per-file diagnostic, 2 on usage error.
  Supports --json for workflow consumption.

- get-shit-done/workflows/reapply-patches.md: rewrite Step 5 to call the
  script and parse its JSON output. The Step 4 Hunk Verification Table
  remains as advisory Claude-readable summary, but the gate is now the
  script's exit code.

- tests/bug-2969-verify-reapply-patches.test.cjs (new): 6 tests covering
  (a) pass when every line survives, (b) fail when a line is missing,
  (c) fail when the merged file is deleted entirely, (d) --json structured
  report shape, (e) backup-meta.json is correctly skipped as metadata,
  (f) no-pristine-dir fallback exercises the safe over-broad path. All pass.

Out of scope: the manifest-baseline tightening described in #2969 Failure 1
(saveLocalPatches comparing against the wrong baseline so prior silent wipes
poison subsequent updates). That's a separate, bigger architectural change
involving pristine-content infrastructure; this PR addresses the gate fidelity
half so users at least see the diagnostic when content goes missing.

Closes #2969 (partial — Failure 2 only)

* fix(#2969): preserve #1999 Hunk Verification Table assertions alongside new script gate

CI failure on PR #2972 surfaced that tests/reapply-patches.test.cjs (the
#1999 contract) asserts Step 5 references:
  - "Hunk Verification Table"
  - `verified: no` failure condition
  - explicit STOP/halt/abort directive
  - "table absent / missing" halt path

My initial Step 5 rewrite for #2969 substituted the deterministic script
for the table-based gate entirely, stripping those references. The script
is the strictly stronger gate, but the existing #1999 test enforces the
table-based safety net as a defense-in-depth contract.

Restore both gates as a layered Step 5:

  - 5a (binding): deterministic verifier script — script gate, exits
    non-zero on any miss, cannot be shortcut by the LLM
  - 5b (advisory): Hunk Verification Table review — preserved as
    redundant safety net for the case where the script has a bug or the
    pristine baseline is unavailable

Both gates must pass. Verified: tests/reapply-patches.test.cjs (5 tests
in the #1999 suite) and tests/bug-2969-verify-reapply-patches.test.cjs
(6 tests in the #2969 suite) all pass — 21/21 total in this fixture.

* fix(#2969): address CodeRabbit findings on workflow + script

Five CR findings on PR #2972, all valid; addressed in this commit:

1. (Major) Stderr was merged into VERIFY_OUTPUT via `2>&1`, so any Node
   warning, deprecation notice, or stack trace would corrupt the JSON
   parse downstream. Capture stdout only; stderr remains on the
   controlling terminal for operator visibility.

2. (Major) verifyFile() crashed with EISDIR/EACCES instead of producing
   a structured diagnostic when the installed path was a directory or
   unreadable. Wrap statSync/readFileSync in try/catch and emit a
   per-file fail row; the whole-run gate continues with structured
   output. Added test case asserting the directory-at-installed-path
   case fails with `not a regular file` diagnostic instead of crashing.

3. (Minor) PRISTINE_FLAG built as a single string + unquoted expansion
   would split paths with spaces. Switched to a bash array (VERIFY_ARGS)
   that preserves whitespace through expansion.

4. (Minor) Fenced code block missing language tag (markdownlint MD040).
   Added `text` tag to the error message block.

5. (Minor) Usage comment said pristine fallback was "backup-meta lookup"
   but the actual code path falls back to significant-line checks from
   backup content. Corrected the comment to match implementation.

Verified all 21 tests in tests/reapply-patches.test.cjs (#1999 contract)
+ tests/bug-2969-verify-reapply-patches.test.cjs (now 7 tests with the
new directory case) pass.

* test(#2969): structured JSON assertions, no substring matching on script output

Replace every assert.match(r.stdout, /pattern/) call with structured
assertions on the parsed JSON report from the script's own --json mode.
The script's --json contract IS the structured shape we test against —
the test author should never depend on the human-readable formatter
output, just as no test should depend on substring presence in source.

Changes:

  - All 7 tests now run the verifier with --json (via a runVerifier()
    helper) and parse the resulting JSON document into { status, report,
    stderr }. Diagnostic stderr is preserved as a separate channel for
    debug output but is not used for assertions.
  - Each previously substring-matched diagnostic ("Failures: 1",
    "not a regular file", "installed file missing after merge",
    file path, dropped line) is now a deepEqual / equal / Array.includes
    against typed report fields: report.failures, report.results[i].status,
    report.results[i].reason, report.results[i].file,
    report.results[i].missing[].
  - Added an explicit "documented shape" test asserting the JSON output
    has exactly the keys { file, missing, reason, status } per result —
    locks the public contract of the --json mode.
  - DRY'd up fixture reset into a resetFixture() helper since every test
    starts with a fresh patches/installed/pristine triple.

Linter: scripts/lint-no-source-grep.cjs reports 0 violations across 348
test files. Combined run of bug-2969-...test.cjs (7 tests) +
reapply-patches.test.cjs (5 tests in the #1999 suite) all pass —
22/22 in the relevant fixture.

* fix(#2969): typed REASON enum + raw-text-matching rule shipped repo-wide

This commit closes the loop on the no-source-grep discipline:

1. scripts/verify-reapply-patches.cjs:
   - Frozen REASON enum exposes the diagnostic surface as stable codes:
     OK_NO_USER_LINES_VS_PRISTINE, OK_NO_SIGNIFICANT_BACKUP_LINES,
     FAIL_INSTALLED_MISSING, FAIL_INSTALLED_NOT_REGULAR_FILE,
     FAIL_READ_ERROR, FAIL_USER_LINES_MISSING.
   - Each result.reason is now a code from this enum, not free text.
     Tests assert via REASON.X equality, not regex on prose.
   - REASON exported from module.exports.

2. tests/bug-2969-verify-reapply-patches.test.cjs:
   - Full rewrite. Every assertion on typed structured fields:
     report.results[0].status === 'fail',
     report.results[0].reason === REASON.FAIL_INSTALLED_NOT_REGULAR_FILE,
     report.results[0].missing.includes(droppedLine) (Array set membership,
     not String substring).
   - Locks the REASON enum surface via Object.keys(REASON).sort() deepEqual.
   - Locks the JSON report shape via Object.keys(report).sort() deepEqual.
   - Zero regex, zero String#includes, zero startsWith/endsWith on text.

3. CONTRIBUTING.md:
   - New section "Prohibited: Raw Text Matching on Test Outputs" with
     concrete BAD/GOOD examples (substring on file content; assert.match
     on stdout; "structured parser" hiding string ops; regex on free-form
     reason fields).
   - The rule statement: "Tests assert on typed structured values. If
     the code under test produces text, the code under test must also
     expose a structured intermediate representation, and the test must
     assert on that IR — never on the rendered text."
   - Required structured-surface table: file IR, --json mode, frozen
     enum, fs facts.
   - "Hiding grep behind a function is still grep" callout — the
     parser-wrapper anti-pattern.
   - New `pre-existing-text-matching` exemption category for the 8
     grandfathered files. Marked Transitional; new tests cannot use it.

4. scripts/lint-no-source-grep.cjs:
   - Three new patterns enforced (in addition to the existing .cjs-source
     readFileSync rule):
     - assert.match/doesNotMatch on .stdout/.stderr
     - .stdout/.stderr.<includes|startsWith|endsWith>(
     - readFileSync(...).<includes|startsWith|endsWith>(
   - Aggregated violations per file (multiple findings now report together).
   - Updated diagnostic message references both CONTRIBUTING.md sections.

5. 8 pre-existing tests annotated with `// allow-test-rule:
   pre-existing-text-matching` so the lint passes on this commit; each
   carries the prose "Tracked for migration to typed-IR assertions; do
   not copy this pattern." Files: bug-2649, bug-2687, bug-2796, bug-2838,
   bug-2943, graphify, hooks-opt-in, security-scan.

Verification: lint 0 violations across 348 test files; full suite passes.

* fix(#2969): rename exemption category to pending-migration-to-typed-ir + cite tracking issue

Per maintainer feedback:
1. "Grandfathered" / "legacy" framing is wrong — both terms imply
   permanent or condoned exemption. The 8 files are tracked for
   correction, not exempted.
2. Each annotated file must cite the tracking issue so the migration
   work is auditable.

Changes:
- CONTRIBUTING.md: rename exemption category from
  `pre-existing-text-matching` to `pending-migration-to-typed-ir`. Update
  prose to "Tracked for correction, not exempted" and require each
  annotation to cite the open migration issue (e.g.
  `// allow-test-rule: pending-migration-to-typed-ir [#NNNN]`).
- 8 test files: update annotation to cite #2974 (the tracking issue
  opened for migrating these files to typed-IR assertions).
2026-05-01 16:14:39 -04:00
Tom Boucher
e9a66da1e7 fix(#2962): write npm-style gsd-sdk shim on Windows under --sdk install (#2971)
* fix(#2962): write npm-style gsd-sdk shim on Windows under --sdk install

trySelfLinkGsdSdk previously contained `if (process.platform === 'win32')
return null;` — a missed gap from #2775's POSIX self-link rather than an
intentional design choice. As a result, `npx get-shit-done-cc@latest
--claude --global --sdk` on Windows left `gsd-sdk` off PATH despite the
installer reporting success, and the obvious recovery (`npm i -g
@gsd-build/sdk`) lands the stale 0.1.0 publication that lacks the `query`
subcommand the agents call ~40 times.

This PR addresses the shim half. The npm-publish half (publishing
@gsd-build/sdk at parity with the get-shit-done-cc version) requires
maintainer credentials and is left for separate action.

Changes:

- bin/install.js: replace the unconditional Windows return-null with
  dispatch to a new trySelfLinkGsdSdkWindows() that:
  * resolves npm's global bin via `execFileSync('npm', ['prefix', '-g'])`
    (no shell interpolation; npm is the only PATH-resolved binary)
  * verifies write access with a probe before producing partial state
  * writes the standard npm shim triple to npm's global bin:
    - gsd-sdk.cmd (cmd.exe; CRLF line endings)
    - gsd-sdk.ps1 (PowerShell)
    - gsd-sdk    (Bash wrapper for Cygwin/MSYS/Git-Bash)
  * each shim invokes `node "<absolute path to bin/gsd-sdk.js>"` with the
    passed args, decoupling shim location from SDK location — same logical
    structure as the POSIX wrapper-via-require() fallback above
  * unlinks any stale shims before writing so prior installs don't pin
    callers to a now-absent path
  * returns the .cmd path on success (handle the existing onPath check
    looks for) or null on any failure, falling through to the existing
    "gsd-sdk is not on your PATH" warning at line 8704

- tests/bug-2962-windows-sdk-shim.test.cjs (new): 5 tests exercising
  trySelfLinkGsdSdkWindows directly with cp.execFileSync mocked to redirect
  npm prefix to a temp dir. Asserts shim contents reference the absolute
  path, .cmd uses CRLF, stale shims are replaced not appended, and null is
  returned when `npm prefix -g` fails.

- tests/no-unconditional-win32-skip.test.cjs (new): regression guard
  that fails CI if any future commit re-introduces
  `if (process.platform === 'win32') return null;` (or similar
  skip-only branches) in bin/install.js. Negative test verified by
  transiently re-introducing the bad pattern → guard fired → restored
  → passes.

Out of scope: publishing @gsd-build/sdk@<current> to npm so the natural
`npm i -g @gsd-build/sdk` recovery also lands a usable SDK. That requires
maintainer credentials and is the second half of the issue.

Closes #2962

* fix(#2962): address CodeRabbit findings — execSync for npm.cmd, behavior-based regression guard

CR finding 1 (🟠 Major): Node's child_process docs explicitly call out that
.cmd/.bat files cannot be spawned via execFile/execFileSync without a shell
("Spawning .bat and .cmd files on Windows" section). Since `npm` on Windows
is `npm.cmd`, my use of execFileSync('npm', ['prefix', '-g'], { shell: false })
would have failed on the very platform this PR is meant to fix.

Switched to cp.execSync('npm prefix -g', ...) — matching the existing
convention at line ~8718 which makes the same lookup. Args are static literals
so shell interpolation is not an injection vector.

CR finding 2 (🟠 Major): the source-grep regression test in
tests/no-unconditional-win32-skip.test.cjs violated the repo's no-source-grep
testing standard (CONTRIBUTING.md). Replaced with a behavior-based test that:

  - overrides process.platform to 'win32' via Object.defineProperty
  - mocks cp.execSync to return a temp-dir as npm prefix
  - calls trySelfLinkGsdSdk(shimSrc) and asserts it returns non-null AND
    materializes gsd-sdk.cmd on disk

The behavior guard is strictly stronger than the regex version: it would
catch any equivalent skip pattern (e.g. os.platform() === 'win32', a
typeof-based guard, etc.), not just literal `if (process.platform === 'win32')`
text. Negative-tested by re-introducing the `return null` skip → test fails
with maintainer-quoted diagnostic "trySelfLinkGsdSdk must not silently
return null on Windows; a no-op skip is a missed-parity regression"; restored
→ passes.

Test for Windows shim materialization (bug-2962-windows-sdk-shim.test.cjs)
also updated to mock cp.execSync (matching the new production code path)
instead of cp.execFileSync.

Full suite: 6480/6480 pass.

* test(#2962): make Windows shim tests self-contained per CR

Each test now invokes trySelfLinkGsdSdkWindows() itself before reading
the shim files, so they don't implicitly depend on the earlier test's
side effects. Addresses CR's order-dependence finding.

* test(#2962): structured shim parsing — eliminate substring source-grep

CR found that even after the prior refactor, three tests in the suite
still used .includes()/.startsWith() against shim file content
(cmdContent.includes(\`@node ${jsonQuoted} %*\`) etc.). Substring matching
on file text is the same anti-pattern the no-source-grep standard
forbids — even when the file is one this test wrote — because it asserts
a literal exists rather than that the structured shape is correct.

Replace with three small parsers (parseCmdShim, parsePs1Invocation,
parseBashInvocation) that split each shim into header + invocation
tokens and assert via deepEqual on structured records. The assertions
now check that the .cmd has @ECHO OFF / @SETLOCAL / @node <abs> %* in
that order with exactly 3 meaningful lines, and that the .ps1 and bash
wrappers split into the expected (call, nodeCmd, target, argToken)
tuples.

The stale-shim replacement test was hardened the same way: instead of
proving the absence of a sentinel substring, it now proves the parsed
target equals the new shimSrc and != the old path.

Verified: scripts/lint-no-source-grep.cjs reports 0 violations across
348 test files. The 6-test windows-sdk-shim + win32-skip-guard suite
all pass.

* fix(#2962): expose pure shim IR + tests assert on typed fields, not rendered text

Earlier "structured parser" approach (parseCmdShim / parsePs1Invocation /
parseBashInvocation) was still raw-text manipulation behind a function
wrapper — split('\\r\\n'), trim().split(/\\s+/), content.includes('\\r\\n').
Maintainer was right: hiding grep behind a parser is still grep.

Real fix: refactor production code to expose the structured intermediate
representation, and have tests assert on the IR fields directly.

Production:
- New buildWindowsShimTriple(shimSrc) — pure function, no fs/spawn.
  Returns { invocation: { interpreter, target }, eol: { cmd, ps1, sh },
  fileNames: { cmd, ps1, sh }, render: { cmd: () => string, ... } }.
  The IR is the contract; rendered text is an implementation detail of
  the renderers.
- trySelfLinkGsdSdkWindows now calls buildWindowsShimTriple, looks up
  filenames from triple.fileNames, and writes triple.render[kind]() to
  each target. Same observable behavior, structurally separated.
- buildWindowsShimTriple added to test-mode exports.

Tests (full rewrite — no shim file content is read at any point):
- Layer 1: pure-IR tests assert on triple.invocation.target,
  triple.eol === { cmd: '\\r\\n', ps1: '\\n', sh: '\\n' },
  triple.fileNames === { cmd: 'gsd-sdk.cmd', ... }, and the
  documented IR shape via Object.keys().sort() deepEqual.
- Layer 2: fs/spawn driver tests assert filesystem FACTS:
  - return value equals expected path
  - all three target files exist as regular non-empty files
  - rendered file byte length === Buffer.byteLength of triple.render(kind)
    output (proves the writer writes what the renderer produces, no
    mutation, no truncation, no double-write — without comparing content)
  - mtime advances on rewrite (proves stale-replace behavior)
  - returns null when npm prefix -g throws

No more split, .includes, .startsWith, .endsWith, or substring matching
anywhere in the test suite. Lint clean. 10/10 tests pass.
2026-05-01 16:10:30 -04:00
Tom Boucher
b8d9bd69b2 fix(release-sdk): skip all cherry-pick conflicts in hotfix loop (full automation) (#2970)
* fix(release-sdk): skip all cherry-pick conflicts in hotfix loop

Full-automation policy: any conflict the cherry-pick can't auto-resolve
— context-missing (#2966) or real merge conflict — is now skipped, not
aborted. The hotfix run completes with whatever applies cleanly; the
SKIPPED list in the run summary becomes the operator's post-hoc review
queue.

Surfaced in run 25227493387 (1.39.1 dry-run): commit 0fb992d
("fix(git): add git.base_branch config") produced real conflicts in
config.cjs / ship.md / complete-milestone.md / tests/config.test.cjs.
v1.39.0 was tagged on the feat/hermes-runtime-2841 branch (#2920),
which restructured those files. 0fb992d was authored against the
pre-restructure shape, so cherry-pick can't auto-resolve.

Pre-#2968 behavior: the workflow distinguished context-missing (skip)
from real (abort + push partial + exit 1). Real conflicts blocked every
hotfix from a base tag whose lineage diverged from main — exactly the
v1.39.x situation. The user has called explicitly for full automation:
"this needs to be fully automated, no one is going to sit there and
tag fixes."

Behavior change:
  - Both classification branches now `git cherry-pick --skip` and
    append to SKIPPED with a reason category:
      * "context absent at base" — empty-HEAD markers (#2966)
      * "merge conflict — manual review" — non-empty HEAD (#2968)
  - Removed: `git cherry-pick --abort`, partial-state push,
    "Cherry-pick conflict" GITHUB_STEP_SUMMARY block, `exit 1`.
  - Operator's manual recovery path via `auto_cherry_pick=false`
    remains intact.

Trade-off (acknowledged in #2968): a critical fix can be silently
dropped if no one reviews the SKIPPED list. The release job's
install-smoke + full test suite still runs and would catch any
test-covered regression. Fixes that aren't test-covered could ship
missing — accepted cost of full automation per the issue.

Tests:
  - tests/bug-2968-cherry-pick-skip-on-any-conflict.test.cjs (new) —
    extracts the cherry-pick failure block via bash if/fi nesting walk
    (no raw-text grep) and asserts the abort path is removed, --skip
    is unconditional, and "merge conflict" + "context absent at base"
    annotations both exist.
  - tests/bug-2966-cherry-pick-context-missing.test.cjs (renamed
    describe + first test name) — assertions still valid since the
    classifier survives for skip-reason annotation.
  - tests/bug-2964-release-sdk-empty-cherry-pick.test.cjs — unchanged
    and still green.

Local: `node --test tests/bug-2964-...test.cjs tests/bug-2966-...test.cjs
tests/bug-2968-...test.cjs` → 8/8 pass.
Local: `npm run lint:tests` → 0 violations.

https://claude.ai/code/session_01LApueb9PVs2uSBhsLprVzG

* fix(release-sdk): split cherry-pick conflict skips from policy skips

CodeRabbit flagged on PR #2970 that conflict skips and policy skips
share the SKIPPED bucket. The run summary heading
"Skipped (feat/refactor/etc — not auto-included)" buries manual-review
conflicts (which the operator must triage) under the same list as
intentional policy exclusions (commits that don't match fix/chore by
design and need no action). Operators reviewing the summary can't
distinguish the two without reading every entry.

Split into two variables:
  - POLICY_SKIPPED — feat/refactor/docs/etc filtered out by the
    fix/chore regex (informational, no action needed)
  - CONFLICT_SKIPPED — fix/chore commits whose cherry-pick failed and
    were skipped per the full-automation policy (#2968) (manual
    review queue)

Run summary now emits two sections with distinct headings:
  - "Skipped — cherry-pick conflict (manual review)"
  - "Not auto-included (feat/refactor/docs/etc)"

The new bug-2968 test asserts both buckets are populated correctly:
  - failure path appends to CONFLICT_SKIPPED, not SKIPPED
  - both bucket variables are echoed in the summary
  - both section headings are present

Local: `node --test tests/bug-2964-...test.cjs tests/bug-2966-...test.cjs
tests/bug-2968-...test.cjs` → 9/9 pass.

https://claude.ai/code/session_01LApueb9PVs2uSBhsLprVzG

* fix(release-sdk): handle merge commits and guard cherry-pick --skip

CodeRabbit flagged a real major issue on PR #2970: merge commits with
fix:/chore: titles fail BEFORE entering cherry-pick state because they
need `-m <parent>` to specify the diff base. Without it, the cherry-pick
errors out and CHERRY_PICK_HEAD is never created. The unconditional
`git cherry-pick --skip` call that follows then fails too (no in-progress
cherry-pick to skip), bricking the loop — defeating the full-automation
policy this PR set out to deliver.

Two guards added:

1. Pre-skip merge commits before invoking cherry-pick. The loop checks
   parent count via `git rev-list --parents -n 1 "$SHA"`; if > 1, the
   commit goes straight to CONFLICT_SKIPPED with reason "merge commit —
   manual -m parent selection required". Operator decides which parent
   to keep when reviewing the run summary.

2. Guard `git cherry-pick --skip` with a CHERRY_PICK_HEAD existence
   check. Catches any other failure mode where the cherry-pick aborts
   before entering conflict state (unreadable commit, ref problems,
   etc.) so the loop still continues cleanly.

Also bumped the bug-2964 test's regex slice window from 2000 to 4000
chars so the merge-commit pre-skip block doesn't push the cherry-pick
line out of the test's match range.

Tests added in tests/bug-2968-cherry-pick-skip-on-any-conflict.test.cjs:
  - merge-commit detection: workflow must call
    `git rev-list --parents -n 1 "$SHA"` before cherry-pick and annotate
    skips with the distinct "manual -m parent selection required"
    reason.
  - guard: failure block must check CHERRY_PICK_HEAD before --skip.

Local: `node --test tests/bug-2964-...test.cjs tests/bug-2966-...test.cjs
tests/bug-2968-...test.cjs` → 11/11 pass.

https://claude.ai/code/session_01LApueb9PVs2uSBhsLprVzG

* fix(release-sdk): guard awk classifier against degenerate unmerged paths

CodeRabbit raised two issues on PR #2970:

1. Major (workflow): the `awk` classifier runs under `set -euo pipefail`.
   If a CONFLICTED path is missing/unreadable, awk exits non-zero and
   terminates the entire step — bricking the loop on a degenerate file.
   Also, an unmerged path with no `<<<<<<< ` markers (path-level conflict
   or anomalous git state) was misclassified as "context absent at base"
   (the auto-skip path), letting potentially-real conflicts skip silently.

   Fix: before invoking awk, check `[ ! -r "$CONFLICTED" ]` and
   `grep -q '^<<<<<<< ' "$CONFLICTED"`. Either failure marks
   ALL_EMPTY_HEAD=false → REASON falls through to "merge conflict —
   manual review", landing the pick in the operator review queue.
   Also added `2>/dev/null || echo "real"` on the awk call so a
   transient awk failure can't slip into the auto-skip bucket.

2. Nitpick (tests): regex assertions on `failureBlock` could match
   commented lines (e.g. comment text mentioning "CONFLICT_SKIPPED"
   or "git cherry-pick --skip" satisfied the assertions without the
   real command being present).

   Fix: anchor with `^\s*...` + `m` flag so only executable shell lines
   count.

Plus a new test asserting all three workflow guards
(`[ ! -r "$CONFLICTED" ]`, `grep -q '^<<<<<<< '`, `awk ... || echo
"real"`) are present in the failure block.

Local: `node --test tests/bug-2964-...test.cjs tests/bug-2966-...test.cjs
tests/bug-2968-...test.cjs` → 12/12 pass.

https://claude.ai/code/session_01LApueb9PVs2uSBhsLprVzG

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-05-01 15:15:20 -04:00
Tom Boucher
0d25ef0c47 fix(release-sdk): skip cherry-picks whose target context is absent at base (#2967)
* fix(release-sdk): skip cherry-picks whose target context is absent at base

When auto_cherry_pick processed a fix:/chore: commit whose patch modified
code that didn't exist at the hotfix base tag — typically because the
surrounding infrastructure was added later in a feat/refactor commit
excluded by the filter — `git cherry-pick` failed with a conflict that
no operator could meaningfully resolve, and the loop bricked the run.

Discovered re-running the 1.39.1 dry-run after #2965 merged: cherry-pick
of `a3467792` (the #2965 merge itself) failed because the auto_cherry_pick
block it modifies was added in #2956 ("Add automated cherry-pick + SDK-
bundle parity to hotfix flow") — an Add/feat commit, so the fix/chore
filter excludes it. v1.39.0 has no such block, so the patch had no
anchor.

The conflict is unmistakably distinguishable from a real content conflict:
git emits marker blocks where every `<<<<<<< HEAD ... =======` HEAD
section is empty (no anchor lines to reconcile against), while real
conflicts have content on both sides.

After cherry-pick fails:
  1. List unmerged paths via `git diff --diff-filter=U`.
  2. For each, scan conflict markers with awk. If every HEAD section is
     blank/whitespace-only across every block, classify as
     context-missing.
  3. Context-missing → `git cherry-pick --skip` and append to SKIPPED
     list with reason "(context absent at base)".
  4. Otherwise fall through to the existing abort/push-partial/error
     path that surfaces the conflict for operator resolution.

Real conflicts still surface with the same workflow as before.

Tests in tests/bug-2966-cherry-pick-context-missing.test.cjs cover:
  - Static — extracts the "Prepare hotfix branch" run block via
    indentation-aware YAML parsing (no raw-text grep) and asserts the
    classification predicate, --skip call, and skipped-reason annotation
    are present.
  - Behavioral — synthetic repo reproducing the real shape of the
    failure, asserts cherry-pick exits non-zero and produces the
    empty-HEAD marker shape.
  - Predicate — pulls the awk script out of the deployed workflow and
    feeds it sample conflict shapes (empty-HEAD, real, mixed,
    whitespace-only); asserts each is classified as the workflow will
    behave.

Local: `node --test tests/bug-2966-...test.cjs` → 3/3 pass.
Local: `npm run lint:tests` → 0 violations.

https://claude.ai/code/session_01LApueb9PVs2uSBhsLprVzG

* fix(release-sdk): pin merge.conflictStyle=merge on hotfix cherry-pick

CodeRabbit flagged on #2967 that the awk classifier introduced for #2966
assumes default conflict-marker style (plain `<<<<<<< HEAD ... ======= ...
>>>>>>>`). If a runner has merge.conflictStyle=diff3 or zdiff3 set
(globally, repo-config, or via git defaults shift), the marker emits an
extra `||||||| ancestor` section between HEAD and =======. The awk's
`in_head` mode would accumulate that ancestor content into the HEAD
buffer, and a context-missing conflict would misclassify as real —
sending the workflow into the abort path on a pick that should be
silently skipped.

Pass `-c merge.conflictStyle=merge` on the cherry-pick command itself
(scoped to that one git invocation; doesn't leak to other commands).
This guarantees marker shape regardless of the runner's git config.

Updated the existing static assertion in
tests/bug-2966-cherry-pick-context-missing.test.cjs to require the pin —
a future edit dropping it fails the test.

Local: `node --test tests/bug-2966-...test.cjs` → 3/3 pass.

https://claude.ai/code/session_01LApueb9PVs2uSBhsLprVzG

* test(#2964): allow git options between `git` and `cherry-pick`

The previous commit on this branch (d6530190) added
`git -c merge.conflictStyle=merge cherry-pick ...` to release-sdk.yml.
The bug-2964 static test's regex `/git cherry-pick[^\n]*"\$SHA"/`
required `cherry-pick` to be the literal next token after `git`, so it
no longer matched the line and CI failed on Node 22 / Node 24 / macOS.

Loosen to `/git\b[^\n]*?cherry-pick[^\n]*"\$SHA"/` so any options
between `git` and `cherry-pick` (e.g. `-c key=value`) are tolerated.
The flag assertions on the matched line still verify --allow-empty and
--keep-redundant-commits are present, which is what bug-2964 actually
guards.

Local: `node --test tests/bug-2964-...test.cjs tests/bug-2966-...test.cjs`
→ 5/5 pass.

https://claude.ai/code/session_01LApueb9PVs2uSBhsLprVzG

* test(#2966): pin merge.conflictStyle in test git wrapper, assert awk status

CodeRabbit raised two issues on PR #2967:

1. The synthetic-repo cherry-pick reproducer asserted `<<<<<<< HEAD ...`
   blocks have empty HEAD sections, but the cherry-pick itself didn't
   pin `merge.conflictStyle`. A developer or CI runner with global
   diff3/zdiff3 config would inject `||||||| ancestor` lines into the
   HEAD scan and the test would fail for environment reasons rather
   than the bug premise. Pin the style on the test's `git()` wrapper
   so every git operation in the test is deterministic regardless of
   user config.

2. `classify()` ran awk and consumed `r.stdout.trim()` without checking
   `r.status` or `r.error`. A failed awk invocation (missing binary,
   syntax error, signal) returns empty stdout, which would falsely
   classify as "context-missing" and the test would silently pass on
   broken predicates. Add `assert.ok(!r.error, ...)` and
   `assert.equal(r.status, 0, ...)` before reading stdout.

Local: `node --test tests/bug-2966-...test.cjs tests/bug-2964-...test.cjs`
→ 5/5 pass.

https://claude.ai/code/session_01LApueb9PVs2uSBhsLprVzG

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-05-01 14:35:18 -04:00
Tom Boucher
a346779213 fix(release-sdk): allow empty/redundant commits during hotfix cherry-pick (#2965) 2026-05-01 13:56:24 -04:00
Tom Boucher
0d6abb87ac fix(#2954): align help.md with post-#2824 skill consolidation (#2959) 2026-05-01 13:36:44 -04:00
Tom Boucher
c5dfdbe42e fix(#2957): claude+global post-install instructs restart and skill fallback (#2960)
* fix(#2957): claude+global post-install instructs restart and skill fallback

`npx get-shit-done-cc --claude --global` writes skills to
`~/.claude/skills/gsd-*/SKILL.md` (CC 2.1.88+ format) and removes the
legacy `~/.claude/commands/gsd/`. The post-install message still told
users to type `/gsd-new-project` without mentioning the required Claude
Code restart or the skill-name fallback. On configurations where CC
does not auto-surface skills in the slash menu, users hit "no commands
appear" and assumed the install failed.

Split the post-install message: the existing single-line instruction
stays for every non-Claude runtime and for `--claude --local`. For
`--claude --global` it now reads:

  Restart Claude Code, then in any directory either type
  /gsd-new-project or ask Claude to run the gsd-new-project skill.

This covers both invocation paths and surfaces the restart requirement.

Add tests/bug-2957-claude-global-postinstall-message.test.cjs as a
regression guard: captures the printed message for claude+global,
claude+local, and opencode+global; asserts content for each. Verified
the test fails on main (pre-fix) and passes after the fix.

Closes #2957

* test(#2957): assert legacy generic instruction is replaced not extended

CodeRabbit flagged that the test would still pass if the new restart/
fallback copy were printed *alongside* the old 'open a blank directory'
instruction. Adding a doesNotMatch assertion proves the claude+global
branch replaces the legacy line rather than appending to it.
2026-05-01 13:04:39 -04:00
javeroff
9d0d085a17 fix(query/agent-skills): emit raw <agent_skills> block instead of JSON-wrapped string (#2917)
* fix(query/agent-skills): emit raw <agent_skills> block instead of JSON-wrapped string

The CLI dispatcher (`cli.ts`) JSON-stringifies all query handler results via
`console.log(JSON.stringify(result.data, null, 2))`.  For the `agent-skills`
handler this produced a JSON-quoted string literal — e.g.
`"<agent_skills>\n…</agent_skills>"` — which workflows embedded verbatim via
`$(gsd-sdk query agent-skills gsd-planner)`, breaking all `<agent_skills>`
injection into spawned subagent prompts.

Fix: add an optional `format: 'json' | 'text'` field to `QueryResult`.  When a
handler returns `format: 'text'` and `--pick` is not active, the CLI writes the
string directly via `process.stdout.write` instead of JSON-stringifying it.
`agentSkills` sets `format: 'text'` for non-empty blocks.

Regression guard: two new CLI integration tests in `skills.test.ts` spawn the
CLI as a child process and assert that (a) a mapped agent type receives the raw
XML block on stdout and (b) an unmapped agent type produces the existing JSON
empty-string output.

Fixes #2914.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(changelog): add #2917 entry under Unreleased Fixed

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 12:21:06 -04:00
Tom Boucher
53cda93a01 Add automated cherry-pick + SDK-bundle parity to hotfix flow (#2956)
* feat(workflows): hotfix auto-cherry-pick + SDK-bundle parity (#2955)

hotfix.yml:
- create: auto-cherry-picks fix:/chore: commits from origin/main since
  BASE_TAG, oldest-first. Patch-equivalents skipped via git cherry.
  feat:/refactor: never auto-included. Conflicts halt with offending SHA.
- finalize: install-smoke gate, sdk-bundle/gsd-sdk.tgz parity with
  release-sdk.yml, tightened next dist-tag re-point, --latest on gh
  release create. SDK package.json bumped in lockstep.

release-sdk.yml:
- New action input (publish | hotfix) and auto_cherry_pick boolean.
- New prepare job branches hotfix/X.YY.Z from highest vX.YY.* tag,
  cherry-picks same logic as hotfix.yml, outputs effective ref.
- install-smoke and release consume prepare.outputs.ref.
- Hotfix mode forces tag=latest, opens merge-back PR. Idempotent if
  branch already exists.

VERSIONING.md: documents the cumulative-tag invariant
(vX.YY.Z anchors vX.YY.{Z+1}) and both workflow paths.

Closes #2955

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(code-review): wire --fix dispatch and update stale command references (#2947)

* fix(#2893): surface non-canonical plan filenames instead of silently returning zero plans

Reporter saw `plan_count: 0` from `/gsd:execute-phase` even though five
plan files existed on disk. Investigation showed the planner had written
files like `01-PLAN-01-foundation.md`, while `phase-plan-index`'s strict
filter (`f.endsWith('-PLAN.md') || f === 'PLAN.md'`) rejected them
silently — collapsing two distinct states into the same `plans: []`
return:

  - directory truly has no plans (legit empty)
  - directory has plans but the filter rejected them (user/agent error)

The canonical contract is documented in three places:
  - `agents/gsd-planner.md` write_phase_prompt step (lines 1063-1080)
  - `commands/gsd/plan-phase.md`
  - `references/universal-anti-patterns.md` (rule 26)

It mandates `{padded_phase}-{NN}-PLAN.md` and explicitly forbids
`PLAN-NN.md` / `01-PLAN-01.md` / `plan-NN.md` etc. The strict filter is
correct per that contract. The bug is that the executor never tells the
user when the contract was violated — they just see `plan_count: 0`
with no signal.

Fix: add a diagnostic helper `describeNonCanonicalPlans()` that scans
the phase directory for files matching `*PLAN*.md` (the diagnostic net)
that the canonical filter rejected, excluding legit derivatives like
`*-PLAN-OUTLINE.md` and `*-PLAN.pre-bounce.md`. When offenders exist,
return a `warning` field naming each one and citing the canonical
pattern so the user knows what to rename to.

Wired into the three filter sites:
  - `phase-plan-index` (the executor's main entry point)
  - `phases list --type plans`
  - `find-phase`

The strict filter itself is unchanged — existing canonical plans behave
identically. This is purely a diagnostic that converts silent-empty
into loud-with-actionable-error.

Tests:
  - `phase-plan-index returns warning for reporter's exact filename
    pattern (`01-PLAN-01-foundation.md`)`
  - `truly empty dir does not emit a warning`
  - `canonical plans + outline + pre-bounce files do not emit a warning`

Closes #2893

* test(#2893): add parity tests for find-phase and phases list --type plans warnings

CodeRabbit's only finding on the prior commit: I wired the warning into
three filter sites (`phase-plan-index`, `find-phase`,
`phases list --type plans`) but only `phase-plan-index` had test
coverage for the warning shape. The other two paths could silently
diverge during future refactors — exactly the silent-drift class of bug
this fix exists to prevent.

Add four parity tests mirroring the existing two:

  - find-phase: non-canonical filenames produce a warning naming each
    offender + citing the canonical pattern.
  - find-phase: canonical plan + derivative files (PLAN-OUTLINE,
    pre-bounce) produce no warning.
  - phases list --type plans: same non-canonical case, but assert the
    warning is prefixed with `${dir}: ` (this path aggregates across
    phase directories so each offender is tagged with its dir).
  - phases list --type plans: canonical case, no warning.

`node --test tests/phase.test.cjs`: 98/98 pass (was 94, +4 new).

* docs(changelog): hotfix flow auto-cherry-pick + SDK bundle parity (#2955)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(workflows): address CodeRabbit findings on hotfix flow (#2955)

5 findings, all real:

1. BASE_TAG selection used lexicographic awk compare, breaking on
   multi-digit patches (v1.27.10 wrongly < v1.27.2). Fixed in both
   hotfix.yml and release-sdk.yml: append TARGET_TAG to candidate list,
   sort -V, take preceding entry. Semver-correct.

2,4. Cherry-pick conflict aborted locally with no remote branch to
   resolve from. Now the skeleton branch is pushed up-front (real runs);
   on conflict we abort, push the partial-pick state with
   --force-with-lease, and emit operator instructions in the run summary.

3. release-sdk.yml dry_run exited before cherry-pick, defeating the
   purpose. Now dry_run still applies cherry-picks locally (catches
   conflicts), just skips push. Downstream install-smoke runs against
   BASE_TAG; the cherry-pick verification itself is the dry-run signal.

5. release-sdk.yml release job missing pull-requests: write — gh pr
   create for the merge-back PR would have failed under restricted
   token defaults. Permission added.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(workflows): CR round 2 — dry-run signal + post-publish reconciliation (#2955)

3 findings, all real:

6. hotfix.yml create dry_run skipped every step (branch creation,
   cherry-pick, version bump) — a green dry-run gave no signal at all.
   Now the local checkout/cherry-pick/bump always runs; only the git
   push calls are gated on dry_run. Conflicts surface in dry-run too.

7,8. "Refuse if version already on npm" preflight hard-failed reruns,
   so a transient failure between npm publish and a later step (tag
   push, GH release, merge-back PR, dist-tag re-point) left the release
   half-shipped with no path to reconcile. Replaced with a
   prior_publish detect step that warns and sets skip_publish=true; the
   publish step is gated on that flag, but tag/release/PR/dist-tag
   continue. GitHub Release create is now idempotent (edit --latest if
   already exists).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(workflows): CR round 3 — preserve dry-run cherry-pick history in conflict guidance (#2955)

Dry-run conflict path discarded successful picks with the runner, but
the message told operators to rerun with auto_cherry_pick=false — which
recreates the branch from BASE_TAG and silently loses every pick that
had succeeded before the conflict.

Updated both hotfix.yml and release-sdk.yml: dry-run conflict summary
now lists the lost SHAs and recommends re-running with
auto_cherry_pick=true (real, not dry-run) to materialize the partial
branch on origin. Real-run guidance unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 11:51:45 -04:00
Tom Boucher
ec07861228 fix(#2948): wire spike --wrap-up flag dispatch (#2951)
* fix(#2948): wire spike --wrap-up flag dispatch

Add dispatch block to commands/gsd/spike.md so that /gsd-spike --wrap-up
routes to the spike-wrap-up workflow instead of silently no-oping. Also
add spike-wrap-up.md to execution_context so the runtime can load it, and
update both companion references in workflows/spike.md from the deleted
/gsd-spike-wrap-up entry-point to /gsd-spike --wrap-up.

Fixes #2948

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(#2948): rewrite dispatch test using parseFrontmatter + section extraction

Replace raw fs.readFileSync + text.includes() / regex assertions with structural
parsing: parseFrontmatter extracts the YAML frontmatter fields and _body,
extractSection pulls named XML blocks, and parseExecutionContextRefs resolves
the @-prefixed workflow references. Assertions now target the argument-hint
frontmatter field, the execution_context @-ref list, and the routing text within
<context>/<process> sections — not arbitrary substrings in the raw file.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(#2948): tighten dispatch assertion to line-level rule check

Replace the co-occurrence check (dispatchText.includes('--wrap-up') &&
dispatchText.includes('spike-wrap-up')) with line-level assertions that parse
the <process> section's rules array, find the exact '- If it is `--wrap-up`:'
line, verify it includes 'strip the flag' and 'spike-wrap-up', and assert the
'- Otherwise:' fallback still routes to the spike workflow.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(#2948): anchor parseFrontmatter to line 0 to avoid mid-file --- delimiters

parseFrontmatter was scanning the whole file for the first two '---' lines,
which can match a mid-document horizontal rule as the opening delimiter.
Now requires lines[0].trim() === '---'; returns { _body: content } for files
with no frontmatter, and searches for the closing '---' from line 1 onward.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 11:25:26 -04:00
Tom Boucher
3ba17e872e fix(#2950): update stale deleted-command references in workflow files (#2952)
* fix(#2950): update stale deleted-command references in workflow files

Eight workflow files (help.md, do.md, settings.md, discuss-phase.md,
new-project.md, plan-phase.md, spike.md, sketch.md) referenced command
names removed in #2790. Updated all occurrences to canonical new forms:
  /gsd-phase (--insert / --remove), /gsd-capture, /gsd-config (--profile
  / --integrations / --advanced), /gsd-spike --wrap-up,
  /gsd-sketch --wrap-up, /gsd-code-review --fix.
Adds regression test (124 assertions) in tests/bug-2950-stale-command-refs.test.cjs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(#2950): update pre-existing assertions to accept new consolidated command forms

gsd-settings-advanced.test.cjs and settings-integrations.test.cjs were checking
settings.md for the old micro-skill names (/gsd-settings-advanced,
/gsd-settings-integrations). Now that #2950 updates settings.md to use the
consolidated equivalents, broaden the assertions to accept both old and new forms.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(#2950): require canonical command forms and forbid legacy variants

The broadened OR assertions added to unblock CI were too permissive — they
could pass with legacy names still present. Now assert the canonical form is
present (gsd-config --advanced / gsd-config --integrations) AND the legacy
forms are absent (gsd-settings-advanced, gsd:settings-advanced,
/gsd-settings-integrations).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 11:25:10 -04:00
Tom Boucher
4d628b306a fix(#2949): wire sketch --wrap-up flag dispatch (#2953)
* fix(#2949): wire sketch --wrap-up flag dispatch

Add dispatch logic to commands/gsd/sketch.md so --wrap-up routes to the
sketch-wrap-up workflow instead of silently falling through to the normal
sketch workflow. Also adds sketch-wrap-up.md to execution_context and
updates companion references in workflows/sketch.md from the deleted
/gsd-sketch-wrap-up command to /gsd-sketch --wrap-up.

Fixes #2949

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(#2949): use exact-match "If it is" instead of "If it contains" for --wrap-up dispatch

Aligns with the established pattern across all consolidated commands
(workspace.md, update.md, progress.md) where the first-token check uses
"If it is `--flag`" for exact equality, not substring matching.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 11:06:24 -04:00
Tom Boucher
b328f3269f fix(code-review): wire --fix dispatch and update stale command references (#2947)
* fix(#2893): surface non-canonical plan filenames instead of silently returning zero plans

Reporter saw `plan_count: 0` from `/gsd:execute-phase` even though five
plan files existed on disk. Investigation showed the planner had written
files like `01-PLAN-01-foundation.md`, while `phase-plan-index`'s strict
filter (`f.endsWith('-PLAN.md') || f === 'PLAN.md'`) rejected them
silently — collapsing two distinct states into the same `plans: []`
return:

  - directory truly has no plans (legit empty)
  - directory has plans but the filter rejected them (user/agent error)

The canonical contract is documented in three places:
  - `agents/gsd-planner.md` write_phase_prompt step (lines 1063-1080)
  - `commands/gsd/plan-phase.md`
  - `references/universal-anti-patterns.md` (rule 26)

It mandates `{padded_phase}-{NN}-PLAN.md` and explicitly forbids
`PLAN-NN.md` / `01-PLAN-01.md` / `plan-NN.md` etc. The strict filter is
correct per that contract. The bug is that the executor never tells the
user when the contract was violated — they just see `plan_count: 0`
with no signal.

Fix: add a diagnostic helper `describeNonCanonicalPlans()` that scans
the phase directory for files matching `*PLAN*.md` (the diagnostic net)
that the canonical filter rejected, excluding legit derivatives like
`*-PLAN-OUTLINE.md` and `*-PLAN.pre-bounce.md`. When offenders exist,
return a `warning` field naming each one and citing the canonical
pattern so the user knows what to rename to.

Wired into the three filter sites:
  - `phase-plan-index` (the executor's main entry point)
  - `phases list --type plans`
  - `find-phase`

The strict filter itself is unchanged — existing canonical plans behave
identically. This is purely a diagnostic that converts silent-empty
into loud-with-actionable-error.

Tests:
  - `phase-plan-index returns warning for reporter's exact filename
    pattern (`01-PLAN-01-foundation.md`)`
  - `truly empty dir does not emit a warning`
  - `canonical plans + outline + pre-bounce files do not emit a warning`

Closes #2893

* test(#2893): add parity tests for find-phase and phases list --type plans warnings

CodeRabbit's only finding on the prior commit: I wired the warning into
three filter sites (`phase-plan-index`, `find-phase`,
`phases list --type plans`) but only `phase-plan-index` had test
coverage for the warning shape. The other two paths could silently
diverge during future refactors — exactly the silent-drift class of bug
this fix exists to prevent.

Add four parity tests mirroring the existing two:

  - find-phase: non-canonical filenames produce a warning naming each
    offender + citing the canonical pattern.
  - find-phase: canonical plan + derivative files (PLAN-OUTLINE,
    pre-bounce) produce no warning.
  - phases list --type plans: same non-canonical case, but assert the
    warning is prefixed with `${dir}: ` (this path aggregates across
    phase directories so each offender is tagged with its dir).
  - phases list --type plans: canonical case, no warning.

`node --test tests/phase.test.cjs`: 98/98 pass (was 94, +4 new).
2026-05-01 10:28:05 -04:00
Tom Boucher
e2792536d9 feat(workflows): atomic Write+commit ordering for SUMMARY.md (#2806) (#2939)
* feat(workflows): add atomic Write+commit ordering directive for SUMMARY.md

Adds explicit prompt-ordering language to executor spawn prompts and
plan-execution steps so agents commit SUMMARY.md before emitting any
concluding narrative. Mitigates the truncation-between-Write-and-commit
failure mode that has made the #2070 rescue net load-bearing.

Refs #2806

* fix(workflows): condense REQUIRED ORDER blocks to fit XL budget

The two REQUIRED ORDER directives added in bd1956df pushed
execute-phase.md to 1712 lines, exceeding the 1700-line XL budget.

Collapse each 6-line block into a single line that preserves the
semantic intent (Write SUMMARY.md → commit → narration; no text
between Write and commit; #2070 rescue is not primary defense).

File is now exactly 1700 lines; workflow-size-budget test passes.

* fix(execute-plan): move self-check before commit to preserve atomic Write+commit (#2939)
2026-05-01 09:32:21 -04:00
Tom Boucher
7cc6358f91 fix(install): honour --minimal across every runtime + manifest fix for Claude local (#2940)
* fix(install): record commands/gsd in manifest for Claude local + per-runtime --minimal coverage

writeManifest gated commands/gsd/ recording to Gemini, leaving Claude
Code local installs with an incomplete manifest. Audit during #2923
investigation showed every runtime adapter correctly honours --minimal
on disk (6 skills, 0 agents) — but Claude local manifest reported 0
skills, breaking saveLocalPatches() drift detection and any downstream
tooling that reads manifest.files for the installed surface.

Drop the isGemini gate so any runtime that writes commands/gsd/ has
those files hashed into the manifest.

Adds tests/install-minimal-all-runtimes.test.cjs: spawns the installer
end-to-end for all 14 supported runtimes in both --global and --local
modes, parses the manifest JSON, and asserts mode === 'minimal',
skill set equals MINIMAL_SKILL_ALLOWLIST, and zero gsd-* agents are
recorded. Cross-checks the manifest against on-disk skill files.

Closes #2923

* test(install): address CR feedback on bug-2923 minimal-runtime tests

- Assert installer exit status in runInstall() so failing installs do not
  produce misleading downstream artifact assertions; include stderr in the
  failure message for debuggability.
- Guard the on-disk vs manifest parity loop with assert.ok(manifest, ...)
  so the equality check cannot pass accidentally when the manifest is
  missing.
2026-05-01 09:23:20 -04:00
Tom Boucher
8de8acee46 fix(workflows): assert HEAD on per-agent branch before worktree commits (#2924) (#2941)
* fix(workflows): assert HEAD on per-agent branch before worktree commits

Worktree-mode setup could leave HEAD attached to a protected branch (master),
causing agent commits to land there. The previous response was a destructive
self-recovery via 'git update-ref refs/heads/master <sha>', which silently
rewinds the protected branch and destroys concurrent commits in multi-active
scenarios (parallel agents, user committing while agent runs).

- Reorder <worktree_branch_check> in execute-phase.md and quick.md to assert
  HEAD via 'git symbolic-ref' BEFORE any 'git reset --hard'. HALT with a
  blocker if HEAD is on main/master/develop/trunk/release/* or detached.
- Add a per-commit HEAD assertion (step 0) to gsd-executor.md
  <task_commit_protocol>; HEAD attachment can drift after 'git checkout <sha>'.
- Forbid 'git update-ref refs/heads/<protected>' in
  <destructive_git_prohibition>; surface the blocker rather than self-heal.
- Remove '--no-verify' as the worktree-mode default in execute-phase.md,
  execute-plan.md, quick.md, and references/git-integration.md. Hooks now
  run on every executor commit; opt out only via workflow.worktree_skip_hooks.
- Add regression test that parses the worktree_branch_check blocks structurally
  and asserts the symbolic-ref check precedes the reset --hard, no workflow
  performs update-ref on a protected ref, and --no-verify is no longer the
  default in any parallel-execution prompt.

* fix(#2924): address CodeRabbit review findings on worktree HEAD PR

- Add positive worktree-agent-* allow-list to <task_commit_protocol> step 0
  in gsd-executor.md and to <worktree_branch_check> in execute-phase.md and
  quick.md. The deny-list (main|master|develop|trunk|release/*) silently
  allowed feature/* and other arbitrary branches outside the agent namespace.
- Register workflow.worktree_skip_hooks in both config schemas
  (sdk/src/query/config-schema.ts and get-shit-done/bin/lib/config-schema.cjs)
  and document it in docs/CONFIGURATION.md so config-set accepts it.
- Fix stash lifecycle in execute-phase.md post-wave hook validation: stash
  under a named ref and pop after the hook run; warn on pop failure.
- Pre-dispatch PLAN.md commit in quick.md: gate on git diff --cached --quiet
  for idempotency and exit 1 with a clear error on commit failure (both the
  --no-verify and the normal branches) — no more swallowing real errors.
- Test fixes (tests/bug-2924-worktree-head-attachment.test.cjs):
  - Parse the protected-branch alternation structurally and require
    main, master, develop, trunk, release/.* (release/* was previously
    skipped by the \\b...\\b regex).
  - Use fs.readdirSync(dir, { recursive: true }) so workflows in nested
    subdirectories are also asserted against the update-ref ban.
  - Add allow-list assertions for execute-phase.md, quick.md, and
    gsd-executor.md to lock in the new positive namespace check.

* test(#2924): assert sub-section end marker exists before slicing

* test(#2924): use section boundary instead of fixed window for parallel-agents slice
2026-05-01 09:23:02 -04:00
Tom Boucher
2cc8796265 fix(config-get): return schema default for context_window when absent (#2944)
* fix(config-get): return schema default for context_window when absent (#2943)

cmdConfigGet in bin/lib/config.cjs now consults a SCHEMA_DEFAULTS map before
emitting "Key not found", so context_window (and any future schema-defaulted
keys) return their default value (exit 0) when not set in config.json.

Also updates the stale subagent-timeout.test.cjs assertion that expected the
old broken behavior (exit 1 / "Key not found") to match the corrected behavior.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test: use distinct sentinel to prove --default wins over schema default (#2943)

* docs: update CHANGELOG.md for #2943 fix

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 09:22:45 -04:00
Tom Boucher
faee0287a0 fix(detect-custom-files): add skills/ to GSD_MANAGED_DIRS (#2942) (#2945)
After v1.39.0 skill consolidation (#2790), skills/ became a GSD-managed
root that the installer wipes on update. GSD_MANAGED_DIRS in gsd-tools.cjs
was missing 'skills', so user-added skill directories (e.g.
skills/custom-skill/SKILL.md) were never walked and silently destroyed
during /gsd-update.

- Add 'skills' to GSD_MANAGED_DIRS so the directory is walked
- Add tests/bug-2942-detect-custom-skills.test.cjs with 5 targeted tests
- Update tests/update-custom-backup.test.cjs: replace the now-incorrect
  "skills/ must NOT be scanned" assertion (written pre-#2790) with a test
  that verifies custom skills ARE detected and GSD-owned skills are not
  falsely flagged

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 09:22:13 -04:00
Tom Boucher
7e9477bb30 docs(#2935): refresh README highlights for v1.39.0 across all languages (#2936)
Replaces stale v1.32/v1.37 highlight blocks with v1.39.0 highlights in
README.md and four translations, adds /gsd-edit-phase to phase-management
tables, documents workstream config inheritance, the post-merge build gate,
and per-runtime review.models.<cli> selection.

Closes #2935
2026-04-30 23:21:31 -04:00
Tom Boucher
5abf46ac1c Merge pull request #2920 from gsd-build/feat/hermes-runtime-2841
feat(install): add Hermes Agent runtime support
2026-04-30 23:02:15 -04:00
Tom Boucher
372d3453f5 fix(install): tokenize before ALL_RUNTIMES_OPTION check + isolate HERMES_HOME in test
Two CodeRabbit findings on PR #2920:

1. parseRuntimeInput previously only matched the bare "16" exactly for
   the all-runtimes shortcut. Inputs the prompt explicitly encourages —
   "16,", "16 1", "1,16" — fell through to per-token parsing and
   silently installed only Claude or a partial subset. Move the
   ALL_RUNTIMES_OPTION check after tokenization so any token equal to
   "16" expands. Added regression coverage in
   tests/multi-runtime-select.test.cjs for the four mixed-input forms.

2. The "maps Hermes to ~/.hermes for global installs" test invoked
   getGlobalDir('hermes') without isolating HERMES_HOME. On a developer
   machine that exports HERMES_HOME the assertion would fail even
   though getGlobalDir was behaving correctly. Save/clear/restore the
   env var around the assertion, mirroring the pattern the later
   describe block already uses.

Full suite: 6128/6128 pass.
2026-04-30 22:48:08 -04:00
Tom Boucher
c9d6306981 fix(hermes): rewrite CLAUDE.md → HERMES.md (revert from .hermes.md per spec)
Per the issue spec for #2841 and CodeRabbit feedback on PR #2920, the
project-context filename rewrite should produce HERMES.md, not
.hermes.md. Reverts the earlier .hermes.md change at all 5 substitution
sites in bin/install.js and updates the corresponding regression test
in tests/hermes-install.test.cjs to assert HERMES.md.

Full suite: 6127/6127 pass.
2026-04-30 22:30:16 -04:00
Tom Boucher
1168e9f59a Merge pull request #2921 from gsd-build/fix/2916-handle-branching-default-base
fix(#2916): branch new phases off origin/HEAD instead of current HEAD
2026-04-30 22:25:03 -04:00
Tom Boucher
3ed8980519 fix(#2916): drop unreachable post-creation merge-base guard
CodeRabbit pointed out the post-creation guard is structurally
unreachable: immediately after `git checkout -b X origin/$DEFAULT_BRANCH`,
HEAD == origin/$DEFAULT_BRANCH, so both the merge-base form (`MB == DT`)
and the alternative "ahead-of" count form (`AHEAD == 0`) are sentinels
that always pass on a successful fresh checkout. With the explicit base
arg + fail-fast on the checkout, the guard cannot catch anything new.

Removing it (rather than swapping in another no-op that satisfies the
linter but adds no actual coverage) is the honest fix. Comment retained
to explain why no post-creation guard is needed: the explicit base
argument to `git checkout -b` is the single source of correctness for
#2916.

Same simplification mirrored in get-shit-done/workflows/quick.md.

Full suite: 6102/6102.
2026-04-30 22:18:34 -04:00
Tom Boucher
c3aef27aa6 fix(#2916): fail-fast on switch/checkout, gate fork-point warning to fresh branches
Two CodeRabbit findings on PR #2921 (review 4209533909 + comment
3171721073, both still unresolved):

A. Branch switch and create steps now abort on non-zero exit. Previously
   `git switch "$BRANCH_NAME"` and `git checkout -b "$BRANCH_NAME"
   "origin/$DEFAULT_BRANCH"` could fail (locked worktree, dirty tree
   refusing the checkout, etc.) and the workflow would silently continue
   on the wrong branch — sending the phase's later commits to the wrong
   place. Both calls now `|| { echo "ERROR: …" >&2; exit 1; }`.

B. The fork-point base-warning is now scoped to the creation arm of
   the if/else. Previously it ran for the resume path too, so a
   legitimate resumed branch where origin/$DEFAULT_BRANCH had advanced
   since first creation would falsely warn ("does not fork from
   origin/<DEFAULT_BRANCH>"). Moving the check inside the else arm
   means it only runs immediately after a fresh `git checkout -b`, when
   the merge-base check is meaningful.

Same fix mirrored in get-shit-done/workflows/quick.md.

execute-phase.md stays at the 1700-line XL budget. Full suite: 6102/6102.
2026-04-30 22:07:46 -04:00
Tom Boucher
ace61869d0 test(#2916): parameterize fixtures so both main and trunk are exercised
Two follow-ups on commit 80f14cac (which hardened quick-branching with a
trunk fixture):

1. quick-branching.test.cjs: add a `defaultBranch` parameter to
   setupFixture and run the "branches off origin/HEAD" assertion against
   both `main` and `trunk`. The wholesale switch to trunk in 80f14cac
   removed coverage of the conventional `main` path; parameterizing
   restores it without giving up the symbolic-ref guarantee.

2. bug-2916-handle-branching-default-base.test.cjs: apply the same
   parameterization here. handle_branching has the same default-branch
   detection logic as Step 2.5, so it deserves the same trunk regression
   guard. Previously this file only exercised `main`.

A regression that silently defaults to `main` instead of consulting
`git symbolic-ref refs/remotes/origin/HEAD` now fails the `trunk`
variant in both files.

Tests: 10/10 in the touched suites.
2026-04-30 21:57:27 -04:00
Tom Boucher
80f14cac1f test(#2916): scope branch_name scan to init step and harden fixture
- Restrict the "init parse list includes branch_name" assertion to
  the bash blocks inside Step 2 (Initialize) so an unrelated step
  that mentions branch_name cannot mask the contract.
- Switch the fixture's default branch from main to trunk so the
  symbolic-ref code path is locked in: a regression that silently
  defaults to "main" instead of consulting origin/HEAD now fails.

Addresses CodeRabbit review on PR #2921.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-30 21:48:43 -04:00
Tom Boucher
2256e4c9a3 fix(#2916): use fork-point detection for non-default-base warning
Replace the "ahead-of" heuristic with a structural check that compares
the HEAD↔origin/$DEFAULT_BRANCH merge-base to origin/$DEFAULT_BRANCH
itself. The previous count-based warning fired on legitimate WIP that
was simply ahead of the default branch — the correct signal is that
the branch did not fork from the default branch in the first place.

Addresses CodeRabbit review on PR #2921.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-30 21:48:36 -04:00
Tom Boucher
e5cd523e7b test(hermes): use parseFrontmatter for agent assertion (CR #2920) 2026-04-30 21:44:12 -04:00
Tom Boucher
b5777572f7 docs(readme): add Hermes uninstall examples (CR #2920) 2026-04-30 21:44:12 -04:00
Tom Boucher
861a7d972b test(install): replace source-grep prompt assertions with structured checks
Two test files were asserting installer prompt behavior by regex/.includes()
against bin/install.js source. Per CONTRIBUTING.md "no-source-grep"
testing standard, replace with structured assertions:

- tests/kilo-install.test.cjs: import runtimeMap and buildRuntimePromptText
  from the install module; assert runtimeMap['11'] === 'kilo' and that the
  rendered prompt lists Kilo above OpenCode without marketing copy.

- tests/multi-runtime-select.test.cjs: import runtimeMap, allRuntimes,
  parseRuntimeInput, buildRuntimePromptText. Assert exported runtimeMap
  matches the canonical option list, allRuntimes contains every runtime
  exactly once, prompt text lists Hermes (10), Qwen Code (13), Trae (14),
  All (16), and parser splits/dedupes by exercising parseRuntimeInput
  rather than regexing source code.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-30 21:30:48 -04:00
Tom Boucher
bd0511988b fix(hermes): nest GSD skills under skills/gsd/ category (#2841)
Per spec in #2841, all 86 GSD skills must collapse into a single "gsd"
category in Hermes' system prompt. Previous code passed skills/ as the
install root, producing a flat skills/gsd-*/ layout that inflated
Hermes' loader output to 86 top-level entries.

Changes:
- Install path now writes to skills/gsd/{DESCRIPTION.md, gsd-*/SKILL.md}
- Uninstall removes the entire skills/gsd/ category dir plus any leftover
  flat-layout gsd-*/ from older installs (graceful migration)
- writeManifest emits skills/gsd/<skill>/<file> paths for Hermes
- --skills-root hermes returns the nested category path so /gsd-sync-skills
  syncs into the right directory
- DESCRIPTION.md at category root carries name/version/description so
  Hermes' skill loader surfaces the GSD category in the system prompt

Also extracts promptRuntime's runtimeMap, allRuntimes, parseRuntimeInput,
and buildRuntimePromptText to module scope and exports them so tests can
assert structurally instead of grepping bin/install.js source.

Existing hermes-install tests updated to expect the nested layout and
to verify the category DESCRIPTION.md frontmatter (name, version,
description) using the shared parseFrontmatter helper.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-30 21:30:48 -04:00
Tom Boucher
4a5f36df5e Merge pull request #2919 from gsd-build/fix/2911-audit-open-output-references
fix(#2911): audit-open emits raw human report and parseable JSON
2026-04-30 21:23:30 -04:00
Tom Boucher
840f2b349e Merge pull request #2918 from gsd-build/worktree-agent-a4db9db3f3106d4d7
fix(progress): explicit context-authority directive in report step
2026-04-30 21:23:12 -04:00
Tom Boucher
140d334dab test(#2916): replace string-grep assertions with behavioral fixture test
CodeRabbit nitpick (per project policy `feedback_no_source_grep_tests`):
the prior `tests/quick-branching.test.cjs` asserted branching correctness
by `.includes()`-grepping the raw markdown content for literal command
substrings. Those assertions stayed green even when the underlying
behavior regressed (e.g. when `git checkout -b` was unconditionally run
from the wrong HEAD).

Replace with the same pattern as `bug-2916-handle-branching-default-base
.test.cjs`:
  - Structurally extract the Step 2.5 bash block from quick.md by
    walking the markdown for fenced ```bash blocks under the heading
    (no regex on prose).
  - Spin up a fixture git repo with a bare origin, a clone whose
    `origin/HEAD` points at `main`, and a checked-out previous-task
    branch carrying its own unmerged commit.
  - Execute the extracted bash block via `bash -c` and assert that
    the new branch's tip equals `origin/main` (0 commits inherited
    from the previous-task HEAD).
  - Add a reuse test that pre-creates the target branch with its own
    commit and verifies the script switches back to it without a
    rebase or reset.

The two informational tests (workflow file exists, branching runs
before task-directory creation) are retained, plus the `branch_name`
parsing assertion is rewritten to walk fenced bash blocks rather than
substring-grep arbitrary content.
2026-04-30 21:22:56 -04:00
Tom Boucher
6e4fad7acc Merge pull request #2933 from gsd-build/chore/2932-coderabbit-docstring-off
chore(ci): disable CodeRabbit docstring coverage check
2026-04-30 21:22:55 -04:00
Tom Boucher
4e2f1105d9 fix(#2916): pin new-branch base to origin/$DEFAULT_BRANCH explicitly
Address CodeRabbit HIGH findings on PR #2921. The previous fix had three
unconditional code paths where `git checkout -b "$BRANCH_NAME"` would run
from the *current* HEAD when the upstream sync failed silently:
  - the dirty-tree warn-and-continue path,
  - the clean path where `git switch` / `git merge --ff-only` errors were
    swallowed by `2>/dev/null` (still falling through to checkout -b),
  - any case where `git fetch` failed but the script continued.

This rewrites both `execute-phase.md` (handle_branching) and `quick.md`
(Step 2.5) to:
  1. Fetch origin/$DEFAULT_BRANCH; if fetch fails AND no local copy of
     origin/$DEFAULT_BRANCH exists, abort with a clear ERROR (exit 1)
     rather than create the branch off arbitrary HEAD.
  2. Always create the new branch with an explicit start point:
     `git checkout -b "$BRANCH_NAME" "origin/$DEFAULT_BRANCH"`. The base
     is now deterministic regardless of which branch is currently
     checked out, regardless of whether the optional local fast-forward
     succeeded, and regardless of dirty-tree state.
  3. Carry uncommitted changes onto the new (origin-pinned) branch
     instead of inheriting the previous-phase HEAD as a fallback base.

The post-creation INHERITED check now references origin/$DEFAULT_BRANCH
rather than the (possibly-stale) local default branch, so the warning
fires accurately even when the local fast-forward was skipped.
2026-04-30 21:22:44 -04:00
Tom Boucher
4ce72cdee7 fix(hermes): align with Hermes Agent conventions per docs review
Four fixes from review of hermes-agent.nousresearch.com docs:

1. SKILL.md frontmatter now declares `version` (required field per
   Hermes spec). Plumbed through `convertClaudeCommandToClaudeSkill`
   gated on runtime='hermes' so other runtimes' frontmatter is unchanged.

2. Project-context filename rewrite changed from `HERMES.md` (not
   discovered by Hermes) to `.hermes.md` (top of Hermes' discovery list:
   .hermes.md → AGENTS.md → CLAUDE.md → .cursorrules).

3. README + finishInstall now show `/gsd-help` and `/gsd-new-project`
   for Hermes; per docs, Hermes auto-exposes skills as slash commands.

4. Hermes tests now parse SKILL.md frontmatter structurally via the
   shared parseFrontmatter helper instead of substring-matching source
   text, and assert the version/name/description shape required by
   Hermes' skill_view().

Full suite: 6128/6128 pass (3 new structural assertions).
2026-04-30 21:22:36 -04:00
Tom Boucher
198022f58d chore(ci): disable CodeRabbit docstring coverage check (#2932)
The docstring coverage pre-merge check (default: warning at 80% threshold)
produces false-positive warnings on PRs whose new code is entirely test
files: it counts test(...) / beforeEach / afterEach arrow-function
callbacks as functions and reports 0% coverage because nothing has JSDoc.

CR's documented schema for reviews.pre_merge_checks.docstrings only
accepts `mode` and `threshold` — there is no per-check path filter that
would let us exclude tests/** while keeping the check active elsewhere.
The top-level path_filters approach would silence ALL CR review on test
files (security scans, out-of-scope checks, the substantive line-level
findings) which we want to keep.

Disabling the check entirely is the right call for this repo because:
  - GSD ships a CLI + agent runtime, not a documented public library
  - The internal helpers that warrant JSDoc already have it
  - The other CR pre-merge checks (out-of-scope, security, title) are
    meaningful for this codebase and stay enabled

Closes #2932
2026-04-30 21:13:55 -04:00
Tom Boucher
ac100ae17b test: assert reportStep present before extractBlockquotes (CR #2918)
Two existing tests called extractBlockquotes(reportStep) without first
asserting reportStep was non-null. If the workflow file ever loses its
`<step name="report">` block, the test would fail with a confusing
TypeError on the destructuring inside extractBlockquotes instead of a
clear "report step must exist" assertion.

Add assert.ok(reportStep, ...) guards at the two missing call sites
(lines 100 and 130). The other two call sites (lines 75-83) already
had guards.

Addresses CodeRabbit comment on PR #2918.
2026-04-30 21:08:26 -04:00
Tom Boucher
002db4dd2b Merge pull request #2931 from gsd-build/feat/2929-release-sdk-parity
ci(release-sdk): bring CI gates to parity with release.yml
2026-04-30 21:04:12 -04:00
Tom Boucher
0e0f6952c5 ci(release-sdk): bring CI gates to parity with release.yml (#2929)
Ports the pre-publish CI gates that release.yml applies into release-sdk.yml,
so the stopgap workflow ships releases at the same quality bar as the
canonical workflow (minus the @gsd-build/sdk publish, still intentionally
omitted, and the release-branch ceremony, intentionally omitted).

Changes (all mechanical copies of release.yml patterns):

  - install-smoke as needs: dependency. The reusable workflow at
    .github/workflows/install-smoke.yml runs the cross-platform install
    matrix (Ubuntu 22/24, macOS 24, packed-vs-unpacked). Publish job
    won't start until install-smoke passes for the dispatched ref.

  - npm test → npm run test:coverage. Full coverage gate, matching
    release.yml's pre-publish test step.

  - Tolerant tag-existence check. The previous upfront "refuse if tag
    exists" was too strict — operators re-running after a mid-flight
    publish-step failure would be blocked by the tag they successfully
    pushed last time. New behavior matches release.yml: skip the tag
    step if the tag points at HEAD; error only if it points elsewhere.

  - Tag-and-push step gets the same skip-if-at-HEAD pattern.

  - New "Re-point next dist-tag at the new latest" step, gated on
    tag=latest. Matches release.yml#finalize "Clean up next dist-tag" —
    keeps @next from going stale relative to @latest.

  - New "Create GitHub Release" step. Per-tag flag selection:
      tag=dev, tag=next  → --prerelease (won't be highlighted on repo home)
      tag=latest         → --latest (becomes the highlighted release)
    All use --generate-notes so the release body auto-fills from commits.

  - Summary updated to mention the GitHub Release and dist-tag re-point.

Out of scope per #2929:
  - canary.yml, release.yml unchanged (verified by file diff)
  - bin/install.js unchanged (install path already uses bundled SDK)
  - No @gsd-build/sdk publish anywhere
  - No release/X.Y.Z branch ceremony (this stopgap targets dispatched
    ref directly)
2026-04-30 20:59:37 -04:00
Tom Boucher
bdead2ee6a Merge pull request #2927 from gsd-build/feat/2925-release-sdk-main
feat(ci): release-sdk.yml stopgap workflow for dev/next/latest CC publishes
2026-04-30 20:51:11 -04:00
Tom Boucher
e107bb35d4 feat(ci): add release-sdk.yml stopgap workflow for dev/next/latest CC publishes (#2925)
Adds a workflow_dispatch-only release path that publishes get-shit-done-cc
to ONE chosen dist-tag per run (dev | next | latest), with the SDK
bundled inside the CC tarball both as the existing loose sdk/dist/ tree
and as a fresh sdk-bundle/gsd-sdk.tgz npm-installable artifact.

Why: @gsd-build/sdk publishes from canary.yml and release.yml fail because
the @gsd-build npm token is currently unavailable. CC users don't consume
@gsd-build/sdk directly — bin/gsd-sdk.js resolves sdk/dist/cli.js from
inside the installed CC package. This workflow ships only get-shit-done-cc
(which we hold the token for) and bundles the SDK two ways so any future
install path can pick whichever shape it needs.

The new sdk-bundle/ directory is added to the CC files whitelist in-tree
at build time only — never committed. Existing canary.yml and release.yml
are intentionally untouched; restore them to primary use once the
@gsd-build/sdk token is recovered.

Per-tag version derivation when the version input is empty:
  - dev    → <base>-dev.N (next sequential, scanning v<base>-dev.* tags)
  - next   → <base>-rc.N (matches release.yml convention)
  - latest → <base> (clean, no suffix)

Refuses to publish when the version already exists on npm or has an
existing git tag (no accidental overwrites). Verifies the publish landed
on the registry and the dist-tag resolves correctly before marking the
run successful.
2026-04-30 20:46:31 -04:00
Tom Boucher
294564b951 fix(#2916): branch new phases off origin/HEAD instead of current HEAD
handle_branching in execute-phase.md (and the equivalent step in quick.md)
created the per-phase branch from whatever branch happened to be checked
out — typically the previous phase's still-unmerged feature branch — so
consecutive phases compounded on top of each other and stayed unpushed.

Detect the default branch via git symbolic-ref refs/remotes/origin/HEAD,
fast-forward it from origin, and fork the new phase branch off that tip.
Existing branches are still reused as-is. Dirty working trees fall back
to current HEAD with a loud warning, and a post-creation guard reports
any inherited commits.

Regression test extracts the bash from the <step name="handle_branching">
block structurally and runs it against a fixture repo where HEAD sits on
a previous-phase branch with extra commits.
2026-04-30 17:30:52 -04:00
Tom Boucher
9a13d2fc0b fix(#2911): audit-open emits raw human report and parseable JSON
Two bugs in the audit-open dispatch case in bin/gsd-tools.cjs:

  1. Bare output(...) calls (only core.output is in scope) threw
     ReferenceError: output is not defined on every invocation,
     blocking the first step of /gsd-complete-milestone.
  2. Even after switching to core.output(formattedReport, raw), the
     human-readable branch JSON-stringified the formatted text because
     core.output only bypasses JSON encoding when called as
     core.output(null, true, rawValue).

Fix:
  - --json path:  core.output(result, raw)   — pass the object,
    let core.output JSON-stringify (don't pre-stringify).
  - text path:    core.output(null, true, formatAuditReport(result))
    — use the rawValue form to emit verbatim section dividers and
    item lists.

Adds tests/bug-2911-audit-open-output-shape.test.cjs which parses
both modes structurally — line-by-line for text mode (asserting the
report headers exist as standalone lines, not as escaped \n inside a
JSON quoted string), and JSON.parse + key-by-key shape assertions for
--json mode (matching the contract returned by auditOpenArtifacts).
2026-04-30 17:30:19 -04:00
Tom Boucher
d29822c1da fix(progress): add explicit context-authority directive to report step
The report step in workflows/progress.md had no directive establishing
PROJECT.md/STATE.md/ROADMAP.md as the authoritative sources for the
progress report. When init.progress returned project_exists: false (e.g.
invoked from a subdirectory without .planning/), the model fell back to
whatever was in its session context — including stale CLAUDE.md
## Project blocks — and produced routing output citing the wrong
milestone/phase.

Add a blockquote directive at the top of the report step that names
PROJECT.md, STATE.md, and ROADMAP.md as authoritative and forbids using
the CLAUDE.md ## Project block as a source for any progress report field.

Fixes #2912
2026-04-30 17:27:37 -04:00
teknium1
b126c0579a feat(install): add Hermes Agent runtime support (#2841)
Adds Hermes Agent as a supported installation target. Users can run
\`npx get-shit-done-cc --hermes\` to install all 86 GSD commands as
skills under \`~/.hermes/skills/gsd-*/SKILL.md\`, following the same
open skill standard as Claude Code 2.1.88+, Qwen Code, Antigravity,
Trae, Augment, and Codebuddy.

Hermes Agent is an open-source AI agent framework by Nous Research
(NousResearch/hermes-agent, MIT). Its skill loader accepts the Claude
skill format as-is: frontmatter parsed with PyYAML SafeLoader (unknown
keys like \`allowed-tools\` / \`argument-hint\` ignored), body XML tags
(\`<objective>\`, \`<execution_context>\`, \`<process>\`) passed directly
to the model. Compatibility proven end-to-end with all 86 GSD skills
loading cleanly, \`skill_view()\` returning full bodies, and
\`build_skills_system_prompt()\` emitting them into the agent system
prompt — zero Hermes code changes required.

Changes:
- \`bin/install.js\`: --hermes flag, getDirName/getGlobalDir/getConfigDirFromHome
  support, HERMES_HOME env var (native to Hermes — used for profile
  mode / Docker deploys), install/uninstall pipelines, interactive
  picker option 10 (alphabetical: between Gemini and Kilo), .hermes
  path replacements in copyCommandsAsClaudeSkills and
  copyWithPathReplacement, legacy commands/gsd cleanup, CLAUDE.md ->
  HERMES.md and "Claude Code" -> "Hermes Agent" content rewrites in
  skills/agents/hooks, runtime-appropriate finish message.
- \`get-shit-done/bin/lib/core.cjs\`: add hermes to KNOWN_RUNTIMES;
  add RUNTIME_PROFILE_MAP.hermes with OpenRouter-slug defaults
  (Hermes is provider-agnostic; these defaults resolve across
  OpenRouter, native Anthropic, and Copilot via Hermes' aggregator-
  aware resolver, and are overridable per-tier via
  model_profile_overrides.hermes.{opus,sonnet,haiku}).
- \`README.md\`: Hermes Agent in tagline, runtime list, verification
  command, install/uninstall examples, \`--hermes\` flag reference.
- \`tests/hermes-install.test.cjs\`: new, 14 tests covering directory
  mapping, HERMES_HOME env var precedence, install/uninstall
  lifecycle, user-skill preservation, engine cleanup.
- \`tests/hermes-skills-migration.test.cjs\`: new, 11 tests covering
  frontmatter conversion, path replacement (~/.claude/ ->
  \$HERMES_HOME/skills/), CLAUDE.md -> HERMES.md, "Claude Code" ->
  "Hermes Agent", stale skill cleanup, SKILL.md format validation.
- \`tests/multi-runtime-select.test.cjs\`: updated for new option
  numbering (hermes=10, kilo=11, opencode=12, qwen=13, trae=14,
  windsurf=15, all=16).
- \`tests/kilo-install.test.cjs\`: updated assertions for Kilo having
  moved from option 10 to option 11.

Closes #2841

Implementation notes:
- Zero custom code paths: Hermes reuses copyCommandsAsClaudeSkills()
  identical to Qwen Code / Antigravity pattern.
- Path replacement: ~/.claude/, \$HOME/.claude/, ./.claude/ ->
  .hermes equivalents in skill/agent/hook content.
- Config precedence: --config-dir > HERMES_HOME > ~/.hermes (matches
  how Hermes itself resolves its home directory).
- Legacy cleanup: removes commands/gsd/ if present from a prior
  install, preserving dev-preferences.md (same as Qwen).
- No external dependencies added.

Testing: 5841 / 5841 tests pass (0 failures, 0 regressions)
- 14 new tests in hermes-install.test.cjs
- 11 new tests in hermes-skills-migration.test.cjs
- multi-runtime-select.test.cjs renumbered + 1 new test (single choice for hermes)
2026-04-30 17:24:53 -04:00
Tom Boucher
006cdafe8f ci(drift): enforce alias freshness checks in CI and contributor flow (#2910)
Merging alias-drift guardrails and local hook hardening.
2026-04-30 14:19:46 -04:00
Tom Boucher
8051bc4fd8 test(golden): expand phases/validate/roadmap parity matrix (#2909)
Merging parity-matrix expansion after stack foundation.
2026-04-30 14:10:28 -04:00
Tom Boucher
444db1714b refactor(query): manifest-backed routing seam + family adapters (#2908)
Merging validated command-seam foundation.
2026-04-30 14:04:50 -04:00
Tom Boucher
6dce1de4a7 fix: gap-analysis parses mixed requirement prefixes and skips table headers (#2902)
* fix: parse non-REQ IDs in gap-analysis and ignore table headers

* fix: parse requirement IDs from first traceability column only

---------

Co-authored-by: Tom Boucher <thomas.boucher@sas.com>
2026-04-30 12:13:55 -04:00
Tom Boucher
abb2cb63f6 refactor: extract planning-workspace seam from core.cjs (#2901)
* refactor: extract planning workspace seam from core

* docs: document planning-workspace module and inventory updates

* fix: harden planning lock timeout and preserve workstream set contract

---------

Co-authored-by: Tom Boucher <thomas.boucher@sas.com>
2026-04-30 11:38:13 -04:00
TÂCHES
8cbdbdd2de feat(sdk): add durable planning runtime (#2898) 2026-04-30 09:03:06 -06:00
Tom Boucher
951d5bf7c0 fix(#2893): surface non-canonical plan filenames instead of silently returning zero plans (#2896)
* fix(#2893): surface non-canonical plan filenames instead of silently returning zero plans

Reporter saw `plan_count: 0` from `/gsd:execute-phase` even though five
plan files existed on disk. Investigation showed the planner had written
files like `01-PLAN-01-foundation.md`, while `phase-plan-index`'s strict
filter (`f.endsWith('-PLAN.md') || f === 'PLAN.md'`) rejected them
silently — collapsing two distinct states into the same `plans: []`
return:

  - directory truly has no plans (legit empty)
  - directory has plans but the filter rejected them (user/agent error)

The canonical contract is documented in three places:
  - `agents/gsd-planner.md` write_phase_prompt step (lines 1063-1080)
  - `commands/gsd/plan-phase.md`
  - `references/universal-anti-patterns.md` (rule 26)

It mandates `{padded_phase}-{NN}-PLAN.md` and explicitly forbids
`PLAN-NN.md` / `01-PLAN-01.md` / `plan-NN.md` etc. The strict filter is
correct per that contract. The bug is that the executor never tells the
user when the contract was violated — they just see `plan_count: 0`
with no signal.

Fix: add a diagnostic helper `describeNonCanonicalPlans()` that scans
the phase directory for files matching `*PLAN*.md` (the diagnostic net)
that the canonical filter rejected, excluding legit derivatives like
`*-PLAN-OUTLINE.md` and `*-PLAN.pre-bounce.md`. When offenders exist,
return a `warning` field naming each one and citing the canonical
pattern so the user knows what to rename to.

Wired into the three filter sites:
  - `phase-plan-index` (the executor's main entry point)
  - `phases list --type plans`
  - `find-phase`

The strict filter itself is unchanged — existing canonical plans behave
identically. This is purely a diagnostic that converts silent-empty
into loud-with-actionable-error.

Tests:
  - `phase-plan-index returns warning for reporter's exact filename
    pattern (`01-PLAN-01-foundation.md`)`
  - `truly empty dir does not emit a warning`
  - `canonical plans + outline + pre-bounce files do not emit a warning`

Closes #2893

* test(#2893): add parity tests for find-phase and phases list --type plans warnings

CodeRabbit's only finding on the prior commit: I wired the warning into
three filter sites (`phase-plan-index`, `find-phase`,
`phases list --type plans`) but only `phase-plan-index` had test
coverage for the warning shape. The other two paths could silently
diverge during future refactors — exactly the silent-drift class of bug
this fix exists to prevent.

Add four parity tests mirroring the existing two:

  - find-phase: non-canonical filenames produce a warning naming each
    offender + citing the canonical pattern.
  - find-phase: canonical plan + derivative files (PLAN-OUTLINE,
    pre-bounce) produce no warning.
  - phases list --type plans: same non-canonical case, but assert the
    warning is prefixed with `${dir}: ` (this path aggregates across
    phase directories so each offender is tagged with its dir).
  - phases list --type plans: canonical case, no warning.

`node --test tests/phase.test.cjs`: 98/98 pass (was 94, +4 new).
2026-04-30 10:49:13 -04:00
Tom Boucher
ca88429bf8 docs(#2888): release notes for 1.40.0-rc.1 (#2889)
Add docs/RELEASE-v1.40.0-rc.1.md following the rc.7 format. Cover the
11 commits on main since v1.39.0-rc.7's release notes landed:

- #2790 — skill surface consolidated 86 → 59
- #2792 — namespace meta-skills + keyword-tag descriptions + context guard
- #2833 — phase-lifecycle status-line read-side
- #2876 — yamlQuote SKILL.md description (Copilot/Antigravity/Trae/CodeBuddy)
- #2768 — Gemini slash command namespace
- #2858 — gsd slash namespace drift cleanup
- #2851 — bare gsd-tools → absolute path
- #2866 — Codex installer trailing-newline preservation
- #2868 — canary publish moved from main to dev
- #2872 — auto-close PRs without issue link

Update CHANGELOG.md [Unreleased] with the same 1.40.0-rc.1 entries.

Closes #2888
2026-04-30 01:13:43 -04:00
Tom Boucher
5fdc950eb7 feat(#2792): namespace meta-skills + keyword-tag descriptions + context utilization guard (#2825)
* feat(#2792): namespace meta-skills retargeted at the post-#2790 surface

This branch is now based on #2790's HEAD (the consolidation PR) instead
of main, and every routing table targets the consolidated surface so a
user routed by a namespace meta-skill never lands at a deleted /
folded sub-skill.

Cross-PR inconsistencies the original PR #2825 carried (vs #2790):

  - ns-ideate routed to gsd-note / gsd-add-todo / gsd-add-backlog /
    gsd-plant-seed → all folded into gsd-capture by #2790. Now routes
    to gsd-capture (the parent picks the mode from the user's intent).
  - ns-context routed to gsd-scan and gsd-intel → folded into
    gsd-map-codebase --fast / --query by #2790. Now routes to those
    flag forms.
  - ns-manage routed all workspace intent to gsd-list-workspaces (a
    list-only entry) → CR also flagged the over-narrow target. #2790
    folds into gsd-workspace; routing now points there.
  - ns-workflow routed to gsd-research-phase → deleted outright by
    #2790. Removed.
  - ns-project routed to gsd-plan-milestone-gaps → deleted outright by
    #2790. Removed.
  - None of the namespaces previously surfaced #2790's new consolidated
    skills (gsd-capture, gsd-phase, gsd-config, gsd-workspace,
    gsd-progress). All five are now reachable through the routers.
  - extract_learnings → extract-learnings (canonicalized by #2858).

Defect fixes within the namespace skills:

  - Hyphen-form `name:` (gsd-workflow, …) per the canonical naming
    contract — the colon-form addressed CR's drift complaint.
  - `Skill` added to allowed-tools on every router. The body instructs
    "Invoke the matched skill directly using the Skill tool" — without
    Skill in the permission list the meta-skill cannot route at all.

New regression guard in tests/enh-2792-namespace-skills.test.cjs: every
gsd-* token in any namespace router's table column resolves to a
surviving commands/gsd/*.md file (or to a known consolidated parent for
flag-form targets like gsd-map-codebase --fast). This single test would
have caught every dead-end route the original PR shipped with.

Skill-count cap in tests/enh-2790-skill-consolidation.test.cjs now
filters out ns-*.md from its <= 63 cap. Namespace routers are
descriptor-only entries, not part of the consolidation surface that cap
is policing — they have their own contract in
tests/enh-2792-namespace-skills.test.cjs.

INVENTORY.md gains a "Namespace Meta-Skills" section with the 6 router
rows; INVENTORY-MANIFEST.json gains 6 entries; the headline count moves
59 → 65 to match.

Out of scope for this rebase: the gsd-health --context flag (PR #2825
advertised the contract but didn't implement it). That's a separate
feature concern and is left untouched here.

5908/5908 on `npm test`.

* feat(#2792): implement gsd-health --context utilization guard

The original PR #2825 advertised a `--context` flag on gsd-health with a
60%/70% utilization threshold table but never implemented the workflow
logic — CR caught it as a contract leak, the rebase deferred it. This
commit closes the gap with TDD red/green/refactor.

Math layer (pure):
  - get-shit-done/bin/lib/context-utilization.cjs
    classifyContextUtilization(tokensUsed, contextWindow) →
      { percent, state }
    State boundaries use the exact ratio:
      < 60% healthy / 60–70% warning / ≥ 70% critical (fracture point)
    Display percent rounded for humans. Throws TypeError on non-integer
    or out-of-range inputs.
  - STATES = Object.freeze({ HEALTHY, WARNING, CRITICAL }) exported
    so callers reference the names by symbol, not by literal string.

SDK CLI integration:
  - get-shit-done/bin/gsd-tools.cjs
    `validate context --tokens-used N --context-window M [--json]`
    routes to the classifier, owns the recommendation copy (the
    classifier intentionally does not — keeps the renderer free to
    evolve without touching the math layer or its tests), and uses
    core.output's rawValue path for the sync-flush guarantee.
  - sdk/src/query/validate.ts + sdk/src/query/index.ts
    TypeScript validateContext handler registered at 'validate.context'
    and 'validate context'. Mirrors the CJS classifier inline (15 lines
    of arithmetic; not worth a shared cross-language module).

User-facing wiring:
  - commands/gsd/health.md frontmatter advertises --context, body
    documents the three-state threshold table.
  - get-shit-done/workflows/health.md adds a `context_check` step
    that's reached only when --context is set. Step calls
    `gsd-sdk query validate.context` with self-reported tokensUsed and
    contextWindow, prints the SDK output verbatim, and ends. Includes
    a TEXT_MODE plain-text fallback for non-Claude runtimes per #2012.

Tests:
  - tests/context-utilization.test.cjs (17 tests) — pure-function
    contract: state thresholds at every boundary, percent rounding,
    input validation, return-shape (no recommendation field — that's
    the renderer's job).
  - tests/validate-context.test.cjs (9 tests) — SDK CLI plumbing:
    arg parsing errors, JSON vs human rendering, recommendation copy
    pinned per state.
  - tests/enh-2792-namespace-skills.test.cjs (4 new tests) — markdown
    contract: --context advertised in argument-hint, threshold table
    in command body, context_check step exists in workflow, step
    invokes gsd-sdk query validate.context with both flags.

Inventory bookkeeping:
  - docs/INVENTORY.md "CLI Modules" 31 → 32; new row for
    context-utilization.cjs.
  - docs/INVENTORY-MANIFEST.json mirror.

5939/5939 on `npm test`.
2026-04-30 01:04:41 -04:00
Tom Boucher
c72b893916 fix(test): unbreak gemini-namespacing test after #2790 skill consolidation (#2886)
Closes #2876 follow-up — CI on main fails because the punctuation test
in tests/gemini-namespacing.test.cjs hardcoded `/gsd-scan` as a known
command, but #2824 (consolidate 86 → 59 skills) removed scan.md from
commands/gsd/. The roster now correctly returns "scan is unknown, leave
unchanged" — the conversion is right, the test fixture is stale.

Swap `scan` for `health` in the punctuation test. Both are bedrock
commands; the test still exercises the original intent (period vs
exclamation handling on adjacent slash commands).

Note added so the next consolidation reviewer knows the swap pattern.

`npm test`: 5936/5936 pass.
2026-04-30 00:57:17 -04:00
hoptop
8fc1fa263c feat(#2833): phase-lifecycle status-line — read-side (parseStateMd + formatGsdState scenes + tests + docs) (#2884)
* feat(#2833): parseStateMd reads phase-lifecycle frontmatter fields

Extend parseStateMd() to parse 4 new STATE.md frontmatter fields that drive
the phase-lifecycle status-line proposed in #2833:

- active_phase   : phase number when orchestrator is in-flight, null when idle
- next_action    : recommended next command when idle
- next_phases    : YAML flow array of phase numbers for next_action
- progress       : nested block with completed_phases / total_phases / percent

All fields default to undefined when absent — formatGsdState() (next commit)
degrades gracefully so existing STATE.md files keep rendering as before.

YAML scope intentionally narrow:
- Only top-level scalar keys (status, milestone, active_phase, next_action)
- Only single-line flow array for next_phases ([...])
- progress block requires 2-space indent for nested keys

Block sequences (- item over multiple lines) and inline comments inside
nested blocks are NOT parsed — keeping the regex-based parser predictable.
Comments outside frontmatter or after the closing --- still work.

Tests: all 27 existing tests still pass (no behavior change for STATE.md
files that don't carry the new fields).

Refs #2833

* feat(#2833): formatGsdState renders phase-lifecycle scenes + opt-in progress bar

Extend formatGsdState() with three lifecycle scenes that activate when the
new STATE.md frontmatter fields (added in the previous commit) are present.
Also append an opt-in progress bar to the milestone segment when
progress.percent is available.

Scenes (first match wins; falls through to the existing path otherwise):

  1. active_phase set     → 'v2.0 [██░] X% · Phase 4.5 executing'
                             (status field carries the lifecycle stage:
                              discussing / planning / executing / verifying)

  2. active_phase null +  → 'v2.0 [██░] X% · next execute-phase 4.5'
     next_action set         (idle state — surfaces what the user should
                              run next without opening STATE.md)

  3. percent=100 (or       → 'v2.0 [██████████] 100% · milestone complete'
     completed=total)

  4. (default fallback)    → 'v1.9 Code Quality · executing · ph (1/5)'
                             (existing rendering, byte-for-byte preserved
                              when none of the new fields are populated)

Backward compat is the design priority:
- STATE.md files without the new fields render identically to v1.38.x
- progress bar is opt-in (empty string when percent absent)
- Each new scene only activates when its specific fields are populated

A new helper renderProgressBar() generates the 10-segment bar that matches
the existing context meter style (so the two bars on the status-line are
visually consistent).

Tests: 27/27 existing tests still pass.

Refs #2833

* test(#2833): cover parseStateMd lifecycle fields + formatGsdState scenes

26 new tests organized in 5 describe blocks, modeled after the existing
enh-2538-statusline-last-command.test.cjs convention:

  parseStateMd #2833 lifecycle fields (7 tests)
    - reads active_phase / next_action / next_phases / progress.percent
    - 'null' literal handled correctly
    - YAML flow array parsing (1 item, multiple items)
    - progress nested block (3 fields)
    - absent fields return undefined

  formatGsdState #2833 lifecycle scenes (6 tests)
    - Scene 1: active_phase set → 'Phase X.Y <stage>'
    - Scene 2: idle + next_action → 'next <action> <phases>' (1+ phases)
    - Scene 3: percent=100 OR completed=total → 'milestone complete'

  formatGsdState #2833 backward compatibility (4 tests) — CRITICAL
    - Legacy STATE.md (no new fields) renders byte-for-byte unchanged
    - Empty state, partial state, progress-bar-opt-in all preserved

  progress bar rendering (6 tests)
    - 0% / 50% / 100% / clamping / opt-in absence

  formatGsdState #2833 scene priority (3 tests)
    - active_phase wins over next_action when both populated
    - next_action wins over fallback when active_phase null
    - percent=100 wins over fallback even with phase set

Combined run: 53/53 tests pass (existing 27 + new 26).

Refs #2833

* docs(#2833): describe phase-lifecycle frontmatter fields and rendering scenes

Add docs/STATE-MD-LIFECYCLE.md as the canonical reference for the four new
STATE.md frontmatter fields and the four status-line rendering scenes
introduced by this proposal:

- Frontmatter field reference (active_phase / next_action / next_phases /
  progress.percent) with type and population semantics
- Why progress.percent is intentionally the phase dimension and not the
  plans dimension (plans dimension trends optimistic when future phases
  are unplanned)
- The four rendering scenes including their priority order
- Stage-label convention for Scene 1 (discussing / planning / executing /
  verifying matching the four phase orchestrators)
- Frontmatter parsing constraints — frontmatter must start at file head,
  no comments inside nested blocks, next_phases is single-line flow only
- Backward-compatibility guarantee (locked in by the test suite)
- Cross-links to the foundation issue #1989 and the read-side issues
  this proposal helps close

The document deliberately scopes itself to the read-side (what the hook
parses, what it renders). Write-side SDK and workflow changes that
auto-maintain the fields are out of scope for this PR so each piece can
be reviewed independently — see the issue thread for the full proposal.

Refs #2833

* test(#2833): simplify '0% renders 10 empty segments' assertion

Address CodeRabbit nitpick — drop the convoluted assert.equal that built
the expected value via .replace() and rely on the existing assert.ok
includes-check. The behavior under test is unchanged; the assertion is
just easier to read.

Refs #2884 review comment
2026-04-30 00:48:49 -04:00
Tom Boucher
87917131f2 refactor(#2790): consolidate 86 gsd-* skills to 59 — fold flags, delete dead skills (#2824)
* feat(#2790): consolidate 86 gsd-* skills to 59 — zero functional loss

Closes #2790

- `capture.md` — absorbs add-todo (default), note (--note), add-backlog
  (--backlog), plant-seed (--seed), check-todos (--list)
- `phase.md` — absorbs add-phase (default), insert-phase (--insert),
  remove-phase (--remove), edit-phase (--edit)
- `config.md` — absorbs settings-advanced (--advanced),
  settings-integrations (--integrations), set-profile (--profile);
  settings.md retained as-is
- `workspace.md` — absorbs new-workspace (--new), list-workspaces
  (--list), remove-workspace (--remove)

- `update.md` — adds --sync (absorbs sync-skills) and --reapply
  (absorbs reapply-patches)
- `sketch.md` — adds --wrap-up (absorbs sketch-wrap-up)
- `spike.md` — adds --wrap-up (absorbs spike-wrap-up)
- `map-codebase.md` — adds --fast (absorbs scan) and --query (absorbs
  intel)
- `code-review.md` — adds --fix (absorbs code-review-fix)
- `progress.md` — adds --next (absorbs next) and --do (absorbs do)

join-discord, research-phase, session-report, from-gsd2,
analyze-dependencies, list-phase-assumptions, plan-milestone-gaps

autonomous.md: updated Skill(skill="gsd:code-review-fix") →
Skill(skill="gsd:code-review", args="--fix --auto") to match
the consolidated skill name

- New: tests/enh-2790-skill-consolidation.test.cjs (48 tests)
- Updated: 14 existing test files redirected from deleted command paths
  to their consolidated equivalents
- docs/INVENTORY.md: Commands count 86→59, ghost rows removed, new
  consolidated rows added
- docs/INVENTORY-MANIFEST.json: regenerated to match filesystem

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(#2790): add CHANGELOG entry for skill consolidation

* docs(#2790): update COMMANDS.md for 86→59 skill consolidation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(#2790): address CodeRabbit review findings

- CHANGELOG.md: add --next alongside --do in progress flag list
- config.md: remove trailing space from --profile code span (MD038)
- COMMANDS.md: add required descriptions to /gsd-phase examples;
  /gsd-phase without args errors, not interactive
- COMMANDS.md: add --next and --do to /gsd-progress flags table + examples
- test: convert content.includes('--reapply') to structural frontmatter
  parse; add allow-test-rule comment for workflow content assertions
- test: replace redundant existsSync duplicate with assertion that verifies
  the full consolidated flag surface (--sync | --reapply) in argument-hint

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(#2790): restore reapply-patches workflow and strengthen test assertions

- Create get-shit-done/workflows/reapply-patches.md: the #2790 consolidation
  deleted the 14K combined command+workflow file (reapply-patches.md) but
  update.md already referenced the workflow via execution_context_extended.
  Restoring it fixes a silent behavioral gap where --reapply had no workflow
  to load. Includes full three-way merge logic, hunk verification table
  (Step 4), and the Hunk Verification Gate (Step 5) that blocks cleanup
  until all user-added hunks are confirmed present in the merged output.

- Fix update.md: /gsd-reapply-patches → /gsd-update --reapply (stale ref)

- Fix reapply-verify-hunks.test.cjs: was checking existsSync(update.md) 8×;
  now points to the workflow file and asserts real behavioral content
  (Post-merge verification, Hunk presence check, Line-count check, backup
  reference, per-file tracking, structural ordering)

- Fix reapply-patches.test.cjs: replace content.includes() stubs with
  frontmatter-parsed argument-hint assertions; replace 4 existsSync(update.md)
  no-ops with real assertions against the workflow content

- Fix edit-phase.test.cjs: /gsd-edit-phase → /gsd-phase (COMMANDS.md now
  documents the consolidated command with --edit flag)

- Fix next-safety-gates.test.cjs: split OR predicates into independent
  assertions — --next in progress.md and --force in next.md workflow

- Fix workspace.test.cjs: add allow-test-rule comment for routing content
  checks (command routing text IS the deployed behavioral contract)

- Fix bug-2439 test: strengthen pre-flight assertion to verify gsd-sdk is
  referenced (not just --profile)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit review findings (CR round 2)

- INVENTORY.md: update sync-skills.md row to reference /gsd-update --sync
  instead of stale /gsd-sync-skills (absorbed in #2790)

- enh-2380-sync-skills.test.cjs: align INVENTORY.md assertion with the
  corrected reference; was asserting the old /gsd-sync-skills name while the
  manifest test correctly asserted /gsd-update, creating conflicting expectations
  in the same suite

- reapply-verify-hunks.test.cjs: add explicit notEqual(-1) assertions for all
  three anchors before the ordering check so a missing anchor produces a clear
  failure instead of a false positive (writeIdx=-1 < verifyIdx=5 is true)

- bug-2439-set-profile-gsd-sdk-preflight.test.cjs: defer fs.readFileSync until
  after the existence assertion; eager describe-level read caused the suite to
  crash before the existence test could run, making it effectively dead code

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(#2790): address CR — INVENTORY routing + reapply test contract wording

Two unresolved CodeRabbit findings (Major):

- docs/INVENTORY.md: workflow-file table still pointed at obsolete
  /gsd-do, /gsd-next, /gsd-note, /gsd-add-todo, /gsd-add-backlog,
  /gsd-check-todos, /gsd-plant-seed slash commands. Re-route to the
  consolidated /gsd-progress (--next, --do) and /gsd-capture (--note,
  --backlog, --seed, --list) so the inventory is internally consistent.

- tests/reapply-verify-hunks.test.cjs: 'verification tracks per-file
  status' asserted on phrasing that doesn't appear in reapply-patches.md
  (the 'per-file' substring only matched accidentally via 'sequential
  integer per file'). Switch to the actual contract text — Hunk
  Verification Table, one row per hunk per file, verified column.

* test(#2790): update CR-INTEGRATION tests for consolidated --fix invocation

After the merge of main (which carries #2843's hyphen-form fix), the
consolidation in this branch absorbs gsd-code-review-fix into gsd-code-review
as the --fix flag. Update the two CR-INTEGRATION tests that previously
asserted on the standalone gsd-code-review-fix skill name to instead assert
on a gsd-code-review invocation carrying --fix in its arg tokens.

Tests still parse Skill() invocations structurally; only the asserted
skill-name + arg-token shape changed.

* test(#2790): scope success_criteria check to the <success_criteria> block

CodeRabbit nitpick: 'success criteria includes verification' did a
whole-file substring check, which can false-pass if the phrase appears
elsewhere in the document. Extract the <success_criteria>...</success_criteria>
block first via extractTagBlock() and assert against that scope only.

* fix(#2790): post-rebase reconciliation with main

- INVENTORY.md/JSON: add reapply-patches workflow row + bump count to 85
- autonomous.md: switch consolidated --fix invocation to hyphen Skill name
- analyze-dependencies test: assert COMMANDS.md does NOT document the
  consolidated-away /gsd-analyze-dependencies entry (was: bare .includes())

* fix(#2790): address remaining CR findings — strengthen contract tests

Doc-fixes:
- INVENTORY.md: route transition.md & edit-phase.md rows to consolidated
  /gsd-progress --next and /gsd-phase --edit (was: deleted /gsd-next, /gsd-edit-phase)
- config.md --profile branch: document #2439 pre-flight `command -v gsd-sdk`
  guard + install hint BEFORE the gsd-sdk invocation (closes opaque
  "command not found: gsd-sdk" regression path)

Test discipline (no-source-grep contract):
- bug-2439: replace bare `content.includes('gsd-sdk')` with structured
  parse of <context> block + --profile branch; assert pre-flight token,
  install hint, #2439 citation, and ordering vs gsd-sdk invocation
- edit-phase: parse INVENTORY.md edit-phase.md row's "Invoked by" column
  and assert `/gsd-phase --edit` (not the deleted /gsd-edit-phase)
- next-safety-gates: tighten `--next` documentation contract — require
  --next AND --force AND completeness routing (was OR-based, passed when
  only --next present)
- reapply-patches: parse argument-hint flag list structurally; scan ALL
  <execution_context*> blocks for the @-include of reapply-patches.md;
  parse Hunk Verification Table header columns directly; locate Step 5
  via heading parsing then assert (i) table reference, (ii) verified=no
  gate, (iii) STOP/halt directive, (iv) explicit absent-table halt path
- workspace: parse frontmatter, tokenize argument-hint across multiple
  bracketed segments, parse @-include targets from <execution_context>
  rather than substring-matching the file body

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 00:43:47 -04:00
Tom Boucher
55298b2f70 fix(#2876): yamlQuote SKILL.md description for Copilot/Antigravity/Trae/CodeBuddy (#2881)
* fix(#2876): yamlQuote description in Copilot/Antigravity/Trae/CodeBuddy SKILL.md

A description starting with `[BETA]` (or any YAML flow indicator —
`{`, `*`, `&`, `!`, `|`, `>`, `%`, `@`, backtick) is parsed as a flow
sequence/mapping by YAML 1.2-strict loaders. gh-copilot's frontmatter
loader fails closed:

  ✖ ~/.copilot/skills/gsd-ultraplan-phase/SKILL.md: failed to parse YAML
    frontmatter: Unexpected scalar at node end at line 2, column 21:
    description: [BETA] Offload plan phase to Claude Code's ultraplan…

Six emission sites in `bin/install.js` re-wrote the description without
quoting, while the Claude variant (`convertClaudeCommandToClaudeSkill`)
already routed it through `yamlQuote`. Brought all six in line:

- convertClaudeCommandToCopilotSkill
- convertClaudeAgentToCopilotAgent
- convertClaudeCommandToAntigravitySkill
- convertClaudeAgentToAntigravityAgent
- convertClaudeCommandToTraeSkill
- convertClaudeCommandToCodebuddySkill

Each now wraps the value in `yamlQuote(...)` so any leading character is
parser-safe.

Regression test (tests/bug-2876-skill-frontmatter-quote.test.cjs) drives
the four command converters and two agent converters through the
reporter's exact "[BETA] …" description plus a grab-bag of YAML flow
indicators, asserting the emitted `description:` value is a quoted YAML
scalar. Also round-trips the value through `JSON.parse` for converters
that don't apply runtime-name substitution to confirm fidelity.

Updated 7 pre-existing substring assertions in copilot-install.test.cjs
and antigravity-install.test.cjs that hard-coded the unquoted form.
Round trip: 5893/5893 pass on `npm test`.

Closes #2876

* test(#2876): structurally parse frontmatter instead of substring-grep

Addresses CodeRabbit's two nitpicks on PR #2881: the pre-existing
substring assertions in copilot-install.test.cjs (4 sites) and
antigravity-install.test.cjs (3 sites) only got bumped from the unquoted
form (`description: Diagnose...`) to the quoted-prefix form
(`description: "Diagnose...`). Both are still raw-string checks against
rendered YAML and drift on any quoting/order change — exactly what the
project's CONTRIBUTING.md "no-source-grep" testing standard exists to
prevent.

Add `parseFrontmatter()` to tests/helpers.cjs — a small parser that
handles the YAML scalar forms the install converters emit (double-quoted
JSON, single-quoted with `''` escape, bare). Throws if the content has
no closed `---` block so a regression in the emitter shape fails loudly
rather than silently returning {}.

Refactor the 7 description-substring sites to compare on parsed values:
the assertion now reads as `fm.description === 'Diagnose planning
directory health'` rather than `result.includes('description: "Diagnose
planning directory health')`. Same coverage of the #2876 quoting
behavior, no coupling to byte-level quote style.

`npm test`: 5893/5893 pass.

Closes #2876

* test(#2876): make parseFrontmatter delimiter check CRLF/whitespace tolerant

CR nitpick on PR #2881 (review at 03:08:08Z): parseFrontmatter()
splits on '\n' and compares each line strictly to '---'. A
Windows-authored skill file (CRLF endings) leaves a trailing '\r'
on every line, so '---\r' fails the equality check, and the helper
throws "no closed --- block" on perfectly valid input. Same problem
with whitespace-padded delimiter lines.

Switch to splitting on /\r?\n/ and comparing the trimmed line.
Helper is used by tests/copilot-install.test.cjs and
tests/antigravity-install.test.cjs, so this also de-flakes those
suites on Windows runners.

5893/5893 on `npm test`.
2026-04-29 23:27:27 -04:00
Jeremy McSpadden
4d394a249d fix(commands): normalize gsd slash namespace drift (#2858)
* fix(commands): normalize gsd slash namespace drift

* fix(#2855): address CodeRabbit findings on namespace drift PR

Three CR findings, all valid:

1. autonomous.md line 783 still had `gsd:discuss-phase` (the PR's own
   normalization missed this line). Switched to `gsd-discuss-phase` and
   updated the matching test in autonomous-interactive.test.cjs that was
   asserting the now-retired colon form.

2. tests/bug-2543-gsd-slash-namespace.test.cjs source-grepped the
   fix-slash-commands.cjs script with .includes() rather than driving
   its transform behaviour. Refactored fix-slash-commands.cjs to export
   a pure transformContent(src, cmdNames) function, kept the CLI behaviour
   unchanged via require.main, and replaced the source-grep block with
   five behavioural cases: rewrite, multi-occurrence, idempotence on
   canonical input, no-op on gsd-sdk/gsd-tools, and word-boundary safety.

3. tests/bug-2808-skill-hyphen-name.test.cjs matched `name:` anywhere
   in SKILL.md; a stray name: in the body could satisfy the assertion.
   Scoped the lookup to the YAML frontmatter block via the suggested
   diff (parse the leading --- ... --- region first, then find name:
   inside it).

Full suite: 5854/5854 passing.

* fix(#2855): address remaining CodeRabbit findings on PR #2858

Three structural concerns flagged on the namespace-drift fix PR:

1. scripts/fix-slash-commands.cjs:24 — `buildPattern([])` compiled
   `/gsd:()(?=[^a-zA-Z0-9_-]|$)/g`. The empty capture group still
   matches any `/gsd:` token followed by a non-word boundary
   (whitespace, EOL, punctuation), rewriting it to a stray `/gsd-`.
   Verified live: `transformContent("/gsd:", [])` → `"/gsd-"`. Added
   a guard returning null from `buildPattern` on empty input and
   updated `transformContent` and `processDir` to no-op when the
   pattern is null.

2. tests/autonomous-interactive.test.cjs:44-47 — assertion was
   `content.includes('gsd-discuss-phase') && content.includes('INTERACTIVE')`,
   which would false-pass on any unrelated co-occurrence (e.g.
   `INTERACTIVE=""` initialization plus a stray `gsd-discuss-phase`
   prose mention). Replaced with a structural extraction: locate the
   `**If \`INTERACTIVE\` is set:**` branch, bound it by the next
   `**If` / `<step>` boundary, and assert the
   `Skill(skill="gsd-discuss-phase", ...)` invocation lives inside
   that region. Tolerates whitespace around `(`, `skill`, and `=`.

3. tests/bug-2808-skill-hyphen-name.test.cjs:104 — colon-call regex
   was `Skill\(skill=...` and missed valid formatting like
   `Skill(skill = "gsd:cmd")` or `Skill( skill = ...)`. Loosened to
   `Skill\(\s*skill\s*=\s*...` so reformatting drift can't slip past
   the namespace guard.

Verification: 5854/5854 pass on `npm test` from the rebased branch.

* fix(#2855): drop pre-validation filter that hid namespace drift

CR finding on tests/bug-2808-skill-hyphen-name.test.cjs:128: the test
collected generated skill directories with
`.filter(entry => entry.isDirectory() && entry.name.startsWith('gsd-'))`,
then validated namespace invariants over that filtered list. Anything
that violated the prefix invariant — `gsd:extract-learnings` (colon
form), `extract_learnings` without prefix, `Gsd-foo` mis-cased — would
silently disappear from the iteration and the test would falsely pass.

Drop the `startsWith('gsd-')` filter so every generated directory shows
up. Add explicit assertions before the existing per-skill loop:
  - directory list is non-empty (catches a broken converter that
    produces nothing)
  - every directory begins with `gsd-`
  - every directory contains no `:`
  - every directory contains no `_`

Re-audited the full PR diff for the same anti-pattern: only this one
site filtered before validating the namespace; bug-2643 and
commands-doc-parity also use `readdirSync().filter()` but only by file
extension, which is correct.

5854/5854 on `npm test`.

* fix(#2855): address remaining CR findings (1 active + 2 nitpicks)

Three findings on PR #2858, all the same root cause: input narrowing
before validation lets drift slip past the guards.

1. tests/bug-2808-...:104 (active) — `colonCallRe` captured local names
   with `[a-z0-9-]+`, which excluded the underscore. A drift like
   `Skill(skill="gsd:extract_learnings")` (deprecated colon syntax with
   the old underscore filename) silently slid through. Broadened the
   capture to `[^'"\s)]+` so any malformed local name is surfaced; surrounding pattern (whitespace tolerance, escape support, flags)
   unchanged.

2. tests/bug-2643-...:43 (nitpick) — `extractSkillNamesHyphen` and
   `extractSkillNamesColon` had the same over-strict capture plus
   relied on a single regex over raw bytes, which the project test-
   rigor memory bans (`feedback_no_source_grep_tests.md`). Replaced
   with `extractSkillCalls(content)` — a small structural extractor
   that walks `Skill(` openers, locates each call's matching `)`,
   parses the body's `skill = "..."` keyword argument with permissive
   whitespace + quoting + escape handling, and returns
   `{ name, raw }` records. The two namespace-form helpers become
   thin filters over the structured output. Tightened the body class
   to `[^'"\\]+` so a trailing escape `\` before the closing quote
   (as in `Skill(skill=\"gsd-foo\", …)` written inside another string
   context) doesn't get included in the captured name.

3. tests/bug-2543-...:44 (nitpick) — `DOC_SEARCH_FILES` was a hand-
   curated 7-entry array. Every doc added in the future would silently
   weaken drift detection until someone remembered to extend the list.
   Replaced with `discoverDocSearchFiles(ROOT)`: globs every `.md`
   under `docs/` and adds `README.md` if present. New docs are picked
   up automatically.

Re-audited the diff surface for similar narrowings; no other sites
filter or constrain before validating namespace invariants.

5854/5854 on `npm test`.

* fix(#2855): recurse docs/ tree so localized translations are scanned too

CR finding: discoverDocSearchFiles() stopped at docs/*.md, leaving
localized translation trees (docs/ja-JP/, docs/zh-CN/, docs/ko-KR/,
docs/pt-BR/) and other nested doc collections (docs/skills/,
docs/superpowers/) invisible to the namespace-drift invariant.

Verified the gap: docs/ has 6 nested directories with ~30 .md files
that the previous top-level-only scan was skipping. None contain
/gsd: references today, but a future translation update or new
doc subdir could leak drift.

Switch to an iterative stack walk so every .md under docs/ is scanned
regardless of depth. Stack form (rather than recursion) avoids the
risk of running into the call-stack limit on deep doc trees.

5854/5854 on `npm test`.

---------

Co-authored-by: Tom Boucher <trekkie@nomorestars.com>
2026-04-29 22:56:59 -04:00
Oleksander Palian
73b9d1dac0 fix(install): use colon namespace for Gemini slash commands (#2768)
* fix(install): use colon namespace for Gemini slash commands and help reference

This fixes unexecutable command recommendations in Gemini CLI by correctly
namespacing slash commands (/gsd: instead of /gsd-) in all installed
artifacts (agents, commands, workflows).

- Implements a lazy command roster discovery to ensure 100% accurate
  conversion and protect file paths, URLs, and agent names.
- Adds isolated behavioral and unit tests covering all boundary cases.
- Fixes hardcoded command strings in banners and help output.

Closes #2783

* fix(install): close roster gaps in Gemini /gsd- → /gsd: conversion (#2783)

Addresses adversarial review findings on PR #2768:

- Restore regex boundaries (lookbehind + extension lookahead). Roster-only
  matching was insufficient: a URL like `https://example.com/gsd-plan-phase`
  ends in a known command and would be incorrectly converted. Boundaries +
  roster now agree before any conversion fires.
- Smarter trailing lookahead `(?!\.[a-z])` distinguishes file extensions
  (`.cjs`, `.md`) from sentence-ending punctuation (`.` at end of input or
  before whitespace), so `/gsd-help.` correctly converts.
- Fail loud on missing roster. `commands/gsd/` not found previously fell
  through to an empty Set, silently no-op'ing every conversion — exactly the
  bug this code exists to prevent. Now emits a one-shot console.warn (gated
  on GSD_TEST_MODE) before returning the empty set.
- Drop unnecessary `i` flag — GSD commands are always lowercase; matching
  uppercase tokens against a lowercase roster always misses anyway.
- Export `_resetGsdCommandRoster` for test isolation against the module-level
  cache.

Test additions pin the actual safety property of the roster check by using
KNOWN command names embedded in URLs and sub-paths — the cases the prior
tests didn't reach because they used `gsd-tools` (not in roster). Added a
roster-load assertion that fails loudly if the empty-Set fallback path
silently neutralises conversions.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(install): centralize <sub> stripping and add structural test assertions

CodeRabbit findings on the prior commit:

- (actionable) Centralizing the Gemini conversion through
  convertClaudeToGeminiMarkdown dropped the stripSubTags() call that the
  inline command path used to make before TOML conversion. Move stripSubTags
  inside convertClaudeToGeminiMarkdown so command/agent/non-command Gemini
  outputs all have <sub> consistently stripped. Remove the now-redundant
  stripSubTags call in convertClaudeToGeminiAgent (single source of truth).
- (nitpick) Replace `.includes()` checks in the TOML test with structured
  parsing — JSON-decode each TOML value and assert on parsed fields, per
  the project's "tests parse, never grep" convention.
- (nitpick) Strengthen the install behavioral test to read a real installed
  artifact (.gemini/commands/gsd/plan-phase.toml), parse it, and assert the
  prompt body actually contains a /gsd: reference and no unconverted
  /gsd-plan-phase. A directory-only check would have passed even if every
  conversion silently no-op'd.
- Add a regression test that <sub> tags are stripped through the
  convertClaudeToGeminiMarkdown pipeline.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Tom Boucher <trekkie@nomorestars.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 22:37:57 -04:00
Tom Boucher
99af76b3ba fix(#2851): replace bare gsd-tools invocations with absolute path (#2869)
* fix(#2851): replace bare gsd-tools invocations with absolute path

`gsd-tools` is not a published bin entry — package.json declares only
get-shit-done-cc and gsd-sdk. The shipped invocation pattern is
`node "$HOME/.claude/get-shit-done/bin/gsd-tools.cjs" <subcommand>`,
used by every other workflow file.

Two leaked bare invocations:

- get-shit-done/workflows/plan-phase.md §13e (gap-analysis)
  — reported in #2851; gap-analysis silently skipped on every plan-phase run
- get-shit-done/workflows/ingest-docs.md §finalize (commit)
  — caught by the new structural test; ingest-docs commit step was broken

Both updated to canonical absolute-path form.

Adds tests/bug-2851-workflow-bare-gsd-tools.test.cjs which parses every
markdown file under get-shit-done/workflows/, extracts shell-fenced code
blocks, tokenizes each line, and asserts no token in command position is
the bare string `gsd-tools` (the trailing `.cjs` is a different token).
The test also asserts plan-phase.md's gap-analysis call uses the canonical
`node …/gsd-tools.cjs` form.

Closes #2851

* fix(#2851): catch third bare gsd-tools call in ingest-docs.md init

After the first commit, the structural test was strengthened to detect
bare `gsd-tools` inside `$(...)` and backtick command-substitution forms.
The improved test surfaced a third leak:

  ingest-docs.md:55: INIT=$(gsd-tools init ingest-docs)

Fixed to canonical form
  INIT=$(node "$HOME/.claude/get-shit-done/bin/gsd-tools.cjs" init ingest-docs)
plus the standard `@file:` handoff line that every other workflow uses
when capturing INIT (required by tests/windows-robustness.test.cjs).

Updated tests/bug-2801-ingest-docs-handler.test.cjs to match either
the bare `gsd-tools init ingest-docs` or canonical
`gsd-tools.cjs" init ingest-docs` form — the test's intent is to verify
the dispatch handler is wired, not to lock the bare-bin form that #2851
removes.

Closes #2851

* test(#2851): tighten ingest-docs and gap-analysis assertions to canonical form

CodeRabbit caught two soft assertions in the regression tests:

1. tests/bug-2801: the init-ingest-docs assertion accepted both the
   legacy bare `gsd-tools` form and the canonical node-path form.
   Since #2851 is the fix that removes the bare form, the test should
   only accept the canonical absolute-path invocation. Switched to
   parsed-bash-block extraction with an anchored regex on the full
   `node "$HOME/.claude/get-shit-done/bin/gsd-tools.cjs"` path.

2. tests/bug-2851: the gap-analysis assertion used two loose
   .includes()/word-boundary checks. Replaced with a single
   assert.match() against the full canonical path so non-canonical
   forms fail.

* test(#2851): env-assignment skip accepts lowercase identifiers too

CodeRabbit caught: the cmdIdx-skip regex /^[A-Z_][A-Z0-9_]*=/ only
matched uppercase variable names, so a line like `tmp=1 gsd-tools init`
would tokenize to ['tmp=1','gsd-tools','init'], the regex would fail on
'tmp=1', cmdIdx would stay at 0, and the command-position check would
compare 'tmp=1' against 'gsd-tools' — false negative.

POSIX shell variable names are [A-Za-z_][A-Za-z0-9_]*. Widen the regex
to match the actual lexical rule. Existing uppercase forms still work
(FOO=bar gsd-tools); now lowercase forms (tmp=1 gsd-tools) and mixed
case forms are also detected.
2026-04-29 21:52:20 -04:00
Tom Boucher
ef08a89241 fix(#2866): Codex installer strips legacy hooks at EOF without trailing newline (#2870)
* fix(#2866): Codex installer strips legacy hooks at end-of-file without trailing newline

The four shape-strip regexes in `bin/install.js` (Codex install path)
required `\r?\n` at end. A stale GSD hook block sitting at end-of-file
without a trailing newline (common — many editors strip them, and the
legacy installer never wrote one) failed every shape, the installer
saw `gsd-check-update` already present, skipped writing the new
Nested-AoT block, and Codex 0.125+ refused to load with
  invalid type: map, expected a sequence in `hooks`

Root cause + fix
================

Each shape's terminator changed from `\r?\n` to `(?:\r?\n|$)`, so
end-of-file is also a valid terminator.

Strip logic was lifted into a new pure helper
`stripStaleGsdHookBlocks(configContent)` that the install pipeline now
calls in place of the inline replace chain. The helper is exported via
the GSD_TEST_MODE module.exports for direct unit-test coverage.

Regression test
===============

`tests/bug-2866-codex-strip-no-trailing-newline.test.cjs` exercises
all four historical shapes (Shape 1 — pre-#1755 gsd-update-check;
Shape 2 — flat [[hooks]]+gsd-check-update; Shape 3 — single
[[hooks.SessionStart]] without nested .hooks; Shape 4 — correct
two-block nested) twice each: once with a trailing newline (regression
guard against the existing behavior) and once at end-of-file without a
trailing newline (the reporter's exact repro).

It also asserts:
- the helper is a no-op when no GSD reference is present, and
- Shape 4 strip does not leave an orphaned [[hooks.SessionStart]]
  header behind (the same ordering invariant the inline code relied on).

The helper is loaded via `package.json` `bin` field, not a hardcoded
path — `tests/bug-2866-codex-strip-no-trailing-newline.test.cjs`
parses package.json and resolves `pkg.bin['get-shit-done-cc']` to
require the installer.

Closes #2866

* test(#2866): assert TOML structure, not raw-text substrings

CodeRabbit caught the strip assertions using `.includes()` against
raw TOML output. Added a small line-structural parseTomlShape() helper
(table headers + dotted-path key/value map, comments stripped) and
rewrote the assertions to:

- Verify no [[hooks.* table header survives the strip
- Verify no key carries a stale gsd-(update|check)-(check|update) value
- Verify history.persistence is preserved as the parsed string "save-all"

Behaviour is unchanged (the strip function under test is not modified).
The assertions now check structural shape rather than substring presence,
which catches re-shaping regressions that text matching would miss.

No new dependencies — the parser is local to the test and handles only
the small well-formed TOML these tests construct.

* refactor(#2866): replace regex hook strip with TOML AST removal

Per CR feedback on PR #2870: the regex-driven `stripStaleGsdHookBlocks`
implementation was fragile to whitespace, indentation, and key-ordering
variations the regression test never exercised. Variations the regex
silently leaked (verified before the rewrite):

- Shape 4 with an extra blank line between parent/child tables
- Shape 2/3 with `command` ordered before `event`
- Shape 3 with an extra `timeout = 5000` key — worse than a leak: the
  regex matched only the command line, leaving `timeout = 5000`
  orphaned outside any TOML table (invalid TOML)
- Tight whitespace `event="SessionStart"` (no spaces around `=`)

The structural rewrite uses the TOML parser already present in this file
(`getTomlTableSections` + `getTomlLineRecords` + `parseTomlValue` +
`removeContentRanges` + `collapseTomlBlankLines`):

1. Find every section whose path is `hooks` or starts with `hooks.`.
2. For each, walk the section's line records and parse `command` values
   structurally — match by basename equality (`gsd-update-check.js` or
   `gsd-check-update.js`), never by regex on raw bytes.
3. Detect orphaned `[[hooks.SessionStart]]` parents: empty body and a
   stale child immediately follows → mark for removal.
4. Extend each removal range backward through any preceding
   `# GSD Hooks` marker line (detected via line records, not text scan).
5. Remove ranges atomically and collapse resulting blank-line runs.

Legacy hook basenames are hoisted to template-literal constants so the
existing `install-hooks-copy.test.cjs` quoted-literal guard continues to
catch accidental *registration* of the inverted filename, while strip
detection (which legitimately needs both names) bypasses it.

Test coverage added: 8 new sub-tests exercising the four whitespace/
ordering variations (with and without trailing newline) plus a
`[[hooks.UserPromptSubmit]]` user-authored hook to guarantee the strip
only touches GSD-managed sections. 20/20 in the file, 5867/5867 in the
full suite.
2026-04-29 21:51:58 -04:00
Tom Boucher
f2ada8500c chore(#2868): switch canary publish from main to dev branch (#2871)
* chore(#2868): switch canary publish from main to dev branch

Swaps the four `if:` guards in `.github/workflows/canary.yml` from
`refs/heads/main` to `refs/heads/dev` so the canary stream is owned
by the new long-lived integration branch. Adds a policy comment at
the top of the workflow documenting the branch->dist-tag mapping
(dev=@canary, main=@next/@latest, no overlap).

Closes #2868

* fix(#2868): summary block matches publish-step gate

CodeRabbit caught: the Summary step keyed off DRY_RUN only, so a
non-dry-run on main would falsely report "Published"/"Tagged" even
though all four publish steps were skipped by the new dev-only gate.

Add PUBLISH_ELIGIBLE env mirroring the publish-step `if:` expression
and a VALIDATION ONLY branch in the summary so non-dev runs report
honestly.
2026-04-29 17:43:30 -04:00
Tom Boucher
f6a6e43226 fix(#2872): auto-close PRs that omit the issue-link keyword (#2873)
The Require Issue Link workflow was posting a comment and failing the
status check, but never transitioning the PR to closed. PR templates
promise auto-close behavior; PR #2863 demonstrated the gap (opened
without a Closes #N, sat open until manually closed).

Adds a `pulls.update({state: 'closed'})` call after the existing
comment, updates the comment heading to 'PR auto-closed', and tells
the author how to reopen after fixing the body.

Closes #2872
2026-04-29 17:40:18 -04:00
Tom Boucher
107a83ebf7 docs(#2859): add release notes for 1.39.0-rc.7 (#2860)
rc.7 will be the first RC in the 1.39.0 train that actually rolls in
the post-rc.5 fixes from main (rc.6 was content-identical to rc.5 — see
#2856). Notes enumerate each fix with PR/issue link, recap rc.6 / rc.5 /
rc.4, and follow the established docs/RELEASE-v1.39.0-rc.X.md format.

No SDK-version pinning advice (consistent with the rc.6 doc cleanup).
Markdownlint-clean fenced code blocks.

Closes #2859
2026-04-29 08:58:16 -04:00
Tom Boucher
43a13217b7 docs(#2856): add docs/RELEASE-v1.39.0-rc.6.md (#2857)
* docs(#2856): add release notes for 1.39.0-rc.6

Documents what's actually in rc.6 (= rc.5 content + version-bump only —
release/1.39.0 was not synced with main before the bump) plus the known
SDK publish failure (@gsd-build/sdk@1.39.0-rc.6 is missing from npm with
404 PUT error). Format mirrors RELEASE-v1.39.0-rc.5.md.

Closes #2856

* docs(#2856): drop SDK refs from rc.6 notes; tag git log fence

Per maintainer + CodeRabbit review:
- Strip the 'Known issue: split publish' section, the SDK pin Note, and
  the @gsd-build/sdk follow-up bullet. SDK publish failure is a known
  separate issue and shouldn't block the rc.6 docs.
- Add bash language tag to the git log fence (markdownlint MD040).
2026-04-29 08:43:39 -04:00
Tom Boucher
2498f5649d docs(release): backfill CHANGELOG with 17 RC-train entries before v1.39.0 final cut (#2854)
Adds [Unreleased] entries for PRs that landed between v1.39.0-rc.4 and
v1.39.0-rc.6 but were missing from CHANGELOG.md. One bullet per PR,
grouped Added (#2828) and Fixed (16 entries: #2788, #2791, #2794, #2796,
#2798, #2801, #2803, #2805, #2808, #2829, #2831, #2832, #2835, #2836,
#2838, #2839).

Closes #2853
2026-04-29 08:29:47 -04:00
Tom Boucher
e81592878e feat(#2789): trim skill description anti-patterns; enforce 100-char budget (#2823)
* feat(#2789): trim skill description anti-patterns; enforce 100-char budget

- Trim descriptions in all commands/gsd/*.md files over 100 chars
- Remove flag documentation from descriptions (belongs in argument-hint)
- Remove Triggers: keyword stuffing
- Add scripts/lint-descriptions.cjs — fails on descriptions > 100 chars
- Add npm script: lint:descriptions
- Add tests/enh-2789-description-budget.test.cjs

Closes #2789

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(#2789): add CHANGELOG entry for description budget lint

* docs(#2789): update COMMANDS.md descriptions; add skill description standards note

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 08:14:11 -04:00
Tom Boucher
4815b3c972 fix(#2838): SUMMARY rescue handles gitignored .planning (#2850)
* fix(#2838): SUMMARY rescue handles gitignored .planning explicitly

The pre-fix rescue used `git ls-files --modified --others --exclude-standard`
to detect uncommitted SUMMARY.md before worktree removal. When projects
gitignore .planning/, --exclude-standard filters out the very files the
rescue is meant to save, the rescue branch is skipped, and `git worktree
remove --force` permanently deletes the SUMMARY.

Replace both rescue blocks (quick.md, execute-phase.md) with a
filesystem-level find + cp rescue that bypasses gitignore entirely and
avoids the worktree↔main commit/merge cascade. cmp -s makes it idempotent.

Adds tests/bug-2838-summary-rescue-gitignored-planning.test.cjs which
extracts each rescue block, runs it against a real temp repo with a
gitignored .planning/, and asserts the SUMMARY survives worktree removal.

* test(#2838): assert rescue block exits 0 in idempotency test

CodeRabbit (Minor): the idempotency test pre-creates the destination
SUMMARY.md, so even a syntax/runtime error in the rescue block would
silently false-pass. Add an explicit r.status === 0 assertion.
2026-04-29 08:07:12 -04:00
Tom Boucher
f9ed47ac8b fix(#2832): gsd-sdk auto detects Codex runtime correctly (#2844)
* fix(#2832): gsd-sdk auto detects Codex runtime correctly

Two-part fix for #2832 (gsd-sdk auto silently routing non-Claude runtime
projects through the Claude Agent SDK):

1. Runtime gate at the `auto` entry point. New `runtime-gate.ts` exports
   `assertRuntimeSupportsAutoMode(config)` which throws an actionable error
   when `GSD_RUNTIME` / `config.runtime` resolves to a non-Claude runtime
   (codex, gemini, opencode, etc.). The autonomous orchestrator only knows
   how to drive `@anthropic-ai/claude-agent-sdk` today; failing fast with a
   clear pointer at the in-session slash commands beats the previous instant
   `[FAILED] $0.00 0.1s` flake. Wired into `cli.ts` before the GSD/InitRunner
   construction.

2. Runtime-aware `resolveModel()` in `session-runner.ts`. The profile -> id
   map (`balanced -> claude-sonnet-4-6`, etc.) was applied unconditionally,
   so even with `runtime: codex` and `resolve_model_ids: omit` the SDK
   forced a Claude id into `query()`. Now the profile map only fires when
   the runtime is Claude and the explicit `resolve_model_ids: "omit"` knob
   short-circuits to undefined, mirroring `query/config-query.ts`.

Tests (vitest, sdk/src):
- runtime-gate.test.ts (8 cases): claude / unset / unknown pass; codex,
  gemini, opencode throw; GSD_RUNTIME wins over config.runtime; error
  message references #2832 and the slash-command workaround.
- session-runner.test.ts (4 new cases under "resolveModel runtime
  awareness (#2832)"): codex runtime + balanced profile -> no model
  injected; resolve_model_ids: omit -> no model; claude runtime still
  resolves to claude-sonnet-4-6 (no regression); explicit options.model
  wins on any runtime.

* fix(#2832): address CR — env-precedence in resolveModel + accurate source attribution

Two CodeRabbit findings on PR #2844:

1. session-runner.ts:resolveModel() (Major) — read runtime via detectRuntime()
   so GSD_RUNTIME env precedence is honored. Without this, a Codex run with
   a Claude-shaped config still fell into the Claude-only profile-id branch.

2. runtime-gate.ts:assertRuntimeSupportsAutoMode() (Minor) — when GSD_RUNTIME
   holds an unsupported value, detectRuntime() falls through to config but
   the source label still reported the discarded env value. Fix: validate
   env against SUPPORTED_RUNTIMES before attributing the source.

Tests added for both: env-precedence in session-runner, source attribution
in runtime-gate. 17/17 pass.
2026-04-29 08:03:32 -04:00
Tom Boucher
91194cdbff chore(#2828): add canary release workflow (#2830)
* chore(#2828): add canary release workflow (dev builds on push to main)

Publishes get-shit-done-cc@canary and @gsd-build/sdk@canary on every
push to main. Version format: {base}-canary.{N} where base strips any
pre-release suffix from package.json (1.39.0-rc.4 → 1.39.0-canary.1).

Sequential canary number is auto-detected from existing git tags so
reruns never collide. Concurrency group cancels stale in-flight canary
runs when commits land quickly.

Mirrors the structure and steps of release.yml: same checkout pins,
Node 24, npm-publish environment, build:sdk, tarball verification,
dry-run publish gate, and publish verification with sleep 10.

Closes #2828

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(#2828): address CodeRabbit review findings on canary.yml

- cancel-in-progress: false — was true, allowing a newer push to cancel a
  run mid-publish (after tag push but before SDK publish), leaving a partial
  release state that's unrecoverable since npm versions are immutable

- Guard tag/publish/verify steps with github.ref == 'refs/heads/main' so
  a manual workflow_dispatch from a feature branch (dry_run defaults false)
  cannot accidentally publish unmerged code under the shared canary dist-tag

- Replace fixed sleep 10 with exponential backoff retry loop (delays: 5 10
  20 30 45s); fixed sleep is flaky against normal npm CDN replication lag
  and a false failure forces a new canary number since the tag already exists

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(plan-phase): expose --mvp flag in command frontmatter

Adds --mvp to argument-hint and Flags doc. Workflow handler in next commit.

* chore(#2828): remove push:main trigger from canary workflow

Submission rate to main is too high to auto-publish a canary on every
merge. Restrict the workflow to manual workflow_dispatch only.

Closes #2828

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 08:02:59 -04:00
Tom Boucher
74b81379cf fix(#2836): audit-open quick SUMMARY filename + UAT terminal-status drift (#2847)
* fix(#2836): audit-open quick SUMMARY filename + UAT terminal-status drift

Fixes two convention drifts in bin/lib/audit.cjs that produced false-positive
"open" items at every milestone close:

1. scanQuickTasks: looked for bare `SUMMARY.md`, but workflows/quick.md
   mandates `${quick_id}-SUMMARY.md`. Now matches either filename so quick
   tasks created via the documented workflow are recognized.

2. scanUatGaps: only treated `status: complete` as terminal, but
   workflows/execute-phase.md uses `status: resolved` post-gap-closure.
   Now treats both `complete` and `resolved` as terminal, with `result:
   all_pass` as a fallback when status is absent.

Also reconciles workflows/help.md one-liner that referenced bare
`SUMMARY.md` so docs match the authoritative quick.md workflow.

Adds tests/bug-2836-audit-open-summary-uat-drift.test.cjs with 6
structural regression tests covering both fixes plus no-regression cases.

* refactor(#2836): hoist TERMINAL_UAT_STATUSES outside scanUatGaps loop

Address CodeRabbit nitpick: the Set was being recreated on each UAT file
iteration. Hoist to module scope so it is constructed once.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 08:00:17 -04:00
Tom Boucher
12b6ba4e34 fix(#2829): gsd-sdk resolvable in local-mode installs (#2848)
* fix(#2829): gsd-sdk resolvable in local-mode installs

Local-mode installs previously short-circuited installSdkIfNeeded() the
moment opts.isLocal was true, leaving every `gsd-sdk query …` call site
unable to resolve the binary on PATH. The published tarball ships
sdk/dist/cli.js and bin/gsd-sdk.js regardless of mode, and the shim
resolves the CLI relative to its own __dirname — so the same self-link
strategy that powers npx-cache global installs (#2775) also works for
local installs. We now run the shared self-link path whenever the dist
is present, and only fall back to a non-fatal warning + early return
when the dist is genuinely missing (preserving the #2678 contract).

* test(#2829): correct precondition comment about ~/.local/bin

Address CodeRabbit feedback — the test does not create ~/.local/bin,
so reword the inline precondition to "any HOME bin candidate remains
off-PATH" to match what the test actually sets up.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 07:59:30 -04:00
Tom Boucher
f4412349f0 fix(#2835): align CR-INTEGRATION tests with hyphen-form skill names (#2843)
* fix(#2835): align CR-INTEGRATION tests with hyphen namespace

PR #2819 changed autonomous.md skill invocations from `gsd:code-review`
(colon) to `gsd-code-review` (hyphen). Tests still asserted the legacy
colon form against the user-installed plugin dir (which lags the repo).

Switch tests to:
- Read autonomous.md from the canonical repo WORKFLOWS_DIR (not the
  plugin install location, which can be stale)
- Parse `Skill(skill="...")` invocations structurally instead of
  substring matching, and assert the canonical hyphen form is present
  while explicitly rejecting the legacy colon form.

Closes #2835

* test(#2835): parse Skill() invocations structurally in CR-INTEGRATION tests

Replace raw-text regex/.includes() assertions with a proper parser that
walks autonomous.md, skips escaped string contexts, and yields
[{ skill, args }] objects. The three CR-INTEGRATION tests now assert
against parsed fields and tokenized args (not substring matches),
addressing CodeRabbit feedback on PR #2843.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 07:57:30 -04:00
Tom Boucher
a7f83ee663 fix(#2831): expand HOME in OpenCode @file references on all platforms (#2842)
* fix(#2831): expand HOME in OpenCode skill/template paths

OpenCode does not shell-expand $HOME in @file references on any platform —
the literal `@$HOME/...` path is resolved relative to the config command/
dir, producing `command/$HOME/...` (file not found). The previous fix for
#2376 only guarded Windows; extend to all platforms.

Closes #2831

* test(#2831): assert behavior via exported computePathPrefix, not source grep

Addresses CodeRabbit review on PR #2842:
- Extracts pathPrefix logic into a named, test-exported computePathPrefix
  helper in bin/install.js (no behavior change at the call site).
- Rewrites bug-2376 and bug-2831 regression tests to call the exported
  function directly instead of regex-matching install.js source text,
  per the repo's no-source-grep testing standard.
- Wraps temp-dir test setup in try/finally so cleanup runs on assertion
  failures (no leaked tmp dirs).
2026-04-29 07:56:51 -04:00
Tom Boucher
7fae804296 fix(#2839): transactional cleanup tail for /gsd-code-review-fix (#2846)
* fix(#2839): make /gsd-code-review-fix cleanup transactional

Cleanup tail in agents/gsd-code-fixer.md previously did 'git worktree
remove' without any recovery marker. If the process was killed between
fix commits and worktree removal, the orphan worktree + branch survived
with no resume path — the next run had no way to discover or finish
the cleanup.

Introduce a recovery sentinel at ${phase_dir}/.review-fix-recovery-pending.json
with strict ordering:
- Sentinel written AFTER 'git worktree add' succeeds (never points at a
  worktree that does not exist).
- Sentinel removed ONLY AFTER 'git worktree remove' returns successfully
  (interruption between commits and removal leaves a sentinel behind).
- New runs detect a pre-existing sentinel, force-remove the recorded
  orphan worktree, then drop the stale sentinel before continuing —
  making the agent self-healing after a crash.

Closes #2839

* fix(#2839): harden sentinel JSON parse and scope ordering assertion

Address CodeRabbit review feedback on PR #2846:

- agents/gsd-code-fixer.md: Guard the recovery-sentinel JSON parse with
  try/catch so a corrupted/truncated sentinel (a realistic crash artifact)
  emits a warning and yields an empty prior_wt instead of aborting setup.
  This preserves the self-healing recovery path even when the sentinel
  itself is the casualty of the original crash.

- tests/bug-2839-review-fix-transactional-cleanup.test.cjs: Scope the
  cleanup-ordering assertion to the cleanup-tail section of the
  setup_worktree step rather than first global occurrences. Previously
  the assertion could pass on pre-recovery references even if cleanup-tail
  ordering regressed. The regex also now accepts the shell-variable form
  (\`rm -f \"\$sentinel\"\`) used in the cleanup tail.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 07:56:32 -04:00
Tom Boucher
c3a42d66f9 Revert "feat(install): add Hermes Agent runtime support" (#2849) 2026-04-29 07:44:49 -04:00
Jeremy McSpadden
0acf1de88c Merge pull request #2845 from teknium1/feat/hermes-runtime
feat(install): add Hermes Agent runtime support
2026-04-29 06:38:13 -05:00
teknium1
5a636bc90a feat(install): add Hermes Agent runtime support (#2841)
Adds Hermes Agent as a supported installation target. Users can run
\`npx get-shit-done-cc --hermes\` to install all 86 GSD commands as
skills under \`~/.hermes/skills/gsd-*/SKILL.md\`, following the same
open skill standard as Claude Code 2.1.88+, Qwen Code, Antigravity,
Trae, Augment, and Codebuddy.

Hermes Agent is an open-source AI agent framework by Nous Research
(NousResearch/hermes-agent, MIT). Its skill loader accepts the Claude
skill format as-is: frontmatter parsed with PyYAML SafeLoader (unknown
keys like \`allowed-tools\` / \`argument-hint\` ignored), body XML tags
(\`<objective>\`, \`<execution_context>\`, \`<process>\`) passed directly
to the model. Compatibility proven end-to-end with all 86 GSD skills
loading cleanly, \`skill_view()\` returning full bodies, and
\`build_skills_system_prompt()\` emitting them into the agent system
prompt — zero Hermes code changes required.

Changes:
- \`bin/install.js\`: --hermes flag, getDirName/getGlobalDir/getConfigDirFromHome
  support, HERMES_HOME env var (native to Hermes — used for profile
  mode / Docker deploys), install/uninstall pipelines, interactive
  picker option 10 (alphabetical: between Gemini and Kilo), .hermes
  path replacements in copyCommandsAsClaudeSkills and
  copyWithPathReplacement, legacy commands/gsd cleanup, CLAUDE.md ->
  HERMES.md and "Claude Code" -> "Hermes Agent" content rewrites in
  skills/agents/hooks, runtime-appropriate finish message.
- \`get-shit-done/bin/lib/core.cjs\`: add hermes to KNOWN_RUNTIMES;
  add RUNTIME_PROFILE_MAP.hermes with OpenRouter-slug defaults
  (Hermes is provider-agnostic; these defaults resolve across
  OpenRouter, native Anthropic, and Copilot via Hermes' aggregator-
  aware resolver, and are overridable per-tier via
  model_profile_overrides.hermes.{opus,sonnet,haiku}).
- \`README.md\`: Hermes Agent in tagline, runtime list, verification
  command, install/uninstall examples, \`--hermes\` flag reference.
- \`tests/hermes-install.test.cjs\`: new, 14 tests covering directory
  mapping, HERMES_HOME env var precedence, install/uninstall
  lifecycle, user-skill preservation, engine cleanup.
- \`tests/hermes-skills-migration.test.cjs\`: new, 11 tests covering
  frontmatter conversion, path replacement (~/.claude/ ->
  \$HERMES_HOME/skills/), CLAUDE.md -> HERMES.md, "Claude Code" ->
  "Hermes Agent", stale skill cleanup, SKILL.md format validation.
- \`tests/multi-runtime-select.test.cjs\`: updated for new option
  numbering (hermes=10, kilo=11, opencode=12, qwen=13, trae=14,
  windsurf=15, all=16).
- \`tests/kilo-install.test.cjs\`: updated assertions for Kilo having
  moved from option 10 to option 11.

Closes #2841

Implementation notes:
- Zero custom code paths: Hermes reuses copyCommandsAsClaudeSkills()
  identical to Qwen Code / Antigravity pattern.
- Path replacement: ~/.claude/, \$HOME/.claude/, ./.claude/ ->
  .hermes equivalents in skill/agent/hook content.
- Config precedence: --config-dir > HERMES_HOME > ~/.hermes (matches
  how Hermes itself resolves its home directory).
- Legacy cleanup: removes commands/gsd/ if present from a prior
  install, preserving dev-preferences.md (same as Qwen).
- No external dependencies added.

Testing: 5841 / 5841 tests pass (0 failures, 0 regressions)
- 14 new tests in hermes-install.test.cjs
- 11 new tests in hermes-skills-migration.test.cjs
- multi-runtime-select.test.cjs renumbered + 1 new test (single choice for hermes)
2026-04-29 04:27:46 -07:00
Tom Boucher
eeaf9c556f fix(#2787): track fenced code blocks in extractCurrentMilestone (#2812)
* fix(#2787): track fenced code blocks in extractCurrentMilestone

The milestone-end search used a multiline regex against the raw
restContent string. Lines inside fenced code blocks (``` or ~~~)
that matched the milestone-heading pattern (e.g. `# note v1.0`)
prematurely set sectionEnd, hiding all phases after the block from
roadmap analyze, roadmap get-phase, and every downstream command.

Replace the regex match with a line-by-line scan that tracks fence
state. Lines inside an open fence are skipped regardless of content.
Adds three regression tests covering backtick fences, tilde fences,
and the roadmap get-phase code path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(#2787): track fence delimiter instead of toggling bare boolean

Replace the inFence boolean with fenceChar/fenceLen tracking so that
indented fences (up to 3 leading spaces) and mixed-delimiter content
(~~~ inside a backtick fence) are parsed correctly. A closing fence
is only recognised when it uses the same character as the opening
delimiter and has at least the same run length, matching the CommonMark
spec.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(#2787): require fence-only closing line — reject info-string lines as closers

A closing fence delimiter must contain only optional trailing whitespace.
A line like \`\`\`js inside an open fence has an info string and must not
close it. The previous regex /^\s{0,3}([`~]{3,})/ matched the opening of any
such line, so the closing check could toggle fenceChar off on an info-string
line and expose subsequent heading-like content to the milestone-end detector.

Fix: capture the trailing portion of every fence-candidate line and only clear
fenceChar when trailing matches /^\s*$/ (per CommonMark §4.5).

Adds a regression test covering the ```text / ```js nesting scenario.

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 20:37:47 -04:00
Tom Boucher
9e58c45ea1 fix(#2791): GSD_WORKSTREAM env var respected by gsd-sdk query + gsd-tools bin alias (#2821)
* fix(#2791): GSD_WORKSTREAM env var respected by gsd-sdk query + gsd-tools bin alias

Two fixes for gsd-sdk binary issues:

**Issue 1 — Binary name collision:**
Both `get-shit-done-cc` and `@gsd-build/sdk` declare `bin: { "gsd-sdk": ... }`.
Added `"gsd-tools": "bin/gsd-sdk.js"` to `package.json` bin so users with the
collision can invoke `gsd-tools query <cmd>` as a conflict-free alternative.

**Issue 2 — Query registry not workstream-aware:**
`gsd-sdk query` commands ignored `GSD_WORKSTREAM` env var, always reading from
the root `.planning/` even when a workstream was active. `gsd-tools.cjs` reads
`GSD_WORKSTREAM` via `planningDir()`, so all ~35 `gsd-sdk query` call sites in
workflow files were broken in workstream-scoped projects.

Fix: added env var fallback in `sdk/src/cli.ts` — when `--ws` is not provided,
`GSD_WORKSTREAM` is used (with name validation; invalid values are silently
ignored, matching CJS behaviour).

Regression test: `tests/bug-2791-sdk-workstream-env.test.cjs`

Closes #2791

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(#2791): address CodeRabbit — precedence test, invalid env fallback assertion, bash fence

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 20:23:32 -04:00
Tom Boucher
897cff6051 fix(#2805): find-phase returns null phase_dir for archived phases (not archive path) (#2818)
* fix(#2805): add regression test — archived phase fallback already fixed in source

getPhaseInfoWithFallback already discards archived disk matches when the
current ROADMAP lists the phase (line 133: phaseInfo?.archived &&
roadmapPhase?.found). The regression test confirms this behavior and
prevents the bug from being reintroduced by future refactors.

Regression test: tests/bug-2805-archived-phase-fallback.test.cjs
(3 tests: phase_dir null, phase_found true, phase_name from ROADMAP)

* fix(#2805): address CodeRabbit — exact phase_name assertion, bash fence
2026-04-28 20:23:29 -04:00
Tom Boucher
a4e15d5616 fix(#2788): audit-uat reads human_verification items from frontmatter (#2814)
* fix(#2788): audit-uat reads frontmatter human_verification array

parseVerificationItems only searched the body for a '## Human Verification'
section. gsd-verifier writes items to the frontmatter human_verification:
YAML array, so audit-uat returned total_items: 0 for all such files.

Two fixes:
1. Read frontmatter human_verification: array first (via extractFrontmatter);
   return those items if present (primary path for gsd-verifier output).
2. Relax the body-section heading regex to accept underscore separators and
   parenthetical suffixes (e.g. '## human_verification (action required)').

Regression test: tests/bug-2788-audit-uat-frontmatter.test.cjs

* fix(#2788): address CodeRabbit — trim whitespace entries, support hyphenated headings, bash fence
2026-04-28 20:22:59 -04:00
Tom Boucher
eddb2a205b fix(#2801): add ingest-docs handler to gsd-tools init dispatch (#2820)
* fix(#2801): add ingest-docs handler to gsd-tools init dispatch

The `/gsd-ingest-docs` workflow was broken because `workflows/ingest-docs.md`
called `gsd-sdk query init.ingest-docs` but the installed binary is `gsd-tools`,
and `gsd-tools init` had no `ingest-docs` case in its dispatch switch.

- Added `cmdInitIngestDocs` function to `init.cjs` and exported it; returns
  `project_exists`, `planning_exists`, `has_git`, `project_path`, `commit_docs`
- Added `case 'ingest-docs'` to the `init` switch in `gsd-tools.cjs`
- Updated `workflows/ingest-docs.md` to call `gsd-tools init ingest-docs`
  (line 55) and `gsd-tools commit` (line 292) instead of `gsd-sdk query ...`
- Regression test: `tests/bug-2801-ingest-docs-handler.test.cjs`

Closes #2801

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(#2801): address CodeRabbit — commit_docs assertion, broader gsd-sdk detection, bash fence

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 20:22:40 -04:00
Tom Boucher
5fe1f00a0d fix(#2808): SKILL.md files use hyphen name form (gsd-cmd not gsd:cmd) (#2819)
* fix(#2808): SKILL.md name uses hyphen form for Claude Code autocomplete

skillFrontmatterName() was converting gsd-<cmd> to gsd:<cmd> (colon) so
installed SKILL.md files had name: gsd:add-phase etc. Claude Code surfaces
this name in autocomplete, showing the deprecated colon form to users even
though the hyphen form is canonical everywhere else.

Root cause: the colon form was needed because workflows called
Skill(skill="gsd:<cmd>"). All 4 remaining colon-form Skill() calls in
autonomous.md and execute-phase.md are updated to hyphen form.

skillFrontmatterName() now returns the hyphen dir name unchanged.
Updated 4 existing tests that asserted colon form.

Regression test: tests/bug-2808-skill-hyphen-name.test.cjs

* fix(#2808): address CodeRabbit — bash/text fences, structured test assertions, fail-loud on errors
2026-04-28 20:22:37 -04:00
Tom Boucher
fa78692167 fix(#2796): roadmap-update-plan-progress accepts --phase flag form (#2815)
* fix(#2796): roadmap update-plan-progress accepts --phase flag form

roadmap-update-plan-progress used positional-only arg parsing: args[0].
When execute-phase.md:228 calls it with --phase <N>, args[0] was the
literal string "--phase", which findPhase received as the phase number.
findPhase returned found:false, causing updated:false with no write.
ROADMAP.md plan checkboxes silently never advanced.

Fix: check for --phase <value> first; fall back to the first non-flag
positional argument for backward-compatible direct calls.

Regression test: tests/bug-2796-arg-parsing-regression.test.cjs

* fix(#2796): address CodeRabbit — guard --phase against flag-like values, bash fence
2026-04-28 20:22:30 -04:00
Tom Boucher
b959b1844f fix(#2803): config-get supports --default <value> fallback for missing keys (#2817)
* fix(#2803): honor --default flag in SDK config-get handler

The gsd-sdk query config-get handler ignored the --default <value> flag.
Missing keys always threw 'Key not found' (exit 1), making 8 workflow
sites that rely on config-get --default fall through to error paths.

The CJS path (gsd-tools.cjs) honored --default since #1893; this ports
that behavior to the SDK configGet handler.

Regression test: tests/bug-2803-config-get-default-flag.test.cjs

* fix(#2803): address CodeRabbit — require --default value, keep missing config.json as error, bash fence
2026-04-28 20:21:48 -04:00
Tom Boucher
7616309a32 fix(#2798): add context_window to VALID_CONFIG_KEYS allowlist (#2816)
* fix(#2798): add regression test — context_window key already in VALID_CONFIG_KEYS

context_window was already added to both VALID_CONFIG_KEYS allowlists
(CJS and SDK) in a prior fix. The regression test confirms it stays there
and that config-set context_window succeeds end-to-end.

Regression test: tests/bug-2798-context-window-config-key.test.cjs

* fix(#2798): address CodeRabbit — add bash language to release notes fence
2026-04-28 20:21:44 -04:00
Tom Boucher
d46efb4790 fix(#2784): clear shared ~/.cache/gsd/ update-check cache in update workflow (#2813)
* fix(#2784): clear shared ~/.cache/gsd/ cache in update workflow

The SessionStart hook (hooks/gsd-check-update.js) writes update-check
results to $HOME/.cache/gsd/gsd-update-check.json (shared, tool-agnostic).
The update.md run_update step only cleared per-runtime paths like
~/.claude/cache/gsd-update-check.json, so the statusline kept showing the
stale upgrade indicator after a successful update.

Fix: add rm -f "$HOME/.cache/gsd/gsd-update-check.json" to the
cache-clear block in the run_update step.

Regression test: tests/bug-2784-update-cache-clear-path.test.cjs

* fix(#2784): address CodeRabbit review — four edge-cases count, bash fence, structured test assertions
2026-04-28 20:21:41 -04:00
Tom Boucher
055b43054f fix(#2794): embed model_profile_overrides.opencode.<tier> into generated OpenCode agents (#2822)
* docs: add CHANGELOG entry and rc.5 release notes for #2809 Codex hooks migrator fixes

Covers the five correctness findings addressed in the round-5 CR of PR #2809:
parseHooksBody key parser (hyphenated/quoted keys), buildNestedBlock empty-handler
guard, legacyMapSections segment-count filter, quoted-dot regression test, and
strengthened command path assertion.

Closes #2810

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(#2794): embed model_profile_overrides.opencode.<tier> into generated OpenCode agents

OpenCode agent files were missing `model:` frontmatter when the user configured
tier-based model resolution via `model_profile_overrides.opencode.*`. Only
explicit `model_overrides[agent]` was consulted; the runtime profile resolver
(used by the Codex path since #2517) was never called for OpenCode agents.

Added a tier-resolver fallback in the OpenCode agent conversion block in
`bin/install.js`. Precedence (matching Codex behavior):
  model_overrides[agent] > model_profile_overrides.opencode.<tier> > omit

Regression test: `tests/bug-2794-opencode-model-profile-overrides.test.cjs`

Closes #2794

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 20:16:27 -04:00
Tom Boucher
06de427b09 docs: add CHANGELOG entry and rc.5 release notes for #2809 Codex hooks migrator fixes (#2811)
Covers the five correctness findings addressed in the round-5 CR of PR #2809:
parseHooksBody key parser (hyphenated/quoted keys), buildNestedBlock empty-handler
guard, legacyMapSections segment-count filter, quoted-dot regression test, and
strengthened command path assertion.

Closes #2810

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 18:35:59 -04:00
Tom Boucher
3c03a153a5 fix(#2773): emit correct Codex 0.124.0+ two-level nested hooks schema (#2809)
* fix(#2773): emit correct Codex 0.124.0+ two-level nested hooks schema

Codex 0.124.0's stable spec requires:

  [[hooks.SessionStart]]          ← event entry (optional matcher)

  [[hooks.SessionStart.hooks]]    ← handler sub-table
  type = "command"
  command = "node ..."

Previous GSD versions wrote the flat [[hooks]] + event = "SessionStart"
form (#2637) or a single-block [[hooks.SessionStart]] without the nested
.hooks sub-table (#2760). Both are rejected by Codex 0.124.0+ at launch.

Changes:

bin/install.js
  - Hook block emission now always writes the two-level nested AoT form.
  - migrateCodexHooksMapFormat extended to also migrate flat [[hooks]]
    array-of-tables entries (event = "..." key → [[hooks.<EVENT>]] form).
    Flat [[hooks]] and [[hooks.<EVENT>]] are mutually exclusive TOML types;
    any pre-existing flat entries must be promoted before GSD appends its
    own namespaced hooks.
  - Migrated flat AoT blocks are inserted BEFORE the GSD marker so they
    stay in the "user" portion of the file and survive stripGsdFromCodexConfig.
  - stripCodexGsd* regexes cover all four historical block shapes.
  - validateCodexConfigSchema no longer rejects flat [[hooks]] at the root
    level (removing the false-positive that blocked install when users had
    their own AfterCommand hooks). The validator still enforces the nested
    [[hooks.<EVENT>.hooks]] shape for entries that have a .hooks sub-table.

tests/
  - bug-2760-codex-install-defensive.test.cjs: 29/29 passing.
    Added 5 new regression cases for fresh install, upgrade from each
    legacy shape, idempotent reinstall, and user hook preservation.
  - codex-config.test.cjs: 106/106 passing.
    All migration tests updated to assert [[hooks.<TYPE>.hooks]] sub-table
    (command now in handler level, not event-entry level).
    New tests: flat [[hooks]] migration (SessionStart, AfterCommand),
    install+uninstall preserves non-GSD AfterCommand hook.

Closes #2773

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit review + CI regression in bug-2698-crlf-install

CI regression (#2698 tests):
  Strip GSD-managed hook blocks BEFORE running migrateCodexHooksMapFormat.
  The previous order let migration convert the stale [[hooks]] + event =
  "SessionStart" + gsd-update-check.js block to [[hooks.SessionStart]] form
  before Shape 1 strip regex could match it; Shape 1 only matches the flat
  [[hooks]] form, so the stale block survived reinstall. Swapping to
  strip-then-migrate ensures only user-authored hooks reach the migration step.
  Shape 3/4 regexes also extended to match both gsd-check-update.js and the
  legacy gsd-update-check.js filename so no variant slips through.

CodeRabbit actionable (major):
  migrateCodexHooksMapFormat now accepts single-quoted TOML event values
  (event = 'SessionStart') in the flat [[hooks]] filter and event-name
  extractor. TOML spec allows single-quoted literal strings; double-quote-only
  regexes silently skipped them, leaving the block unmigrated and triggering
  the hard-fail validator.

CodeRabbit nitpicks:
  tests/codex-config.test.cjs: replace indexOf('[[hooks.AfterCommand]]')
  ordering check with parseTomlToObject structural assertions (no-source-grep
  rule).
  tests/bug-2760-codex-install-defensive.test.cjs: replace three
  content.match(/…/g).length raw-text counts with parseTomlToObject structural
  assertions for single-handler and single-event-entry invariants.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit review #2 — extractFlatHookEventName helper + type assertions

- bin/install.js: consolidate TOML_QUOTED_STRING + TOML_EVENT_CAPTURE into a
  single extractFlatHookEventName() helper that rejects empty-string event values
  (event = "" or event = ''); previously two independent regexes had to be kept
  in sync and neither guarded against a blank event name producing a [[hooks.]]
  header

- tests/bug-2760-codex-install-defensive.test.cjs: add comments explaining why
  the e.command fallback is retained in both allSessionStartCommands and
  afterToolCommands collectors — migration only upgrades [hooks.TYPE] map-format
  sections, not existing [[hooks.TYPE]] namespaced AoT entries authored with
  command at event-entry level; removing the fallback causes false failures for
  preserved user entries

- tests/codex-config.test.cjs: add type = "command" assertions to all migration
  tests that verify .command but were missing .type checks; buildNestedBlock
  injects type = "command" when the source body has no explicit type key, so
  every migrated handler must carry it per the Codex 0.124.0+ schema

138 tests pass, 0 fail.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: CR round 3 + proactive audit — TOML quoting, stale AoT migration, strict validator

Three real issues from CodeRabbit round 3, plus the collateral improvements they
enable:

bin/install.js — tomlBareKey() helper (#2773 CR6a)
  buildNestedBlock interpolated the raw event name into [[hooks.${type}]] and
  [[hooks.${type}.hooks]] headers without TOML escaping. An event name containing
  spaces or punctuation (e.g. "Before Tool") would produce invalid TOML that
  parseTomlToObject would subsequently reject. Added tomlBareKey() — wraps the
  key in double-quoted TOML strings when it contains non-bare-key characters
  ([A-Za-z0-9_-]).

bin/install.js — staleNamespacedAotSections migration path (#2773 CR6b)
  migrateCodexHooksMapFormat handled [hooks.TYPE] (map-format) and flat [[hooks]]
  with event = "..." but ignored [[hooks.TYPE]] AoT entries that carried handler
  fields (command, type, timeout, statusMessage) at event-entry level without a
  nested [[hooks.TYPE.hooks]] sub-table. This is the pre-#2773 single-block shape
  that Codex 0.124.0+ rejects. Added staleNamespacedAotSections as the third
  migration category: detected by STALE_HANDLER_FIELD_PATTERN + absence of a
  [[hooks.TYPE.hooks]] sub-table in the same file; promoted to the two-level
  nested form by buildNestedBlock. Matcher-only entries (no handler fields) are
  intentionally skipped.

bin/install.js — validator now rejects event-level handler fields (#2773 CR6c)
  With migration covering the stale AoT shape, validateCodexConfigSchema can be
  strict: entries that have handler fields at event-entry level but no .hooks
  sub-array return ok: false instead of silently passing. Matcher-only entries
  (no handler fields and no .hooks) remain valid as event filters.

tests/codex-config.test.cjs — four new migration tests + missing type assertion
  Four tests cover the new stale AoT migration path: single-entry promotion,
  already-nested entry is left untouched (no double-wrap), multiple event types,
  and matcher-only entry is skipped. Added the missing type = "command" assertion
  to the CRLF migration test (the one miss from CR round 2).

tests/bug-2760-codex-install-defensive.test.cjs — strict .hooks-only collectors
  With stale AoT entries now migrated, the entry.command fallbacks in
  allSessionStartCommands and afterToolCommands are dead code. Replaced with
  strict entry.hooks-only collection guarded by an every(Array.isArray(e.hooks))
  pre-assertion, so any future regression that leaves handler fields at event
  level produces an explicit test failure rather than silently collecting them.

142 tests pass, 0 fail.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: CR round 4 — segment-safe quoted-key detection + structural test assertions

bin/install.js — getTomlTableSections now exposes segments (#2773 CR7a)
  The staleNamespacedAotSections filter used section.path.split('.').length > 2
  to skip [[hooks.TYPE.hooks]] sub-table entries. That check misclassifies quoted
  event names containing dots: [[hooks."before.tool"]] has path hooks.before.tool
  (3 dot-parts) but only 2 true parsed segments, so it was incorrectly excluded
  from migration. Fixed by adding segments to the getTomlTableSections return
  shape (already available on record.tableHeader.segments) and replacing the
  split-based check with section.segments.length !== 2, which uses the true
  parsed key count regardless of dots inside quoted names.

tests/codex-config.test.cjs — replace raw-equality assertions (#2773 CR7b)
  The two new no-op migration tests (already-nested and matcher-only) used
  assert.strictEqual(result, content) — raw string equality that conflicts with
  the repo no-source-grep testing standard. Replaced with structural assertions
  using parseTomlToObject: the already-nested test verifies the handler stays
  under .hooks[0] and no double-wrap occurs; the matcher-only test verifies the
  matcher key is preserved and no .hooks sub-array is added.

142 tests pass, 0 fail.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: CR round 5 — parseHooksBody key parser, empty-handler guard, segment-safe legacyMap filter, stronger test assertions

- parseHooksBody: replace /^([\w.]+)\s*=/ regex with parseTomlKey() so
  hyphenated keys (status-message) and quoted keys are not silently dropped
- buildNestedBlock: guard against handlerEntries.length === 0 — do not
  synthesise [[hooks.TYPE.hooks]] with type="command" but no command for
  matcher-only or otherwise handler-empty stale sections
- legacyMapSections filter: use section.segments.length === 2 (same fix
  applied to staleNamespacedAotSections in round 4) to prevent [hooks.X.Y]
  3-segment tables from being misclassified as event entries
- tests: add regression test for [[hooks."before.tool"]] quoted-dot event
  names; strengthen command path assertion to exact absolute path comparison

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 18:28:53 -04:00
Tom Boucher
c0730fffde docs(changelog): expand [Unreleased] with all 1.39.0-rc.4 changes since v1.38.5 (#2799)
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 22:56:03 -04:00
Tom Boucher
f983c95ffc Release/1.39.0-rc.4 (#2797)
* chore: bump version to 1.39.0 for release

* chore: bump to 1.39.0-rc.1

* chore: bump to 1.39.0-rc.2

* chore: bump to 1.39.0-rc.3

* chore: bump to 1.39.0-rc.4

* docs: add v1.39.0-rc.4 release notes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 22:32:14 -04:00
Tom Boucher
b44482cf03 fix(#2760): defensive Codex install — strip legacy agents blocks, default hooks to AoT, validate post-write schema (#2785)
* fix(#2760): defensive Codex install — strip legacy agents blocks, default hooks to AoT, validate post-write schema

Three defects, three defensive fixes shipped together. Issue reporter
never returned with the requested diagnostic backup, but four additional
users have since confirmed the same Codex breakage and ZakAnun confirmed
manual cleanup is the only working workaround — defensive triple ships
without the original backup grep, justified by the corroborating reports.

Fix 1 (defect 3 — confirmed real). The Codex hooks emit path always
appended a top-level `[[hooks]]` AoT block, which collides with users
who already use the namespaced AoT form `[[hooks.SessionStart]]`. New
helper `hasUserNamespacedAotHooks()` detects the user's preferred shape
on parse and the install emits the GSD-managed hook in that same shape
when present. Default for fresh configs stays at top-level `[[hooks]]`
so status-quo behavior is preserved.

Fix 2 (defects 1+2 — defensive). `stripLeakedGsdCodexSections()` (the
install-time stripper) now always purges bare `[agents]` single-bracket
tables and `[[agents]]` sequence tables regardless of GSD marker
presence — both forms are invalid in current Codex schema and produce
"invalid type: ..., expected struct AgentsToml". Previously gated on
GSD-name lookup which missed marker-stripped configs and third-party
authored entries. The uninstall-time stripper (`stripCodexGsdAgentSections`)
keeps its old conservative behavior so user-authored entries survive
uninstall.

Fix 3 (defensive). Post-write schema validation parses the bytes about
to be committed and asserts no bare `[agents]`, no `[[agents]]`, and no
bare `[hooks.<Event>]` tables remain. On failure the install restores
the pre-install backup of config.toml and aborts loudly so the user is
never left with a Codex CLI that refuses to load. Pre-install snapshot
is captured before installCodexConfig runs (not after) so restore
returns the file to its true pre-GSD state.

Tests added (10 new, 1 updated):
  - bug-2760-codex-install-defensive.test.cjs (10 new tests across 4
    describes: hooks AoT preservation, strip robustness for both
    [agents] and [[agents]] without marker, schema validator behavior,
    abort+restore via test seam)
  - codex-config.test.cjs "case 2 ..." updated to reflect new defensive
    bare-[agents] purge

Full suite: 5747 pass / 0 fail.

Closes #2760

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(#2760): normalize Codex hooks emit field name across migration and managed paths

The migrateCodexHooksMapFormat path emitted `type = "<TYPE>"` for legacy
[hooks.TYPE] sections, while the GSD-managed Codex install emitted
`event = "SessionStart"` — same target [[hooks]] schema, two different
field names. Codex currently tolerates both via permissive parsing, but
the moment one path tightens this becomes a silent #2760-class regression.

Normalize both call sites on `event` (the existing GSD-managed
convention). Update migration emit, docstring, and existing migration
assertions to match. Add a parity regression test that drives both code
paths and asserts the [[hooks]] field key is identical.

* test(#2153): fix test isolation by building hooks/dist on demand

The "Codex install copies hook file (#2153)" regression depends on
hooks/dist/ being populated, but that directory is gitignored and only
built by `npm run build:hooks`. The npm pretest chain runs `build:sdk`
but not `build:hooks`, so when this file is run in isolation
(`node --test tests/codex-config.test.cjs`) the hook copy step skips
silently and the regression test fails on a stale-environment artifact
rather than a real bug.

Add a top-level before() hook that runs scripts/build-hooks.js when
hooks/dist/ is missing or empty. Matches the pattern already used by
bug-1834-sh-hooks-installed and other install integration tests, so the
suite passes regardless of runner ordering or which tests are targeted.

* fix(#2760): structural TOML validation, atomic writes, and behavioral test rewrites

Addresses CodeRabbit review on PR #2785 plus source-grep violations the
maintainer flagged in the regression test.

Fix 1 (CR 3149606220) — validateCodexConfigSchema now parses the TOML into
a structured object first via the new parseTomlToObject helper, then
runs schema-shape checks against both the parsed structure and the table
section headers. Malformed TOML with valid-looking headers no longer
slips past validation.

Fix 2 (CR 3149606224) — Replaced the four source-grep assertions in
tests/bug-2760-codex-install-defensive.test.cjs (lines 109, 125, 169,
201) with structural assertions against the parsed TOML object via the
exported parseTomlToObject helper. Tests now verify behavior (the file
parses and contains the expected structure) instead of literal byte
patterns. Robust to formatting changes — exactly what the regex-loosening
suggestion was reaching for, done correctly. Confirmed clean by
`npm run lint:tests` (0 violations).

Fix 3 (CR 3149606234) — The describe block that mutates
installModule.__codexSchemaValidator now runs with concurrency: false
so the test seam mutation cannot leak into sibling suites that also
call runCodexInstall.

Fix 4 (CR outside-diff) — Approach (b): atomic temp-file + renameSync.
Added atomicWriteFileSync helper used by mergeCodexConfig and the final
hooks-write. A mid-write failure leaves the .tmp-<pid>-<n> sibling
behind (cleaned up immediately) and never truncates the original
config.toml. Paired with try/catch wrapping around the entire
post-snapshot mutation sequence so any unexpected throw also triggers
restoreCodexSnapshot. Two layers of defense: atomic write prevents the
corruption window, snapshot restore handles non-atomic write paths.

Added behavioral test for fix 4: stubs fs.renameSync to throw on the
configPath rename, asserts the on-disk bytes match the pre-install
snapshot byte-for-byte, asserts the parsed structure is still the
user's [model] section (no half-written GSD agents block), and asserts
no stray .tmp-* files remain. Marked concurrency: false because it
monkey-patches a global.

Test results: 5749/5749 pass, 0 fail. lint:tests clean.

* test(#2760): TOML-parse based assertions for bare-agents purge and hook-field parity (CodeRabbit follow-up)

* fix(#2760): treat write failures as fatal, strip legacy hooks before guard, tighten TOML parser (CR4)

CR4 finding 1 (MAJOR) — Write failures silently succeeded. The inner catch
around atomicWriteFileSync restored the snapshot then re-threw, but the outer
catch only matched 'post-write Codex schema validation failed' and downgraded
everything else to a warn-and-continue. Install finished with "Done!" while
Codex had no GSD agents configured. Fix: wrap writeErr with a `post-write
Codex install failed:` prefix and broaden the outer guard to `.startsWith(
'post-write')` so both schema-validation and write failures abort install.

CR4 finding 2 (MAJOR) — Legacy flat [[hooks]] block prevented namespaced AoT
upgrade. The `!configContent.includes('gsd-check-update')` guard short-
circuited the new namespaced emit when an existing install had the legacy
flat [[hooks]] block, leaving users stuck in the mixed layout this fix is
designed to eliminate. Fix: strip ALL existing managed gsd-check-update
hook blocks (top-level [[hooks]] AND namespaced [[hooks.SessionStart]])
BEFORE evaluating the includes guard, so every install converges on the
right shape regardless of prior state.

CR4 finding 3 (MAJOR) — Homegrown TOML parser silently accepted malformed
input. parseTomlValue happily consumed the `0` prefix of `timeout = 0.5`
and parseTomlToObject did not verify the full RHS was consumed, so
`key = "x" junk` and date/time literals slipped through. Per CONTRIBUTING
("No external dependencies in core"), option (b) was chosen over adding
@iarna/toml: (a) parseTomlValue rejects any integer immediately followed
by `.`, `e`, `E`, `:`, `-`, `T`, or `Z` (floats / dates / times); (b)
parseTomlToObject scans from parsed.end to the next newline and throws
`trailing bytes after value` if anything other than whitespace + optional
`# comment` is present.

* test(#2760): add CR4 regression tests + scope GSD_TEST_MODE + rename rename-fault test

CR4 finding 5 (NIT) — GSD_TEST_MODE leak. Saved previous value, set '1' for
the require, then restored (delete if undefined). No more test-only env var
leaking to siblings in the same node process.

CR4 finding 4 (NIT) — Renamed the existing fix-4 test from 'fs.writeFileSync'
to 'fs.renameSync' (the only call actually faulted) and added a sibling test
that stubs fs.writeFileSync to throw on the .tmp- target — exercising the
pre-rename branch of atomicWriteFileSync that was previously untested. Both
serialize via concurrency: false on the existing describe block.

CR4 finding 1 (MAJOR test) — New behavioral test asserts install throws with
a `post-write Codex install failed` message AND never prints "Done!" when
the hook-block atomic rename fails. Captures stdout via console.log stub,
asserts byte equality of restored snapshot. Faults only the rename whose
temp source contains gsd-check-update so earlier mergeCodexConfig writes
are not collateral damage.

CR4 finding 2 (MAJOR test) — New TOML-parsed behavioral test for the
legacy-hook upgrade path: pre-install has [[hooks.SessionStart]] (user) +
legacy flat [[hooks]] managed gsd-check-update entry; post-install must
have hooks.SessionStart as Array-of-tables with both user hook and GSD
entry, and no top-level [[hooks]] AoT remaining. Also asserts exactly one
gsd-check-update entry (no duplicates).

CR4 finding 3 (MAJOR test) — parseTomlToObject regression suite: rejects
floats (timeout = 0.5), dates (created = 1979-05-27), trailing garbage
(key = "x" junk), and accepts trailing whitespace + # comment.

* fix(#2760): CR5 — pre-write fatal, TOML duplicate-key/header rejection, namespaced AoT migration

Address all five CodeRabbit round-5 findings on PR #2785:

Finding 1 (MAJOR) — Pre-write failures in the Codex hook configuration
catch (around bin/install.js:7002) used to fall through to console.warn
even though restoreCodexSnapshot() had already run. This produced "Done!"
output with no Codex hooks configured. Now wraps the original error with
a "(pre-write)" prefix and rethrows so install aborts loudly. Same defect
class as CR4 finding 1, different layer.

Finding 2 (MAJOR) — parseTomlToObject silently reused existing tables and
overwrote duplicate keys. Real TOML 1.0 rejects:
  - duplicate scalar key in same table ([a]\nx=1\nx=2)
  - re-declared [a] header (two [a] sections)
  - [[arr]] then [arr] for same path (shape mismatch)
Tracks pathShape, declaredHeaders, and per-table-instance key sets;
throws "duplicate or shape-mismatched table header at <path>" or
"duplicate key <name> in <path>".

Finding 3 (MAJOR) — migrateCodexHooksMapFormat used to emit flat
[[hooks]]\nevent="<TYPE>", which produced mixed flat+namespaced layouts
when the user already had [[hooks.<OTHER>]] entries. Now emits
[[hooks.<TYPE>]] directly (the namespace IS the event); managed-emit
detector hasUserNamespacedAotHooks fires correctly so the install
converges on a single namespaced layout regardless of pre-existing state.

Finding 4 (NIT) — tests/bug-2760-codex-install-defensive.test.cjs
rename-failure test tightened from "throw OR warn acceptable" to
assert.equal(threw, true), locking the contract Finding 1 establishes.

Finding 5 (NIT) — bug-2760 test suite snapshots and restores fs.renameSync
defensively in beforeEach/afterEach (symmetric with fs.writeFileSync),
removing the fragile per-test try/finally. Second test in the same
suite cleaned up to drop its try/finally.

Updates tests/codex-config.test.cjs to assert the new namespaced AoT
migration shape via parseTomlToObject (no source-grep). Existing field-
parity test reframed as shape-parity since both paths now emit
namespaced.

Tests: 5764 pass (+8 new). lint:tests: 0 violations.

* docs(#2760): add CHANGELOG entry for Codex install defensive triple

Adds the [Unreleased] Fixed entry for the Codex install fix landed in this
PR — defensive strip of legacy [agents]/[[agents]] blocks, namespaced AoT
hook detection across all events, atomic write + rollback, strict TOML
validation rejecting duplicate keys/repeated headers/trailing bytes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 17:11:59 -04:00
Tom Boucher
936cf26706 fix(#2769): tolerate Requirements header with colon inside bold delimiters (#2782)
* fix(#2769): tolerate Requirements header with colon inside bold delimiters

extractReqIds in sdk/src/query/init.ts and the legacy init.cjs port only
matched `**Requirements**:` (colon outside bold), so phases declared with
the equally-valid markdown form `**Requirements:**` (colon inside bold,
which is what the project's own templates emit) returned phase_req_ids:
null for both `init plan-phase` and `init execute-phase`.

The mirror-image bug in `phase complete`'s REQUIREMENTS.md traceability
sweep at get-shit-done/bin/lib/phase.cjs:871 only matched the inside-bold
form, silently skipping the REQ-ID checkbox flips for any roadmap that
used the outside-bold form. Both parsers now share the same canonical
regex that accepts all three rendered-identical variants:

  **Requirements:**     (colon inside bold)
  **Requirements**:     (colon outside bold)
  **Requirements** :    (space before outside colon)

Tests:
- tests/init.test.cjs — parameterized over the three header variants for
  both init plan-phase and init execute-phase (6 new behavioral cases).
- sdk/src/query/init.test.ts — describe.each over the same variants
  exercising initPlanPhase through the SDK.
- tests/bug-2769-requirements-header-variants.test.cjs — phase complete
  flips REQ-001 in REQUIREMENTS.md across all three header variants.

Closes #2769

* refactor(#2769): centralize REQUIREMENTS_HEADER_RE constant per CodeRabbit
2026-04-27 12:31:49 -04:00
Tom Boucher
54e6da3126 fix(#2767): pass paths via --files to gsd-sdk query commit + lint guard (#2781)
* fix(#2767): pass paths via --files to gsd-sdk query commit + lint guard

Workflows, agents, commands, and references passed file paths positionally
to `gsd-sdk query commit`, which silently appended them to the commit
subject and triggered the `.planning/` wholesale-stage fallback in
sdk/src/query/commit.ts:136. Regression of #733/#798.

Inserted `--files` before the path list at every site (81 invocations
across 50 files). Added tests/bug-2767-gsd-sdk-commit-files-flag.test.cjs
as a permanent lint that scans every shipped .md file and asserts each
`gsd-sdk query commit[-to-subrepo]` invocation either uses `--files` or
carries no path arguments.

Closes #2767

* test(#2767): replace source-grep with behavioral SDK test

The original test walked every shipped .md file and regex-tokenized
`gsd-sdk query commit` invocations to assert `--files` was present.
CONTRIBUTING.md prohibits this source-grep pattern.

Rewrite as behavioral SDK tests against `sdk/dist/cli.js` over a real
tmp git project (createTempGitProject helper). Cover both the
well-formed (`--files <paths>`) form — clean subject, exactly-staged
files, .planning/ left untouched — and the buggy positional form,
asserting the documented misbehavior (paths leak into subject + the
`.planning/` wholesale-stage fallback at commit.ts:136). Also asserts
`commit-to-subrepo` rejects when `--files` is omitted (commit.ts:258).

The doc-lint is retained as a supplementary defense-in-depth guard
since agent-prompt markdown invocations cannot be exercised end-to-end
— but it is no longer the primary contract.

* docs(#2767): correct contradictory --files guidance in zh-CN/en docs + fix test docstring
2026-04-27 12:31:43 -04:00
Tom Boucher
3ac3a2ae70 fix(#2770): coerce non-string depends_on YAML values to preserve dependencies (#2780)
* fix(#2770): coerce non-string truths to preserve cross-cutting constraints

`cmdRoadmapAnnotateDependencies` skipped non-string truth entries via
`if (typeof t !== 'string') continue`. That avoided the TypeError reported
in #2770 but silently dropped legitimate constraints — numeric YAML scalars
(`- 3`) and kv-shaped truths from parseMustHavesBlock's continuation-kv
path (#2757) — from the cross-cutting analysis, leaving ROADMAP.md
under-annotated.

Replace the skip-guard with a `coerceTruthToString` helper that:
  * passes strings through
  * `String()`-coerces numbers, booleans, bigints
  * extracts a string field (title, text, name, rule, path, provides) from
    object-shaped items

Composes cleanly with #2757 (objects from kv continuation lines now
contribute their title rather than being dropped) and the existing
`splitInlineArray` quote-aware parser.

Tests: tests/bug-2770-annotate-deps-int-coerce.test.cjs
  - numeric scalar truth shared across plans surfaces as constraint
  - kv-shaped truth surfaces via title field
  - bare-int depends_on regression guards on extractFrontmatter

Full suite: 5678 pass, 0 fail.

Closes #2770

* test(#2770): use array join() for multi-line fixtures per CONTRIBUTING

* refactor(#2770): cache trim() and avoid no-op truthCounts.set in aggregation
2026-04-27 12:31:38 -04:00
Tom Boucher
8b6c44433f fix(#2772): only disable worktree isolation when planned paths touch submodules (#2779)
* fix(#2772): only disable worktree isolation when planned paths touch submodules

The previous guard in execute-phase.md and quick.md unconditionally set
USE_WORKTREES=false whenever .gitmodules existed, penalising every plan in
a submodule project even when no plan touched a submodule path.

Replace with submodule-path parsing + per-plan path intersection:

- Parse SUBMODULE_PATHS once from .gitmodules via
  `git config --file .gitmodules --get-regexp '^submodule\..*\.path$'`.
- In execute-phase.md, intersect SUBMODULE_PATHS with each plan's
  files_modified frontmatter; disable worktree isolation only for plans
  with non-empty intersection. Fall back to safe-disable for that plan
  when files_modified is missing/unparseable, with a log line explaining
  why.
- In quick.md (no pre-declared paths), keep submodule-path parsing and
  document a fail-loud commit-time guard so the executor aborts only when
  it actually stages a submodule path.

Add tests/bug-2772-gitmodules-path-intersection.test.cjs covering both
files: no unconditional disable, submodule paths are parsed, intersection
logic exists in execute-phase, fallback path is documented.

Full suite: 5680 / 5680 pass.

Closes #2772

* test(#2772): replace source-grep with behavioral test of submodule path intersection

* fix(#2772): wire USE_WORKTREES_FOR_PLAN into dispatch + fix glob matcher + add quick.md commit guard

Address CodeRabbit review on PR #2779 — the original fix computed
USE_WORKTREES_FOR_PLAN but never read it, so the per-plan submodule
intersection was dead code. Dispatch sites still branched on the
project-level USE_WORKTREES.

Changes:

1. execute-phase.md (CRITICAL — dispatch wiring): Move per-plan
   computation into execute_waves as sub-step 2.5, run it for each plan
   before its dispatch, and gate all four dispatch sites on
   USE_WORKTREES_FOR_PLAN: worktree-mode header, sequential-mode header,
   "worktrees disabled" sequential rule, and post-wave cleanup. Document
   PLAN_FILES extraction via jq from the phase-plan-index JSON. Track
   WAVE_WORKTREE_PLANS so post-wave cleanup only runs when at least one
   plan in the wave actually used worktrees.

2. Per-plan gate matcher (MAJOR — glob safety): Strip leading "./" and
   trailing "/" from both submodule and planned paths. Match
   bidirectionally (pf inside sm AND sm inside pf). Handle globby
   planned paths like "vendor/**/*.c" by extracting the literal prefix
   before the first glob metachar and re-checking. Wrap the iteration
   in set -f / set +f so glob expansion does not corrupt patterns.
   Extracted the gate (~92 lines) into
   workflows/execute-phase/steps/per-plan-worktree-gate.md to keep
   execute-phase.md under the 1700-line XL budget.

3. quick.md (CRITICAL — fail-loud guard): Inject SUBMODULE_PATHS into
   the executor Task prompt and add a <submodule_commit_guard> bash
   block the executor must run before every git commit. The guard
   inspects staged paths via `git diff --cached --name-only`, normalizes
   paths, and aborts with a clear ABORT message + recovery instruction
   ("re-run with workflow.use_worktrees=false") when any staged path
   falls inside a submodule.

4. tests/bug-2772-gitmodules-path-intersection.test.cjs: 25 tests total.
   Updated GATE_SNIPPET to match the new bash matcher. Added
   normalization tests (./ prefix, trailing /, glob "vendor/**/*.c",
   parent directory, ./ in .gitmodules). Added workflow-markdown
   wiring assertions for all 4 dispatch sites + per-plan gate file
   extraction. Added quick.md guard tests: prompt injection assertion +
   behavioral fixture-repo tests that stage a submodule path and assert
   the guard exits non-zero with the ABORT message.

Test count: 5701 pass / 0 fail (was 5698/1 before).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 12:31:32 -04:00
Tom Boucher
77d929429f fix(#2774): inclusion-based worktree cleanup to protect workspace .git (#2778)
* fix(#2774): inclusion-based worktree cleanup to protect workspace .git

The cleanup blocks in execute-phase.md and quick.md used an exclusion
filter (`grep -v "$(pwd)$"`) to skip the current worktree before calling
`git worktree remove --force` on everything else. The exclusion fails
whenever the current workspace is itself a worktree of an upstream repo:

- multi-workspace setups where `git worktree list` reports the registry
  path as a different absolute path than `$(pwd)`
- the cross-drive Windows case where the registry reports `E:/...` while
  `$(pwd)` resolves to `C:/...` — the equality test never holds, every
  other worktree (including the workspace itself) is removed, and the
  workspace's `.git` pointer file is destroyed.

Switches both cleanup blocks to an inclusion-based filter that targets
only agent-spawned worktrees under `.claude/worktrees/agent-`, the
namespace Claude Code's `isolation="worktree"` always uses for executor
worktrees. The workspace path can never collide with that prefix.

Adds tests/bug-2774-worktree-cleanup-workspace-safety.test.cjs covering:
- both workflow files use the inclusion filter
- neither falls back to the broken `grep -v "$(pwd)$"` guard
- end-to-end simulation of porcelain output with workspace + agent
  worktrees yields only the agent worktree

Closes #2774

* test(#2774): replace source-grep with behavioral test of cleanup pipeline

* fix(#2774): whitespace-safe worktree iteration with while/read

CodeRabbit review on PR #2778 flagged that `for WT in $WORKTREES` splits
on whitespace. Any agent worktree path containing a space (e.g. a workspace
under '/Users/dev/My Workspace/') would be torn into broken half-paths,
`git -C` would fail on each fragment, and the executor branch would never
be deleted.

Switch both cleanup blocks (quick.md and execute-phase.md) to:

  while IFS= read -r WT; do
    [ -z "$WT" ] && continue
    ...
  done < <(git worktree list --porcelain | grep ... | sed ...)

Process substitution feeds the pipeline output line-by-line — IFS= and -r
preserve every byte of the path including embedded spaces.

Also rename the misleading `makeBareTempGitRepo` helper to
`makeTempUpstreamRepo` (it does not pass --bare; it inits a normal repo
with an initial commit so worktree-add works).

Add two new behavioral tests:
- discovery pipeline yields whitespace paths intact on a single line
- the actual while/read loop iterates each whitespace-bearing path
  exactly once (would fail with the previous `for WT in` form)

Tests: 5681 pass, 0 fail.
2026-04-27 12:31:26 -04:00
Tom Boucher
6a293cfc2a fix(#2775): verify gsd-sdk on PATH before reporting SDK ready (#2777)
* fix(#2775): verify gsd-sdk on PATH before reporting SDK ready

`npx get-shit-done-cc@latest` printed `✓ GSD SDK ready` even though
`gsd-sdk` was not callable. Root cause: npx only links the package's
primary bin (`get-shit-done-cc`); secondary bins like `gsd-sdk` are not
materialized into a PATH directory. The installer asserted the weaker
invariant "sdk/dist/cli.js exists on disk" and treated it as proof of
the stronger invariant "command -v gsd-sdk resolves" — they aren't the
same.

Fix tightens the gate in installSdkIfNeeded:

  1. After confirming the dist is present, walk PATH for an executable
     `gsd-sdk` shim (isGsdSdkOnPath, no spawn).
  2. If absent, attempt to materialize the shim via symlink at
     `~/.local/bin/gsd-sdk` (or the first HOME-rooted PATH dir we can
     write to), falling back to a copy on filesystems that reject
     symlinks (trySelfLinkGsdSdk).
  3. Re-probe PATH after linking. Only print `✓ GSD SDK ready` when the
     probe succeeds; otherwise emit a clear ⚠ + remediation.

Also strips the misleading "or `npx get-shit-done-cc`" clause from the
shim header (it never linked the secondary bin).

Closes #2775

* test(#2775): use centralized helpers from helpers.cjs per CONTRIBUTING

* fix(#2775): wrapper script in symlink fallback to preserve __dirname resolution

CodeRabbit follow-up on PR #2777. The previous symlink-fallback in
trySelfLinkGsdSdk used fs.copyFileSync(shimSrc, target), but
bin/gsd-sdk.js resolves the CLI via path.resolve(__dirname, '..',
'sdk', 'dist', 'cli.js'). After a copy, __dirname becomes the link
directory (e.g. ~/.local/bin), so the resolved CLI path was broken
(~/.local/sdk/dist/cli.js) — and isGsdSdkOnPath() only checked file
existence + execute bit, so the success line still printed over a
broken install.

Replace the copy with a tiny wrapper script that require()s the real
shim by absolute path. This preserves __dirname inside bin/gsd-sdk.js
because the require runs against shimSrc's own location.

Also fixes the PATH restoration nit in the regression test (was
coercing undefined to the string "undefined" if PATH was unset).

Adds a behavioral fallback test that mocks fs.symlinkSync to throw,
exercises the fallback path, and asserts the resulting target is a
require()-wrapper (not a verbatim copy) and is executable.

* fix(#2775): PATH-backed dir ordering + tighten captureConsole + drop tautological assertion (CodeRabbit follow-up)
2026-04-27 12:31:21 -04:00
Tom Boucher
290c8b2909 fix(#2771): unify user-owned-artifacts list to suppress false patches warning (#2776)
* fix(#2771): unify user-owned-artifacts list to suppress false patches warning

USER-PROFILE.md was both preserved across reinstalls (correctly) AND tracked in
gsd-file-manifest.json (incorrectly). On the next install, saveLocalPatches()
hashed the on-disk file, found it differed from the stale manifest hash (because
/gsd-profile-user --refresh regenerated it), and reported it as a "locally
modified GSD file" — a spurious warning every time the profile refreshed.

A file is either distribution (manifest-tracked, diff'd against manifest) or
user artifact (preserved across installs, never diff'd). Never both. This
extracts USER_OWNED_ARTIFACTS as a single source of truth, referenced by both
the preserveUserArtifacts call site and writeManifest, so the invariant cannot
drift again.

Adds a regression test that exercises the full reproduction path: install,
create USER-PROFILE.md, reinstall, refresh USER-PROFILE.md, reinstall, assert
no patch backup and no warning text.

Closes #2771

* test(#2771): use centralized helpers from helpers.cjs per CONTRIBUTING

* fix(#2771): normalize legacy USER_OWNED_ARTIFACTS entries from manifest + tighten test
2026-04-27 12:31:15 -04:00
Tom Boucher
dc9b712967 refactor(state): drop unused args + lift currentPhase in cmdStateCompletePhase (#2761)
* refactor(state): drop unused args param and lift currentPhase in cmdStateCompletePhase

Two cleanup items surfaced by CodeRabbit review of PR #2759:

1. cmdStateCompletePhase(cwd, args, raw) — args is never read inside the
   function. All sibling state subcommands use the leaner (cwd, raw) shape.
   Remove the unused parameter and update the dispatch call in gsd-tools.cjs.

2. output() at line 1754 called fs.readFileSync(statePath) after
   readModifyWriteStateMd had already released the lock, re-extracting
   Current Phase via an extra fs read. The closure already computed
   currentPhase at line 1704; lifting resolvedPhase into outer scope and
   capturing it in the callback eliminates the post-lock read and closes the
   small race window.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(#2761): apply CodeRabbit nitpicks with regression tests

Two CodeRabbit nitpicks from PR #2761 review, each landed with a
regression test so a future refactor can't unwind them.

1. tests/dispatcher.test.cjs — pin the enumerated subcommand list:
   the 'state unknown subcommand errors' test now also asserts that
   the dispatcher's error string includes 'complete-phase'. Without
   this, a future reformat of the available-subcommands enumeration
   could silently drop entries and the existing
   'Unknown state subcommand' substring check would still pass.

2. get-shit-done/bin/lib/state.cjs — tighten the Phase fallback in
   cmdStateCompletePhase: when STATE.md is missing the canonical
   '**Current Phase:**' field and the only phase signal is the
   decorated body line under '## Current Position' (e.g.
   'Phase: 01 (Foo) — EXECUTING'), the previous fallback returned
   the entire decorated string, producing messy downstream output:
     Status: Phase 01 (Foo) — EXECUTING complete
     Phase: 01 (Foo) — EXECUTING — COMPLETE
   The fallback now strips everything past the leading
   numeric/decimal token via /^\\s*([\\w.-]+)/ so degraded inputs
   produce clean output identical to the canonical path.

3. tests/state.test.cjs — two new tests in a dedicated describe block:
   - decorated Phase line writes clean Phase identifier
   - canonical Current Phase wins over Current Position decoration
   Both run real `gsd state complete-phase` against synthetic
   STATE.md fixtures and assert on the rendered Status field.

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 09:03:36 -04:00
Tom Boucher
9472f343db feat(#2762): --minimal install profile (≥94% cold-start token reduction) (#2764)
* feat(#2762): add --minimal install profile to cut cold-start token cost

Eager system-prompt load from 86 gsd-* skill descriptions plus 33
subagent descriptions costs ~12k tokens per turn even in directories
with no .planning/. Frontier models (Sonnet 4.6 / Opus 4.7) with 200K-1M
context don't feel it; local LLMs with 32K-128K do.

--minimal (alias --core-only) installs only the main GSD loop:
new-project, discuss-phase, plan-phase, execute-phase, plus help/update.
Zero gsd-* subagents are written. Re-running gsd update without
--minimal expands to the full surface. Default install behavior is
unchanged.

DRY: a single stageSkillsForMode() helper filters the source dir; all
13 runtime-specific copy fns are unchanged because they recurse the
staged dir. Allowlist + helpers live in get-shit-done/bin/lib/install-
profiles.cjs as the single source of truth.

Manifest now records mode: 'minimal' | 'full' so future commands can
detect install profile.

Tested end-to-end: --minimal yields 6 skill folders + 0 agents; default
yields 86 + 33 (unchanged).

* docs(#2762): document --minimal install in README

Adds a collapsible 'Minimal Install' section under Getting Started
covering: who it's for (local LLMs, token-billed APIs), what you get
(6 skills, 0 subagents, ~700 token floor vs ~12k), and the critical
caveat that re-installing without --minimal restores the full surface
and erases the savings. Includes a comparison table, the manifest
inspection one-liner, and the use-case decision matrix.

* fix(#2762): address CodeRabbit review + CI failures

CodeRabbit findings:
1. Temp dir leak (Minor): stageSkillsForMode created tmp dirs that were
   never cleaned up. Added a module-level Set tracking every staged dir
   plus a process.on('exit') handler that rm -rf's them. Also wrap the
   copy loop in try/catch to remove a partially-populated tmp dir on
   mid-flight failure. Verified end-to-end: 0 leaked dirs in /tmp after
   a real install.

2. Codex full -> minimal stale state (Major): a previous full Codex
   install left agents/gsd-*.toml files plus [agents.gsd-*] sections in
   config.toml. The original cleanup only removed .md files, so a switch
   to --minimal would leave Codex still advertising the full agent
   surface. Cleanup now also handles .toml under isCodex, and minimal
   mode strips GSD sections from config.toml via the existing
   stripGsdFromCodexConfig helper (same path used by --uninstall).

3. Nitpick — Codex downgrade regression test: added a spawnSync-based
   end-to-end test that fakes a previous full install (stale gsd-*.md +
   gsd-*.toml + GSD-marked config.toml + a user-owned agent/setting),
   runs install.js --codex --minimal, and asserts stale GSD files +
   sections are gone while user content is preserved.

CI failures (inventory parity):
- docs/INVENTORY.md CLI Modules table now lists install-profiles.cjs
  with the correct headline count (30 -> 31).
- docs/INVENTORY-MANIFEST.json regenerated via gen-inventory-manifest.cjs.

Test count: 149 pass (was 116 in last commit; +14 new install-minimal +
all previously-failing inventory tests now green).

* test(#2762): expand install-minimal test coverage for future-proofing

Each new test pins a specific guarantee that closes off a future
regression class — turning every CodeRabbit finding (including the
nitpicky one) into a permanent guard.

cleanupStagedSkills suite (+3 tests):
- 'full mode does not register a staged dir' — catches a future
  regression where someone forgets the early-return in stageSkillsForMode
  and starts polluting STAGED_DIRS in default installs.
- 'exit handler registers exactly once across many calls' — catches
  removal of the exitHandlerRegistered guard. install.js has 13
  dispatch sites, so a missing guard would attach 13 listeners.
- 'mid-copy failure removes partial staged dir and re-throws' —
  intercepts fs.copyFileSync to throw mid-loop and asserts the staged
  dir count in /tmp is unchanged after the throw. Pins the exact
  CodeRabbit-flagged leak.

Claude full -> minimal downgrade (+1 test):
- Mirrors the Codex downgrade test for the .md-only path that the
  other 12 runtimes share. Asserts user-owned agents are preserved.

Manifest mode round-trip (+3 tests):
- Default install -> mode: 'full' with >6 skills and >0 agents
- --minimal -> mode: 'minimal' with exactly 6 skills and 0 agents
- --core-only alias produces identical manifest to --minimal

Allowlist scope guards (+3 tests):
- Every main-loop command IS in allowlist (positive)
- Off-loop commands (autonomous, ship, do, progress, next, fast,
  quick, debug, code-review, verify-work) are NOT (guards against
  silent scope creep — future contributor adds 'autonomous' to core
  and the floor erodes)
- Unknown mode strings fall through to full behavior — pre-emptive
  guard for future 'compact'/'tier2' modes that might forget to
  update the predicate.

Total: 25 tests in this file (was 15), 159/159 passing across the
install + inventory suites.

* fix(#2762): clean up staged tmp dirs on SIGINT/SIGTERM/SIGHUP

CodeRabbit follow-up review on c727bf5f flagged that process.on('exit')
does not fire on signal-driven termination. An installer is exactly
the kind of process users abort mid-run with Ctrl+C, so without
explicit signal handlers the staged tmp dirs in STAGED_DIRS would be
left behind until the OS reaps tmpdir.

Fix: ensureExitCleanup now also registers process.once handlers for
SIGINT, SIGTERM, SIGHUP. Each handler runs cleanupStagedSkills then
re-raises the same signal via process.kill(pid, sig) so the OS-default
handler takes over and the parent shell sees the correct exit code
(130 for SIGINT, etc.) — CI scripts and interactive users see the
abort the way they expect.

Test: spawns a child that stages a tmp dir then blocks; parent
captures the staged path from stdout, sends SIGINT, asserts (a) the
staged dir is gone after child exit, (b) child exits via the signal
not via code 0. Skipped on Windows (signal semantics differ; the
natural-exit cleanup test covers the Windows CI matrix).

Total: 26 tests in install-minimal.test.cjs (was 25).
2026-04-27 00:13:20 -04:00
Tom Boucher
ab5ad6c8bc fix(#2757): unquoted truths with colons crash annotate-dependencies (#2759)
parseMustHavesBlock dispatched on `includes(':')` to detect key-value pairs,
but unquoted YAML strings like `GET /foo/:id resolves...` and
`Class::Method is idempotent` also contain colons. When the KV regex failed
to match, `current` was left as `{}` (the empty object initialized before
the branch), which then caused `t.trim()` in roadmap.cjs to throw
`TypeError: t.trim is not a function`.

Two fixes:
- frontmatter.cjs: tighten the KV regex to require at least one space after
  the colon (`\s+` instead of `\s*`), matching YAML convention. When the
  regex still fails to match, fall back to treating the item as a plain
  string instead of leaving `current` as `{}`.
- roadmap.cjs: add `typeof t !== 'string'` guard before `.trim()` as a
  cheap safety net against any future parser anomaly.

Closes #2757

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 21:40:38 -04:00
Tom Boucher
1a230e69aa perf: convert discuss-phase SKILL.md @file imports to lazy per-branch reads (#2752)
* perf: convert discuss-phase @file imports to lazy per-branch reads

Replace eager @file directives in <execution_context> with on-demand
Read calls gated behind mode routing. discuss-phase-assumptions.md is
now only read when DISCUSS_MODE=assumptions; discuss-phase.md is only
read for the default discuss mode; discuss-phase-power.md and
templates/context.md are removed from the entry point entirely
(power mode is handled inside discuss-phase.md's lazy mode dispatch;
context.md is loaded at the write_context step).

Reduces tokens loaded at skill entry from ~13k to near zero.

Closes #2606
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(discuss-phase): use contiguous 'Read and execute' phrase in process block

The test at tests/discuss-mode.test.cjs:45 asserts that the <process>
block contains 'Read and execute' as a literal substring. The prior
wording split the instruction across two lines (Read(...) / Then execute),
so the substring match failed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(discuss-phase): restore discuss-phase-power reference in process block

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 13:37:45 -04:00
Tom Boucher
1405728292 fix: parseMustHavesBlock quoted strings + gsd state complete-phase (#2744)
* fix: parseMustHavesBlock quoted strings + gsd state complete-phase

Bug #2734: parseMustHavesBlock dropped quoted truths containing ':' because
fully-quoted strings like `"App-side UUIDv4: generated locally"` fell into
the kv-parse branch, the regex failed (value starts with '"'), and current
stayed as empty {}. Fix: detect fully-quoted strings before the ':' check
and extract them directly. Two regression tests added to frontmatter.test.cjs.

Bug #2735: `gsd state complete-phase` subcommand was missing — unknown
subcommands fell through to cmdStateLoad. Added cmdStateCompletePhase to
state.cjs (updates Status, Last Activity, and Current Position to COMPLETE),
exported it, and wired it into the case 'state': dispatch in gsd-tools.cjs.

Closes #2734
Closes #2735

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(state): unknown subcommand returns explicit error instead of silent fallthrough

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 13:37:43 -04:00
Tom Boucher
3246810876 feat: extend RUNTIME_PROFILE_MAP to gemini/qwen/opencode/copilot + settings-advanced UI (#2754)
Closes #2612

- Add gemini, qwen, opencode, and copilot entries to RUNTIME_PROFILE_MAP in core.cjs
- Group B runtimes (kilo, cline, cursor, windsurf, augment, trae, codebuddy, antigravity) intentionally have no built-in map and fall through to the existing unknown-runtime fallback
- Add 40 new tests to tests/issue-2517-runtime-aware-profiles.test.cjs covering each new runtime's three tiers, Group B fall-through, and partial override merge semantics
- Add Section 7 "Runtime Model Tiers" to settings-advanced.md with interactive UI to view and override built-in tier defaults per runtime
- Update docs/CONFIGURATION.md built-in tier table to include all four new runtimes

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 13:33:54 -04:00
Tom Boucher
e0b4561fa9 feat: add /gsd-edit-phase command to modify roadmap phases in place (#2753)
Adds a new slash command that lets developers modify any field of an
existing phase in ROADMAP.md without affecting phase number or position.

- commands/gsd/edit-phase.md: command file with --force flag support
- get-shit-done/workflows/edit-phase.md: full workflow with status guard,
  depends_on validation, diff+confirmation, and STATE.md update
- tests/edit-phase.test.cjs: 32 tests covering all acceptance criteria
- docs/INVENTORY.md, INVENTORY-MANIFEST.json, COMMANDS.md: registered

Closes #2617

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 13:33:52 -04:00
Tom Boucher
8788ab2381 feat: post-merge build & test gate — Build step, iOS/Xcode, serial mode (#2751)
* feat: post-merge build & test gate — Build step, iOS/Xcode, serial mode

Step 5.6 of execute-phase is extended per #2720:
- Renamed from "Post-merge test gate (parallel mode only)" to "Post-merge build & test gate"
- Gate now runs in both parallel mode (after worktree merge) and serial mode (after last plan)
- Added Step A: Build gate resolving BUILD_CMD from workflow.build_command config key, then auto-detecting via priority: config override → Xcode (.xcodeproj) → Makefile build: → Justfile → Cargo/Go/Python/npm. Xcode uses xcodebuild -list -json to get first scheme, then xcodebuild build -scheme ... -destination 'platform=iOS Simulator,name=iPhone 16'. Build failure increments WAVE_FAILURE_COUNT.
- Added Xcode/iOS detection to Step B (Test gate): when *.xcodeproj present and no workflow.test_command configured, uses xcodebuild test instead of the previous "no test runner detected" skip. Scheme reused from Step A when available.
- Documented workflow.build_command and workflow.test_command in docs/CONFIGURATION.md (table + JSON schema)

Closes #2720

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(execute-phase): extract Step 5.6 body to post-merge-gate.md sub-file

Moves the build-detection logic and xcodebuild commands from the inline
Step 5.6 body into execute-phase/steps/post-merge-gate.md, replacing it
with a single Read() reference. Reduces execute-phase.md from 1755 to
1647 lines, satisfying the ≤1700 XL-tier budget enforced by
tests/workflow-size-budget.test.cjs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 13:33:50 -04:00
Tom Boucher
71a3f86fbe docs: add end-to-end workflow walkthrough to USER-GUIDE.md (#2749)
Adds a concrete single-phase walkthrough (webhook validator project)
showing ROADMAP.md, CONTEXT.md, PLAN.md, SUMMARY.md, and STATE.md
excerpts and how each command consumes the previous step's output.
Also adds links to the walkthrough from README.md's nav bar and
How It Works section.

Closes #2359

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 13:33:47 -04:00
Tom Boucher
bf73cbe1a4 test(#2688): add review.models.claude tests for per-runtime review model config (#2748)
Adds two tests to review-model-config.test.cjs:
- isValidConfigKey accepts review.models.claude (schema validation)
- round-trip: config-set then config-get for review.models.claude

The dynamic key pattern (^review\.models\.[a-zA-Z0-9_-]+$), the workflow
model-read logic in review.md, and the CONFIGURATION.md docs were already
in place. Only the claude-specific test coverage was missing.

Closes #2688

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 13:33:45 -04:00
Tom Boucher
d5cd64dde5 fix(#2637): migrate legacy Codex [hooks] map format to [[hooks]] array on install (#2747)
Codex 0.124.0 changed the required config.toml hooks format from the old
map-style ([hooks.shell]) to array-of-tables ([[hooks]]). Old GSD installs
that wrote the legacy format now cause a startup parse error on upgrade.

Add migrateCodexHooksMapFormat() which detects non-array [hooks] and
[hooks.TYPE] sections and rewrites them to [[hooks]] entries with an
injected type = "TYPE" key. The migration runs at the start of every Codex
install so affected configs self-heal on the next `gsd install --codex`.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 13:33:43 -04:00
Tom Boucher
8f2ec0e8f7 fix: add explicit wait-for-subagent instructions in orchestration workflows (#2755)
Adds ORCHESTRATOR RULE blockquotes immediately after every Task() spawn
in 26 GSD workflow files, instructing the parent orchestrator to stop
working on the task while the subagent is active. This prevents the
parallel-work anti-pattern on Codex runtime where the parent continues
reading files and producing duplicate/conflicting output after spawning.

Rules are placed inline at each spawn point (not as generic headers)
so they are adjacent to and unambiguously associated with each Task()
call. Background Task() spawns get a variant noting not to return to
the spawning context until the subagent reports back.

Closes #2729

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 13:33:35 -04:00
Tom Boucher
7255539ff9 fix: validate LM Studio model identity in review workflow (#2746)
* fix: validate LM Studio model identity in review workflow

Capture the full API response before extracting content, then compare
the top-level `.model` field against the configured LM_STUDIO_MODEL.
Emits a warning to stderr if LM Studio served a different model than
requested, while still proceeding with the review response.

Closes #2721

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(review): skip LM Studio review file when content is empty instead of writing error text

Also applies the same fix to llama.cpp which had the identical pattern of writing
a literal error string into the review temp file when content was empty/null.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 13:33:28 -04:00
Tom Boucher
b8bbc74192 fix(sdk): preserve nested keys from globalDefaults in configNewProject (#2745)
When building nested config sections (workflow, git, hooks, agent_skills,
features), the deep merge was missing globalDefaults for those sections,
causing user values from ~/.gsd/defaults.json to be silently dropped.

Added globalDefaults spread at the correct precedence level (hardcoded <
globalDefaults < userChoices) for all five nested keys, and added three
test cases verifying the merge works end-to-end via HOME env var override.

Closes #2673

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 13:33:26 -04:00
Tom Boucher
2b95ccbddd Merge pull request #2743 from gsd-build/fix/2714-workstream-config-inheritance
feat(config): workstream config.json inherits from root .planning/config.json
2026-04-26 11:49:30 -04:00
Tom Boucher
4a05283bc8 Merge pull request #2742 from gsd-build/fix/2731-workstream-query-handler-threadthrough
fix(query): thread workstream through all query handlers
2026-04-26 11:49:25 -04:00
Tom Boucher
80f4c9063f Merge pull request #2741 from gsd-build/fix/2727-codex-agents-toml-format
fix(installer): revert Codex agents section to [agents.<name>] struct format
2026-04-26 11:49:19 -04:00
Tom Boucher
41787e361f Merge pull request #2740 from gsd-build/fix/2641-details-summary-milestone-detection
fix(phase-lifecycle): skip <details>-wrapped sections in milestone detection
2026-04-26 11:49:14 -04:00
Tom Boucher
0eef943f0a Merge pull request #2739 from gsd-build/fix/2732-graphify-cli-invocation
fix(graphify): update CLI invocation from legacy flag form to subcommand
2026-04-26 11:49:09 -04:00
Tom Boucher
bbf33b608e Merge pull request #2738 from gsd-build/fix/2644-milestone-complete-version-arg
test(query): Vitest regression guard for milestone.complete version arg (fix #2644)
2026-04-26 11:49:03 -04:00
Tom Boucher
9e63d14709 Merge pull request #2737 from gsd-build/fix/2726-phase-add-bullet-bold-roadmap
fix(phase-lifecycle): detect phases in bullet/bold ROADMAP formats for phase.add
2026-04-26 11:48:58 -04:00
Tom Boucher
5b4a239ead Merge pull request #2736 from gsd-build/fix/2728-phase-complete-roadmap-corruption
fix(phase-lifecycle): prevent plan-line corruption when **Plans:** has no inline value
2026-04-26 11:48:52 -04:00
Tom Boucher
cb149383c1 fix(config): bump MODEL_ALIAS_MAP to claude-opus-4-7 (#2733)
* fix(config): bump MODEL_ALIAS_MAP and RUNTIME_PROFILE_MAP to claude-opus-4-7

Opus 4.7 shipped Q1 2026 but MODEL_ALIAS_MAP and RUNTIME_PROFILE_MAP.claude.opus
were still pinned to claude-opus-4-6. Users with resolve_model_ids: true received
stale model IDs in logs and agent-tool calls.

Also adds a resolve_model_ids: true test suite — this path had zero coverage,
which is why the stale ID survived undetected.

Closes #2712

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(config): derive RUNTIME_PROFILE_MAP.claude from MODEL_ALIAS_MAP (coderabbit)

RUNTIME_PROFILE_MAP.claude was duplicating model IDs that MODEL_ALIAS_MAP
already owns. Future model bumps now only require updating MODEL_ALIAS_MAP.
Also fixes stale test assertion (claude-opus-4-6 → claude-opus-4-7).

* fix(tests): update stale claude-opus-4-6 refs to claude-opus-4-7; DRY: derive RUNTIME_PROFILE_MAP.claude from MODEL_ALIAS_MAP

- Update 3 hardcoded `claude-opus-4-6` assertions in tests/issue-2517-runtime-aware-profiles.test.cjs to `claude-opus-4-7`
- Update comment on line 128 that referenced the old model ID
- Replace manual per-tier expansion of RUNTIME_PROFILE_MAP.claude with Object.fromEntries so future alias bumps only require updating MODEL_ALIAS_MAP

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 11:48:47 -04:00
Tom Boucher
274fc524cd fix(tests): derive STATE.md phase number from phaseDir in setupProject fixture
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 11:46:57 -04:00
Tom Boucher
b7b6f89776 fix(phase-lifecycle): fast-path replaceInCurrentMilestone only when pattern matches after
The previous guard `if (after.trim().length > 0)` incorrectly triggered
when `after` contained only footer text (e.g. `---\n*Last updated*`).
In that case `after.replace(pattern, replacement)` is a no-op and the
function returned unchanged content instead of falling through to the
slow path that searches inside the last `<details>` block.

Fix: capture the replaced string first, then only take the fast path
when the replacement actually changed `after`.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 11:46:40 -04:00
Tom Boucher
76f1d20d80 test(phase-lifecycle): add regression for active <details> with trailing footer
Adds a case where the active milestone is the last <details> block but
footer text (--- / *Last updated*) follows </details>, triggering the
fast-path to replace in the footer instead of inside the block.

Closes #2743
2026-04-26 11:36:46 -04:00
Tom Boucher
022b577922 test(graphify): tighten version-check assertions per CodeRabbit nitpick
- Success path: add explicit python3Calls.length === 0 assertion so "no
  fallback" is stated directly rather than implied by calls.length === 1
- Fallback path: add explicit calls[0].cmd === 'graphify' assertion so
  "graphify precedes python3" is verified by name, not just argument
2026-04-26 11:36:14 -04:00
Tom Boucher
493e251bab refactor(tests): extract assertMilestoneSuccess helper in decomposed-handlers.test.ts
Centralizes repeated result.data cast + common success-field checks
per coderabbit review suggestion on PR #2738.
2026-04-26 11:24:31 -04:00
Tom Boucher
cfc79d211f fix(config-query): thread workstream through resolveModel handler
resolveModel ignored _workstream, unlike configGet/configPath which both
forward it to planningPaths/loadConfig. Different workstreams may have
different model_profile settings.

Addresses coderabbit finding on PR #2742.
2026-04-26 11:23:51 -04:00
Tom Boucher
7c08a155ea test(graphify): tighten call-sequence assertions per coderabbit review
Adds explicit call-count and ordering assertions to version-check tests:
- Success path: exactly 1 spawnSync (graphify --version only, no python fallback)
- Failure path: graphify --version attempted first, python3 fallback second

Addresses coderabbit nitpick on PR #2739.
2026-04-26 11:21:54 -04:00
Tom Boucher
8270f17773 fix(phase-lifecycle): handle project-code-prefixed dirs in phaseAdd fallback scan
Filesystem fallback regex /^(\d+)-/ missed directories like CK-45-foundation
when project_code is configured. Updated to /^(?:[A-Z][A-Z0-9]*-)?(\d+)-/i.

Addresses coderabbit finding on PR #2737.
2026-04-26 11:21:33 -04:00
Tom Boucher
c85e65ec03 fix(roadmap-update-plan-progress): propagate Plans** regex hardening from phase-lifecycle
Applies the same [ \t]* + section-boundary lookahead fix that was applied
to planCountPattern in phase-lifecycle.ts. roadmap.update-plan-progress
shared the same corruption vector via \s* crossing newlines.

Addresses coderabbit finding on PR #2736.
2026-04-26 11:19:34 -04:00
Tom Boucher
1cb4bebcf5 feat(config): workstream config.json inherits from root .planning/config.json
- Add _deepMergeConfig() with correct null-override semantics
- loadConfig() reads root config.json when GSD_WORKSTREAM is set, then
  deep-merges with workstream config (workstream wins on conflict)
- Workstream without config.json falls back to root config entirely
- Migrations and disk writes operate on fileData (on-disk content) only,
  never on the merged result, to prevent workstream pollution
- Fixes null-override bug from PR #2717: explicit null in workstream now
  correctly overrides root value instead of falling back to root
- Tests: inherit root model_overrides, workstream override, nested
  workflow.* deep merge, explicit null override, missing workstream config

Closes #2714
2026-04-26 11:15:16 -04:00
Tom Boucher
3a623b1117 fix(query): thread workstream through all query handlers
All 18+ query handlers accepted _workstream but never forwarded it to
planningPaths/loadConfig/getMilestoneInfo. Remove _ prefix and pass
workstream to all internal helper calls so --ws flag actually scopes
path resolution.

Affected handlers: initNewProject, initProgress, initManager, configGet,
configPath, configSet, configSetModelProfile, configNewProject,
configEnsureSection, validateHealth, commit, checkCommit, commitToSubrepo.

Also fixes validate.ts to use paths.* fields from planningPaths instead
of hardcoded join(projectDir, '.planning') paths.

Closes #2731
2026-04-26 10:53:17 -04:00
Tom Boucher
f6cddc5b2f test(query): add failing tests for workstream path threading in init-complex
Demonstrates that initProgress and initManager ignore the workstream
parameter, reading from root .planning/ instead of the workstream
subdirectory.

Closes #2731
2026-04-26 10:38:10 -04:00
Tom Boucher
7924abec0c fix(installer): revert Codex agents section to [agents.<name>] struct format
[[agents]] sequence format (introduced in #2645) is rejected by
codex-cli 0.124.0 with "invalid type: sequence, expected struct AgentsToml".
Revert to [agents.<name>] struct format which is correct for 0.120.0+.

stripCodexGsdAgentSections already handles both formats for self-healing
configs written by previous GSD versions using [[agents]].

Closes #2727
2026-04-26 10:35:30 -04:00
Tom Boucher
8f7f43abaa test(installer): add failing test for [agents.<name>] struct format (Codex 0.124.0+)
Adds test asserting generateCodexConfigBlock emits [agents.<name>] struct
format, not [[agents]] sequence format. Confirms RED phase for #2727 fix.
2026-04-26 10:35:23 -04:00
Tom Boucher
0f17cfc71d fix(phase-lifecycle): skip <details>-wrapped sections in milestone detection
replaceInCurrentMilestone's lastIndexOf('</details>') heuristic fails
when the active milestone itself is wrapped in a <details> block — the
after-slice is empty so the replacement is silently dropped.

Fix detects this case (after.trim().length === 0) and falls back to
locating the last complete <details>…</details> span and applying the
replacement only inside it, leaving all earlier archived-milestone
blocks untouched.

Closes #2641
2026-04-26 10:28:48 -04:00
Tom Boucher
a7d3bb948b test(phase-lifecycle): add failing test for active milestone in <details> block
Closes #2641
2026-04-26 10:23:41 -04:00
Tom Boucher
f89a56eb55 fix(graphify): update CLI invocation from legacy flag form to subcommand
graphify . --update was removed in favor of graphify update . in v0.4.x.
Also improves version detection to try `graphify --version` before
falling back to python3 importlib query.

Closes #2732
2026-04-26 10:20:41 -04:00
Tom Boucher
f8a0e6f145 test(query): add Vitest regression tests for milestone.complete version arg (fix #2644)
milestoneComplete was imported in decomposed-handlers.test.ts but had zero
test coverage. The original defect (6f79b1d) called phasesArchive([], ...)
instead of forwarding the positional version arg; the wrapping try/catch
swallowed the GSDError into { completed: false, reason: String(err) },
masking a programming error as a legitimate negative answer.

Add five Vitest tests that lock in the correct contract:
- positional version arg is extracted from args[0] and echoed in response
- missing version throws GSDError (not masked as completed: false)
- --archive-phases flag is processed
- --name flag sets milestone name
- response shape has version/date/phases/milestones_updated fields

Closes #2644

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 10:18:08 -04:00
Tom Boucher
54cbc2ad96 fix(phase-lifecycle): detect phases in bullet/bold ROADMAP formats
phaseAdd's phase-number regex only matched heading format (## Phase N:),
missing bullet checklist (- [x] Phase N:) and bold (**Phase N:**) entries.
When zero regex matches, newPhaseId defaulted to 1.

Fix: broaden regex to match all three formats, and add filesystem fallback
scanning .planning/phases/ when ROADMAP scan finds nothing.

Closes #2726
2026-04-26 10:00:39 -04:00
Tom Boucher
a9f49f8f9d fix(phase-lifecycle): prevent plan-line overwrite when **Plans:** is bare
\s* after **Plans:** matches newlines, causing [^\n]+ to consume the first
plan checkbox when the **Plans:** field has no value on the same line.
Additionally, the lazy [\s\S]*? could cross section boundaries when the
current section had no **Plans:** value, corrupting a later section.

Fix 1: replace \s* with [ \t]* to restrict post-colon match to horizontal
whitespace only.
Fix 2: replace [\s\S]*? with (?:(?!\n#{2,4})[\s\S])*? to prevent the
pattern from crossing into a new section heading.

Closes #2728
2026-04-26 09:46:57 -04:00
Tom Boucher
394403ae06 test(phase-lifecycle): add failing regression test for #2728
When **Plans:** appears on its own line with no inline value, the
planCountPattern regex crosses the newline and destroys the first
plan checkbox line by replacing it with the literal "N/N plans complete"
string.

This test documents the expected correct behavior and will fail until
the planCountPattern regex is fixed.
2026-04-26 09:46:51 -04:00
Lex Christopherson
f3685d9173 1.38.5 2026-04-25 17:56:06 -06:00
Lex Christopherson
22b73f548d docs: update changelog for v1.38.5 2026-04-25 17:56:06 -06:00
Lex Christopherson
2fafbd2753 fix(sdk): pass phaseDir to executor prompt so SUMMARY.md lands in .planning/
The SDK's buildExecutorPrompt told executors to "Create a SUMMARY.md file"
with no directory path, causing them to write it in cwd (project root)
instead of .planning/phases/{phase}/. Thread phaseDir from PhaseRunner
through PromptFactory and into the completion instructions so the executor
gets an explicit path like `.planning/phases/01-auth/01-01-SUMMARY.md`.

Backward compatible — buildExecutorPrompt still accepts a plain string
(agentDef) for existing callers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-25 17:56:06 -06:00
Tom Boucher
470c1a0bff fix(#2722): forensics gh commands pin --repo gsd-build/get-shit-done (#2723)
* fix(#2722): forensics gh commands pin --repo gsd-build/get-shit-done

gh issue create and gh label list both defaulted to the repo inferred
from $PWD, causing issues to be submitted to the user's current project
instead of this repo. Added --repo gsd-build/get-shit-done to both
commands. Added two regression tests covering both gh calls.

Closes #2722

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(#2723): scope forensics tests to specific gh commands, not whole file

CodeRabbit found that the gh issue create test searched the whole
workflow file, so it would pass even if gh issue create lacked --repo
(because gh label list already contains the repo string elsewhere).
Also replaced the brittle 200-char slice in the label-list test with
a regex. Both tests now use assert.match() with command-scoped regexes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 16:47:55 -04:00
Tom Boucher
caf6974bbf fix(ci): remove stale SDK-variant tests for files deleted in 377a6d2 (#2725)
377a6d2 deleted sdk/prompts/agents/ and sdk/prompts/workflows/ (13 files)
but did not update 3 test files that reference them, causing ENOENT
failures on every CI run (main and all PRs) since that commit.

Removed:
- sdk/prompts/agents variants describe block (enh-2427-sycophancy-hardening)
- PLAN_PHASE_SDK_PATH constant and headless plan-phase test (post-planning-gaps-2493)
- sdk/prompts/workflows/verify-phase.md describe block (verifier-deferred-items)

The underlying behaviour is covered by the existing main agent/workflow
tests; the SDK variant tests are moot now that the SDK loads installed
files instead of bundled stripped-down copies.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 16:26:15 -04:00
Lex Christopherson
a5a2d44121 1.38.4 2026-04-25 14:13:49 -06:00
Lex Christopherson
73abae60f0 1.38.3 2026-04-25 14:11:46 -06:00
Lex Christopherson
efab0545c7 docs: update changelog for v1.38.3 2026-04-25 14:11:33 -06:00
Lex Christopherson
f0953dec0c fix(sdk): prevent interactive tool calls in headless self-discuss mode
The discuss step loaded the full interactive workflow prompt which
instructs the agent to use AskUserQuestion, Skill(), and area selection
UIs. In headless auto mode, the agent followed these instructions and
tried to interact with a non-existent user.

Fix: prepend a mandatory headless override BEFORE the workflow prompt
that explicitly forbids interactive tools and instructs the agent to
make all decisions autonomously. Prepending (not appending) ensures
the override takes priority over conflicting instructions later in
the prompt.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-25 14:08:05 -06:00
Lex Christopherson
25d9763878 fix(sdk): fix executor plan loading, plan ID derivation, and verification outcome parsing
Bug 1: phasePlanIndex derived empty planId for bare PLAN.md files.
Fixed to use 'PLAN' as the ID, with matching SUMMARY.md detection.

Bug 2: executeSinglePlan passed null to buildPrompt instead of the
actual parsed plan. The executor needs the plan content (tasks,
objectives) to know what to build. Now loads and parses the plan
file before building the prompt.

Bug 3: parseVerificationOutcome checked session exit code, not what
the verifier wrote. A session that runs without errors but writes
status: gaps_found to VERIFICATION.md was treated as 'passed'. Now
queries check.verification-status to read the actual VERIFICATION.md
frontmatter status field.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-25 14:08:05 -06:00
Lex Christopherson
377a6d2c6e fix(sdk): use installed agent/workflow prompts instead of stripped-down bundled copies
The SDK bundled its own agents and workflows at ~17% the size of the real
ones, missing critical instructions like file naming conventions, scope
reduction rules, discovery protocols, and TDD integration. This caused
the planner to create a single PLAN.md instead of properly named
per-plan files (01-01-PLAN.md, 01-02-PLAN.md), breaking wave-based
parallel execution.

- Invert load priority: installed GSD agents/workflows first, SDK
  bundled as last-resort fallback
- Replace @-reference stripping with resolution (read + inline content)
- Use full agent definitions instead of extracting only the <role> block
- Delete sdk/prompts/agents/ and sdk/prompts/workflows/ (13 files)
- Delete headless-prompts.test.ts (validated deleted files)
- Thread projectDir through sanitizePrompt for @-reference resolution

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-25 14:08:05 -06:00
Tom Boucher
1068223439 feat(#2500): enrich gsd-codebase-mapper arch-focus ARCHITECTURE.md with ASCII diagrams, data flow traces, and constraints (#2715)
* feat(#2500): enrich gsd-codebase-mapper arch-focus ARCHITECTURE.md template

The codebase mapper's arch-focus template was a sparse structural inventory.
After major refactors, the research/ARCHITECTURE.md (created at /gsd-new-project
and never refreshable) went stale while the refreshable codebase version lacked
the visual richness that makes architecture docs useful for planning.

Add to the ARCHITECTURE.md template:
- <!-- refreshed: {date} --> marker at the top (maintainer request)
- ASCII system overview diagram with component boxes and flow arrows
- Component responsibility table (Component / Responsibility / File)
- Primary request path traces with numbered steps and code references
- Architectural constraints section (threading, global state, circular imports)
- Anti-patterns section with codebase-specific patterns and correct alternatives

All existing sections (Pattern Overview, Layers, Key Abstractions, Entry Points,
Error Handling, Cross-Cutting Concerns) are preserved.

7 new tests in tests/enh-2500-codebase-mapper-arch-rich-format.test.cjs verify
each required section is present in the deployed template.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(#2500): resolve CodeRabbit review findings

- Add 'text' language tag to bare ASCII diagram fenced block (markdownlint MD040)
- Tighten data flow test: require '### Primary Request Path' heading, 3+
  numbered steps, and file:line reference pattern — prevents loose-match
  false positives
- Tighten constraints test: require '## Architectural Constraints' heading
  AND Threading / Global state / Circular imports tokens — prevents broad
  keyword matches masking regressions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 14:23:40 -04:00
Tom Boucher
b40110111d feat(#2306): plan-review-convergence v2 — CYCLE_SUMMARY contract, config gate, local model reviewers (#2718)
* feat(#2306): plan-review-convergence v2 — CYCLE_SUMMARY contract, config gate, local model reviewers

Fixes the false-stall detection bug in the plan→review→replan convergence
loop. REVIEWS.md accumulates history across cycles so raw grep inflated
HIGH counts; HIGH count now comes from a per-cycle CYCLE_SUMMARY contract
emitted in the review agent's return message.

Key changes:
- workflow.plan_review_convergence config gate (disabled by default, same
  pattern as workflow.code_review / workflow.nyquist_validation)
- Review agent prompt defines CYCLE_SUMMARY: current_high=<N> contract with
  PARTIALLY RESOLVED / FULLY RESOLVED counting rules
- Orchestrator aborts on absent/malformed CYCLE_SUMMARY (distinguishes both)
- Warns when HIGH_COUNT > 0 but ## Current HIGH Concerns section is missing
- Stall detection and --ws forwarding preserved and tested
- Local model reviewers: --ollama, --lm-studio, --llama-cpp flags added to
  convergence workflow and review workflow; all three use OpenAI-compatible
  /v1/chat/completions endpoint with jq --rawfile for safe JSON encoding
- review.ollama_host / review.lm_studio_host / review.llama_cpp_host config
  keys registered and documented (default to localhost:11434/1234/8080)
- review.models.ollama / .lm_studio / .llama_cpp model-name config support
- 58 tests (up from 29 in PR #2339), all passing

Closes #2306
Closes #2339
Co-authored-by: Tom Boucher <trekkie@nomorestars.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): sync sdk/src/query/config-schema.ts with CJS schema (#2306)

Add workflow.plan_review_convergence, review.ollama_host,
review.lm_studio_host, and review.llama_cpp_host to the SDK-side
TypeScript mirror — required by the CJS↔SDK parity test (#2653).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(#2306): resolve CodeRabbit review findings

- Anchor HIGH_COUNT extraction with head -1 to prevent multi-match when
  agent return message contains multiple CYCLE_SUMMARY lines (e.g. quoted
  back from prompt context)
- Replace hardcoded reviewers list in REVIEWS.md frontmatter template with
  runtime-derived placeholder — the static list did not reflect which
  reviewers were actually invoked
- Broaden workflow.plan_review_convergence docs to include local reviewers
  (Ollama, LM Studio, llama.cpp) alongside cloud reviewers

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): restore reviewers frontmatter list with runtime note

The cursor-reviewer.test.cjs (and equivalent per-reviewer tests) assert
that each supported reviewer appears on the reviewers: line — these are
wiring tests that catch when a new reviewer is added to invocation but
not to the REVIEWS.md template. Replacing the list with a placeholder
broke those tests.

Restore the full static list and add an inline comment clarifying that
the actual committed frontmatter should be filtered to only the reviewers
invoked that run — satisfying both the per-reviewer tests and the
CodeRabbit correctness note.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 14:18:34 -04:00
Tom Boucher
3da9420a38 fix(#2698,#2678): CRLF agent-block strip regex + local install skips SDK check (#2710)
Fixes #2698 — The two separate LF/CRLF .replace() calls in the Codex hooks
migration could not handle mixed line endings (e.g. header in LF, body in
CRLF), leaving stale gsd-update-check blocks after reinstall. Consolidated to
a single \r?\n-aware regex with gm flags that handles LF, CRLF, and mixed
content in one pass.

Fixes #2678 — installSdkIfNeeded() called process.exit(1) unconditionally when
sdk/dist/cli.js was missing, even during --local installs where users cannot
write to global node_modules. Added isLocal option: when true, prints a warning
and returns instead of exiting.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 12:16:04 -04:00
Tom Boucher
3fe5759d7c fix(#2686): review-fix agent uses isolated git worktree, prevents main-tree race (#2705)
* fix(#2686): review-fix agent now uses git worktree for isolation

The gsd-code-fixer agent operated directly against the main working tree,
racing any concurrent foreground session for HEAD, the index, and on-disk
files. Added a setup_worktree step (git worktree add /tmp/sv-N-reviewfix
HEAD) as the first action before any file operations, with unconditional
git worktree remove cleanup on exit. Mirrors the pattern used by all other
GSD per-issue agents.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(#2686): address CodeRabbit review — mktemp unique path, branch-aware worktree, tighten test assertions

- Use mktemp -d for unique worktree path (prevents concurrent-run collision)
- Resolve branch via git branch --show-current before worktree add (prevents detached HEAD)
- Error-and-exit on worktree add failure instead of force-removing shared path
- Test: use .exec().index for checkout position (not indexOf on match string)
- Test: match gsd-sdk query commit as well as git commit for ordering assertion
- Test: tighten /tmp path assertion to require actual /tmp/sv- assignment

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 12:15:39 -04:00
Tom Boucher
8e21c9b1b7 fix(#2684,#2676): milestone.complete arg forwarding + parallel milestone phase routing (#2708)
* test(#2692): add behavioral --wave N test, annotate source-text assertions

Adds two behavioral tests for wave filtering via phase-plan-index:
- Verifies plans with wave frontmatter are correctly grouped by wave number
- Verifies plans with no wave field default to wave 1

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(#2684,#2676): milestone.complete version validation + parallel milestone phase routing

#2684: Confirms milestone.complete correctly validates and uses its version
argument end-to-end. The inline archive path in milestoneComplete already
forwarded version correctly; regression tests lock in that contract.

#2676: phase.complete applied getMilestonePhaseFilter unconditionally, using
STATE.md's primary milestone to scope the candidate set. When the completed
phase belongs to a parallel (secondary) milestone, the filter excluded all
phases from that milestone, leaving an empty candidate set and incorrectly
returning is_last_phase: true / next_phase: null.

Fix: before applying the milestone filter in Step E, check whether the
completed phase itself appears in the filtered set. If not, skip the filter
for both the directory scan and the ROADMAP.md fallback so phases from the
secondary milestone remain visible for next-phase detection.

Closes #2684
Closes #2676

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 12:10:01 -04:00
Tom Boucher
8393f4b355 fix(#2601): config-set-model-profile now accepts 'inherit' value (#2707)
VALID_PROFILES was derived solely from Object.keys(MODEL_PROFILES['gsd-planner']),
which only contained the named tiers (quality/balanced/budget/adaptive). The
cmdConfigSetModelProfile validator rejected 'inherit' even though the runtime has
supported it since #1829. Fix: append 'inherit' to VALID_PROFILES and handle it
in getAgentToModelMapForProfile so the agent→model table shows 'inherit' instead
of undefined.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 12:09:27 -04:00
Tom Boucher
b7ff14fe51 fix(#2687): derive KNOWN_TOP_LEVEL from DYNAMIC_KEY_PATTERNS to eliminate read-side drift (#2706)
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 12:08:59 -04:00
Tom Boucher
94f8e895c0 test(#2692): add behavioral --wave N test, annotate source-text assertions (#2704)
Adds two behavioral tests for wave filtering via phase-plan-index:
- Verifies plans with wave frontmatter are correctly grouped by wave number
- Verifies plans with no wave field default to wave 1

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 12:04:16 -04:00
Tom Boucher
70f01e0c57 test(#2695): replace CR-CONFIG source-grep + config bypass tests with behavioral config-set assertions (#2702)
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 12:03:39 -04:00
Tom Boucher
56ae7a73f5 test(#2694): eliminate shared mutable content state — move readFileSync to describe scope (#2703)
Fixes #2694

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 12:02:58 -04:00
Tom Boucher
aeef87de7f docs(test-standards): enforce no-source-grep rule with CI linter + CONTRIBUTING.md (#2700)
* docs(test-standards): enforce no-source-grep rule with CI linter + update CONTRIBUTING.md

Adds scripts/lint-no-source-grep.cjs — a static linter that detects readFileSync
on .cjs source files in tests without an allow-test-rule annotation. Wires it
into CI as a new lint-tests job in test.yml and as npm run lint:tests.

Resolves all 9 existing violations across the test suite:
- Rewrites workspace routing tests (3) as behavioral runGsdTools calls that
  verify each command is router-recognized (exit != "Unknown init workflow")
- Adds allow-test-rule annotations with explanatory comments to 7 legitimate
  structural tests: architectural invariants (locking, orphan-worktree),
  structural regression guards (milestone-regex-global), docs-parity
  (config-field-docs), integration-test-input (copilot-install), and
  structural-implementation-guards (bug-1891, discuss-mode)

Updates CONTRIBUTING.md Testing Standards section with:
- "Prohibited: Source-Grep Tests" section with the before/after pattern,
  root cause analysis of why it breaks (commit 990c3e64), and CI reference
- allow-test-rule exemption table (6 recognized categories with when-to-use)
- "CI Test Quality Checks" table showing lint-tests job and local run command

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: resolve CodeRabbit findings on PR #2700

- CONTRIBUTING.md: "four recognized categories" → "six" (table has 6 rows)
- workspace.test.cjs: use positional args in routing tests (no --name flag)
- lint-no-source-grep.cjs: add source-dir guard to READ_WITH_INLINE_CJS_RE
  (mirrors CJS_PATH_CONST_RE's protection against false positives on temp files)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lint): tighten allow-test-rule and add recursive test discovery

- ALLOW_ANNOTATION now requires at least one non-whitespace char after the
  colon so bare '// allow-test-rule:' cannot bypass the lint gate
- findTestFiles() recurses into subdirectories so nested *.test.cjs files
  are covered if the tests/ tree ever grows subdirs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 11:34:55 -04:00
Tom Boucher
b1a670e662 fix(#2697): replace retired /gsd: prefix with /gsd- in all user-facing text (#2699)
All workflow, command, reference, template, and tool-output files that
surfaced /gsd:<cmd> as a user-typed slash command have been updated to
use /gsd-<cmd>, matching the Claude Code skill directory name.

Closes #2697

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 10:59:33 -04:00
Tom Boucher
7c6f8005f3 test: destroy 9 config-schema.cjs/core.cjs source-grep tests, replace with behavioral config-set (#2696)
* test: destroy 9 config-schema.cjs/core.cjs source-grep tests, add behavioral config-set tests (#2691, #2693)

Replace source-grep theater with config-set behavioral tests:
- execute-phase-wave: config-set workflow.use_worktrees replaces VALID_CONFIG_KEYS grep
- inline-plan-threshold: delete redundant source-grep (behavioral test at L36 already covered it)
- plan-bounce: config-set for plan_bounce / plan_bounce_script / plan_bounce_passes replaces 3 key-presence greps
- code-review: config-set for code_review / code_review_depth replaces 2 greps; removes CONFIG_PATH constant
- thinking-partner: config-set features.thinking_partner replaces two greps (config-schema.cjs AND core.cjs)

Behavioral tests survive refactors (no path constants, no file reads). The config-schema.cjs →
core.cjs migration commit 990c3e64 happened because these tests groped source paths.

Add allow-test-rule: source-text-is-the-product annotations to legitimate product-content tests:
autonomous-allowed-tools, agent-frontmatter, agent-skills-awareness, bug-2334, bug-2346,
execute-phase-wave (MD reads), plan-bounce (workflow reads). Annotations explain WHY text
inspection is the right level of testing for AI instruction files.

Closes #2691
Closes #2693

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test: address CodeRabbit findings on #2696

- agent-frontmatter.test.cjs: move allow-test-rule annotation from block comment
  to standalone // line comment so rule scanners can detect it
- thinking-partner.test.cjs: strengthen config-set test with config-get read-back
  assertion to verify the value was persisted, not just accepted (exit 0)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test: tighten thinking_partner config assertion per CodeRabbit (#2696)

Replace config-get output substring check (includes('true') false-positive
risk) with a direct JSON read of .planning/config.json, asserting the
exact persisted value via strictEqual. This also validates the config file
was created, catching silent key-acceptance without persistence.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 10:50:54 -04:00
Tom Boucher
cd05725576 fix(#2661): unconditional plan-checkbox sync in execute-plan (#2682)
* fix(#2661): unconditional plan-checkbox sync in execute-plan

Checkpoint A in execute-plan.md was wrapped in a "Skip in parallel mode"
guard that also short-circuited the parallelization-without-worktrees
case. With `parallelization: true, use_worktrees: false`, only
Checkpoint C (phase.complete) then remained, and any interruption
between the final SUMMARY write and phase complete left ROADMAP.md
plan checkboxes stale.

Remove the guard: `roadmap update-plan-progress` is idempotent and
atomically serialized via readModifyWriteRoadmapMd's lockfile, so
concurrent invocations from parallel plans converge safely.

Checkpoint B (worktree-merge post-step) and Checkpoint C
(phase.complete) become redundant after A is unconditional; their
removal is deferred to a follow-up per the RCA.

Closes #2661

* fix(#2661): gate ROADMAP sync on use_worktrees=false to preserve single-writer contract

Adversarial review of PR #2682 found that unconditionally removing the
IS_WORKTREE guard violates the single-writer contract for shared
ROADMAP.md established by commit dcb50396 (PR #1486). The lockfile only
serializes within a single working tree; separate worktrees have
separate ROADMAP.md files that diverge.

Restore the worktree guard but document its intent explicitly: the
in-handler sync runs only when use_worktrees=false (the actual #2661
reproducer). Worktree mode relies on the orchestrator's post-merge
update at execute-phase.md lines 815-834, which is the documented
single-writer for shared tracking files.

Update tests to assert both branches of the gate:
- use_worktrees: false mode runs the sync (the #2661 case)
- use_worktrees: true mode does NOT run the in-handler sync
- handler-level idempotence and lockfile contention tests retained,
  scope clarified to within-tree concurrency only
2026-04-24 20:27:59 -04:00
Tom Boucher
c811792967 fix(#2660): capture prose after labeled bold in extractOneLinerFromBody (#2679)
* fix(#2660): capture prose after label in extractOneLinerFromBody

The regex `\*\*([^*]+)\*\*` matched the first bold span, so for the new
SUMMARY template `**One-liner:** Real prose here.` it captured the label
`One-liner:` instead of the prose. MILESTONES.md then wrote bullets like
`- One-liner:` with no content.

Handle both template forms:
- Labeled:  `**One-liner:** prose`  → prose
- Bare:     `**prose**`             → prose (legacy)

Empty prose after a label returns null so no bogus bullets are emitted.

Note: existing MILESTONES.md entries generated under the bug are not
regenerated here — that is a follow-up.

Closes #2660

* fix(#2660): normalize CRLF before one-liner extraction

Windows-authored SUMMARY files use CRLF line endings; the LF-only regex
in extractOneLinerFromBody would fail to match. Normalize \r\n and \r
to \n before stripping frontmatter and matching the one-liner pattern.

Adds test case (h) covering CRLF input.
2026-04-24 20:22:29 -04:00
Tom Boucher
34b39f0a37 test(#2659): regression guard against bare output() in audit-open handler (#2680)
* fix(#2659): qualify bare output() calls in audit-open handler

The audit-open dispatch case in bin/gsd-tools.cjs previously called bare
output() on both --json and text branches, which crashed with
ReferenceError: output is not defined. The core module is imported as
`const core`, so every other case uses core.output(). HEAD already
qualifies the calls correctly; this commit adds a regression test that
invokes `audit-open` and `audit-open --json` through runGsdTools and
asserts a clean exit plus non-empty stdout (and an explicit check that
the failure mode is not ReferenceError). The test fails on any revision
where either call reverts to bare output().

Closes #2659

* test(#2659): assert valid JSON output in --json mode

CodeRabbit nit: tighten --json regression coverage by parsing stdout
and asserting the result is a JSON object/array, not just non-empty.
2026-04-24 20:22:17 -04:00
Tom Boucher
b1278f6fc3 fix(#2674): align initProgress with initManager ROADMAP [x] precedence (#2681)
initProgress computed phase status purely from disk (PLAN/SUMMARY counts),
consulting the ROADMAP `- [x] Phase N` checkbox only for phases with no
directory. initManager, by contrast, applied an explicit override: a
ROADMAP `[x]` forces status to `complete` regardless of disk state.

Result: a phase with a stub directory (no SUMMARY.md) and a ticked
ROADMAP checkbox reported `complete` from /gsd-manager and `pending`
from /gsd-progress — same data, different answer.

Apply ROADMAP-[x]-wins as the unified policy inside initProgress, mirroring
initManager's override. A user who typed `- [x] Phase 3` has made an
explicit assertion; a leftover stub dir is the weaker signal.

Adds sdk/src/query/init-progress-precedence.test.ts covering six cases
(stub dir + [x], full dir + [x], full dir + [ ], stub dir + [ ],
ROADMAP-only + [x], and completed_count parity). Pre-fix: cases 1 and 6
failed. Post-fix: all six pass. No existing tests were modified.

Closes #2674
2026-04-24 20:20:11 -04:00
Tom Boucher
303fd26b45 fix(#2662): add state.add-roadmap-evolution SDK handler; insert-phase uses it (#2683)
/gsd-insert-phase step 4 instructed the agent to directly Edit/Write
.planning/STATE.md to append a Roadmap Evolution entry. Projects that
ship a protect-files.sh PreToolUse hook (a recommended hardening
pattern) blocked the raw write, silently leaving STATE.md out of sync
with ROADMAP.md.

Adds a dedicated SDK handler state.add-roadmap-evolution (plus space
alias) that:

  - Reads STATE.md through the shared readModifyWriteStateMd lockfile
    path (matches sibling mutation handlers — atomic against
    concurrent writers).
  - Locates ### Roadmap Evolution under ## Accumulated Context, or
    creates both sections as needed.
  - Dedupes on exact-line match so idempotent retries are no-ops
    ({ added: false, reason: "duplicate" }).
  - Validates --phase / --action presence and action membership,
    throwing GSDError(Validation) for bad input (no silent
    { ok: false } swallow).

Workflow change (insert-phase.md step 4):

  - Replaces the raw Edit/Write instructions for STATE.md with
    gsd-sdk query state.patch (for the next-phase pointer) and
    gsd-sdk query state.add-roadmap-evolution (for the evolution
    log).
  - Updates success criteria to check handler responses.
  - Drops "Write" from commands/gsd/insert-phase.md allowed-tools
    (no step in the workflow needs it any more).

Tests (vitest, sdk/src/query/state-mutation.test.ts): subsection
creation when missing; append-preserving-order when present;
duplicate -> reason=duplicate; idempotence over two calls; three
validation cases covering missing --phase, missing --action, and
invalid action.

This is the first SDK handler dedicated to STATE.md Roadmap
Evolution mutations. Other workflows with similar raw STATE.md
edits (/gsd-pause-work, /gsd-resume-work, /gsd-new-project,
/gsd-complete-milestone, /gsd-add-phase) remain on raw Edit/Write
and will need follow-up issues to migrate — out of scope for this
fix.

Closes #2662
2026-04-24 20:20:02 -04:00
Tom Boucher
7b470f2625 fix(#2633): ROADMAP.md is the authority for current-milestone phase counts (#2665)
* fix(#2633): use ROADMAP.md as authority for current-milestone phase counts

initMilestoneOp (SDK + CJS) derives phase_count and completed_phases from
the current milestone section of ROADMAP.md instead of counting on-disk
`.planning/phases/` directories. After `phases clear` at the start of a new
milestone the on-disk set is a subset of the roadmap, causing premature
`all_phases_complete: true`.

validateHealth W002 now unions ROADMAP.md phase declarations (all milestones
— current, shipped, backlog) with on-disk dirs when checking STATE.md phase
refs. Eliminates false positives for future-phase refs in the current
milestone and history-phase refs from shipped milestones.

Falls back to legacy on-disk counting when ROADMAP.md is missing or
unparseable so no-roadmap fixtures still work.

Adds vitest regressions for both handlers; all 66 SDK + 118 CJS tests pass.

* fix(#2633): preserve full phase tokens in W002 + completion lookup

CodeRabbit flagged that the parseInt-based normalization collapses distinct
phase IDs (3, 3A, 3.1) into the same integer bucket, masking real
STATE/ROADMAP mismatches and miscounting completions in milestones with
inserted/sub-phases.

Index disk dirs and validate STATE.md refs by canonical full phase token —
strip leading zeros from the integer head only, preserve [A-Z] suffix and
dotted segments, and accept just the leading-zero variant of the integer
prefix as a tolerated alias. 3A and 3 never share a bucket.

Also widens the disk and STATE.md regexes to accept [A-Z]? suffix tokens.
2026-04-24 18:11:12 -04:00
Tom Boucher
c8ae6b3b4f fix(#2636): surface gsd-sdk query failures and add workflow↔handler parity check (#2656)
* fix(#2636): surface gsd-sdk query failures and add workflow↔handler parity check

Root cause: workflows invoked `gsd-sdk query agent-skills <slug>` with a
trailing `2>/dev/null`, swallowing stderr and exit code. When the installed
`@gsd-build/sdk` npm was stale (pre-query), the call resolved to an empty
string and `agent_skills.<slug>` config was never injected into spawn
prompts — silently. The handler exists on main (sdk/src/query/skills.ts),
so this is a publish-drift + silent-fallback bug, not a missing handler.

Fix:
- Remove bare `2>/dev/null` from every `gsd-sdk query agent-skills …`
  invocation in workflows so SDK failures surface to stderr.
- Apply the same rule to other no-fallback calls (audit-open, write-profile,
  generate-* profile handlers, frontmatter.get in commands). Best-effort
  cleanup calls (config-set workflow._auto_chain_active false) keep
  exit-code forgiveness via `|| true` but no longer suppress stderr.

Parity tests:
- New: tests/bug-2636-gsd-sdk-query-silent-swallow.test.cjs — fails if any
  `gsd-sdk query agent-skills … 2>/dev/null` is reintroduced.
- Existing: tests/gsd-sdk-query-registry-integration.test.cjs already
  asserts every workflow noun resolves to a registered handler; confirmed
  passing post-change.

Note: npm republish of @gsd-build/sdk is a separate release concern and is
not included in this PR.

* fix(#2636): address review — restore broken markdown fences and shell syntax

The previous commit's mass removal of '2>/dev/null' suffixes also
collapsed adjacent closing code fences and 'fi' tokens onto the
command line, producing malformed markdown blocks and 'truefi' /
'true   fi' shell syntax errors in the workflows.

Repaired sites:
- commands/gsd/quick.md, thread.md (frontmatter.get fences)
- workflows/complete-milestone.md (audit-open fence)
- workflows/profile-user.md (write-profile + generate-* fences)
- workflows/verify-work.md (audit-open --json fence)
- workflows/execute-phase.md (truefi -> true / fi)
- workflows/plan-phase.md, discuss-phase-assumptions.md,
  discuss-phase/modes/chain.md (true   fi -> true / fi)

All 5450 tests pass.
2026-04-24 18:10:45 -04:00
Tom Boucher
7ed05c8811 fix(#2645): emit [[agents]] array-of-tables in Codex config.toml (#2664)
* fix(#2645): emit [[agents]] array-of-tables in Codex config.toml

Codex ≥0.116 rejects `[agents.<name>]` map tables with `invalid type:
map, expected a sequence`. Switch generateCodexConfigBlock to emit
`[[agents]]` array-of-tables with an explicit `name` field per entry.

Strip + merge paths now self-heal on reinstall — both the legacy
`[agents.gsd-*]` map shape (pre-#2645 configs) and the new
`[[agents]]` with `name = "gsd-*"` shape are recognized and replaced,
while user-authored `[[agents]]` entries are preserved.

Fixes #2645

* fix(#2645): use TOML-aware parser to strip managed [[agents]] sections

CodeRabbit flagged that the prior regex-based stripper for [[agents]]
array-of-tables only matched headers at column 0 and stopped at any line
beginning with `[`. An indented [[agents]] header would not terminate the
preceding match, so a managed `gsd-*` block could absorb a following
user-authored agent and silently delete it.

Replace the ad-hoc regex with the existing TOML-aware section parser
(getTomlTableSections + removeContentRanges) so section boundaries are
authoritative regardless of indentation. Same logic applies to legacy
[agents.gsd-*] map sections.

Add a comprehensive mixed-shape test covering multiple GSD entries (both
legacy map and new array-of-tables, double- and single-quoted names)
interleaved with multiple user-authored agents in both shapes — verifies
all GSD entries are stripped and every user entry is preserved.
2026-04-24 18:09:01 -04:00
Tom Boucher
0f8f7537da fix(#2652): layer ~/.gsd/defaults.json over built-ins in SDK loadConfig (#2663)
* fix(#2652): layer ~/.gsd/defaults.json over built-ins in SDK loadConfig

SDK loadConfig only merged built-in CONFIG_DEFAULTS, so pre-project init
queries (e.g. resolveModel in Codex installs) ignored user-level knobs like
resolve_model_ids: "omit" and emitted Claude model aliases from MODEL_PROFILES.

Port the user-defaults layer from get-shit-done/bin/lib/config.cjs:65 to the
TS loader. CJS parity: user defaults only apply when no .planning/config.json
exists (buildNewProjectConfig already bakes them in at /gsd:new-project time).

Fixes #2652

* fix(#2652): isolate GSD_HOME in test, refresh loadConfig JSDoc (CodeRabbit)
2026-04-24 18:08:07 -04:00
Tom Boucher
709f0382bf fix(#2639): route Codex TOML emit through full Claude→Codex neutralization pipeline (#2657)
installCodexConfig() applied a narrow path-only regex pass before
generateCodexAgentToml(), skipping the convertClaudeToCodexMarkdown() +
neutralizeAgentReferences(..., 'AGENTS.md') pipeline used on the .md emit
path. Result: emitted Codex agent TOMLs carried stale Claude-specific
references (CLAUDE.md, .claude/skills/, .claude/commands/, .claude/agents/,
.claudeignore, bare "Claude" agent-name mentions).

Route the TOML path through convertClaudeToCodexMarkdown and extend that
pipeline to cover bare .claude/<subdir>/ references and .claudeignore
(both previously unhandled on the .md path too). The $HOME/.claude/
get-shit-done prefix substitution still runs first so the absolute Codex
install path is preserved before the generic .claude → .codex rewrite.

Regression test: tests/issue-2639-codex-toml-neutralization.test.cjs —
drives installCodexConfig against a fixture containing every flagged
marker and asserts the emitted TOML contains zero CLAUDE.md / .claude/
/ .claudeignore occurrences and that Claude Code / Claude Opus product
names survive.

Fixes #2639
2026-04-24 18:06:13 -04:00
Tom Boucher
a6e692f789 fix(#2646): honor ROADMAP [x] checkboxes when no phases/ directory exists (#2669)
initProgress (and its CJS twin) hardcoded `not_started` for ROADMAP-only
phases, so `completed_count` stayed at 0 even when the ROADMAP showed
`- [x] Phase N`. Extract ROADMAP checkbox states into a shared helper
and use `- [x]` as the completion signal when no phase directory is
present. Disk status continues to win when both exist.

Adds a regression test that reproduces the bug with no phases/ dir and
one `[x]` / one `[ ]` phase, asserting completed_count===1.

Fixes #2646
2026-04-24 18:05:41 -04:00
Tom Boucher
b67ab38098 fix(#2643): align skill frontmatter name with workflow gsd: emission (#2672)
Flat-skills installs write SKILL.md files under gsd-<cmd>/ dirs, but
Claude Code resolves skills by their frontmatter `name:`, not directory
name. PR #2595 normalized every `/gsd-<cmd>` to `/gsd:<cmd>` across
workflows — including inside `Skill(skill="...")` args — but the
installer still emitted `name: gsd-<cmd>`, so every Skill() call on a
flat-skills install resolved to nothing.

Fix: emit `name: gsd:<cmd>` (colon form) in
`convertClaudeCommandToClaudeSkill`. Keep the hyphen-form directory
name for Windows path safety.

Codex stays on hyphen form: its adapter invokes skills as `$gsd-<cmd>`
(shell-var syntax) and a colon would terminate the variable name.
`convertClaudeCommandToCodexSkill` uses `yamlQuote(skillName)` directly
and is untouched.

- Extract `skillFrontmatterName(dirName)` helper (exported for tests).
- Update claude-skills-migration and qwen-skills-migration assertions
  that encoded the old hyphen emission.
- Add `tests/bug-2643-skill-frontmatter-name.test.cjs` asserting every
  `Skill(skill="gsd:<cmd>")` reference in workflows resolves to an
  emitted frontmatter name.

Full suite: 5452/5452 passing.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 18:05:40 -04:00
Tom Boucher
06463860e4 fix(#2638): write sub_repos to canonical planning.sub_repos (#2668)
loadConfig's multiRepo migration and filesystem-sync writers targeted the
top-level parsed.sub_repos, but KNOWN_TOP_LEVEL (the unknown-key validator's
allowlist) only recognizes planning.sub_repos (canonical per #2561). Each
migration/sync therefore persisted a key the next loadConfig call warned was
unknown.

Redirect both writers to parsed.planning.sub_repos, ensuring parsed.planning
is initialized first. Also self-heal legacy/buggy installs by stripping any
stale top-level sub_repos on load, preserving its value as the
planning.sub_repos seed if that slot is empty.

Tests cover: (a) canonical planning.sub_repos emits no warning, (b) multiRepo
migration writes to planning.sub_repos with no top-level residue,
(c) filesystem sync relocates to planning.sub_repos, (d) stale top-level
sub_repos from older buggy installs is stripped on load.

Closes #2638
2026-04-24 18:05:33 -04:00
Tom Boucher
259c1d07d3 fix(#2647): guard tarball ships sdk/dist so gsd-sdk query works (#2671)
v1.38.3 shipped without sdk/dist/ because the outer `files` whitelist
and `prepublishOnly` chain had drifted. The `gsd-sdk` bin shim then
fell through to a stale @gsd-build/sdk@0.1.0 (pre-`query`), breaking
every workflow that called `gsd-sdk query <noun>` on fresh installs.

Current package.json already restores `sdk/dist` + `build:sdk`
prepublish; this PR locks the fix in with:

- tests/bug-2647-outer-tarball-sdk-dist.test.cjs — asserts `files`
  includes `sdk/dist`, `prepublishOnly` invokes `build:sdk`, the
  shim resolves sdk/dist/cli.js, `npm pack --dry-run` lists
  sdk/dist/cli.js, and the built CLI exposes a `query` subcommand.
- scripts/verify-tarball-sdk-dist.sh — packs, extracts, installs
  prod deps, and runs `node sdk/dist/cli.js query --help` against
  the real tarball output.
- .github/workflows/release.yml — runs the verify script in both
  next and stable release jobs before `npm publish`.

Partial fix for #2649 (same root cause on the sibling sdk package).

Fixes #2647
2026-04-24 18:05:18 -04:00
Tom Boucher
387c8a1f9c fix(#2653): eliminate SDK↔CJS config-schema drift (#2670)
The SDK's config-set kept its own hand-maintained allowlist (28-key
drift vs. get-shit-done/bin/lib/config-schema.cjs), so documented
keys accepted by the CJS config-set — planning.sub_repos,
workflow.code_review_command, workflow.security_*, review.models.*,
model_profile_overrides.*, etc. — were rejected with
"Unknown config key" when routed through the SDK.

Changes:
- New sdk/src/query/config-schema.ts mirrors the CJS schema exactly
  (exact-match keys + dynamic regex sources).
- config-mutation.ts imports VALID_CONFIG_KEYS / DYNAMIC_KEY_PATTERNS
  from the shared module instead of rolling its own set and regex
  branches.
- Drop hand-coded agent_skills.* / features.* regex branches —
  now schema-driven so claude_md_assembly.blocks.*, review.models.*,
  and model_profile_overrides.<runtime>.<tier> are also accepted.
- Add tests/config-schema-sdk-parity.test.cjs (node:test) as the
  CI drift guard: asserts CJS VALID_CONFIG_KEYS set-equals the
  literal set parsed from config-schema.ts, and that every CJS
  dynamic pattern source has an identical counterpart in the SDK.
  Parallel to the CJS↔docs parity added in #2479.
- Vitest #2653 specs iterate every CJS key through the SDK
  validator, spot-check each dynamic pattern, and lock in
  planning.sub_repos.
- While here: add workflow.context_coverage_gate to the CJS schema
  (already in docs and SDK; CJS previously rejected it) and sync
  the missing curated typo-suggestions (review.model, sub_repos,
  plan_checker, workflow.review_command) into the SDK.

Fixes #2653.
2026-04-24 18:05:16 -04:00
Tom Boucher
e973ff4cb6 fix(#2630): reset STATE.md frontmatter atomically on milestone switch (#2666)
The /gsd:new-milestone workflow Step 5 rewrote STATE.md's Current Position
body but never touched the YAML frontmatter, so every downstream reader
(state.json, getMilestoneInfo, progress bars) kept reporting the stale
milestone until the first phase advance forced a resync. Asymmetric with
milestone.complete, which uses readModifyWriteStateMdFull.

Add a new `state milestone-switch` handler (both SDK and CJS) that atomically:
- Stomps frontmatter milestone/milestone_name with caller-supplied values
- Resets status to 'planning' and progress counters to zero
- Rewrites the ## Current Position section to the new-milestone template
- Preserves Accumulated Context (decisions, blockers, todos)

Wire the workflow Step 5 to invoke `state.milestone-switch` instead of the
manual body rewrite. Note the flag is `--milestone` not `--version`:
gsd-tools reserves `--version` as a globally-invalid help flag.

Red vitest in sdk/src/query/state-mutation.test.ts asserts the frontmatter
reset. Regression guard via node:test in tests/bug-2630-*.test.cjs runs
through gsd-tools end-to-end.

Fixes #2630

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 18:05:10 -04:00
Tom Boucher
8caa7d4c3a fix(#2649): installer fail-fast when sdk/dist missing in npx cache (#2667)
Root cause shared with #2647: a broken 1.38.3 tarball shipped without
sdk/dist/. The pre-#2441-decouple installer reacted by running
spawnSync('npm.cmd', ['install'], { cwd: sdkDir }) inside the npx cache
on Windows, where the cache is read-only, producing the misleading
"Failed to npm install in sdk/" error.

Defensive changes here (user-facing behavior only; packaging fix lives
in the sibling PR for #2647):

- Classify the install context (classifySdkInstall): detect npx cache
  paths, node_modules-based installs, and dev clones via path heuristics
  plus a side-effect-free write probe. Exported for test.
- Rewrite the dist-missing error to branch on context:
    tarball + npxCache -> "don't touch npx cache; npm i -g ...@latest"
    tarball (other)    -> upgrade path + clone-build escape hatch
    dev-clone          -> keep existing cd sdk && npm install && npm run build
- Preserve the invariant that the installer never shells out to
  npm install itself — users always drive that.
- Add tests/bug-2649-sdk-fail-fast.test.cjs covering the classifier and
  both failure messages, with spawnSync/execSync interceptors that
  assert no nested npm install is attempted.

Cross-ref: #2647 (packaging).

Fixes #2649

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 18:05:04 -04:00
forfrossen
a72bebb379 fix(workflows): agent-skills query keys must match subagent_type (follow-up to #2555) (#2616)
* fix(workflows): agent-skills query keys must match subagent_type

Eight workflow files called `gsd-sdk query agent-skills <KEY>` with
a key that did not match any `subagent_type` Task() spawns in the
same workflow (or any existing `agents/<KEY>.md`):

- research-phase.md:45 — gsd-researcher    → gsd-phase-researcher
- plan-phase.md:36     — gsd-researcher    → gsd-phase-researcher
- plan-phase.md:38     — gsd-checker       → gsd-plan-checker
- quick.md:145         — gsd-checker       → gsd-plan-checker
- verify-work.md:36    — gsd-checker       → gsd-plan-checker
- new-milestone.md:207 — gsd-synthesizer   → gsd-research-synthesizer
- new-project.md:63    — gsd-synthesizer   → gsd-research-synthesizer
- ui-review.md:21      — gsd-ui-reviewer   → gsd-ui-auditor
- discuss-phase.md:114 — gsd-advisor       → gsd-advisor-researcher

Effect before this fix: users configuring `agent_skills.<correct-type>`
in .planning/config.json got no injection on these paths because the
workflow asked the SDK for a different (non-existent) key. The SDK
correctly returned "" for the unknown key, which then interpolated as
an empty string into the Task() prompt. Silent no-op.

The discuss-phase advisor case is a subtle variant — the spawn site
uses `subagent_type="general-purpose"` and loads the agent role via
`Read(~/.claude/agents/gsd-advisor-researcher.md)`. The injection key
must follow the agent identity (gsd-advisor-researcher), not the
technical spawn type.

This is a follow-up to #2555 — the SDK-side fix in that PR (#2587)
only becomes fully effective once the call sites use the right keys.

Adds `sdk/src/workflow-agent-skills-consistency.test.ts` as a
contract test: every `agent-skills <slug>` invocation in
`get-shit-done/workflows/**/*.md` must reference an existing
`agents/<slug>.md`. Fails loudly on future key typos.

Closes #2615

* test: harden workflow agent-skills regex per review feedback

Review (#2616): CodeRabbit flagged the `agent-skills <slug>` pattern
as too permissive (can match prose mentions of the string) and the
per-line scan as brittle (misses commands wrapped across lines).

- Require full `gsd-sdk query agent-skills` prefix before capture
  + `\b` around the pattern so prose references no longer match.
- Scan each file's full content (not line-by-line) so `\s+` can span
  newlines; resolve 1-based line number from match index.
- Add JSDoc on helpers and on QUERY_KEY_PATTERN.

Verified: RED against base (`f30da83`) produces the same 9 violations
as before; GREEN on fixed tree.

---------

Co-authored-by: forfrossen <forfrossensvart@gmail.com>
2026-04-23 12:40:56 -04:00
Tom Boucher
31569c8cc8 ci: explicit rebase check + fail-fast SDK typecheck in install-smoke (#2631)
* ci: explicit rebase check + fail-fast SDK typecheck in install-smoke

Stale-base regression guard. Root cause: GitHub's `refs/pull/N/merge`
is cached against the PR's recorded merge-base, not current main. When
main advances after a PR is opened, the cache stays stale and CI runs
against the pre-advance tree. PRs hit this whenever a type error lands
on main and gets patched shortly after (e.g. #2611 + #2622) — stale
branches replay the broken intermediate state and report confusing
downstream failures for hours.

Observed failure mode: install-smoke's "Assert gsd-sdk resolves on PATH"
step fires with "installSdkIfNeeded() regression" even when the real
cause is `npm run build` failing in sdk/ due to a TypeScript cast
mismatch already fixed on main.

Fix:
- Explicit `git merge origin/main` step in both `install-smoke.yml` and
  `test.yml`. If the merge conflicts, emit a clear "rebase onto main"
  diagnostic and fail early, rather than let conflicts produce unrelated
  downstream errors.
- Dedicated `npm run build:sdk` typecheck step in install-smoke with a
  remediation hint ("rebase onto main — the error may already be fixed
  on trunk"). Fails fast with the actual tsc output instead of masking
  it behind a PATH assertion.
- Drop the `|| true` on `get-shit-done-cc --claude --local` so installer
  failures surface at the install step with install.js's own error
  message, not at the downstream PATH assertion where the message
  misleadingly blames "shim regression".
- `fetch-depth: 0` on checkout so the merge-base check has history.

* ci: address CodeRabbit — add rebase check to smoke-unpacked, fix fetch flag

Two findings from CodeRabbit's review on #2631:

1. `smoke-unpacked` job was missing the same rebase check applied to the
   `smoke` job. It ran on the cached `refs/pull/N/merge` and could hit
   the same stale-base failure mode the PR was designed to prevent. Added
   the identical rebase-check step.

2. `git fetch origin main --depth=0` is an invalid flag — git rejects it
   with "depth 0 is not a positive number". The intent was "fetch with
   full depth", but the right way is just `git fetch origin main` (no
   --depth). Removed the invalid flag and the `||` fallback that was
   papering over the error.
2026-04-23 12:40:16 -04:00
Tom Boucher
eba0c99698 fix(#2623): resolve parent .planning root for sub_repos workspaces in SDK query dispatch (#2629)
* fix(#2623): resolve parent .planning root for sub_repos workspaces in SDK query dispatch

When `gsd-sdk query` is invoked from inside a `sub_repos`-listed child repo,
`projectDir` defaulted to `process.cwd()` which pointed at the child repo,
not the parent workspace that owns `.planning/`. Handlers then directly
checked `${projectDir}/.planning` and reported `project_exists: false`.

The legacy `gsd-tools.cjs` CLI does not have this gap — it calls
`findProjectRoot(cwd)` from `bin/lib/core.cjs`, which walks up from the
starting directory checking each ancestor's `.planning/config.json` for a
`sub_repos` entry that lists the starting directory's top-level segment.

This change ports that walk-up as a new `findProjectRoot` helper in
`sdk/src/query/helpers.ts` and applies it once in `cli.ts:main()` before
dispatching `query`, `run`, `init`, or `auto`. Resolution is idempotent:
if `projectDir` already owns `.planning/` (including an explicit
`--project-dir` pointing at the workspace root), the helper returns it
unchanged. The walk is capped at 10 parent levels and never crosses
`$HOME`. All filesystem errors are swallowed.

Regression coverage:
- `helpers.test.ts` — 8 unit tests covering own-`.planning` guard (#1362),
  sub_repos match, nested-path match, `planning.sub_repos` shape,
  heuristic fallback, unparseable config, legacy `multiRepo: true`.
- `sub-repos-root.integration.test.ts` — end-to-end baseline (reproduces
  the bug without the walk-up) and fixed behavior (walk-up + dispatch of
  `init.new-milestone` reports `project_exists: true` with the parent
  workspace as `project_root`).

sdk vitest: 1511 pass / 24 fail (all 24 failures pre-existing on main,
baseline is 26 failing — `comm -23` against baseline produces zero new
failures). CJS: 5410 pass / 0 fail.

Closes #2623

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(#2623): remove stray .planing typo from integration test setup

Address CodeRabbit nitpick: the mkdir('.planing') call on line 23 was
dead code from a typo, with errors silently swallowed via .catch(() => {}).
The test already creates '.planning' correctly on the next line.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 11:58:23 -04:00
Tom Boucher
5a8a6fb511 fix(#2256): pass per-agent model overrides through Codex/OpenCode transport (#2628)
The Codex and OpenCode install paths read `model_overrides` only from
`~/.gsd/defaults.json` (global). A per-project override set in
`.planning/config.json` — the reporter's exact setup for
`gsd-codebase-mapper` — was silently dropped, so the child agent inherited
the runtime's default model regardless of `model_overrides`.

Neither runtime has an inline `model` parameter on its spawn API
(Codex `spawn_agent(agent_type, message)`, OpenCode `task(description,
prompt, subagent_type, task_id, command)`), so the per-agent model must
reach the child via the static config GSD writes at install time. That
config was being populated from the wrong source.

Fix: add `readGsdEffectiveModelOverrides(targetDir)` which merges
`~/.gsd/defaults.json` with per-project `.planning/config.json`, with
per-project keys winning on conflict. Both install sites now call it and
walk up from the install root to locate `.planning/` — matching the
precedence `readGsdRuntimeProfileResolver` already uses for #2517.

Also update the Codex Task()->spawn_agent mapping block so it no longer
says "omit" without context: it now documents that per-agent overrides
are embedded in the agent TOML and notes the restriction that Codex
only permits `spawn_agent` when the user explicitly requested sub-agents
(do the work inline otherwise).

Regression tests (`tests/bug-2256-model-overrides-transport.test.cjs`)
cover: global-only, project-only, project-wins-on-conflict, walking up
from a nested `targetDir`, Codex TOML `model =` emission, and OpenCode
frontmatter `model:` emission.

Closes #2256

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 11:58:06 -04:00
Tom Boucher
bdba40cc3d fix(#2618): thread --ws through query dispatch and sync root STATE.md on workstream.set (#2627)
* fix(#2618): thread --ws through query dispatch for state and init handlers

Gap 1 of #2618: the query dispatcher already accepts a workstream via
registry.dispatch(cmd, args, projectDir, ws), but several handlers drop it
before reaching planningPaths() / getMilestoneInfo() / findPhase() — so
stateJson and the init.* handlers return root-scoped results even when --ws
is provided.

Changes:

- sdk/src/query/state.ts: forward workstream into getMilestoneInfo() and
  extractCurrentMilestone() so buildStateFrontmatter resolves milestone data
  from the workstream ROADMAP/STATE instead of the root mirror.
- sdk/src/query/init.ts: thread workstream through initExecutePhase,
  initPlanPhase, initPhaseOp, and getPhaseInfoWithFallback (which fans out
  to findPhase() and roadmapGetPhase()). Also switch hardcoded
  join(projectDir, '.planning') to relPlanningPath(workstream) so returned
  state_path/roadmap_path/config_path reflect the workstream layout.

Regression test: stateJson with --ws workstream reads STATE.md from
.planning/workstreams/<name>/ when workstream is provided.

Closes #2618 (gap 1)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(#2618): sync root .planning/STATE.md mirror on workstream.set

Gap 2 of #2618: setActiveWorkstream only flips the active-workstream
pointer file; the root .planning/STATE.md mirror stays stale. Downstream
consumers (statusline, gsd-sdk query progress, any tool that reads the
root STATE.md) continue to see the previous workstream's state.

After setActiveWorkstream(), copy .planning/workstreams/<name>/STATE.md
verbatim to .planning/STATE.md via writeFileSync. The workstream STATE.md
is authoritative; the root file is a pass-through mirror. Missing source
STATE.md is a no-op rather than an error — a freshly created workstream
with no STATE.md yet should still activate cleanly.

The response now includes `mirror_synced: boolean` so callers can
observe whether the root mirror was updated.

Regression test: workstreamSet root STATE.md mirror sync — switches
from a stale root mirror to a workstream STATE.md with different
frontmatter and asserts the root file now matches.

Closes #2618 (gap 2)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 11:54:34 -04:00
Tom Boucher
df0ab0c0c9 fix(#2410): emit wave + plan checkpoint heartbeats to prevent stream idle timeout (#2626)
/gsd:manager's background execute-phase Task fails with
"Stream idle timeout - partial response received" on multi-plan phases
(Claude Code + Opus 4.7 at ~200K+ cache_read) because the long subagent
never emits tokens fast enough between large tool_results — the SSE layer
times out mid-assistant-turn and the harness retries hit the same TTFT
wall after prompt cache TTL expires.

Root cause: no orchestrator-level activity at wave/plan boundaries.

Fix (maintainer-approved A+B):
- A (wave boundary): execute-phase.md now emits a `[checkpoint]`
  heartbeat before each wave spawns and after each wave completes.
- B (plan boundary): also emit `[checkpoint]` before each Task()
  dispatch and after each executor returns (complete/failed/checkpoint).
  Heartbeats are literal assistant-text lines (no tool call) with a
  monotonic `{P}/{Q} plans done` counter so partial-transcript recovery
  tools can grep progress even when a run dies mid-phase.

Docs: COMMANDS.md /gsd-manager section documents the marker format.
Tests: tests/bug-2410-stream-checkpoint-heartbeats.test.cjs (12 cases)
asserts the heartbeats exist at every boundary and in the right workflow
step. Full suite: 5422 node:test cases pass. Pre-existing vitest
failures on main are unrelated to this change.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 11:54:11 -04:00
Tom Boucher
807db75d55 fix(#2620): detect HOME-relative PATH entries before suggesting absolute export (#2625)
* fix(#2620): detect HOME-relative PATH entries before suggesting absolute export

When the installer reported `gsd-sdk` not on PATH and suggested
appending an absolute `export PATH="/home/user/.npm-global/bin:$PATH"`
line to the user's rc file, a user who had the equivalent
`export PATH="$HOME/.npm-global/bin:$PATH"` already in their shell
profile would get a duplicate entry — the installer only compared the
absolute form.

Add `homePathCoveredByRc(globalBin, homeDir, rcFileNames?)` to
`bin/install.js` and export it for test-mode callers. The helper scans
`~/.zshrc`, `~/.bashrc`, `~/.bash_profile`, `~/.profile`, grepping each
file for `export PATH=` / bare `PATH=` lines and substituting the
common HOME forms (\$HOME, \${HOME}, leading ~/) with the real home
directory before comparing each resolved PATH segment against
globalBin. Trailing slashes are normalised so `.npm-global/bin/`
matches `.npm-global/bin`. Missing / unreadable / malformed rc files
are swallowed — the caller falls back to the existing absolute
suggestion.

Tests cover $HOME, \${HOME}, and ~/ forms, absolute match,
trailing-slash match, commented-out lines, missing rc files, and
unreadable rc files (directory where a file is expected).

Closes #2620

* fix(#2620): skip relative PATH segments in homePathCoveredByRc

CodeRabbit flagged that the helper unconditionally resolved every
non-$-containing segment against homeAbs via path.resolve(homeAbs, …),
which silently turns a bare relative segment like `bin` or
`node_modules/.bin` into `$HOME/bin` / `$HOME/node_modules/.bin`. That
is wrong: bare PATH segments depend on the shell's cwd at lookup time,
not on $HOME — so the helper was returning true for rc files that do
not actually cover globalBin.

Guard the compare with path.isAbsolute(expanded) after HOME expansion.
Only segments that are absolute on their own (or that became absolute
via $HOME / \${HOME} / ~ substitution) are compared against targetAbs.
Relative segments are skipped.

Add two regression tests covering a bare `bin` segment and a nested
`node_modules/.bin` segment; both previously returned true when home
happened to contain a matching subdirectory and now correctly return
false.

Closes #2620 (CodeRabbit follow-up)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(#2620): wire homePathCoveredByRc into installer suggestion path

CodeRabbit flagged that homePathCoveredByRc was added in the previous
commit but never called from the installer, so the user-facing PATH
warning stayed unchanged — users with `export PATH="$HOME/.npm-global/bin:$PATH"`
in their rc would still get a duplicate absolute-path suggestion.

Add `maybeSuggestPathExport(globalBin, homeDir)` that:
- skips silently when globalBin is already on process.env.PATH;
- prints a "try reopening your shell" diagnostic when homePathCoveredByRc
  returns true (the directory IS on PATH via an rc entry — just not in
  the current shell);
- otherwise falls through to the absolute-path
  `echo 'export PATH="…:$PATH"' >> ~/.zshrc` suggestion.

Call it from installSdkIfNeeded after the sdk/dist check succeeds,
resolving globalBin via `npm prefix -g` (plus `/bin` on POSIX). Swallow
any exec failure so the installer keeps working when npm is weird.

Export maybeSuggestPathExport for tests. Add three new regression tests
(installer-flow coverage per CodeRabbit nitpick):
- rc covers globalBin via $HOME form → no absolute suggestion emitted
- rc covers only an unrelated directory → absolute suggestion emitted
- globalBin already on process.env.PATH → no output at all

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 11:53:51 -04:00
Tom Boucher
74da61fb4a fix(#2619): prevent extractCurrentMilestone from truncating on phase-vX.Y headings (#2624)
* fix(#2619): prevent extractCurrentMilestone from truncating on phase-vX.Y headings

extractCurrentMilestone sliced ROADMAP.md to the current milestone by
looking for the next milestone heading with a greedy regex:

    ^#{1,N}\s+(?:.*v\d+\.\d+||📋|🚧)

Any heading that mentioned a version literal matched — including phase
headings like "### Phase 12: v1.0 Tech-Debt Closure". When the current
milestone was at the same heading level as the phases (### 🚧 v1.1 …),
the slice terminated at the first such phase, hiding every phase that
followed from phase.insert, validate.health W007, and other SDK commands.

Fix: add a `(?!Phase\s+\S)` negative lookahead so phase headings can
never be treated as milestone boundaries. Phase headings always start
with the literal `Phase `, so this is a clean exclusion.

Applied to:
- get-shit-done/bin/lib/core.cjs (extractCurrentMilestone)
- sdk/src/query/roadmap.ts (extractCurrentMilestone + extractNextMilestoneSection)

Regression tests:
- tests/roadmap-phase-fallback.test.cjs: extractCurrentMilestone does not
  truncate on phase heading containing vX.Y (#2619)
- sdk/src/query/roadmap.test.ts: extractCurrentMilestone bug-2619: does
  not truncate at a phase heading containing vX.Y

Closes #2619

* fix(#2619): make milestone-boundary Phase lookahead case-insensitive

CodeRabbit follow-up on #2619: the negative lookahead `(?!Phase\s+\S)`
in the SDK milestone-boundary regex was case-sensitive, so headings like
`### PHASE 12: v1.0 Tech-Debt` or `### phase 12: …` still truncated the
milestone slice. Add the `i` flag (now `gmi`).

The sibling CJS regex in get-shit-done/bin/lib/core.cjs already uses the
`mi` flag, so it is already case-insensitive; added a regression test to
lock that in.

- sdk/src/query/roadmap.ts: change flags from `gm` → `gmi`
- sdk/src/query/roadmap.test.ts: add PHASE/phase regression test
- tests/roadmap-phase-fallback.test.cjs: add PHASE/phase regression test

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 11:53:20 -04:00
Jeremy McSpadden
0a049149e1 fix(sdk): decouple from build-from-source install, close #2441 #2453 (#2457)
* fix(sdk): decouple SDK from build-from-source install path, close #2441 and #2453

Ship sdk/dist prebuilt in the tarball and replace the npm-install-g
sub-install with a parent-package bin shim (bin/gsd-sdk.js). npm chmods
bin entries from a packed tarball correctly, eliminating the mode-644
failure (#2453) and the full class of NPM_CONFIG_PREFIX/ignore-scripts/
corepack/air-gapped failure modes that caused #2439 and #2441.

Changes:
- sdk/package.json: prepublishOnly runs `rm -rf dist && tsc && chmod +x
  dist/cli.js` (stale-build guard + execute-bit fix at publish time)
- package.json: add "gsd-sdk": "bin/gsd-sdk.js" bin entry; add sdk/dist
  to files so the prebuilt CLI ships in the tarball
- bin/gsd-sdk.js: new back-compat shim — resolves sdk/dist/cli.js relative
  to the package root and delegates via `node`, so all existing PATH call
  sites (slash commands, agents, hooks) continue to work unchanged (S1 shim)
- bin/install.js: replace installSdkIfNeeded() build-from-source + global-
  install dance with a dist-verify + chmod-in-place guard; delete
  resolveGsdSdk(), detectShellRc(), emitSdkFatal() helpers now unused
- .github/workflows/install-smoke.yml: add smoke-unpacked job that strips
  execute bit from sdk/dist/cli.js before install to reproduce the exact
  #2453 failure mode
- tests/bug-2441-sdk-decouple.test.cjs: new regression tests asserting all
  invariants (no npm install -g from sdk/, shim exists, sdk/dist in files,
  prepublishOnly has rm -rf + chmod)
- tests/bugs-1656-1657.test.cjs: update stale assertions that required
  build-from-source behavior (now asserts new prebuilt-dist invariants)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore(release): bump to 1.38.2, wire release.yml to build SDK dist

- Bump version 1.38.1 -> 1.38.2 for the #2441/#2453 fix shipped in 0f6903d.
- Add `build:sdk` script (`cd sdk && npm ci && npm run build`).
- `prepublishOnly` now runs hooks + SDK builds as a safety net.
- release.yml (rc + finalize): build SDK dist before `npm publish` so the
  published tarball always ships fresh `sdk/dist/` (kept gitignored).
- CHANGELOG: document 1.38.2 entry and `--sdk` flag semantics change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: build SDK dist before tests and smoke jobs

sdk/dist/ is gitignored (built fresh at publish time via release.yml),
but both the test suite and install-smoke jobs run `bin/install.js`
or `npm pack` against the checked-out tree where dist doesn't exist yet.

- test.yml: `npm run build:sdk` before `npm run test:coverage`, so tests
  that spawn `bin/install.js` don't hit `installSdkIfNeeded()`'s fatal
  missing-dist check.
- install-smoke.yml (both smoke and smoke-unpacked): build SDK before
  pack/chmod so the published tarball contains dist and the unpacked
  install has a file to strip exec-bit from.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(sdk): lift SDK runtime deps to parent so tarball install can resolve them

The SDK's runtime deps (ws, @anthropic-ai/claude-agent-sdk) live in
sdk/package.json, but sdk/node_modules is NOT shipped in the parent
tarball — only sdk/dist, sdk/src, sdk/prompts, and sdk/package.json are.
When a user runs `npm install -g get-shit-done-cc`, npm installs the
parent's node_modules but never runs `npm install` inside the nested
sdk/ directory.

Result: `node sdk/dist/cli.js` fails with ERR_MODULE_NOT_FOUND for 'ws'.
The smoke tarball job caught this; the unpacked variant masked it
because `npm install -g <dir>` copies the entire workspace including
sdk/node_modules (left over from `npm run build:sdk`).

Fix: declare the same deps in the parent package.json so they land in
<pkg>/node_modules, which Node's resolution walks up to from
<pkg>/sdk/dist/cli.js. Keep them declared in sdk/package.json too so
the SDK remains a self-contained package for standalone dev.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(lockfile): regenerate package-lock.json cleanly

The previous `npm install` run left the lockfile internally inconsistent
(resolved esbuild@0.27.7 referenced but not fully written), causing
`npm ci` to fail in CI with "Missing from lock file" errors.

Clean regen via rm + npm install fixes all three failed jobs
(test, smoke, smoke-unpacked), which were all hitting the same
`npm ci` sync check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(deps): remove unused esbuild + vitest from root devDependencies

Both were declared but never imported anywhere in the root package
(confirmed via grep of bin/, scripts/, tests/). They lived in sdk/
already, which is the only place they're actually used.

The transitive tree they pulled in (vitest → vite → esbuild 0.28 →
@esbuild/openharmony-arm64) was the root of the CI npm ci failures:
the openharmony platform package's `optional: true` flag was not being
applied correctly by npm 10 on Linux runners, causing EBADPLATFORM.

After removal: 800+ transitive packages → 155. Lockfile regenerated
cleanly. All 4170 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(sdk): pretest:coverage builds sdk; tighten shim test assertions

Add "pretest:coverage": "npm run build:sdk" so npm run test:coverage
works in clean checkouts where sdk/dist/ hasn't been built yet.

Tighten the two loose shim assertions in bug-2441-sdk-decouple.test.cjs:
- forwards-to test now asserts path.resolve() is called with the
  'sdk','dist','cli.js' path segments, not just substring presence
- node-invocation test now asserts spawnSync(process.execPath, [...])
  pattern, ruling out matches in comments or the shebang line

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: address PR review — pretest:coverage + tighten shim tests

Review feedback from trek-e on PR 2457:

1. pretest:coverage + pretest hooks now run `npm run build:sdk` so
   `npm run test[:coverage]` in a clean checkout produces the required
   sdk/dist/ artifacts before running the installer-dependent tests.
   CI already does this explicitly; local contributors benefit.

2. Shim tests in bug-2441-sdk-decouple.test.cjs tightened from loose
   substring matches (which would pass on comments/shebangs alone) to
   regex assertions on the actual path.resolve call, spawnSync with
   process.execPath, process.argv.slice(2), and process.exit pattern.
   These now provide real regression protection for #2453-class bugs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: correct CHANGELOG entry and add [1.38.2] reference link

Two issues in the 1.38.2 CHANGELOG entry:
- installSdkIfNeeded() was described as deleted but it still exists in
  bin/install.js (repurposed to verify sdk/dist/cli.js and fix execute bit).
  Corrected the description to say 'repurposes' rather than 'deletes'.
- The reference-link block at the bottom of the file was missing a [1.38.2]
  compare URL and [Unreleased] still pointed to v1.37.1...HEAD. Added the
  [1.38.2] link and updated [Unreleased] to compare/v1.38.2...HEAD.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(sdk): double-cast WorkflowConfig to Record for strict tsc build

TypeScript error on main (introduced in #2611) blocks `npm run build`
in sdk/, which now runs as part of this PR's tarball build path. Apply
the double-cast via `unknown` as the compiler suggests.

Same fix as #2622; can be dropped if that lands first.

* test: remove bug-2598 test obsoleted by SDK decoupling

The bug-2598 test guards the Windows CVE-2024-27980 fix in the old
build-from-source path (npm spawnSync with shell:true + formatSpawnFailure
diagnostics). This PR removes that entire code path — installSdkIfNeeded
no longer spawns npm, it just verifies the prebuilt sdk/dist/cli.js
shipped in the tarball.

The test asserts `installSdkIfNeeded.toString()` contains a
formatSpawnFailure helper. After decoupling, no such helper exists
(nothing to format — there's no spawn). Keeping the test would assert
invariants of the rejected architecture.

The original #2598 defect (silent failure of npm spawn on Windows) is
structurally impossible in the shim path: bin/gsd-sdk.js invokes
`node sdk/dist/cli.js` directly via child_process.spawn with an
explicit argv array. No .cmd wrapper, no shell delegation.

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Tom Boucher <trekkie@nomorestars.com>
2026-04-23 08:36:03 -04:00
Tom Boucher
a56707a07b fix(#2613): preserve STATE.md frontmatter on write path (option 2) (#2622)
* fix(#2613): preserve STATE.md frontmatter on write path (option 2)

`readModifyWriteStateMd` strips frontmatter before invoking the modifier,
so `syncStateFrontmatter` received body-only content and `existingFm`
was always `{}`. The preservation branch never fired, and every mutation
re-derived `status` (to `'unknown'` when body had no `Status:` line) and
`progress.*` (to 0/0 when the shipped milestone's phase directories were
archived), silently overwriting authoritative frontmatter values.

Option 2 — write-side analogue of #2495 READ fix: `buildStateFrontmatter`
reads the current STATE.md frontmatter from disk as a preservation
backstop. Status preserved when derived is `'unknown'` and existing is
non-unknown. Progress preserved when disk scan returns all zeros AND
existing has non-zero counts. Legitimate body-driven status changes and
non-zero disk counts still win.

Milestone/milestone_name already preserved via `getMilestoneInfo`'s
#2495 fix — regression test added to lock that in.

Adds 5 regression tests covering status preservation, progress
preservation, milestone preservation, legitimate status updates, and
disk-scan-wins-when-non-zero.

Closes #2613

* fix(sdk): double-cast WorkflowConfig to Record in loadGateConfig

TypeScript error on main (introduced in #2611) blocks the install-smoke
CI job: `WorkflowConfig` has no string index signature, so the direct
cast to `Record<string, unknown>` fails type-check. The SDK build fails,
`installSdkIfNeeded()` cannot install `gsd-sdk` from source, and the
smoke job reports a false-positive installer regression.

  src/query/check-decision-coverage.ts(236,16): error TS2352:
  Conversion of type 'WorkflowConfig' to type 'Record<string, unknown>'
  may be a mistake because neither type sufficiently overlaps with the
  other.

Apply the double-cast via `unknown` as the compiler suggests. Behavior
is unchanged — this was already a cast.
2026-04-23 08:22:42 -04:00
Tom Boucher
f30da8326a feat: add gates ensuring discuss-phase decisions are translated to plans and verified (closes #2492) (#2611)
* feat(#2492): add gates ensuring discuss-phase decisions are translated and verified

Two gates close the loop between CONTEXT.md `<decisions>` and downstream
work, fixing #2492:

- Plan-phase **translation gate** (BLOCKING). After requirements
  coverage, refuses to mark a phase planned when a trackable decision
  is not cited (by id `D-NN` or by 6+-word phrase) in any plan's
  `must_haves`, `truths`, or body. Failure message names each missed
  decision with id, category, text, and remediation paths.

- Verify-phase **validation gate** (NON-BLOCKING). Searches plans,
  SUMMARY.md, files modified, and recent commit subjects for each
  trackable decision. Misses are written to VERIFICATION.md as a
  warning section but do not change verification status. Asymmetry is
  deliberate — fuzzy-match miss should not fail an otherwise green
  phase.

Shared helper `parseDecisions()` lives in `sdk/src/query/decisions.ts`
so #2493 can consume the same parser.

Decisions opt out of both gates via `### Claude's Discretion` heading
or `[informational]` / `[folded]` / `[deferred]` tags.

Both gates skip silently when `workflow.context_coverage_gate=false`
(default `true`).

Closes #2492

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(#2492): make plan-phase decision gate actually block (review F1, F8, F9, F10, F15)

- F1: replace `${context_path}` with `${CONTEXT_PATH}` in the plan-phase
  gate snippet so the BLOCKING gate receives a non-empty path. The
  variable was defined in Step 4 (`CONTEXT_PATH=$(_gsd_field "$INIT" ...)`)
  and the gate snippet referenced the lowercase form, leaving the gate to
  run with an empty path argument and silently skip.
- F15: wrap the SDK call with `jq -e '.data.passed == true' || exit 1` so
  failure halts the workflow instead of being printed and ignored. The
  verify-phase counterpart deliberately keeps no exit-1 (non-blocking by
  design) and now carries an inline note documenting the asymmetry.
- F10: tag the JSON example fence as `json` and the options-list fence as
  `text` (MD040).
- F8/F9: anchor the heading-presence test regexes to `^## 13[a-z]?\\.` so
  prose substrings like "Requirements Coverage Gate" mentioned in body
  text cannot satisfy the assertion. Added two new regression tests
  (variable-name match, exit-1 guard) so a future revert is caught.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(#2492): tighten decision-coverage gates against false positives and config drift (review F3,F4,F5,F6,F7,F16,F18,F19)

- F3: forward `workstream` arg through both gate handlers so workstream-scoped
  `workflow.context_coverage_gate=false` actually skips. Added negative test
  that creates a workstream config disabling the gate while the root config
  has it enabled and asserts the workstream call is skipped.
- F4: restrict the plan-phase haystack to designated sections — front-matter
  `must_haves` / `truths` / `objective` plus body sections under headings
  matching `must_haves|truths|tasks|objective`. HTML comments and fenced
  code blocks are stripped before extraction so a commented-out citation or
  a literal example never counts as coverage. Verify-phase keeps the broader
  artifact-wide haystack by design (non-blocking).
- F5: reject decisions with fewer than 6 normalized words from soft-matching
  (previously only rejected when the resulting phrase was under 12 chars
  AFTER slicing — too lenient). Short decisions now require an explicit
  `D-NN` citation, with regression tests for the boundary.
- F6: walk every `*-SUMMARY.md` independently and use `matchAll` with the
  `/g` flag so multiple `files_modified:` blocks across multiple summaries
  are all aggregated. Previously only the first block in the concatenated
  string was parsed, silently dropping later plans' files.
- F7: validate every `files_modified` path stays inside `projectDir` after
  resolution (rejects absolute paths, `../` traversal). Cap each file read
  at 256 KB. Skipped paths emit a stderr warning naming the entry.
- F16: validate `workflow.context_coverage_gate` is boolean in
  `loadGateConfig`; warn loudly on numeric or other-shaped values and
  default to ON. Mirrors the schema-vs-loadConfig validation gap from
  #2609.
- F18: bump verify-phase `git log -n` cap from 50 to 200 so longer-running
  phases are not undercounted. Documented as a precision-vs-recall tradeoff
  appropriate for a non-blocking gate.
- F19: tighten `QueryResult` / `QueryHandler` to be parameterized
  (`<T = unknown>`). Drops the `as unknown as Record<string, unknown>`
  casts in the gate handlers and surfaces shape mismatches at compile time
  for callers that pass a typed `data` value.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(#2492): harden decisions parser and verify-phase glob (review F11,F12,F13,F14,F17,F20)

- F11: strip fenced code blocks from CONTEXT.md before searching for
  `<decisions>` so an example block inside ``` ``` is not mis-parsed.
- F12: accept tab-indented continuation lines (previously required a leading
  space) so decisions split with `\t` continue cleanly.
- F13: parse EVERY `<decisions>` block in the file via `matchAll`, not just
  the first. CONTEXT.md may legitimately carry more than one block.
- F14: `decisions.parse` handler now resolves a relative path against
  `projectDir` — symmetric with the gate handlers — and still accepts
  absolute paths.
- F17: replace `ls "${PHASE_DIR}"/*-CONTEXT.md | head -1` in verify-phase.md
  with a glob loop (ShellCheck SC2012 fix). Also avoids spawning an extra
  subprocess and survives filenames with whitespace.
- F20: extend the unicode quote-stripping in the discretion-heading match
  to cover U+2018/2019/201A/201B and the U+201C-F double-quote variants
  plus backtick, so any rendering of "Claude's Discretion" collapses to
  the same key.

Each fix has a regression test in `decisions.test.ts`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 00:26:53 -04:00
Tom Boucher
1a3d953767 feat: add unified post-planning gap checker (closes #2493) (#2610)
* feat: add unified post-planning gap checker (closes #2493)

Adds a unified post-planning gap checker as Step 13e of plan-phase.md.
After all plans are generated and committed, scans REQUIREMENTS.md and
CONTEXT.md <decisions> against every PLAN.md in the phase directory and
emits a single Source | Item | Status table.

Why
- The existing Requirements Coverage Gate (§13) blocks/re-plans on REQ
  gaps but emits two separate per-source signals. Issue #2493 asks for
  one unified report after planning so that requirements AND
  discuss-phase decisions slipping through are surfaced in one place
  before execution starts.

What
- New workflow.post_planning_gaps boolean config key, default true,
  added to VALID_CONFIG_KEYS, CONFIG_DEFAULTS, hardcoded.workflow, and
  cmdConfigSet (boolean validation).
- New get-shit-done/bin/lib/decisions.cjs — shared parser for
  CONTEXT.md <decisions> blocks (D-NN entries). Designed for reuse by
  the related #2492 plan/verify decision gates.
- New get-shit-done/bin/lib/gap-checker.cjs — parses REQUIREMENTS.md
  (checkbox + traceability table forms), reads CONTEXT.md decisions,
  walks PHASE_DIR/*-PLAN.md, runs word-boundary coverage detection
  (REQ-1 must not match REQ-10), formats a sorted report.
- New gsd-tools gap-analysis CLI command wired through gsd-tools.cjs.
- workflows/plan-phase.md gains §13e between §13d (commit plans) and
  §14 (Present Final Status). Existing §13 gate preserved — §13e is
  additive and non-blocking.
- sdk/prompts/workflows/plan-phase.md gets an equivalent
  post_planning_gaps step for headless mode.
- Docs: CONFIGURATION.md, references/planning-config.md, INVENTORY.md,
  INVENTORY-MANIFEST.json all updated.

Tests
- tests/post-planning-gaps-2493.test.cjs: 30 test cases covering step
  insertion position, decisions parser, gap detector behavior
  (covered/not-covered, false-positive guard, missing-file
  resilience, malformed-input resilience, gate on/off, deterministic
  natural sort), and full config integration.
- Full suite: 5234 / 5234 pass.

Design decisions
- Numbered §13e (sub-step), not §14 — §14 already exists (Present
  Final Status); inserting before it preserves downstream auto-advance
  step numbers.
- Existing §13 gate kept, not replaced — §13 blocks/re-plans on
  REQ gaps; §13e is the unified post-hoc report. Per spec: "default
  behavior MUST be backward compatible."
- Word-boundary ID matching avoids REQ-1 matching REQ-10 and avoids
  brittle semantic/substring matching.
- Shared decisions.cjs parser so #2492 can reuse the same regex.
- Natural-sort keys (REQ-02 before REQ-10) for deterministic output.
- Boolean validation in cmdConfigSet rejects non-boolean values
  matches the precedent set by drift_threshold/drift_action.

Closes #2493

* fix(#2493): expose post_planning_gaps in loadConfig() + sync schema example

Address CodeRabbit review on PR #2610:

- core.cjs loadConfig(): return post_planning_gaps from both the
  config.json branch and the global ~/.gsd/defaults.json fallback so
  callers can rely on config.post_planning_gaps regardless of whether
  the key is present (comment 3127977404, Major).
- docs/CONFIGURATION.md: add workflow.post_planning_gaps to the Full
  Schema JSON example so copy/paste users see the new toggle alongside
  security_block_on (comment 3127977392, Minor).
- tests/post-planning-gaps-2493.test.cjs: regression coverage for
  loadConfig() — default true when key absent, honors explicit
  true/false from workflow.post_planning_gaps.
2026-04-22 23:03:59 -04:00
Tom Boucher
cc17886c51 feat: make model profiles runtime-aware for Codex/non-Claude runtimes (closes #2517) (#2609)
* feat: make model profiles runtime-aware for Codex/non-Claude runtimes (closes #2517)

Adds an optional top-level `runtime` config key plus a
`model_profile_overrides[runtime][tier]` map. When `runtime` is set,
profile tiers (opus/sonnet/haiku) resolve to runtime-native model IDs
(and reasoning_effort where supported) instead of bare Claude aliases.

Codex defaults from the spec:
  opus   -> gpt-5.4        reasoning_effort: xhigh
  sonnet -> gpt-5.3-codex  reasoning_effort: medium
  haiku  -> gpt-5.4-mini   reasoning_effort: medium

Claude defaults mirror MODEL_ALIAS_MAP. Unknown runtimes fall back to
the Claude-alias safe default rather than emit IDs the runtime cannot
accept. reasoning_effort is only emitted into Codex install paths;
never returned from resolveModelInternal and never written to Claude
agent frontmatter.

Backwards compatible: any user without `runtime` set sees identical
behavior — the new branch is gated on `config.runtime != null`.

Precedence (highest to lowest):
  1. per-agent model_overrides
  2. runtime-aware tier resolution (when `runtime` is set)
  3. resolve_model_ids: "omit"
  4. Claude-native default
  5. inherit (literal passthrough)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(#2517): address adversarial review of #2609 (findings 1-16)

Addresses all 16 findings from the adversarial review of PR #2609.
Each finding is enumerated below with its resolution.

CRITICAL
- F1: readGsdRuntimeProfileResolver(targetDir) now probes per-project
  .planning/config.json AND ~/.gsd/defaults.json with per-project winning,
  so the PR's headline claim ("set runtime in project config and Codex
  TOML emit picks it up") actually holds end-to-end.
- F2: resolveTierEntry field-merges user overrides with built-in defaults.
  The CONFIGURATION.md string-shorthand example
    `{ codex: { opus: "gpt-5-pro" } }`
  now keeps reasoning_effort from the built-in entry. Partial-object
  overrides like `{ opus: { reasoning_effort: 'low' } }` keep the
  built-in model. Both paths regression-tested.

MAJOR
- F3: resolveReasoningEffortInternal gates strictly on the
  RUNTIMES_WITH_REASONING_EFFORT allowlist regardless of override
  presence. Override + unknown-runtime no longer leaks reasoning_effort.
- F4: runtime:"claude" is now a no-op for resolution (it is the implicit
  default). It no longer hijacks resolve_model_ids:"omit". Existing
  tests for `runtime:"claude"` returning Claude IDs were rewritten to
  reflect the no-op semantics; new test asserts the omit case returns "".
- F5: _readGsdConfigFile in install.js writes a stderr warning on JSON
  parse failure instead of silently returning null. Read failure and
  parse failure are warned separately. Library require is hoisted to top
  of install.js so it is not co-mingled with config-read failure modes.
- F6: install.js requires for core.cjs / model-profiles.cjs are hoisted
  to the top of the file with __dirname-based absolute paths so global
  npm install works regardless of cwd. Test asserts both lib paths exist
  relative to install.js __dirname.
- F7: docs/CONFIGURATION.md `runtime` row no longer lists `opencode` as
  a valid runtime — install-path emission for non-Codex runtimes is
  explicitly out of scope per #2517 / #2612, and the doc now points at
  #2612 for the follow-on work. resolveModelInternal still accepts any
  runtime string (back-compat) and falls back safely for unknown values.
- F8: Tests now isolate HOME (and GSD_HOME) to a per-test tmpdir so the
  developer's real ~/.gsd/defaults.json cannot bleed into assertions.
  Same pattern CodeRabbit caught on PRs #2603 / #2604.
- F9: `runtime` and `model_profile_overrides` documented as flat-only
  in core.cjs comments — not routed through `get()` because they are
  top-level keys per docs/CONFIGURATION.md and introducing nested
  resolution for two new keys was not worth the edge-case surface.
- F10/F13: loadConfig now invokes _warnUnknownProfileOverrides on the
  raw parsed config so direct .planning/config.json edits surface
  unknown runtime values (e.g. typo `runtime: "codx"`) and unknown
  tier values (e.g. `model_profile_overrides.codex.banana`) at read
  time. Warnings only — preserves back-compat for runtimes added
  later. Per-process warning cache prevents log spam across repeated
  loadConfig calls.

MINOR / NIT
- F11: Removed dead `tier || 'sonnet'` defensive shortcut. The local
  is now `const alias = tier;` with a comment explaining why `tier`
  is guaranteed truthy at that point (every MODEL_PROFILES entry
  defines `balanced`, the fallback profile).
- F12: Extracted resolveTierEntry() in core.cjs as the single source
  of truth for runtime-aware tier resolution. core.cjs and bin/install.js
  both consume it — no duplicated lookup logic between the two files.
- F14: Added regression tests for findings #1, #2, #3, #4, #6, #10, #13
  in tests/issue-2517-runtime-aware-profiles.test.cjs. Each must-fix
  path has a corresponding test that fails against the pre-fix code
  and passes against the post-fix code.
- F15: docs/CONFIGURATION.md `model_profile` row cross-references
  #1713 / #1806 next to the `adaptive` enum value.
- F16: RUNTIME_PROFILE_MAP remains in core.cjs as the single source of
  truth; install.js imports it through the exported resolveTierEntry
  helper rather than carrying its own copy. Doc files (CONFIGURATION.md,
  USER-GUIDE.md, settings.md) intentionally still embed the IDs as text
  — code comment in core.cjs flags that those doc files must be updated
  whenever the constant changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 23:00:37 -04:00
Tom Boucher
41dc475c46 refactor(workflows): extract discuss-phase modes/templates/advisor for progressive disclosure (closes #2551) (#2607)
* refactor(workflows): extract discuss-phase modes/templates/advisor for progressive disclosure (closes #2551)

Splits 1,347-line workflows/discuss-phase.md into a 495-line dispatcher plus
per-mode files in workflows/discuss-phase/modes/ and templates in
workflows/discuss-phase/templates/. Mirrors the progressive-disclosure
pattern that #2361 enforced for agents.

- Per-mode files: power, all, auto, chain, text, batch, analyze, default, advisor
- Templates lazy-loaded at the step that produces the artifact (CONTEXT.md
  template at write_context, DISCUSSION-LOG.md template at git_commit,
  checkpoint.json schema when checkpointing)
- Advisor mode gated behind `[ -f $HOME/.claude/get-shit-done/USER-PROFILE.md ]`
  — inverse of #2174's --advisor flag (don't pay the cost when unused)
- scout_codebase phase-type→map selection table extracted to
  references/scout-codebase.md
- New tests/workflow-size-budget.test.cjs enforces tiered budgets across
  all workflows/*.md (XL=1700 / LARGE=1500 / DEFAULT=1000) plus the
  explicit <500 ceiling for discuss-phase.md per #2551
- Existing tests updated to read from the new file locations after the
  split (functional equivalence preserved — content moved, not removed)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(#2607): align modes/auto.md check_existing with parent (Update it, not Skip)

CodeRabbit flagged drift between the parent step (which auto-selects "Update
it") and modes/auto.md (which documented "Skip"). The pre-refactor file had
both — line 182 said "Skip" in the overview, line 250 said "Update it" in the
actual step. The step is authoritative. Fix the new mode file to match.

Refs: PR #2607 review comment 3127783430

* test(#2607): harden discuss-phase regression tests after #2551 split

CodeRabbit identified four test smells where the split weakened coverage:

- workflow-size-budget: assertion was unreachable (entered if-block on match,
  then asserted occurrences === 0 — always failed). Now unconditional.
- bug-2549-2550-2552: bounded-read assertion checked concatenated source, so
  src.includes('3') was satisfied by unrelated content in scout-codebase.md
  (e.g., "3-5 most relevant files"). Now reads parent only with a stricter
  regex. Also asserts SCOUT_REF exists.
- chain-flag-plan-phase: filter(existsSync) silently skipped a missing
  modes/chain.md. Now fails loudly via explicit asserts.
- discuss-checkpoint: same silent-filter pattern across three sources. Now
  asserts each required path before reading.

Refs: PR #2607 review comments 3127783457, 3127783452, plus nitpicks for
chain-flag-plan-phase.test.cjs:21-24 and discuss-checkpoint.test.cjs:22-27

* docs(#2607): fix INVENTORY count, context.md placeholders, scout grep portability

- INVENTORY.md: subdirectory note said "50 top-level references" but the
  section header now says 51. Updated to 51.
- templates/context.md: footer hardcoded XX-name instead of declared
  placeholders [X]/[Name], which would leak sample text into generated
  CONTEXT.md files. Now uses the declared placeholders.
- references/scout-codebase.md: no-maps fallback used grep -rl with
  "\\|" alternation (GNU grep only — silent on BSD/macOS grep). Switched
  to grep -rlE with extended regex for portability.

Refs: PR #2607 review comments 3127783404, 3127783448, plus nitpick for
scout-codebase.md:32-40

* docs(#2607): label fenced examples + clarify overlay/advisor precedence

- analyze.md / text.md / default.md: add language tags (markdown/text) to
  fenced example blocks to silence markdownlint MD040 warnings flagged by
  CodeRabbit (one fence in analyze.md, two in text.md, five in default.md).
- discuss-phase.md: document overlay stacking rules in discuss_areas — fixed
  outer→inner order --analyze → --batch → --text, with a pointer to each
  overlay file for mode-specific precedence.
- advisor.md: add tie-breaker rules for NON_TECHNICAL_OWNER signals — explicit
  technical_background overrides inferred signals; otherwise OR-aggregate;
  contradictory explanation_depth values resolve by most-recent-wins.

Refs: PR #2607 review comments 3127783415, 3127783437, plus nitpicks for
default.md:24, discuss-phase.md:345-365, and advisor.md:51-56

* fix(#2607): extract codebase_drift_gate body to keep execute-phase under XL budget

PR #2605 added 80 lines to execute-phase.md (1622 -> 1702), pushing it over
the XL_BUDGET=1700 line cap enforced by tests/workflow-size-budget.test.cjs
(introduced by this PR). Per the test's own remediation hint and #2551's
progressive-disclosure pattern, extract the codebase_drift_gate step body to
get-shit-done/workflows/execute-phase/steps/codebase-drift-gate.md and leave
a brief pointer in the workflow. execute-phase.md is now 1633 lines.

Budget is NOT relaxed; the offending workflow is tightened.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 21:57:24 -04:00
Tom Boucher
220da8e487 feat: /gsd-settings-integrations — configure third-party search and review integrations (closes #2529) (#2604)
* feat(#2529): /gsd-settings-integrations — third-party integrations command

Adds /gsd-settings-integrations for configuring API keys, code-review CLI
routing, and agent-skill injection. Distinct from /gsd-settings (workflow
toggles) because these are connectivity, not pipeline shape.

Three sections:
- Search Integrations: brave_search / firecrawl / exa_search API keys,
  plus search_gitignored toggle.
- Code Review CLI Routing: review.models.{claude,codex,gemini,opencode}
  shell-command strings.
- Agent Skills Injection: agent_skills.<agent-type> free-text input,
  validated against [a-zA-Z0-9_-]+.

Security:
- New secrets.cjs module with ****<last-4> masking convention.
- cmdConfigSet now masks value/previousValue in CLI output for secret keys.
- Plaintext is written only to .planning/config.json; never echoed to
  stdout/stderr, never written to audit/log files by this flow.
- Slug validators reject path separators, whitespace, shell metacharacters.

Tests (tests/settings-integrations.test.cjs — 25 cases):
- Artifact presence / frontmatter.
- Field round-trips via gsd-tools config-set for all four search keys,
  review.models.<cli>, agent_skills.<agent-type>.
- Config-merge safety: unrelated keys preserved across writes.
- Masking: config-set output never contains plaintext sentinel.
- Logging containment: plaintext secret sentinel appears only in
  config.json under .planning/, nowhere else on disk.
- Negative: path-traversal, shell-metachar, and empty-slug rejected.
- /gsd:settings workflow mentions /gsd:settings-integrations.

Docs:
- docs/COMMANDS.md: new command entry with security note.
- docs/CONFIGURATION.md: integration settings section (keys, routing,
  skills injection) with masking documentation.
- docs/CLI-TOOLS.md: reviewer CLI routing and secret-handling sections.
- docs/INVENTORY.md + INVENTORY-MANIFEST.json regenerated.

Closes #2529

* fix(#2529): mask secrets in config-get; address CodeRabbit review

cmdConfigGet was emitting plaintext for brave_search/firecrawl/exa_search.
Apply the same isSecretKey/maskSecret treatment used by config-set so the
CLI surface never echoes raw API keys; plaintext still lives only in
config.json on disk.

Also addresses CodeRabbit review items in the same PR area:
- #3127146188: config-get plaintext leak (root fix above)
- #3127146211: rename test sentinels to concat-built markers so secret
  scanners stop flagging the test file. Behavior preserved.
- #3127146207: add explicit 'text' language to fenced code blocks (MD040).
- nitpick: unify masked-value wording in read_current legend
  ('****<last-4>' instead of '**** already set').
- nitpick: extend round-trip test to cover search_gitignored toggle.

New regression test 'config-get masks secrets and never echoes plaintext'
verifies the fix for all three secret keys.

* docs(#2529): bump INVENTORY counts post-rebase (commands 84→85, workflows 82→83)

* fix(test): bump CLI Modules count 27→28 after rebase onto main (CI #24811455435)

PR #2604 was rebased onto main before #2605 (drift.cjs) merged. The
pull_request CI runs against the merge ref (refs/pull/2604/merge),
which now contains 28 .cjs files in get-shit-done/bin/lib/, but
docs/INVENTORY.md headline still said "(27 shipped)".

inventory-counts.test.cjs failed with:
  AssertionError: docs/INVENTORY.md "CLI Modules (27 shipped)" disagrees
  with get-shit-done/bin/lib/ file count (28)

Rebased branch onto current origin/main (picks up drift.cjs row, which
was already added by #2605) and bumped the headline to 28.

Full suite: 5200/5200 pass.
2026-04-22 21:41:00 -04:00
Tom Boucher
c90081176d fix(#2598): pass shell: true to npm spawnSync on Windows (#2600)
* fix(#2598): pass shell: true to npm spawnSync on Windows

Since Node's CVE-2024-27980 fix (>= 18.20.2 / >= 20.12.2 / >= 21.7.3),
spawnSync refuses to launch .cmd/.bat files on Windows without
`shell: true`. installSdkIfNeeded picks npmCmd='npm.cmd' on win32 and
then calls spawnSync five times — every one returns
{ status: null, error: EINVAL } before npm ever runs. The installer
checks `status !== 0`, trips the failure path, and emits a bare
"Failed to `npm install` in sdk/." with zero diagnostic output because
`stdio: 'inherit'` never had a child to stream.

Every fresh install on Windows has failed at the SDK build step on any
supported Node version for the life of the post-CVE bin/install.js.

Introduce a local `spawnNpm(args, opts)` helper inside
installSdkIfNeeded that injects `shell: process.platform === 'win32'`
when the caller doesn't override it. Route all five npm invocations
through it: `npm install`, `npm run build`, `npm install -g .`, and
both `npm config get prefix` calls.

Adds a static regression test that parses installSdkIfNeeded and
asserts no bare `spawnSync(npmCmd, ...)` remains, a shell-aware
wrapper exists, and at least five invocations go through it.

Closes #2598

* fix(#2598): surface spawnSync diagnostics in SDK install fatal paths

Thread result.error / result.signal / result.status into emitSdkFatal for
the three npm failure branches (install, run build, install -g .) via a
formatSpawnFailure helper. The root cause of #2598 went silent precisely
because `{ status: null, error: EINVAL }` was reduced to a generic
"Failed to `npm install` in sdk/." with no diagnostic — stdio: 'inherit'
had no child process to stream and result.error was swallowed. Any future
regression in the same area (EINVAL, ENOENT, signal termination) now
prints its real cause in the red fatal banner.

Also strengthen the regression test so it cannot pass with only four
real npm call sites: the previous `spawnSync(npmCmd, ..., shell)` regex
double-counted the spawnNpm helper's own body when a helper existed.
Separate arrow-form vs function-form helper detection and exclude the
wrapper body from explicitShellNpm so the `>= 5` assertion reflects real
invocations only. Add a new test that asserts all three fatal branches
now reference formatSpawnFailure / result.error / signal / status.

Addresses CodeRabbit review comments on PR #2600:
- r3126987409 (bin/install.js): surface underlying spawnSync failure
- r3126987419 (test): explicitShellNpm overcounts by one via helper def
2026-04-22 21:23:44 -04:00
Tom Boucher
1a694fcac3 feat: auto-remap codebase after significant phase execution (closes #2003) (#2605)
* feat: auto-remap codebase after significant phase execution (#2003)

Adds a post-phase structural drift detector that compares the committed tree
against `.planning/codebase/STRUCTURE.md` and either warns or auto-remaps
the affected subtrees when drift exceeds a configurable threshold.

## Summary
- New `bin/lib/drift.cjs` — pure detector covering four drift categories:
  new directories outside mapped paths, new barrel exports at
  `(packages|apps)/*/src/index.*`, new migration files, and new route
  modules. Prioritizes the most-specific category per file.
- New `verify codebase-drift` CLI subcommand + SDK handler, registered as
  `gsd-sdk query verify.codebase-drift`.
- New `codebase_drift_gate` step in `execute-phase` between
  `schema_drift_gate` and `verify_phase_goal`. Non-blocking by contract —
  any error logs and the phase continues.
- Two new config keys: `workflow.drift_threshold` (int, default 3) and
  `workflow.drift_action` (`warn` | `auto-remap`, default `warn`), with
  enum/integer validation in `config-set`.
- `gsd-codebase-mapper` learns an optional `--paths <p1,p2,...>` scope hint
  for incremental remapping; agent/workflow docs updated.
- `last_mapped_commit` lives in YAML frontmatter on each
  `.planning/codebase/*.md` file; `readMappedCommit`/`writeMappedCommit`
  round-trip helpers ship in `drift.cjs`.

## Tests
- 55 new tests in `tests/drift-detection.test.cjs` covering:
  classification, threshold gating at 2/3/4 elements, warn vs. auto-remap
  routing, affected-path scoping, `--paths` sanitization (traversal,
  absolute, shell metacharacter rejection), frontmatter round-trip,
  defensive paths (missing STRUCTURE.md, malformed input, non-git repos),
  CLI JSON output, and documentation parity.
- Full suite: 5044 pass / 0 fail.

## Documentation
- `docs/CONFIGURATION.md` — rows for both new keys.
- `docs/ARCHITECTURE.md` — section on the post-execute drift gate.
- `docs/AGENTS.md` — `--paths` flag on `gsd-codebase-mapper`.
- `docs/USER-GUIDE.md` — user-facing behavior note + toggle commands.
- `docs/FEATURES.md` — new 27a section with REQ-DRIFT-01..06.
- `docs/INVENTORY.md` + `docs/INVENTORY-MANIFEST.json` — drift.cjs listed.
- `get-shit-done/workflows/execute-phase.md` — `codebase_drift_gate` step.
- `get-shit-done/workflows/map-codebase.md` — `parse_paths_flag` step.
- `agents/gsd-codebase-mapper.md` — `--paths` directive under parse_focus.

## Design decisions
- **Frontmatter over sidecar JSON** for `last_mapped_commit`: keeps the
  baseline attached to the file, survives git moves, survives per-doc
  regeneration, no extra file lifecycle.
- **Substring match against STRUCTURE.md** for `isPathMapped`: the map is
  free-form markdown, not a structured manifest; any mention of a path
  prefix counts as "mapped territory". Cheap, no parser, zero false
  negatives on reasonable maps.
- **Category priority migration > route > barrel > new_dir** so a file
  matching multiple rules counts exactly once at the most specific level.
- **Empty-tree SHA fallback** (`4b825dc6…`) when `last_mapped_commit` is
  absent — semantically correct (no baseline means everything is drift)
  and deterministic across repos.
- **Four layers of non-blocking** — detector try/catch, CLI try/catch, SDK
  handler try/catch, and workflow `|| echo` shell fallback. Any single
  layer failing still returns a valid skipped result.
- **SDK handler delegates to `gsd-tools.cjs`** rather than re-porting the
  detector to TypeScript, keeping drift logic in one canonical place.

Closes #2003

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(mapper): tag --paths fenced block as text (CodeRabbit MD040)

Comment 3127255172.

* docs(config): use /gsd- dash command syntax in drift_action row (CodeRabbit)

Comment 3127255180. Matches the convention used by every other command
reference in docs/CONFIGURATION.md.

* fix(execute-phase): initialize AGENT_SKILLS_MAPPER + tag fenced blocks

Two CodeRabbit findings on the auto-remap branch of the drift gate:

- 3127255186 (must-fix): the mapper Task prompt referenced
  ${AGENT_SKILLS_MAPPER} but only AGENT_SKILLS (for gsd-executor) is
  loaded at init_context (line 72). Without this fix the literal
  placeholder string would leak into the spawned mapper's prompt.
  Add an explicit gsd-sdk query agent-skills gsd-codebase-mapper step
  right before the Task spawn.
- 3127255183: tag the warn-message and Task() fenced code blocks as
  text to satisfy markdownlint MD040.

* docs(map-codebase): wire PATH_SCOPE_HINT through every mapper prompt

CodeRabbit (review id 4158286952, comment 3127255190) flagged that the
parse_paths_flag step defined incremental-remap semantics but did not
inject a normalized variable into the spawn_agents and sequential_mapping
mapper prompts, so incremental remap could silently regress to a
whole-repo scan.

- Define SCOPED_PATHS / PATH_SCOPE_HINT in parse_paths_flag.
- Inject ${PATH_SCOPE_HINT} into all four spawn_agents Task prompts.
- Document the same scope contract for sequential_mapping mode.

* fix(drift): writeMappedCommit tolerates missing target file

CodeRabbit (review id 4158286952, drift.cjs:349-355 nitpick) noted that
readMappedCommit returns null on ENOENT but writeMappedCommit threw — an
asymmetry that breaks first-time stamping of a freshly produced doc that
the caller has not yet written.

- Catch ENOENT on the read; treat absent file as empty content.
- Add a regression test that calls writeMappedCommit on a non-existent
  path and asserts the file is created with correct frontmatter.
  Test was authored to fail before the fix (ENOENT) and passes after.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 21:21:44 -04:00
Tom Boucher
9c0a153a5f feat: /gsd-settings-advanced — power-user config tuning command (closes #2528) (#2603)
* feat: /gsd-settings-advanced — power-user config tuning command (closes #2528)

Adds a second-tier interactive configuration command covering the power-user
knobs that don't belong in the common-case /gsd-settings prompt. Six sectioned
AskUserQuestion batches cover planning, execution, discussion, cross-AI, git,
and runtime settings (19 config keys total). Current values are pre-selected;
numeric fields reject non-numeric input; writes route through
gsd-sdk query config-set so unrelated keys are preserved.

- commands/gsd/settings-advanced.md — command entry
- get-shit-done/workflows/settings-advanced.md — six-section workflow
- get-shit-done/workflows/settings.md — advertise advanced command
- get-shit-done/bin/lib/config-schema.cjs — add context_window to VALID_CONFIG_KEYS
- docs/COMMANDS.md, docs/CONFIGURATION.md, docs/INVENTORY.md — docs + inventory
- tests/gsd-settings-advanced.test.cjs — 81 tests (files, frontmatter,
  field coverage, pre-selection, merge-preserves-siblings, VALID_CONFIG_KEYS
  membership, confirmation table, /gsd-settings cross-link, negative scenarios)

All 5073 tests pass; coverage 88.66% (>= 70% threshold).

* docs(settings-advanced): clarify per-field numeric bounds and label fenced blocks

Addresses CodeRabbit review on PR #2603:
- Numeric-input rule now states min is field-specific: plan_bounce_passes
  and max_discuss_passes require >= 1; other numeric fields accept >= 0.
  Resolves the inconsistency between the global rule and the field-level
  prompts (CodeRabbit comment 3127136557).
- Adds 'text' fence language to seven previously unlabeled code blocks in
  the workflow (six AskUserQuestion sections plus the confirmation banner)
  to satisfy markdownlint MD040 (CodeRabbit comment 3127136561).

* test(settings-advanced): tighten section assertion, fix misleading test name, add executable numeric-input coverage

Addresses CodeRabbit review on PR #2603:
- Required section list now asserts the full 'Runtime / Output' heading
  rather than the looser 'Runtime' substring (comment 3127136564).
- Renames the subagent_timeout coercion test to match the actual key
  under test (was titled 'context_window' but exercised
  workflow.subagent_timeout — comment 3127136573).
- Adds two executable behavioral tests at the config-set boundary
  (comment 3127136579):
  * Non-numeric input on a numeric key currently lands as a string —
    locks in that the workflow's AskUserQuestion re-prompt loop is the
    layer responsible for type rejection. If a future change adds CLI-side
    numeric validation, the assertion flips and the test surfaces it.
  * Numeric string on workflow.max_discuss_passes is coerced to Number —
    locks in the parser invariant for a second numeric key.
2026-04-22 20:50:15 -04:00
Tom Boucher
86c5863afb feat: add settings layers to /gsd-settings (Group A toggles) (closes #2527) (#2602)
* feat(#2527): add settings layers to /gsd:settings (Group A toggles)

Expand /gsd:settings from 14 to 22 settings, grouped into six visual
sections: Planning, Execution, Docs & Output, Features, Model & Pipeline,
Misc. Adds 8 new toggles:

  workflow.pattern_mapper, workflow.tdd_mode, workflow.code_review,
  workflow.code_review_depth (conditional on code_review=on),
  workflow.ui_review, commit_docs, intel.enabled, graphify.enabled

All 8 keys already existed in VALID_CONFIG_KEYS and docs/CONFIGURATION.md;
this wires them into the interactive flow, update_config write step,
~/.gsd/defaults.json persistence, and confirmation table.

Closes #2527

* test(#2527): tighten leaf-collision and rename mismatched negative test

Addresses CodeRabbit findings on PR #2602:

- comment 3127100796: leaf-only matching collapsed `intel.enabled` and
  `graphify.enabled` to a single `enabled` token, so one occurrence
  could satisfy both assertions. Replace with hasPathLike(), which
  requires each dotted segment to appear in order within a bounded
  window. Applied to both update_config and save_as_defaults blocks.

- comment 3127100798: the negative-test description claimed to verify
  invalid `code_review_depth` value rejection but actually exercised an
  unknown key path. Split into two suites with accurate names: one
  asserts settings.md constrains the depth options, the other asserts
  config-set rejects an unknown key path.

* docs(#2527): clarify resolved config path for /gsd-settings

Addresses CodeRabbit comment 3127100790 on PR #2602: the original line
implied a single `.planning/config.json` target, but settings updates
route to `.planning/workstreams/<active>/config.json` when a workstream
is active. Document both resolved paths so the merge target is
unambiguous.
2026-04-22 20:49:52 -04:00
Tom Boucher
1f2850c1a8 fix(#2597): expand dotted query tokens with trailing args (#2599)
resolveQueryArgv only expanded `init.execute-phase` → `init execute-phase`
when the tokens array had length 1. Argv like `init.execute-phase 1` has
length 2, skipped the expansion, and resolved to no registered handler.

All 50+ workflow files use the dotted form with arguments, so this broke
every non-argless query route (`init.execute-phase`, `state.update`,
`phase.add`, `milestone.complete`, etc.) at runtime.

Rename `expandSingleDottedToken` → `expandFirstDottedToken`: split only
the first token on its dots (guarding against `--` flags) and preserve
the tail as positional args. Identity comparison at the call site still
detects "no expansion" since we return the input array unchanged.

Adds regression tests for the three failure patterns reported:
`init.execute-phase 1`, `state.update status X`, `phase.add desc`.

Closes #2597
2026-04-22 17:30:08 -04:00
Tom Boucher
b35fdd51f3 Revert "feat(#2473): ship refuses to open PR when HANDOFF.json declares in-pr…" (#2596)
This reverts commit 7212cfd4de.
2026-04-22 12:57:12 -04:00
Fernando Castillo
7212cfd4de feat(#2473): ship refuses to open PR when HANDOFF.json declares in-progress work (#2553)
* feat(#2473): ship refuses to open PR when HANDOFF.json declares in-progress work

Add a preflight step to /gsd-ship that parses .planning/HANDOFF.json and
refuses to run git push + gh pr create when any remaining_tasks[].status
is not in the terminal set {done, cancelled, deferred_to_backend, wont_fix}.

Refusal names each blocking task and lists four resolutions (finish, mark
terminal, delete stale file, --force). Missing HANDOFF.json is a no-op so
projects that do not use /gsd-pause-work see no behavior change.

Also documents the terminal-statuses contract in references/artifact-types.md
and adds tests/ship-handoff-preflight.test.cjs to lock in the contract.

Closes #2473

* fix(#2473): capture node exit from $() so malformed HANDOFF.json hard-stops

Command substitution BLOCKING=$(node -e "...") discards the inner process
exit code, so a corrupted HANDOFF.json that fails JSON.parse would yield
empty BLOCKING and fall through silently to push_branch — the opposite of
what preflight is supposed to do.

Capture node's exit into HANDOFF_EXIT via $? immediately after the
assignment and branch on it. A non-zero exit is now a hard refusal with
the parser error printed on the preceding stderr line. --force does not
bypass this branch: if the file exists and can't be parsed, something is
wrong and the user should fix it (option 3 in the refusal message —
"Delete HANDOFF.json if it's stale" — still applies).

Verified with a tmp-dir simulation: captured exit 2, hard-stop fires
correctly on malformed JSON. Added a test case asserting the capture
($?) + branch (-ne 0) + parser exit (process.exit(2)) are all present,
so a future refactor can't silently reintroduce the bug.

Reported by @coderabbitai on PR #2553.
2026-04-22 12:11:31 -04:00
Tom Boucher
2b5c35cdb1 test(#2519): add regression test for sdk tarball dist inclusion (#2586)
* test(#2519): add regression test verifying sdk/package.json has files + prepublishOnly

Guards the sdk/package.json fix for #2519 (tarball shipped without dist/)
so future edits can't silently drop either the `files` whitelist or the
`prepublishOnly` build hook. Asserts:

- `files` is a non-empty array
- `files` includes "dist" (so compiled CLI ships in tarball)
- `scripts.prepublishOnly` runs a build (npm run build / tsc)
- `bin` target lives under dist/ (sanity tie-in)

Closes #2519

* test(#2519): accept valid npm glob variants for dist in files matcher

Addresses CodeRabbit nitpick: the previous equality check on 'dist' / 'dist/' /
'dist/**' would false-fail on other valid npm packaging forms like './dist',
'dist/**/*', or backslash-separated paths. Normalize each entry and use a
regex that accepts all common dist path variants.
2026-04-22 12:09:12 -04:00
Tom Boucher
73c1af5168 fix(#2543): replace legacy /gsd-<cmd> syntax with /gsd:<cmd> across all source files (#2595)
Commands are now installed as commands/gsd/<name>.md and invoked as
/gsd:<name> in Claude Code. The old hyphen form /gsd-<name> was still
hardcoded in hundreds of places across workflows, references, templates,
lib modules, and command files — causing "Unknown command" errors
whenever GSD suggested a command to the user.

Replace all /gsd-<cmd> occurrences where <cmd> is a known command name
(derived at runtime from commands/gsd/*.md) using a targeted Node.js
script. Agent names, tool names (gsd-sdk, gsd-tools), directory names,
and path fragments are not touched.

Adds regression test tests/bug-2543-gsd-slash-namespace.test.cjs that
enforces zero legacy occurrences going forward. Removes inverted
tests/stale-colon-refs.test.cjs (bug #1748) which enforced the now-obsolete
hyphen form; the new bug-2543 test supersedes it. Updates 5 assertion
tests that hardcoded the old hyphen form to accept the new colon form.

Closes #2543

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 12:04:25 -04:00
Tom Boucher
533973700c feat(#2538): add last: /cmd suffix to statusline (opt-in) (#2594)
Adds a `statusline.show_last_command` config toggle (default: false) that
appends ` │ last: /<cmd>` to the statusline, showing the most recently
invoked slash command in the current session.

The suffix is derived by tailing the active Claude Code transcript
(provided as transcript_path in the hook input) and extracting the last
<command-name> tag. Reads only the final 256 KiB to stay cheap per render.
Graceful degradation: missing transcript, no recorded command, unreadable
config, or parse errors all silently omit the suffix without breaking the
statusline.

Closes #2538
2026-04-22 12:04:21 -04:00
Tom Boucher
349daf7e6a fix(#2545): use word boundary in path replacement to catch ~/.claude without trailing slash (#2592)
The Copilot content converter only replaced `~/.claude/` and
`$HOME/.claude/` when followed by a literal `/`. Bare references
(e.g. `configDir = ~/.claude` at end of line) slipped through and
triggered the post-install "Found N unreplaced .claude path reference(s)"
warning, since the leak scanner uses `(?:~|$HOME)/\.claude\b`.

Switched both replacements to a `(\/|\b)` capture group so trailing-slash
and bare forms are handled in a single pass — matching the pattern
already used by Antigravity, OpenCode, Kilo, and Codex converters.

Closes #2545
2026-04-22 12:04:17 -04:00
Tom Boucher
6b7b5c15a5 fix(#2559): remove stale year injection from research agent web search instructions (#2591)
The gsd-phase-researcher and gsd-project-researcher agents instructed
WebSearch queries to always include 'current year' (e.g., 2024). As
time passes, a hardcoded year biases search results toward stale
dated content — users saw 2024-tagged queries producing stale blog
references in 2026.

Remove the year-injection guidance. Instead, rely on checking
publication dates on the returned sources. Query templates and
success criteria updated accordingly.

Closes #2559
2026-04-22 12:04:13 -04:00
Tom Boucher
67a9550720 fix(#2549,#2550,#2552): bound discuss-phase context reads, add phase-type map selection, prohibit split reads (#2590)
#2549: load_prior_context was reading every prior *-CONTEXT.md file,
growing linearly with project phase count. Cap to the 3 most recent
phases. If .planning/DECISIONS-INDEX.md exists, read that instead.

#2550: scout_codebase claimed to select maps "based on phase type" but
had no classifier — agents read all 7 maps. Replace with an explicit
phase-type-to-maps table (2–3 maps per phase type) with a Mixed fallback.

#2552: Add explicit instruction not to split-read the same file at two
different offsets. Split reads break prompt cache reuse and cost more
than a single full read.

Closes #2549
Closes #2550
Closes #2552

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 12:04:10 -04:00
Tom Boucher
fba040c72c fix(#2557): Gemini/Antigravity local hook commands use relative paths, not \$CLAUDE_PROJECT_DIR (#2589)
\$CLAUDE_PROJECT_DIR is Claude Code-specific. Gemini CLI doesn't set it, and on
Windows its path-join logic doubled the value producing unresolvable paths like
D:\Projects\GSD\'D:\Projects\GSD'. Gemini runs project hooks with project root
as cwd, so bare relative paths (e.g. node .gemini/hooks/gsd-check-update.js)
are cross-platform and correct. Claude Code and others still use the env var.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 12:04:06 -04:00
Tom Boucher
7032f44633 fix(#2544): exit 1 on missing key in config-get (#2588)
The configGet query handler previously threw GSDError with
ErrorClassification.Validation, which maps to exit code 10. Callers
using `if ! gsd-sdk query config-get key; then fallback; fi` could
not detect missing keys through the exit code alone, because exit 10
is still truthy-failure but the intent (and documented UNIX
convention — cf. `git config --get`) is exit 1 for absent key.

Change the classification for the two 'Key not found' throw sites to
ErrorClassification.Execution so the CLI exits 1 on missing key.
Usage/schema errors (no key argument, malformed JSON, missing
config.json) remain Validation.

Closes #2544
2026-04-22 12:04:03 -04:00
Tom Boucher
2404b40a15 fix(#2555): SDK agent-skills reads config.agent_skills and returns <agent_skills> block (#2587)
The SDK query handler `agent-skills` previously scanned every skill
directory on the filesystem and returned a flat JSON list, ignoring
`config.agent_skills[agentType]` entirely. Workflows that interpolate
$(gsd-sdk query agent-skills <type>) into Task() prompts got a JSON
dump of all skills instead of the documented <agent_skills> block.

Port `buildAgentSkillsBlock` semantics from
get-shit-done/bin/lib/init.cjs into the SDK handler:

- Read config.agent_skills[agentType] via loadConfig()
- Support single-string and array forms
- Validate each project-relative path stays inside the project root
  (symlink-aware, mirrors security.cjs#validatePath)
- Support `global:<name>` prefix for ~/.claude/skills/<name>/
- Skip entries whose SKILL.md is missing, with a stderr warning
- Return the exact string block workflows embed:
  <agent_skills>\nRead these user-configured skills:\n- @.../SKILL.md\n</agent_skills>
- Empty string when no agent type, no config, or nothing valid — matches
  gsd-tools.cjs cmdAgentSkills output.
2026-04-22 12:03:59 -04:00
Tom Boucher
0d6349a6c1 fix(#2554): preserve leading zero in getMilestonePhaseFilter (#2585)
The normalization `replace(/^0+/, '')` over-stripped decimal phase IDs:
`"00.1"` collapsed to `".1"`, while the disk-side extractor yielded
`"0.1"` from `"00.1-<slug>"`. Set membership failed and inserted decimal
phases were silently excluded from every disk scan inside
`buildStateFrontmatter`, causing `state update` to rewind progress
counters.

Strip leading zeros only when followed by a digit
(`replace(/^0+(?=\d)/, '')`), preserving the zero before the decimal
point while keeping existing behavior for zero-padded integer IDs.

Closes #2554
2026-04-22 12:03:56 -04:00
Tom Boucher
c47a6a2164 fix: correct VALID_CONFIG_KEYS — remove internal state key, add missing public keys, migration hints (#2561)
* fix(#2530-2535): correct VALID_CONFIG_KEYS set — remove internal state key, add missing public keys, add migration hints

- Remove workflow._auto_chain_active from VALID_CONFIG_KEYS (internal runtime state, not user-settable) (#2530)
- Add hooks.workflow_guard to VALID_CONFIG_KEYS (read by gsd-workflow-guard.js hook, already documented) (#2531)
- Add workflow.ui_review to VALID_CONFIG_KEYS (read in autonomous.md via config-get) (#2532)
- Add workflow.max_discuss_passes to VALID_CONFIG_KEYS (read in discuss-phase.md via config-get) (#2533)
- Add CONFIG_KEY_SUGGESTIONS entries for sub_repos → planning.sub_repos and plan_checker → workflow.plan_check (#2535)
- Document workflow.ui_review and workflow.max_discuss_passes in docs/CONFIGURATION.md
- Clear INTERNAL_KEYS exemption in parity test (workflow._auto_chain_active removed from schema entirely)
- Add regression test file tests/bug-2530-valid-config-keys.test.cjs covering all 6 bugs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: align SDK VALID_CONFIG_KEYS with CJS — remove internal key, add missing public keys

- Remove workflow._auto_chain_active from SDK (internal runtime state, not user-settable)
- Add workflow.ui_review, workflow.max_discuss_passes, hooks.workflow_guard to SDK
- Add ui_review and max_discuss_passes to Full Schema example in CONFIGURATION.md

Resolves CodeRabbit review on #2561.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 11:28:25 -04:00
forfrossen
af2dba2328 fix(hooks): detect Claude Code via stdin session_id (closes #2520) (#2521)
* fix(hooks): detect Claude Code via stdin session_id, not filtered env (#2520)

The #2344 fix assumed `CLAUDECODE` would propagate to hook subprocesses.
On Claude Code v2.1.116 it doesn't — Claude Code applies a separate env
filter to PreToolUse hook commands that drops bare CLAUDECODE and
CLAUDE_SESSION_ID, keeping only CLAUDE_CODE_*-prefixed vars plus
CLAUDE_PROJECT_DIR. As a result every Edit/Write on an existing file
produced a redundant READ-BEFORE-EDIT advisory inside Claude Code.

Use `data.session_id` from the hook's stdin JSON as the primary Claude
Code signal (it's part of Claude Code's documented PreToolUse hook-input
schema). Keep CLAUDE_CODE_ENTRYPOINT / CLAUDE_CODE_SSE_PORT env checks
as propagation-verified fallbacks, and keep the legacy
CLAUDE_SESSION_ID / CLAUDECODE checks for back-compat and
future-proofing.

Add tests/bug-2520-read-guard-hook-subprocess-env.test.cjs, which spawns
the hook with an env mirroring the actual Claude Code hook-subprocess
filter. Extend the legacy test harnesses to also strip the
propagation-verified CLAUDE_CODE_* vars so positive-path tests keep
passing when the suite itself runs inside a Claude Code session (same
class of leak as #2370 / PR #2375, now covering the new detection
signals).

Non-Claude-host behavior (OpenCode / MiniMax) is unchanged: with no
`session_id` on stdin and no CLAUDE_CODE_* env var, the advisory still
fires.

Closes #2520

* test(2520): isolate session_id signal from env fallbacks in regression test

Per reviewer feedback (Copilot + CodeRabbit on #2521): the session_id
isolation test used the helper's default CLAUDE_CODE_ENTRYPOINT /
CLAUDE_CODE_SSE_PORT values, so the env fallback would rescue the skip
even if the primary `data.session_id` check regressed. Pass an explicit
env override that clears those fallbacks, so only the stdin `session_id`
signal can trigger the skip.

Other cases (env-only fallback, negative / non-Claude host) already
override env appropriately.

---------

Co-authored-by: forfrossen <forfrossensvart@gmail.com>
2026-04-22 10:41:58 -04:00
elfstrob
9b5397a30f feat(sdk): add queued_phases to init.manager (closes #2497) (#2514)
* feat(sdk): add queued_phases to init.manager (closes #2497)

Surfaces the milestone immediately AFTER the active one so the
/gsd-manager dashboard can preview upcoming phases without mixing
them into the active phases grid.

Changes:
- roadmap.ts: exports two new helpers
  - extractPhasesFromSection(section): parses phase number / name /
    goal / depends_on using the same pattern initManager uses for
    the active milestone, so queued phases have identical shape.
  - extractNextMilestoneSection(content, projectDir): resolves the
    current milestone via the STATE-first path (matching upstream
    PR #2508) then scans for the next ## milestone heading. Shipped
    milestones are stripped first so they can't shadow the real
    next. Returns null when the active milestone is the last one.
- init-complex.ts: initManager now exposes
  - queued_phases: Array<{ number, name, display_name, goal,
    depends_on, dep_phases, deps_display }>
  - queued_milestone_version: string | null
  - queued_milestone_name: string | null
  Existing phases array is unchanged — callers that only care about
  the active milestone see no behavior difference.

Scope note: PR #2508 (merged upstream 2026-04-21) superseded the
#2495 + #2496 portions of this branch's original submission. This
commit is the rebased remainder contributing only #2497 on top of
upstream's new helpers.

Test coverage (7 new tests, all passing):
- roadmap.test.ts: +5 tests
  - extractPhasesFromSection parses multiple phases with goal + deps
  - extractPhasesFromSection returns [] when no phase headings
  - extractNextMilestoneSection returns the milestone after the
    STATE-resolved active one
  - extractNextMilestoneSection returns null when active is last
  - extractNextMilestoneSection returns null when no version found
- init-complex.test.ts: +4 tests under `queued_phases (#2497)`
  - surfaces next milestone with version + name metadata
  - queued entries carry name / deps_display / display_name
  - queued phases are NOT mixed into active phases list
  - returns [] + nulls when active is the last milestone

All 51 tests in roadmap.test.ts + init-complex.test.ts pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(workflows): render queued_phases section in /gsd-manager dashboard

Surfaces the new `queued_phases` / `queued_milestone_version` /
`queued_milestone_name` fields from init.manager (SDK #2497) in a
compact preview section directly below the main active-milestone
table.

Changes to workflows/manager.md:
- Initialize step: parse the optional trio
  (queued_milestone_version, queued_milestone_name, queued_phases)
  alongside the existing init.manager fields. Treat missing as
  empty for backward compatibility with older SDK versions.
- Dashboard step: new "Queued section (next milestone preview)"
  rendered between the main active-milestone grid and the
  Recommendations section. Renders only when queued_phases is
  non-empty; skipped entirely when absent or empty (e.g. active
  milestone is the last one).
- Queued rows render without D/P/E columns since the phases haven't
  been discussed yet — just number, display_name, deps_display,
  and a fixed "· Queued" status.
- Success criterion added: queued section renders when non-empty
  and is skipped when absent.

Queued phases are deliberately NOT eligible for the Continue action
menu; they live in a future milestone. The preview exists for
situational awareness only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 10:41:37 -04:00
Tom Boucher
7397f580a5 fix(#2516): resolve executor_model inherit literal passthrough; add regression test (#2537)
When model_profile is "inherit", execute-phase was passing the literal string
"inherit" to Task(model=), causing fallback to the default model. The workflow
now documents that executor_model=="inherit" requires omitting the model= parameter
entirely so Claude Code inherits the orchestrator model automatically.

Closes #2516
2026-04-21 21:35:22 -04:00
Tom Boucher
9a67e350b3 fix(#2504): auto-pass UAT for infrastructure/foundation phases with no user-facing elements (#2541)
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 21:20:27 -04:00
Tom Boucher
98d92d7570 fix(#2526): warn about REQ-IDs in body missing from Traceability table (#2539)
Scan REQUIREMENTS.md body for all **REQ-ID** patterns during phase
complete and emit a warning for any IDs absent from the Traceability
table, regardless of whether the roadmap has a Requirements: line.

Closes #2526
2026-04-21 21:18:58 -04:00
Tom Boucher
8eeaa20791 fix(install): chmod dist/cli.js 0o755 after npm install -g; add regression test (closes #2525) (#2536)
Use process.platform !== 'win32' guard in catch instead of a comment, and add
regression test for bug #2525 (gsd-sdk bin symlink points at non-executable file).

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 21:18:34 -04:00
Tom Boucher
f32ffc9fb8 fix(quick): include deferred-items.md in final commit file list (closes #2523) (#2542)
Step 8 file list omitted deferred-items.md, leaving executor out-of-scope
findings untracked after final commit even with commit_docs: true.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 20:33:43 -04:00
Tom Boucher
5676e2e4ef fix(sdk): forward --ws workstream flag through query dispatch (#2546)
* fix(sdk): forward --ws workstream flag through query dispatch (closes #2524)

- cli.ts: pass args.ws as workstream to registry.dispatch()
- registry.ts: add workstream? param to dispatch(), thread to handler
- utils.ts: add optional workstream? to QueryHandler type signature
- helpers.ts: planningPaths() accepts workstream? and uses relPlanningPath()
- All ~26 query handlers updated to receive and pass workstream to planningPaths()
- Config/commit/intel handlers use _workstream (project-global, not scoped)
- Add failing-then-passing test: tests/bug-2524-sdk-query-ws-flag.test.cjs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(sdk): forward workstream to all downstream query helpers

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(test): rewrite #2524 test as static source assertions — no sdk/dist build in CI

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 20:33:24 -04:00
Lex Christopherson
7bb6b6452a fix: spike workflow defaults to interactive UI demos, not stdout
Flips the bias in step 8b: build a simple HTML page/web UI by default,
fall back to stdout only for pure fact-checking (binary yes/no, benchmarks).
Mirrors upstream spike-idea skill constraint #3 update.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 09:19:04 -06:00
Lex Christopherson
43ea92578b Merge remote-tracking branch 'origin/main' into hotfix/1.38.2
# Conflicts:
#	CHANGELOG.md
#	bin/install.js
#	sdk/src/query/init.ts
2026-04-21 09:16:24 -06:00
Lex Christopherson
a42d5db742 1.38.2 2026-04-21 09:14:52 -06:00
Lex Christopherson
c86ca1b3eb fix: sync spike/sketch workflows with upstream skill v2 improvements
Spike workflow:
- Add frontier mode (no-arg or "frontier" proposes integration + frontier spikes)
- Add depth-over-speed principle — follow surprising findings, test edge cases,
  document investigation trail not just verdict
- Add CONVENTIONS.md awareness — follow established patterns, update after session
- Add Requirements section in MANIFEST — track design decisions as they emerge
- Add re-ground step before each spike to prevent drift in long sessions
- Add Investigation Trail section to README template
- Restructured prior context loading with priority ordering
- Research step now runs per-spike with briefing and approach comparison table

Sketch workflow:
- Add frontier mode (no-arg or "frontier" proposes consistency + frontier sketches)
- Add spike context loading — ground mockups in real data shapes, requirements,
  and conventions from spike findings

Spike wrap-up workflow:
- Add CONVENTIONS.md generation step (recurring stack/structure/pattern choices)
- Reference files now use implementation blueprint format (Requirements, How to
  Build It, What to Avoid, Constraints)
- SKILL.md now includes requirements section from MANIFEST
- Next-steps route to /gsd-spike frontier mode instead of inline analysis

Sketch wrap-up workflow:
- Next-steps route to /gsd-sketch frontier mode

Commands updated with frontier mode in descriptions and argument hints.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 09:14:32 -06:00
github-actions[bot]
337e052aa9 chore: bump version to 1.38.2 for hotfix 2026-04-21 15:13:56 +00:00
Lex Christopherson
969ee38ee5 fix: sync spike/sketch workflows with upstream skill v2 improvements
Spike workflow:
- Add frontier mode (no-arg or "frontier" proposes integration + frontier spikes)
- Add depth-over-speed principle — follow surprising findings, test edge cases,
  document investigation trail not just verdict
- Add CONVENTIONS.md awareness — follow established patterns, update after session
- Add Requirements section in MANIFEST — track design decisions as they emerge
- Add re-ground step before each spike to prevent drift in long sessions
- Add Investigation Trail section to README template
- Restructured prior context loading with priority ordering
- Research step now runs per-spike with briefing and approach comparison table

Sketch workflow:
- Add frontier mode (no-arg or "frontier" proposes consistency + frontier sketches)
- Add spike context loading — ground mockups in real data shapes, requirements,
  and conventions from spike findings

Spike wrap-up workflow:
- Add CONVENTIONS.md generation step (recurring stack/structure/pattern choices)
- Reference files now use implementation blueprint format (Requirements, How to
  Build It, What to Avoid, Constraints)
- SKILL.md now includes requirements section from MANIFEST
- Next-steps route to /gsd-spike frontier mode instead of inline analysis

Sketch wrap-up workflow:
- Next-steps route to /gsd-sketch frontier mode

Commands updated with frontier mode in descriptions and argument hints.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 09:05:47 -06:00
Tom Boucher
2980f0ec48 fix(sdk): stripShippedMilestones handles inline SHIPPED headings; getMilestoneInfo prefers STATE.md (#2508)
* fix(sdk): stripShippedMilestones handles inline SHIPPED headings; getMilestoneInfo prefers STATE.md

Fixes two compounding bugs:

- #2496: stripShippedMilestones only stripped <details> blocks, ignoring
  '## Heading —  SHIPPED ...' inline markers. Shipped milestone sections
  were leaking into downstream parsers.

- #2495: getMilestoneInfo checked STATE.md frontmatter only as a last-resort
  fallback, so it returned the first heading match (often a leaked shipped
  milestone) rather than the current milestone. Moved STATE.md check to
  priority 1, consistent with extractCurrentMilestone.

Closes #2495
Closes #2496

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(roadmap): handle ### SHIPPED headings and STATE.md version-only case

Two follow-up fixes from CodeRabbit review of #2508:

1. stripShippedMilestones only split on ## boundaries; ### headings marked
    SHIPPED were not stripped, leaking into fallback parsers. Expanded
   the split/filter regex to #{2,3} to align with extractCurrentMilestone.

2. getMilestoneInfo's early-return on parseMilestoneFromState discarded the
   real milestone name from ROADMAP.md when STATE.md had only `milestone:`
   (no `milestone_name:`), returning the placeholder name 'milestone'.
   Now only short-circuits when STATE.md provides a real name; otherwise
   falls through to ROADMAP for the name while using stateVersion to
   override the version in every ROADMAP-derived return path.

Tests: +2 new cases (### SHIPPED heading, version-only STATE.md).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 10:41:35 -04:00
Tom Boucher
8789211038 fix(insert-phase): update STATE.md next-phase recommendation after phase insertion (#2509)
* fix(insert-phase): update STATE.md next-phase recommendation after inserting a phase

Closes #2502

* fix(insert-phase): update all STATE.md pointers; tighten test scope

Two follow-up fixes from CodeRabbit review of #2509:

1. The update_project_state instruction only said to find "the line" for
   the next-phase recommendation. STATE.md can have multiple pointers
   (structured current_phase: field AND prose recommendation text).
   Updated wording to explicitly require updating all of them in the same
   edit.

2. The regression test for the next-phase pointer update scanned the
   entire file, so a match anywhere would pass even if update_project_state
   itself was missing the instruction. Scoped the assertion to only the
   content inside <step name="update_project_state"> to prevent false
   positives.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 10:10:45 -04:00
Tom Boucher
57bbfe652b fix: exclude non-wiped dirs from custom-file scan; warn on non-Claude model profiles (#2511)
* fix(detect-custom-files): exclude skills and command dirs not wiped by installer (closes #2505)

GSD_MANAGED_DIRS included 'skills' and 'command' directories, but the
installer never wipes those paths. Users with third-party skills installed
(40+ files, none in GSD's manifest) had every skill flagged as a "custom
file" requiring backup, producing noisy false-positive reports on every
/gsd-update run.

Removes 'skills' and 'command' from both gsd-tools.cjs and the SDK's
detect-custom-files.ts. Adds two regression tests confirming neither
directory is scanned.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(settings): warn that model profiles are no-ops on non-Claude runtimes (closes #2506)

settings.md presented Quality/Balanced/Budget model profiles without any
indication that these tiers map to Claude models (Opus/Sonnet/Haiku) and
have no effect on non-Claude runtimes (Codex, Gemini CLI, OpenRouter).
Users on Codex saw the profile chooser as if it would meaningfully select
models, but all agents silently used the runtime default regardless.

Adds a non-Claude runtime note before the profile question (shown in
TEXT_MODE, the path all non-Claude runtimes take) explaining the profiles
are no-ops and directing users to either choose Inherit or configure
model_overrides manually. Also updates the Inherit option description to
explicitly name the runtimes where it is the correct choice.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 10:10:10 -04:00
Tom Boucher
a4764c5611 fix(execute-phase): resurrection-detection must check git history before deleting new .planning/ files (#2510)
The guard at the worktree-merge resurrection block was inverting the
intended logic: it deleted any .planning/ file absent from PRE_MERGE_FILES,
which includes brand-new files (e.g. SUMMARY.md just created by the
executor). A genuine resurrection is a file that was previously tracked on
main, deliberately removed, and then re-introduced by the merge. Detecting
that requires a git history check — not just tree membership.

Fix: replace the PRE_MERGE_FILES grep guard with a `git log --follow
--diff-filter=D` check that only removes the file if it has a deletion
event in main's ancestry.

Closes #2501
2026-04-21 09:46:01 -04:00
Tom Boucher
b2534e8a05 feat(plan-phase): chunked mode + filesystem fallback for Windows stdio hang (#2499)
* feat(plan-phase): chunked mode + filesystem fallback for Windows stdio hang (#2310)

Addresses the 2026-04-16 Windows incident where gsd-planner wrote all 5
PLAN.md files to disk but Task() never returned, hanging the orchestrator
for 30+ minutes. Two mitigations:

1. Filesystem fallback (steps 9a, 11a): when Task() returns with an
   empty/truncated response but PLAN.md files exist on disk, surface a
   recoverable prompt (Accept plans / Retry planner / Stop) instead of
   silently failing. Directly addresses the post-restart recovery path.

2. Chunked mode (--chunked flag / workflow.plan_chunked config): splits the
   single long-lived planner Task into a short outline Task (~2 min) followed
   by N short per-plan Tasks (~3-5 min each). Each plan is committed
   individually for crash resilience. A hang loses one plan, not all of them.
   Resume detection skips plans already on disk on re-run.

RCA confirmed: task state mtime 14:29 vs PLAN.md writes 14:32-14:52 =
subagent completed normally, IPC return was dropped by Windows stdio deadlock.
Neither mitigation fixes the root cause (requires upstream Task() timeout
support); both bound damage and enable recovery.

New reference file planner-chunked.md keeps OUTLINE COMPLETE / PLAN COMPLETE
return formats out of gsd-planner.md (which sits at 46K near its size limit).

Closes #2310

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(plan-phase): address CodeRabbit review comments on #2499

- docs/CONFIGURATION.md: add workflow.plan_chunked to full JSON schema example
- plan-phase.md step 8.5.1: validate PLAN-OUTLINE.md with grep for OUTLINE
  COMPLETE marker before reusing (not just file existence)
- plan-phase.md step 8.5.2: validate per-plan PLAN.md has YAML frontmatter
  (head -1 grep for ---) before skipping in resume path
- plan-phase.md: add language tags (text/javascript/bash) to bare fenced
  code blocks in steps 8.5, 9a, 11a (markdownlint MD040)
- Rejected: commit_docs gate on per-plan commits (gsd-sdk query commit
  already respects commit_docs internally — comment was a false positive)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(plan-phase): route Accept-plans through step 9 PLANNING COMPLETE handling

Honors --skip-verify / plan_checker_enabled=false in 9a fallback path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 08:40:39 -04:00
Tom Boucher
d1b56febcb fix(execute-phase): post-merge deletion audit for bulk file deletions (closes #2384) (#2483)
* fix(execute-phase): post-merge deletion audit for bulk file deletions (closes #2384)

Two data-loss incidents were caused by worktree merges bringing in bulk
file deletions silently. The pre-merge check (HEAD...WT_BRANCH) catches
deletions on the worktree branch, but files deleted during the merge
itself (e.g., from merge conflict resolution or stale branch state) were
not audited post-merge.

Adds a post-merge audit immediately after git merge --no-ff succeeds:
- Counts files deleted outside .planning/ in the merge commit
- If count > 5 and ALLOW_BULK_DELETE!=1: reverts the merge with
  git reset --hard HEAD~1 and continues to the next worktree
- Logs the full file list and an escape-hatch instruction

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(test): tighten post-merge deletion audit assertions (CodeRabbit #2483)

Replace loose substring checks with exact regex assertions:
- assert.match against 'git diff --diff-filter=D --name-only HEAD~1 HEAD'
- assert.match against threshold gate + ALLOW_BULK_DELETE override condition
- assert.match against git reset --hard HEAD~1 revert
- assert.match against MERGE_DEL_COUNT grep -vc for non-.planning count

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(inventory): update workflow count to 81 (graduation.md added in #2490)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 18:37:42 -04:00
Tom Boucher
1657321eb0 fix(install): remove bare ~/.claude reference in update.md (closes #2470) (#2482)
* fix(install): remove bare ~/.claude reference in update.md (closes #2470)

The installer's copyWithPathReplacement() replaces ~/\.claude\/ (with
trailing slash) but not ~/\.claude (bare, no trailing slash). A comment
on line 398 of update.md used the bare form, which scanForLeakedPaths()
correctly flagged for every non-Claude runtime install.

Replaced the example in the comment with a non-Claude runtime path so
the file passes the scanner for all runtimes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(test): align regex with installer's word-boundary semantics (CodeRabbit #2482)

Replace negative lookahead (?!\/) with \b word boundary to match the
installer's scanForLeakedPaths() pattern. The lookahead would incorrectly
flag ~/.claude_suffix whereas \b correctly excludes it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(test): revert \b regex — (?!\/) was intentionally scoped to bare refs

The installer's scanForLeakedPaths uses \b but the test is specifically
checking for bare ~/.claude without trailing slash that the replacer misses.
~/.claude/ (with slash) at line 359 of update.md is expected and handled.
\b would flag it as a false positive.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(inventory): update workflow count to 81 (graduation.md added in #2490)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 18:37:32 -04:00
Tom Boucher
2b494407e5 feat(assembly): add link mode for CLAUDE.md @-reference sections (#2484)
* feat(assembly): add link mode for CLAUDE.md @-reference sections (#2415)

Adds `claude_md_assembly.mode: "link"` config option that writes
`@.planning/<source>` instead of inlining content between GSD markers,
reducing typical CLAUDE.md size by ~65%. Per-block overrides available
via `claude_md_assembly.blocks.<section>`. Falls back to embed for
sections without a real source file (workflow, fallbacks).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(test): add positive assertion for embedded workflow content (CodeRabbit #2484)

The negative assertion only confirmed @GSD defaults wasn't written.
Add assert.ok(content.includes('GSD Workflow Enforcement')) to verify
the workflow section is actually embedded inline when link mode falls back.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 18:27:55 -04:00
Tom Boucher
d0f4340807 feat(workflows): link pending todos to roadmap phases in new-milestone (#2433) (#2485)
Adds step 10.5 to gsd-new-milestone that scans pending todos against the
approved roadmap and tags matches with `resolves_phase: N` in their YAML
frontmatter. Adds a `close_phase_todos` step to execute-phase that moves
tagged todos to `completed/` when the phase completes — closing the loop
automatically with no manual cleanup.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 18:25:24 -04:00
Tom Boucher
280eed93bc feat(cli): add /gsd-sync-skills for cross-runtime managed skill sync (#2491)
* fix(tests): update 5 source-text tests to read config-schema.cjs

VALID_CONFIG_KEYS moved from config.cjs to config-schema.cjs in the
drift-prevention companion PR. Tests that read config.cjs source text
and checked for key literal includes() now point to the correct file.

Closes #2480

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(cli): add /gsd-sync-skills for cross-runtime managed skill sync (#2380)

Adds /gsd-sync-skills command so multi-runtime users can keep gsd-* skill
directories aligned across runtime roots after updating one runtime with gsd-update.

Changes:
- bin/install.js: add --skills-root <runtime> flag that prints the skills root
  path for any supported runtime, reusing the existing getGlobalDir() table.
  Banner is suppressed when --skills-root is used (machine-readable output).
- commands/gsd/sync-skills.md: slash command definition
- get-shit-done/workflows/sync-skills.md: full workflow spec covering argument
  parsing, path resolution via --skills-root, diff computation (CREATE/UPDATE/
  REMOVE/SKIP), dry-run report (default), apply execution, idempotency guarantee,
  and safety rules (only gsd-* touched, dry-run performs no writes).

Safety rules: only gsd-* directories are ever created/updated/removed; non-GSD
skills in destination roots are never touched; --dry-run is the default.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 18:21:43 -04:00
Tom Boucher
b432d4a726 feat(workflows): close LEARNINGS.md consumption-and-graduation loop (#2490)
* fix(tests): update 5 source-text tests to read config-schema.cjs

VALID_CONFIG_KEYS moved from config.cjs to config-schema.cjs in the
drift-prevention companion PR. Tests that read config.cjs source text
and checked for key literal includes() now point to the correct file.

Closes #2480

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(workflows): close LEARNINGS.md consumption-and-graduation loop (#2430)

Part A — Consumption: extend plan-phase.md cross-phase context load to include
LEARNINGS.md files from the 3 most recent prior phases (same recency gate as
CONTEXT.md + SUMMARY.md: CONTEXT_WINDOW >= 500000 only). Also loads LEARNINGS.md
from any phases in the Depends-on chain. Silent skip if absent; 15% context
budget cap with oldest-first truncation; [from Phase N LEARNINGS] attribution.

Part B — Graduation: add graduation_scan step to transition.md (after
evolve_project) that delegates to new graduation.md helper workflow. The helper
clusters recurring items across the last N phases (default window=5, threshold=3)
using Jaccard lexical similarity, surfaces HITL Promote/Defer/Dismiss prompts,
routes promotions to PROJECT.md or PATTERNS.md by category, annotates graduated
items with `graduated:` field, and persists dismissed/deferred clusters in
STATE.md graduation_backlog. Always non-blocking; silently no-ops on first phase
or when data is insufficient.

Also: adds optional `graduated:` annotation docs to extract_learnings.md schema.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(graduation): address CodeRabbit review findings on PR #2490

- graduation.md: unify insufficient-data guard to silent-skip (remove
  contradictory [no-op] print path)
- graduation.md: add TEXT_MODE fallback for HITL cluster prompts
- graduation.md: add A (defer-all) to accepted actions [P/D/X/A]
- graduation.md: tag untyped code fences with text language (MD040)
- transition.md: tag untyped graduation.md fence with text language

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(graduation): rephrase TEXT_MODE line to avoid prompt-injection scanner false positive

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 18:21:35 -04:00
Tom Boucher
cfe4dc76fd feat(health): canonical artifact registry and W019 unrecognized-file lint (#2448) (#2488)
Adds artifacts.cjs with canonical .planning/ root file names, W019 warning
in gsd-health that flags unrecognized .md files at the .planning/ root, and
templates/README.md as the authoritative artifact index for agents and humans.

Closes #2448
2026-04-20 18:21:23 -04:00
Tom Boucher
f19d0327b2 feat(agents): sycophancy hardening for 9 audit-class agents (#2489)
* fix(tests): update 5 source-text tests to read config-schema.cjs

VALID_CONFIG_KEYS moved from config.cjs to config-schema.cjs in the
drift-prevention companion PR. Tests that read config.cjs source text
and checked for key literal includes() now point to the correct file.

Closes #2480

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(agents): sycophancy hardening for 9 audit-class agents (#2427)

Add adversarial reviewer posture to gsd-plan-checker, gsd-code-reviewer,
gsd-security-auditor, gsd-verifier, gsd-eval-auditor, gsd-nyquist-auditor,
gsd-ui-auditor, gsd-integration-checker, and gsd-doc-verifier.

Four changes per agent:
- Third-person framing: <role> opens with submission framing, not "You are a GSD X"
- FORCE stance: explicit starting hypothesis that the submission is flawed
- Failure modes: agent-specific list of how each reviewer type goes soft
- BLOCKER/WARNING classification: every finding must carry an explicit severity

Also applies to sdk/prompts/agents variants of gsd-plan-checker and gsd-verifier.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 18:20:08 -04:00
Tom Boucher
bd27d4fabe feat(roadmap): surface wave dependencies and cross-cutting constraints (#2487)
* feat(roadmap): surface wave dependencies and cross-cutting constraints (#2447)

Adds roadmap.annotate-dependencies command that post-processes a phase's
ROADMAP plan list to insert wave dependency notes and surface must_haves.truths
entries shared across 2+ plans as cross-cutting constraints. Operation is
idempotent and purely derived from existing PLAN frontmatter.

Closes #2447

* fix(roadmap): address CodeRabbit review findings on PR #2487

- roadmap.cjs: expand idempotency guard to also check for existing
  cross-cutting constraints header, preventing duplicate injection on
  re-runs; add content equality check before writing to preserve
  true idempotency for single-wave phases
- plan-phase.md: move ROADMAP annotation (13d) before docs commit (13c)
  so annotated ROADMAP.md is included in the commit rather than left dirty;
  include .planning/ROADMAP.md in committed files list
- sdk/src/query/index.ts: add annotate-dependencies aliases to
  QUERY_MUTATION_COMMANDS so the mutation is properly event-wired
- sdk/src/query/roadmap.ts: add timeout (15s) and maxBuffer to spawnSync;
  check result.error before result.status to handle spawn/timeout failures

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 18:19:21 -04:00
Tom Boucher
e8ec42082d feat(health): detect MILESTONES.md drift from archived snapshots (#2446) (#2486)
Adds W018 warning when .planning/milestones/vX.Y-ROADMAP.md snapshots
exist without a corresponding entry in MILESTONES.md. Introduces
--backfill flag to synthesize missing entries from snapshot titles.

Closes #2446
2026-04-20 18:19:14 -04:00
Rezolv
86fb9c85c3 docs(sdk): registry docs and gsd-sdk query call sites (#2302 Track B) (#2340)
* feat(sdk): golden parity harness and query handler CJS alignment (#2302 Track A)

Golden/read-only parity tests and registry alignment, query handler fixes
(check-completion, state-mutation, commit, validate, summary, etc.), and
WAITING.json dual-write for .gsd/.planning readers.

Refs gsd-build/get-shit-done#2341

* fix(sdk): getMilestoneInfo matches GSD ROADMAP (🟡, last bold, STATE fallback)

- Recognize in-flight 🟡 milestone bullets like 🚧.
- Derive from last **vX.Y Title** before ## Phases when emoji absent.
- Fall back to STATE.md milestone when ROADMAP is missing; use last bare vX.Y
  in cleaned text instead of first (avoids v1.0 from shipped list).
- Fixes init.execute-phase milestone_version and buildStateFrontmatter after
  state.begin-phase (syncStateFrontmatter).

* feat(sdk): phase list, plan task structure, requirements extract handlers

- Register phase.list-plans, phase.list-artifacts, plan.task-structure,
  requirements.extract-from-plans (SDK-only; golden-policy exceptions).
- Add unit tests; document in QUERY-HANDLERS.md.
- writeProfile: honor --output, render dimensions, return profile_path and dimensions_scored.

* feat(sdk): centralize getGsdAgentsDir in query helpers

Extract agent directory resolution to helpers (GSD_AGENTS_DIR, primary
~/.claude/agents, legacy path). Use from init and docs-init init bundles.

docs(15): add 15-CONTEXT for autonomous phase-15 run.

* feat(sdk): query CLI CJS fallback and session correlation

- createRegistry(eventStream, sessionId) threads correlation into mutation events
- gsd-sdk query falls back to gsd-tools.cjs when no native handler matches
  (disable with GSD_QUERY_FALLBACK=off); stderr bridge warnings
- Export createRegistry from @gsd-build/sdk; add sdk/README.md
- Update QUERY-HANDLERS.md and registry module docs for fallback + sessionId
- Agents: prefer node dist/cli.js query over cat/grep for STATE and plans

* fix(sdk): init phase_found parity, docs-init agents path, state field extract

- Normalize findPhase not-found to null before roadmap fallback (matches findPhaseInternal)

- docs-init: use detectRuntime + resolveAgentsDir for checkAgentsInstalled

- state.cjs stateExtractField: horizontal whitespace only after colon (YAML progress guard)

- Tests: commit_docs default true; config-get golden uses temp config; golden integration green

Refs: #2302

* refactor(sdk): share SessionJsonlRecord in profile-extract-messages

CodeRabbit nit: dedupe JSONL record shape for isGenuineUserMessage and streamExtractMessages.

* fix(sdk): address CodeRabbit major threads (paths, gates, audit, verify)

- Resolve @file: and CLI JSON indirection relative to projectDir; guard empty normalized query command

- plan.task-structure + intel extract/patch-meta: resolvePathUnderProject containment

- check.config-gates: safe string booleans; plan_checker alias precedence over plan_check default

- state.validate/sync: phaseTokenMatches + comparePhaseNum ordering

- verify.schema-drift: token match phase dirs; files_modified from parsed frontmatter

- audit-open: has_scan_errors, unreadable rows, human report when scans fail

- requirements PLANNED key PLAN for root PLAN.md; gsd-tools timeout note

- ingest-docs: repo-root path containment; classifier output slug-hash

Golden parity test strips has_scan_errors until CJS adds field.

* fix: Resolve CodeRabbit security and quality findings
- Secure intel.ts and cli.ts against path traversal
- Catch and validate git add status in commit.ts
- Expand roadmap milestone marker extraction
- Fix parsing array-of-objects in frontmatter YAML
- Fix unhandled config evaluations
- Improve coverage test parity mapping

* docs(sdk): registry docs and gsd-sdk query call sites (#2302 Track B)

Update CHANGELOG, architecture and user guides, workflow call sites, and read-guard tests for gsd-sdk query; sync ARCHITECTURE.md command/workflow counts and directory-tree totals with the repo (80 commands, 77 workflows).

Address CodeRabbit: fix markdown tables and emphasis; align CLI-TOOLS GSDTools and state.read docs with implementation; correct roadmap handler name in universal-anti-patterns; resolve settings workflow config path without relying on config_path from state.load.

Refs gsd-build/get-shit-done#2340

* test: raise planner character extraction limit to 48K

* fix(sdk): resolve build TS error and doc conflict markers
2026-04-20 18:09:21 -04:00
Rezolv
c5b1445529 feat(sdk): golden parity harness and query handler CJS alignment (#2302 Track A) (#2341)
* feat(sdk): golden parity harness and query handler CJS alignment (#2302 Track A)

Golden/read-only parity tests and registry alignment, query handler fixes
(check-completion, state-mutation, commit, validate, summary, etc.), and
WAITING.json dual-write for .gsd/.planning readers.

Refs gsd-build/get-shit-done#2341

* fix(sdk): getMilestoneInfo matches GSD ROADMAP (🟡, last bold, STATE fallback)

- Recognize in-flight 🟡 milestone bullets like 🚧.
- Derive from last **vX.Y Title** before ## Phases when emoji absent.
- Fall back to STATE.md milestone when ROADMAP is missing; use last bare vX.Y
  in cleaned text instead of first (avoids v1.0 from shipped list).
- Fixes init.execute-phase milestone_version and buildStateFrontmatter after
  state.begin-phase (syncStateFrontmatter).

* feat(sdk): phase list, plan task structure, requirements extract handlers

- Register phase.list-plans, phase.list-artifacts, plan.task-structure,
  requirements.extract-from-plans (SDK-only; golden-policy exceptions).
- Add unit tests; document in QUERY-HANDLERS.md.
- writeProfile: honor --output, render dimensions, return profile_path and dimensions_scored.

* feat(sdk): centralize getGsdAgentsDir in query helpers

Extract agent directory resolution to helpers (GSD_AGENTS_DIR, primary
~/.claude/agents, legacy path). Use from init and docs-init init bundles.

docs(15): add 15-CONTEXT for autonomous phase-15 run.

* feat(sdk): query CLI CJS fallback and session correlation

- createRegistry(eventStream, sessionId) threads correlation into mutation events
- gsd-sdk query falls back to gsd-tools.cjs when no native handler matches
  (disable with GSD_QUERY_FALLBACK=off); stderr bridge warnings
- Export createRegistry from @gsd-build/sdk; add sdk/README.md
- Update QUERY-HANDLERS.md and registry module docs for fallback + sessionId
- Agents: prefer node dist/cli.js query over cat/grep for STATE and plans

* fix(sdk): init phase_found parity, docs-init agents path, state field extract

- Normalize findPhase not-found to null before roadmap fallback (matches findPhaseInternal)

- docs-init: use detectRuntime + resolveAgentsDir for checkAgentsInstalled

- state.cjs stateExtractField: horizontal whitespace only after colon (YAML progress guard)

- Tests: commit_docs default true; config-get golden uses temp config; golden integration green

Refs: #2302

* refactor(sdk): share SessionJsonlRecord in profile-extract-messages

CodeRabbit nit: dedupe JSONL record shape for isGenuineUserMessage and streamExtractMessages.

* fix(sdk): address CodeRabbit major threads (paths, gates, audit, verify)

- Resolve @file: and CLI JSON indirection relative to projectDir; guard empty normalized query command

- plan.task-structure + intel extract/patch-meta: resolvePathUnderProject containment

- check.config-gates: safe string booleans; plan_checker alias precedence over plan_check default

- state.validate/sync: phaseTokenMatches + comparePhaseNum ordering

- verify.schema-drift: token match phase dirs; files_modified from parsed frontmatter

- audit-open: has_scan_errors, unreadable rows, human report when scans fail

- requirements PLANNED key PLAN for root PLAN.md; gsd-tools timeout note

- ingest-docs: repo-root path containment; classifier output slug-hash

Golden parity test strips has_scan_errors until CJS adds field.

* fix: Resolve CodeRabbit security and quality findings
- Secure intel.ts and cli.ts against path traversal
- Catch and validate git add status in commit.ts
- Expand roadmap milestone marker extraction
- Fix parsing array-of-objects in frontmatter YAML
- Fix unhandled config evaluations
- Improve coverage test parity mapping

* test: raise planner character extraction limit to 48K

* fix(sdk): resolve TS build error in docs-init passing config
2026-04-20 18:09:02 -04:00
TÂCHES
c8807e38d7 Merge pull request #2481 from gsd-build/hotfix/1.37.1
chore: merge hotfix v1.37.1 back to main
2026-04-20 14:23:58 -06:00
Lex Christopherson
2b4446e2f9 chore: resolve merge conflict — take main's INVENTORY.md references
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 14:23:49 -06:00
Lex Christopherson
ef4ce7d6f9 1.37.1 2026-04-20 14:16:09 -06:00
Tom Boucher
12d38b2da0 fix(ci): update ARCHITECTURE.md counts and add TEXT_MODE fallback to sketch workflow (#2377)
* fix(tests): clear CLAUDECODE env var in read-guard test runner

The hook skips its advisory on two env vars: CLAUDE_SESSION_ID and
CLAUDECODE. runHook() cleared CLAUDE_SESSION_ID but inherited CLAUDECODE
from process.env, so tests run inside a Claude Code session silently
no-oped and produced no stdout, causing JSON.parse to throw.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): update ARCHITECTURE.md counts and add TEXT_MODE fallback to sketch workflow

Four new spike/sketch files were added in 1.37.0 but two housekeeping
items were missed: ARCHITECTURE.md component counts (75→79 commands,
72→76 workflows) and the required TEXT_MODE fallback in sketch.md for
non-Claude runtimes (#2012).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): update directory-tree slash command count in ARCHITECTURE.md

Missed the second count in the directory tree (# 75 slash commands → 79).
The prose "Total commands" was updated but the tree annotation was not,
causing command-count-sync.test.cjs to fail.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 14:12:21 -06:00
Lex Christopherson
e7a6d9ef2e fix: sync spike/sketch workflows with upstream skill improvements
Spike workflow:
- Add prior spike check — skips already-validated questions
- Add comparison spikes (NNN-a/NNN-b) for head-to-head evaluation
- Add research-before-building step (context7 + web search)
- Add forensic logging/observability for runtime-interactive spikes
- Add Type column to MANIFEST, type/Research/Observability to README

Sketch workflow:
- Add research-the-target-stack step — check component availability,
  framework constraints, and idiomatic patterns before building

Spike wrap-up workflow:
- Replace per-spike curation with auto-include-all (every spike carries
  signal: VALIDATED=patterns, PARTIAL=constraints, INVALIDATED=landmines)
- Add Step 10 intelligent routing — integration spike candidates,
  frontier spike candidates, and standard next-step options

Commands updated with context7/WebSearch tools and --text flag.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 14:05:29 -06:00
github-actions[bot]
beb3ac247b chore: bump version to 1.37.1 for hotfix 2026-04-20 20:05:07 +00:00
Lex Christopherson
a95cabaedb fix: sync spike/sketch workflows with upstream skill improvements
Spike workflow:
- Add prior spike check — skips already-validated questions
- Add comparison spikes (NNN-a/NNN-b) for head-to-head evaluation
- Add research-before-building step (context7 + web search)
- Add forensic logging/observability for runtime-interactive spikes
- Add Type column to MANIFEST, type/Research/Observability to README

Sketch workflow:
- Add research-the-target-stack step — check component availability,
  framework constraints, and idiomatic patterns before building

Spike wrap-up workflow:
- Replace per-spike curation with auto-include-all (every spike carries
  signal: VALIDATED=patterns, PARTIAL=constraints, INVALIDATED=landmines)
- Add Step 10 intelligent routing — integration spike candidates,
  frontier spike candidates, and standard next-step options

Commands updated with context7/WebSearch tools and --text flag.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 14:04:31 -06:00
Tom Boucher
9d55d531a4 fix(#2432,#2424): pre-dispatch PLAN.md commit + reapply-patches baseline detection; docs(#2397): config schema drift (#2469)
- quick.md Step 5.6: commit PLAN.md to base branch before worktree executor
  spawn when USE_WORKTREES is active, preventing CC #36182 path-resolution
  drift that caused silent writes to main repo instead of worktree
- reapply-patches.md Option A: replace first-add commit heuristic with
  pristine_hashes SHA-256 matching from backup-meta.json so baseline detection
  works correctly on multi-cycle repos; first-add fallback kept for older
  installers without pristine_hashes
- CONFIGURATION.md: move security_enforcement/security_asvs_level/security_block_on
  to workflow.* (matches templates/config.json and workflow readers); rename
  context_profile → context (matches VALID_CONFIG_KEYS in config.cjs); add
  planning.sub_repos to schema example
- universal-anti-patterns.md + context-budget.md: fix context_window_tokens →
  context_window (the actual key name in config.cjs)

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 10:11:00 -04:00
Tom Boucher
5f419c0238 fix(bugs): resolve issues #2388, #2431, #2396, #2376 (#2467)
#2388 (plan-phase silently renames feature branch): add explicit Git
Branch Invariant section to plan-phase.md prohibiting branch
creation/rename/switch during planning; phase slug changes are
plan-level only and must not affect the git branch.

#2431 (worktree teardown silently swallows errors): replace
`git worktree remove --force 2>/dev/null || true` with a lock-aware
block in quick.md and execute-phase.md that detects locked worktrees,
attempts unlock+retry, and surfaces a user-visible recovery message
when removal still fails.

#2396 (hardcoded test commands bypass Makefile): add a three-tier
test command resolver (project config → Makefile/Justfile → language
sniff) in execute-phase.md, verify-phase.md, and audit-fix.md.
Makefile with a `test:` target now takes priority over npm/cargo/go.

#2376 (OpenCode @$HOME not mapped on Windows): add platform guard in
bin/install.js so OpenCode on win32 uses the absolute path instead of
`$HOME/...`, which OpenCode does not expand in @file references on
Windows.

Tests: 29 new assertions across 4 regression test files (all passing).

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 10:10:16 -04:00
Tom Boucher
dfa1ecce99 fix(#2418,#2399,#2419,#2421): four workflow and installer bug fixes (#2462)
- #2418: convertClaudeToAntigravityContent now replaces bare ~/.claude and
  $HOME/.claude (no trailing slash) for both global and local installs,
  eliminating the "unreplaced .claude path reference" warnings in
  gsd-debugger.md and update.md during Antigravity installs.

- #2399: plan-phase workflow gains step 13c that commits PLAN.md files
  and STATE.md via gsd-sdk query commit when commit_docs is true.
  Previously commit_docs:true was read but never acted on in plan-phase.

- #2419: new-project.md and new-milestone.md now parse agents_installed
  and missing_agents from the init JSON and warn users clearly when GSD
  agents are not installed, rather than silently failing with "agent type
  not found" when trying to spawn gsd-project-researcher subagents.

- #2421: gsd-planner.md gains a "Grep gate hygiene" rule immediately after
  the Nyquist Rule explaining the self-invalidating grep gate anti-pattern
  and providing comment-stripping alternatives (grep -v, ast-grep).

Tests: 4 new test files (30 tests) all passing.

Closes #2418
Closes #2399
Closes #2419
Closes #2421

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 10:09:33 -04:00
Tom Boucher
4cd890b252 fix(phase): guard backlog dirs and YYYY-MM dates in integer phase removal (#2466)
* fix(phase): guard backlog dirs and YYYY-MM dates in integer phase removal

Closes #2435
Closes #2434

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(phase): extend date-collision guard to hyphen-adjacent context

The lookbehind `(?<!\d)` in renameIntegerPhases only excluded
digit-prefixed matches; a YYYY-MM-DD date like 2026-05-14 has a hyphen
before the month digits, which passed the original guard and caused
date corruption when renumbering a phase whose zero-padded number
matched the month. Replace with `(?<![0-9-])` lookbehind and
`(?![0-9-])` lookahead to exclude both digit- and hyphen-adjacent
contexts. Adds a regression test for the hyphen-adjacent case.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 10:08:52 -04:00
Tom Boucher
d117c1045a test: add --no-sdk to copilot-install E2E runners + static guard (#2461) (#2463)
Four execFileSync installer calls in copilot-install.test.cjs deleted
GSD_TEST_MODE but omitted --no-sdk, triggering the fatal installSdkIfNeeded()
path in test.yml CI where npm global bin is not on PATH.

Partial fix in e213ce0 patched three hook-deployment tests but missed
runCopilotInstall, runCopilotUninstall, runClaudeInstall, runClaudeUninstall.

Also adds tests/sdk-no-sdk-guard.test.cjs: a static analysis guard that
scans test files for subprocess installer calls missing --no-sdk, so this
class of regression is caught automatically in future.

Closes #2461

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 10:08:49 -04:00
Tom Boucher
0ea443cbcf fix(install): chmod sdk dist/cli.js executable; fix context monitor over-reporting (#2460)
Bug #2453: After tsc builds sdk/dist/cli.js, npm install -g from a local
directory does not chmod the bin-script target (unlike tarball extraction).
The file lands at mode 644, the gsd-sdk symlink points at a non-executable
file, and command -v gsd-sdk fails on every first install. Fix: explicitly
chmodSync(cliPath, 0o755) immediately after npm install -g completes,
mirroring the pattern used for hook files throughout the installer.

Bug #2451: gsd-context-monitor warning messages over-reported usage by ~13
percentage points vs CC native /context. Root cause: gsd-statusline.js
wrote a buffer-normalized used_pct (accounting for the 16.5% autocompact
reserve) to the bridge file, inflating values. The bridge used_pct is now
raw (Math.round(100 - remaining_percentage)), consistent with what CC's
native /context command reports. The statusline progress bar continues to
display the normalized value; only the bridge value changes. Updated the
existing #2219 tests to check the normalized display via hook stdout rather
than bridge.used_pct, and added a new assertion that bridge.used_pct is raw.

Closes #2453
Closes #2451

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 10:08:46 -04:00
Tom Boucher
53b9fba324 fix: stale phase dirs corrupt phase counts; stopped_at overwritten by historical prose (#2459)
* fix(sdk): extractCurrentMilestone Backlog leak + state.begin-phase flag parsing

Closes #2422
Closes #2420

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(#2444,#2445): scope stopped_at extraction to Session section; filter stale phase dirs

- buildStateFrontmatter now extracts stopped_at only from the ## Session
  section when one exists, preventing historical prose elsewhere in the
  body (e.g. "Stopped at: Phase 5 complete" in old notes) from overwriting
  the current value in frontmatter (bug #2444)
- buildStateFrontmatter de-duplicates phase dirs by normalized phase number
  before computing plan/phase counts, so stale phase dirs from a prior
  milestone with the same phase numbers as the new milestone don't inflate
  totals (bug #2445)
- cmdInitNewMilestone now filters phase dirs through getMilestonePhaseFilter
  so phase_dir_count excludes stale prior-milestone dirs (bug #2445)
- Tests: 4 tests in state.test.cjs covering both bugs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 10:08:43 -04:00
Tom Boucher
5afcd5577e fix: zero-padded phase numbers bypass archived-phase guard; stale current_milestone (#2458)
* fix(sdk): extractCurrentMilestone Backlog leak + state.begin-phase flag parsing

Closes #2422
Closes #2420

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(sdk): skip stateVersion early-return for shipped milestones

When STATE.md has a stale `milestone: v1.0` entry but v1.0 is already
shipped (heading contains  in ROADMAP.md), the stateVersion early-return
path in getMilestoneInfo was returning v1.0 instead of detecting the new
active milestone.

Two-part fix:
1. In the stateVersion block: skip the early-return when the matched
   heading line includes  (shipped marker). Fall through to normal
   detection instead.
2. In the heading-format fallback regex: add a negative lookahead
   `(?!.*)` so the regex never matches a  heading regardless of
   whether stateVersion was present. This handles the no-STATE.md case
   and ensures fallthrough from part 1 actually finds the next milestone.

Adds two regression tests covering both -suffix (`## v1.0  Name`)
and -prefix (`##  v1.0 Name`) heading formats.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(core): allow padded-and-unpadded phase headings in getRoadmapPhaseInternal

The zero-strip normalization (01→1) fixed the archived-phase guard but
broke lookup against ROADMAP headings that still use zero-padded numbers
like "Phase 01:". Change the regex to use 0*<normalized> so both formats
match, making the fix robust regardless of ROADMAP heading style.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 10:08:40 -04:00
Tom Boucher
9f79cdc40a fix(security): neutralize spaced+closing injection markers; fix audit-uat resolved status (#2456)
* fix(security): neutralize spaced+closing injection markers; fix audit-uat resolved status

scanForInjection recognizes — adds <user> tags, whitespace-padded tags
(e.g. <user >), closing [/SYSTEM]/[/INST] markers, and closing <</SYS>>
markers. Five new regression tests confirm each gap is closed.

whose result column reads PASS or resolved, so items that were already
confirmed do not appear as outstanding in audit-uat --raw. Two new
regression tests cover item-level PASS and file-level status: passed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test: add closing-tag assertion for spaced <user > sanitization

The test for 'neutralizes spaced tags like <user >' only asserted that the
opening token '<user' was removed. A spaced closing tag '</user >' could
survive sanitization undetected. Added assert.ok(!result.includes('</user'))
to the same test block so both sides of the tag are verified.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 10:08:18 -04:00
Tom Boucher
59cfbbba6a fix(sdk): extractCurrentMilestone Backlog leak + state.begin-phase flag parsing (#2455)
* fix(sdk): extractCurrentMilestone Backlog leak + state.begin-phase flag parsing

Closes #2422
Closes #2420

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: patch-version semver in milestone boundary regex + flag-parser validation

Two follow-on correctness issues identified in code review:

1. roadmap.ts: currentVersionMatch and nextMilestoneRegex only captured
   major.minor (v(\d+\.\d+)), collapsing v2.0.1 to "2.0". A sub-heading
   "## v2.0.2 Phase Details" would match the same prefix and be incorrectly
   skipped. Both patterns updated to v(\d+(?:\.\d+)+) to capture full semver.

2. state-mutation.ts: pair-wise flag parsing loop advanced i by 2 unconditionally,
   so a missing flag value caused the next flag token to be assigned as the value
   (e.g. flags['phase'] = '--name'). Fix: iterate with i++ and validate that the
   candidate value exists and does not start with '--' before assigning; throw
   GSDError('missing value for --<key>') on invalid input. Added regression test.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 10:08:14 -04:00
Tom Boucher
990c3e648d fix(tests): update 5 source-text tests to read config-schema.cjs (#2480)
VALID_CONFIG_KEYS moved from config.cjs to config-schema.cjs in the
drift-prevention companion PR. Tests that read config.cjs source text
and checked for key literal includes() now point to the correct file.

Closes #2480

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 09:54:35 -04:00
Tom Boucher
62eaa8dd7b docs: close doc drift vectors — bidirectional parity, manifest, schema-driven config (#2479)
Option A — ghost-entry guard (INVENTORY ⊆ actual):
  tests/inventory-source-parity.test.cjs parses every declared row in
  INVENTORY.md and asserts the source file exists. Catches deletions and
  renames that leave ghost entries behind.

Option B — auto-generated structural manifest:
  scripts/gen-inventory-manifest.cjs walks all six family dirs and emits
  docs/INVENTORY-MANIFEST.json. tests/inventory-manifest-sync.test.cjs
  fails CI when a new surface ships without a manifest update, surfacing
  exactly which entries are missing.

Option C — schema-driven config validation + docs parity:
  get-shit-done/bin/lib/config-schema.cjs extracted from config.cjs as
  the single source of truth for VALID_CONFIG_KEYS and dynamic patterns.
  config.cjs now imports from it. tests/config-schema-docs-parity.test.cjs
  asserts every exact-match key appears in docs/CONFIGURATION.md, surfacing
  14 previously undocumented keys (planning.sub_repos, workflow.ai_integration_phase,
  git.base_branch, learnings.max_inject, and 10 others) — all now documented
  in their appropriate sections.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 09:39:05 -04:00
Logan
fbf30792f3 docs: authoritative shipped-surface inventory with filesystem-backed parity tests (#2390)
* docs: finish trust-bug fixes in user guide and commands

Correct load-bearing defects in the v1.36.0 docs corpus so readers stop
acting on wrong defaults and stale exhaustiveness claims.

- README.md: drop "Complete feature"/"Every command"/"All 18 agents"
  exhaustiveness claims; replace version-pinned "What's new in v1.32"
  bullet with a CHANGELOG pointer.
- CONFIGURATION.md: fix `claude_md_path` default (null/none -> `./CLAUDE.md`)
  in both Full Schema and core settings table; correct `workflow.tdd_mode`
  provenance from "Added in v1.37" to "Added in v1.36".
- USER-GUIDE.md: fix `workflow.discuss_mode` default (`standard` ->
  `discuss`) in the workflow-toggles table AND in the abbreviated Full
  Schema JSON block above it; align the Options cell with the shipped
  enum.
- COMMANDS.md: drop "Complete command syntax" subtitle overclaim to
  match the README posture.
- AGENTS.md: weaken "All 21 specialized agents" header to reflect that
  the `agents/` filesystem is authoritative (shipped roster is 31).

Part 1 of a stacked docs refresh series (PR 1/4).

* docs: refresh shipped surface coverage for v1.36

Close the v1.36.0 shipped-surface gaps in the docs corpus.

- COMMANDS.md: add /gsd-graphify section (build/query/status/diff) and
  its config gate; expand /gsd-quick with --validate flag and list/
  status/resume subcommands; expand /gsd-thread with list --open, list
  --resolved, close <slug>, status <slug>.
- CLI-TOOLS.md: replace the hardcoded "15 domain modules" count with a
  pointer to the Module Architecture table; add a graphify verb-family
  section (build/query/status/diff/snapshot); add Graphify and Learnings
  rows to the Module Architecture table.
- FEATURES.md: add TOC entries for #116 TDD Pipeline Mode and #117
  Knowledge Graph Integration; add the #117 body with REQ-GRAPH-01..05.
- CONFIGURATION.md: move security_enforcement / security_asvs_level /
  security_block_on from root into `workflow.*` in Full Schema to match
  templates/config.json and the gsd-sdk runtime reads; update Security
  Settings table to use the workflow.* prefix; add planning.sub_repos
  to Full Schema and description table; add a Graphify Settings section
  documenting graphify.enabled and graphify.build_timeout.

Note: VALID_CONFIG_KEYS in bin/lib/config.cjs does not yet include
workflow.security_* or planning.sub_repos, so config-set currently
rejects them. That is a pre-existing validator gap that this PR does
not attempt to fix; the docs now correctly describe where these keys
live per the shipped template and runtime reads.

Part 2 of a stacked docs refresh series (PR 2/5), based on PR 1.

* docs: make inventory authoritative and reconcile architecture

Upgrade docs/INVENTORY.md from "complete for agents, selective for others"
to authoritative across all six shipped-surface families, and reconcile
docs/ARCHITECTURE.md against the new inventory so the PR that introduces
INVENTORY does not also introduce an INVENTORY/ARCHITECTURE contradiction.

- docs/AGENTS.md: weaken "21 specialized agents" header to 21 primary +
  10 advanced (31 shipped); add new "Advanced and Specialized Agents"
  section with concise role cards for the 10 previously-omitted shipped
  agents (pattern-mapper, debug-session-manager, code-reviewer,
  code-fixer, ai-researcher, domain-researcher, eval-planner,
  eval-auditor, framework-selector, intel-updater); footnote the Agent
  Tool Permissions Summary as primary-agents-only so it no longer
  misleads.

- docs/INVENTORY.md (rewritten to be authoritative):
  * Full 31-agent roster with one-line role + spawner + primary-doc
    status per agent (unchanged from prior partial work).
  * Commands: full 75-row enumeration grouped by Core Workflow, Phase &
    Milestone Management, Session & Navigation, Codebase Intelligence,
    Review/Debug/Recovery, and Docs/Profile/Utilities — each row
    carries a one-line role derived from the command's frontmatter and
    a link to the source file.
  * Workflows: full 72-row enumeration covering every
    get-shit-done/workflows/*.md, with a one-line role per workflow and
    a column naming the user-facing command (or internal orchestrator)
    that invokes it.
  * References: full 41-row enumeration grouped by Core, Workflow,
    Thinking-Model clusters, and the Modular Planner decomposition,
    matching the groupings docs/ARCHITECTURE.md already uses; notes
    the few-shot-examples subdirectory separately.
  * CLI Modules and Hooks: unchanged — already full rosters.
  * Maintenance section rewritten to describe the drift-guard test
    suite that will land in PR4 (inventory-counts, commands-doc-parity,
    agents-doc-parity, cli-modules-doc-parity, hooks-doc-parity).

- docs/ARCHITECTURE.md reconciled against INVENTORY:
  * References block: drop the stale "(35 total)" count; point at
    INVENTORY.md#references-41-shipped for the authoritative count.
  * CLI Tools block: drop the stale "19 domain modules" count; point
    at INVENTORY.md#cli-modules-24-shipped for the authoritative roster.
  * Agent Spawn Categories: relabel as "Primary Agent Spawn Categories"
    and add a footer naming the 10 advanced agents and pointing at
    INVENTORY.md#agents-31-shipped for the full 31-agent roster.

- docs/CONFIGURATION.md: preserve the six model-profile rows added in
  the prior partial work, and tighten the fallback note so it names the
  13 shipped agents without an explicit profile row, documents
  model_overrides as the escape hatch, and points at INVENTORY.md for
  the authoritative 31-agent roster.

Part 3 of a stacked docs refresh series (PR 3/4). Remaining consistency
work (USER-GUIDE config-section delete-and-link, FEATURES.md TOC
reorder, ARCHITECTURE.md Hook-table expansion + installation-layout
collapse, CLI-TOOLS.md module-row additions, workflow-discuss-mode
invocation normalization, and the five doc-parity tests) lands in PR4.

* test(docs): add consistency guards and remove duplicate refs

Consolidates USER-GUIDE.md's command/config duplicates into pointers to
COMMANDS.md and CONFIGURATION.md (kills a ghost `resolve_model_ids` key
and a stale `discuss_mode: standard` default); reorders FEATURES.md TOC
chronologically so v1.32 precedes v1.34/1.35/1.36; expands
ARCHITECTURE.md's Hook table to the 11 shipped hooks
(gsd-read-injection-scanner, gsd-check-update-worker) and collapses
the installation-layout hook enumeration to the *.js/*.sh pattern form;
adds audit/gsd2-import/intel rows and state signal-*, audit-open,
from-gsd2 verbs to CLI-TOOLS.md; normalizes workflow-discuss-mode.md
invocations to `node gsd-tools.cjs config-set`.

Adds five drift guards anchored on docs/INVENTORY.md as the
authoritative roster: inventory-counts (all six families),
commands/agents/cli-modules/hooks parity checks that every shipped
surface has a row somewhere.

* fix(convergence): thread --ws to review agent; add stall and max-cycles behavioral tests

- Thread GSD_WS through to review agent spawn in plan-review-convergence
  workflow (step 5a) so --ws scoping is symmetric with planning step
- Add behavioral stall detection test: asserts workflow compares
  HIGH_COUNT >= prev_high_count and emits a stall warning
- Add behavioral --max-cycles 1 test: asserts workflow reaches escalation
  gate when cycle >= MAX_CYCLES with HIGH > 0 after a single cycle
- Include original PR files (commands, workflow, tests) as the branch
  predated the PR commits

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(docs,config): PR #2390 review — security_* config keys and REQ-GRAPH-02 scope

Addresses trek-e's review items that don't require rebase:

- config.cjs: add workflow.security_enforcement, workflow.security_asvs_level,
  workflow.security_block_on to VALID_CONFIG_KEYS so gsd-sdk config-set accepts
  them (closed the gap where docs/CONFIGURATION.md listed keys the validator
  rejected).
- core.cjs: add matching CONFIG_DEFAULTS entries (true / 1 / 'high') so the
  canonical defaults table matches the documented values.
- config.cjs: wire the three keys into the new-project workflow defaults so
  fresh configs inherit them.
- planning-config.md: document the three keys in the Workflow Fields table,
  keeping the CONFIG_DEFAULTS ↔ doc parity test happy.
- config-field-docs.test.cjs: extend NAMESPACE_MAP so the flat keys in
  CONFIG_DEFAULTS resolve to their workflow.* doc rows.
- FEATURES.md REQ-GRAPH-02: split the slash-command surface (build|query|
  status|diff) from the CLI surface which additionally exposes `snapshot`
  (invoked automatically at the tail of `graphify build`). The prior text
  overstated the slash-command surface.

* docs(inventory): refresh rosters and counts for post-rebase drift

origin/main accumulated surfaces since this PR was authored:

- Agents: 31 → 33 (+ gsd-doc-classifier, gsd-doc-synthesizer)
- Commands: 76 → 82 (+ ingest-docs, ultraplan-phase, spike, spike-wrap-up,
  sketch, sketch-wrap-up)
- Workflows: 73 → 79 (same 6 names)
- References: 41 → 49 (+ debugger-philosophy, doc-conflict-engine,
  mandatory-initial-read, project-skills-discovery, sketch-interactivity,
  sketch-theme-system, sketch-tooling, sketch-variant-patterns)

Adds rows in the existing sub-groupings, introduces a Sketch References
subsection, and bumps all four headline counts. Roles are pulled from
source frontmatter / purpose blocks for each file. All 5 parity tests
(inventory-counts, agents-doc-parity, commands-doc-parity,
cli-modules-doc-parity, hooks-doc-parity) pass against this state —
156 assertions, 0 failures.

Also updates the 'Coverage note' advanced-agent count 10 → 12 and the
few-shot-examples footnote "41 top-level references" → "49" to keep the
file internally consistent.

* docs(agents): add advanced stubs for gsd-doc-classifier and gsd-doc-synthesizer

Both agents ship on main (spawned by /gsd-ingest-docs) but had no
coverage in docs/AGENTS.md. Adds the "advanced stub" entries (Role,
property table, Key behaviors) following the template used by the other
10 advanced/specialized agents in the same section.

Also updates the Agent Tool Permissions Summary scope note from
"10 advanced/specialized agents" to 12 to reflect the two new stubs.

* docs(commands): add entries for ingest-docs, ultraplan-phase, plan-review-convergence

These three commands ship on main (plan-review-convergence via trek-e's
4b452d29 commit on this branch) but had no user-facing section in
docs/COMMANDS.md — they lived only in INVENTORY.md. The commands-doc-parity
test already passes via INVENTORY, but the user-facing doc was missing
canonical explanations, argument tables, and examples.

- /gsd-plan-review-convergence → Core Workflow (after /gsd-plan-phase)
- /gsd-ultraplan-phase → Core Workflow (after plan-review-convergence)
- /gsd-ingest-docs → Brownfield (after /gsd-import, since both consume
  the references/doc-conflict-engine.md contract)

Content pulled from each command's frontmatter and workflow purpose block.

* test: remove redundant ARCHITECTURE.md count tests

tests/architecture-counts.test.cjs and tests/command-count-sync.test.cjs
were added when docs/ARCHITECTURE.md carried hardcoded counts for commands/
workflows/agents. With the PR #2390 cleanup, ARCHITECTURE.md no longer
owns those numbers — docs/INVENTORY.md does, enforced by
tests/inventory-counts.test.cjs (scans the same filesystem directories
with the same readdirSync filter).

Keeping these ARCHITECTURE-specific tests would re-introduce the hardcoded
counts they guard, defeating trek-e's review point. The single-source-of-
truth parity tests already catch the same drift scenarios.

Related: #2257 (the regression this replaced).

---------

Co-authored-by: Tom Boucher <trekkie@nomorestars.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 09:31:34 -04:00
alanshurafa
3d6c2bea4b docs: clarify capture_thought is an optional convention (#1873) (#2379)
* docs: clarify capture_thought is an optional convention (#1873)

Issue #1873 merged /gsd:extract-learnings with an optional
capture_thought hook, but the docs never explained what the tool is
or where it comes from — readers couldn't tell whether it was a
bundled GSD tool, a required dependency, or something they had to
install. This surfaced in a user question on that issue's thread.

Clarify in docs/FEATURES.md §112 and the workflow file that
capture_thought is a convention — any MCP server exposing a tool
with that name will be used; if none is present, LEARNINGS.md
remains the primary output and the step is a silent no-op.

No behavioral change. All 23 extract-learnings tests still pass.

* fix(security): add human to detection message; test [/INST] closing form neutralization

- Detection message now lists <human> alongside <system>/<assistant>/<user>
- Sanitizer regex extended to cover [/INST] closing form (was only [INST])
- Detection pattern extended to cover [/INST] closing form
- New sanitizeForPrompt test asserts [/INST] is neutralized

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(config): add workflow.security_* keys to VALID_CONFIG_KEYS

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: add language tag to fenced code block in FEATURES.md

Fixes MD040 lint finding in PR #2379 — the capture_thought tool
signature example was missing a javascript language identifier.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Tom Boucher <trekkie@nomorestars.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 09:04:21 -04:00
Jeremy McSpadden
30433368a0 fix(install): template bare .claude hook paths for non-Claude runtimes 2026-04-19 18:42:30 -05:00
Jeremy McSpadden
04fab926b5 test: add --no-sdk to hook-deployment installer tests
Tests #1834, #1924, #2136 exercise hook/artifact deployment and don't
care about SDK install. Now that installSdkIfNeeded() failures are
fatal, these tests fail on any CI runner without gsd-sdk pre-built
because the sdk/ tsc build path runs and can fail in CI env.

Pass --no-sdk so each test focuses on its actual subject. SDK install
path has dedicated end-to-end coverage in install-smoke.yml.
2026-04-19 18:39:32 -05:00
Jeremy McSpadden
f98ef1e460 fix(install): fatal SDK install failures + CI smoke gate (#2439)
## Why
#2386 added `installSdkIfNeeded()` to build @gsd-build/sdk from bundled
source and `npm install -g .`, because the npm-published @gsd-build/sdk
is intentionally frozen and version-mismatched with get-shit-done-cc.

But every failure path in that function was warning-only — including
the final `which gsd-sdk` verification. When npm's global bin is off a
user's PATH (common on macOS), the installer printed a yellow warning
then exited 0. Users saw "install complete" and then every `/gsd-*`
command crashed with `command not found: gsd-sdk` (the #2439 symptom).

No CI job executed the install path, so this class of regression could
ship undetected — existing "install" tests only read bin/install.js as
a string.

## What changed

**bin/install.js — installSdkIfNeeded() is now transactional**
- All build/install failures exit non-zero (not just warn).
- Post-install `which gsd-sdk` check is fatal: if the binary landed
  globally but is off PATH, we exit 1 with a red banner showing the
  resolved npm bin dir, the user's shell, the target rc file, and the
  exact `export PATH=…` line to add.
- Escape hatch: `GSD_ALLOW_OFF_PATH=1` downgrades off-PATH to exit 2
  for users with intentionally restricted PATH who will wire up the
  binary manually.
- Resolver uses POSIX `command -v` via `sh -c` (replaces `which`) so
  behavior is consistent across sh/bash/zsh/fish.
- Factored `resolveGsdSdk()`, `detectShellRc()`, `emitSdkFatal()`.

**.github/workflows/install-smoke.yml (new)**
- Executes the real install path: `npm pack` → `npm install -g <tgz>`
  → run installer non-interactively → `command -v gsd-sdk` → run
  `gsd-sdk --version`.
- PRs: path-filtered to installer-adjacent files, ubuntu + Node 22 only.
- main/release branches: full matrix (ubuntu+macos × Node 22+24).
- Reusable via workflow_call with `ref` input for release gating.

**.github/workflows/release.yml — pre-publish gate**
- New `install-smoke-rc` and `install-smoke-finalize` jobs invoke the
  reusable workflow against the release branch. `rc` and `finalize`
  now `needs: [validate-version, install-smoke-*]`, so a broken SDK
  install blocks `npm publish`.

## Test plan
- Local full suite: 4154/4154 pass
- install-smoke.yml will self-validate on this PR (ubuntu+Node22 only)

Addresses root cause of #2439 (the per-command pre-flight in #2440 is
the complementary defensive layer).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 18:39:32 -05:00
Jeremy McSpadden
d0565e95c1 fix(set-profile): use hyphenated /gsd-set-profile in pre-flight message
Project convention (#1748) requires /gsd-<cmd> hyphen form everywhere
except designated test inputs. Fix the colon references in the
pre-flight error and its regression test to satisfy stale-colon-refs.
2026-04-19 18:39:32 -05:00
Jeremy McSpadden
4ef6275e86 fix(set-profile): guard gsd-sdk invocation with command -v pre-flight (#2439)
/gsd:set-profile crashed with `command not found: gsd-sdk` when gsd-sdk
was not on PATH. The command invoked `gsd-sdk query` directly in a `!`
backtick with no guard, so a missing binary produced an opaque shell
error with exit 127.

Add a `command -v gsd-sdk` pre-flight that prints the install/update
hint and exits 1 when absent, mirroring the #2334 fix on /gsd-quick.
The auto-install in #2386 still runs at install time; this guard is the
defensive layer for users whose npm global bin is off-PATH (install.js
warns but does not fail in that case).

Closes #2439
2026-04-19 18:39:32 -05:00
Jeremy McSpadden
6c50490766 fix(sdk): register init.ingest-docs handler and add registry drift guard (#2442)
The ingest-docs workflow called `gsd-sdk query init.ingest-docs` with a
fallback to `init.default` — neither was registered in createRegistry(),
so the workflow proceeded with `{}` and tried to parse project_exists,
planning_exists, has_git, and project_path from empty.

- Add initIngestDocs handler; register dotted + space aliases
- Simplify workflow call; drop broken fallback
- Repo-wide drift guard scans commands/, agents/, get-shit-done/,
  hooks/, bin/, scripts/, docs/ for `gsd-sdk query <cmd>` and fails
  on any reference with no registered handler (file:line citations)
- Unit tests for the new handler

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 18:39:20 -05:00
Jeremy McSpadden
4cbebfe78c docs(readme): add /gsd-ingest-docs to Brownfield commands
Surfaces the new ingest-docs command from the Unreleased changelog in
the README Commands section so users discover it without digging.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 18:39:20 -05:00
Jeremy McSpadden
9e87d43831 fix(build): include gsd-read-injection-scanner in hooks/dist (#2406)
The scanner was added in #2201 but never added to the HOOKS_TO_COPY
allowlist in scripts/build-hooks.js, so it never landed in hooks/dist/.
install.js reads from hooks/dist/, so every install on 1.37.0/1.37.1
emitted "Skipped read injection scanner hook — not found at target"
and the read-time prompt-injection scanner was silently disabled.

- Add gsd-read-injection-scanner.js to HOOKS_TO_COPY
- Add it to EXPECTED_ALL_HOOKS regression test in install-hooks-copy

Fixes #2406

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 18:39:20 -05:00
github-actions[bot]
29ea90bc83 chore: bump version to 1.38.1 for hotfix 2026-04-19 23:37:15 +00:00
github-actions[bot]
0c6172bfad chore: finalize v1.38.0 2026-04-18 03:45:59 +00:00
Jeremy McSpadden
e3bd06c9fd fix(release): make merge-back PR step non-fatal
Repos that disable "Allow GitHub Actions to create and approve pull
requests" (org-level policy or repo-level setting) cause the "Create PR
to merge release back to main" step to fail with a GraphQL 403. That
failure cascades: Tag and push, npm publish, GitHub Release creation
are all skipped, and the entire release aborts.

The merge-back PR is a convenience — it's re-openable manually after
the release. Making it non-fatal with continue-on-error lets the rest
of the release complete. The step now emits ::warning:: annotations
pointing at the manual-recovery command when it fails.

Shell pipelines also fall through with `|| echo "::warning::..."` so
transient gh CLI failures don't mask the underlying policy issue.

Covers the failure mode seen on run 24596079637 where dry-run publish
validation passed but the release halted at the PR-creation step.
2026-04-17 22:45:22 -05:00
github-actions[bot]
c69ecd975a chore: bump to 1.38.0-rc.1 2026-04-18 03:05:35 +00:00
Jeremy McSpadden
06c4ded4ec docs(changelog): promote Unreleased to [1.38.0] + add ultraplan entry 2026-04-17 22:03:26 -05:00
github-actions[bot]
341bb941c6 chore: bump version to 1.38.0 for release 2026-04-18 03:02:41 +00:00
922 changed files with 85646 additions and 13968 deletions

44
.changeset/README.md Normal file
View File

@@ -0,0 +1,44 @@
# Changeset Fragments
This directory holds **per-PR CHANGELOG fragments**. Every PR with user-facing changes drops one (or more) `<random-name>.md` files here describing its CHANGELOG entry. Fragments are consolidated into the top-level `CHANGELOG.md` at release time.
## Why
Two PRs that both edit the `### Fixed` block of `CHANGELOG.md` always conflict on merge — git can't pick a serialization order without human input. Two PRs that each add a fresh `.changeset/<unique-name>.md` never conflict because they don't share lines.
See [#2975](https://github.com/gsd-build/get-shit-done/issues/2975) for the full rationale.
## Adding a fragment
```bash
node scripts/changeset/new.cjs \
--type Fixed \
--pr 1234 \
--body "fix the thing — explain the user-visible change in one sentence"
```
This writes `.changeset/<adjective>-<noun>-<noun>.md` with frontmatter and a body. Three random words → concurrent PRs don't collide.
## Format
```md
---
type: Fixed
pr: 1234
---
**`/gsd-foo` no longer drops trailing slashes** — explain the user-visible change.
```
Allowed `type:` values follow [Keep a Changelog](https://keepachangelog.com/): `Added`, `Changed`, `Deprecated`, `Removed`, `Fixed`, `Security`.
## Opting out
PRs that legitimately have no user-facing impact can add the `no-changelog` label. CI honors it. When unsure, add the fragment.
## At release time
```bash
node scripts/changeset/cli.cjs render --version vX.Y.Z --date YYYY-MM-DD
```
Reads every fragment, groups bullets by `type:`, replaces `## [Unreleased]` with a new `## [vX.Y.Z] - YYYY-MM-DD` block, opens a fresh `## [Unreleased]` above, deletes consumed fragments. Idempotent.

View File

@@ -0,0 +1,5 @@
---
type: Changed
---
**Query command dispatch deepened with Command Topology Module** — query dispatch now consumes a single topology seam that resolves command tokens, binds native handler adapters, and returns structured no-match diagnosis, improving locality and reducing dispatch seam drift.

View File

@@ -0,0 +1,5 @@
---
type: Fixed
pr: 3058
---
**GSD transport raw-mode handling and timeout fallback hardened** — fixes undefined raw formatting edge case and adds raw-path coverage to prevent regressions.

View File

@@ -0,0 +1,8 @@
---
type: Changed
pr: 3069
---
**query command metadata now flows through a canonical Command Definition Module seam** — registry assembly, mutation semantics, and alias generation consume one Interface (`family`, `canonical`, `aliases`, `mutation`, `output_mode`, `handler_key`) to improve locality and reduce drift.
**query fallback error mapping cleanup** — the CJS fallback catch path now passes original `err` to `mapFallbackDispatchError` (follow-up to prior review feedback missed in PR #3066).

View File

@@ -0,0 +1,6 @@
---
type: Changed
pr: 3075
---
**query architecture deepening pass** — extracted Query Runtime Context, Native Dispatch Adapter, and Query CLI Output Modules so dispatch policy, runtime context policy, and CLI projection logic each live behind focused seams with higher locality and leverage.

View File

@@ -0,0 +1,5 @@
---
type: Fixed
pr: 2990
---
gsd-code-fixer worktree no longer fails on the same-branch checkout — the agent now creates a new gsd-reviewfix/ branch via git worktree add -b and fast-forwards the user's branch on cleanup. See #2990.

View File

@@ -0,0 +1,5 @@
---
type: Changed
pr: 2986
---
Test suite for config-schema.cjs is now mutation-resistant — 95 typed assertions kill the 124 surviving Stryker mutants from the 4.62% baseline. Tests target static-key fast path, dynamic-pattern .some semantics, polarity, and regex-anchor tightening. See #2986.

View File

@@ -0,0 +1,5 @@
---
type: Fixed
pr: 3008
---
**`tests/install-minimal.test.cjs:307` no longer races on shared `os.tmpdir()` under parallel CI** — the previous shape compared `listTmpStageDirs()` snapshots before and after the throw. Under `scripts/run-tests.cjs --test-concurrency=4`, `tests/install-minimal-all-runtimes.test.cjs` runs in a parallel process and creates/removes `gsd-minimal-skills-*` dirs in the shared OS tmpdir between snapshots, so `deepStrictEqual` failed deterministically when the parallel process happened to have a live stage dir during the snapshot window. Fix: stub `fs.mkdtempSync` to record THIS call's stage dir, then assert that exact path no longer exists after the throw — no global filesystem snapshot, no race. (#3008)

View File

@@ -0,0 +1,5 @@
---
type: Fixed
pr: 3022
---
**Codex SessionStart hook now uses absolute Node binary path** — closes the gap left after #3002. The Codex install path wrote `command = "node ${path}"` directly into config.toml, bypassing `resolveNodeRunner()`. Under GUI/minimal-PATH runtimes (`/usr/bin:/bin:/usr/sbin:/sbin`), bare `node` failed to resolve, exit 127. Now routed through new `buildCodexHookBlock()` helper. Reinstall path migrates legacy bare-node entries via new `rewriteLegacyCodexHookBlock()`. See #3017.

View File

@@ -0,0 +1,5 @@
---
type: Fixed
pr: TBD
---
**Codex skill adapter no longer instructs the agent to silently default discuss-phase decisions.** When `request_user_input` was rejected (Default mode), the generated adapter said "pick a reasonable default" — so `$gsd-discuss-phase` proceeded toward writing CONTEXT.md / DISCUSSION-LOG.md / checkpoints without ever asking the user. Adapter prose now requires the agent to STOP, present plain-text questions, and wait, with explicit named exceptions (`--auto`/`--all`/explicit user approval). See #3018.

View File

@@ -0,0 +1,6 @@
---
type: Changed
pr: 3074
---
**query CLI path extracted into a dedicated Query CLI Adapter Module**`sdk/src/cli.ts` now delegates query-specific dispatch, error mapping, and output/exit handling to `sdk/src/query/query-cli-adapter.ts` for better locality and testability.

View File

@@ -0,0 +1,5 @@
---
type: Fixed
pr: 3012
---
**Post-install message and update.md no longer recommend the removed `/gsd-reapply-patches` command** — after PR #2824 consolidated 86 skills into ~58, `/gsd-reapply-patches` was folded into a flag (`/gsd-update --reapply`). The 1.39.1 hotfix (#2954) updated `help.md` but missed `bin/install.js`'s `reportLocalPatches` runtime emitter, `get-shit-done/workflows/update.md` Step 4, and the English + zh-CN/ja-JP/ko-KR doc set. Users hit "Unknown command" after every install with backed-up patches. All five runtime branches in `reportLocalPatches` (claude, opencode, kilo, copilot, gemini, codex, cursor) now emit the consolidated form. Regression: `tests/bug-3010-reapply-patches-references.test.cjs` scans `bin/install.js`, every workflow file, and every doc (excluding CHANGELOG history and help.md's deprecation notice) for stale recommendations. See #3010.

View File

@@ -0,0 +1,5 @@
---
type: Changed
pr: 0
---
**Documentation refreshed for v1.40.0** — full audit of `docs/` against the 1.40.0-rc.1 release surface. Updates command lists, walkthroughs, and inventory rows for the 86→59 skill consolidation (#2790), the six namespace meta-skills with two-stage routing (#2792), the `/gsd-health --context` guard, the phase-lifecycle status-line read-side (#2833), and the Gemini colon-form / non-Gemini hyphen-form slash-command split. Translations in ja-JP/ko-KR/zh-CN/pt-BR mirror the structural changes; new English prose is marked with `<!-- TODO i18n -->` for human translator follow-up. CHANGELOG.md `[Unreleased]` section regrouped under Feature/Enhancement/Fix headers.

View File

@@ -0,0 +1,5 @@
---
type: Added
pr: TBD
---
**`dynamic_routing` block in `.planning/config.json` for failure-tier escalation (#3024).** Each agent declares a default tier (`light` / `standard` / `heavy`); when `dynamic_routing.enabled: true`, the resolver picks `tier_models[default_tier]` for the first spawn and escalates one tier up on orchestrator-detected soft failure (capped by `max_escalations`). Disabled by default — fully backward compatible. Composes with `model_overrides` (higher precedence) and `models.<phase_type>` (lower) for full cost-control flexibility. Adds new resolver `resolveModelForTier(cwd, agent, attempt)` to `core.cjs` for orchestrator integration.

View File

@@ -0,0 +1,5 @@
---
type: Added
pr: 2975
---
**Changeset-fragment workflow** — eliminates CHANGELOG.md merge conflicts. Each PR drops `.changeset/<random-name>.md` with frontmatter (`type:`, `pr:`) plus a markdown body; the release-time `npm run changelog:render` consolidates fragments into `CHANGELOG.md` and deletes them. CI lint (`npm run lint:changeset`) requires a fragment on any PR touching user-facing files (`bin/`, `get-shit-done/`, `agents/`, `commands/`, `hooks/`, `sdk/src/`); contributors can opt out via the `no-changelog` label for purely internal changes. See [.changeset/README.md](.changeset/README.md) and CONTRIBUTING.md for the workflow.

View File

@@ -0,0 +1,5 @@
---
type: Fixed
pr: 3037
---
**Gemini local install no longer duplicates `/gsd:*` commands across user and workspace scopes** — when GSD is already installed at the user scope (`~/.gemini/commands/gsd/`) and you run `npx get-shit-done-cc --gemini --local` in a project, the installer now skips writing `commands/gsd/` to `<project>/.gemini/` and prints a one-line warning explaining why. Previously, both scopes received the same 65 command files, and Gemini's conflict detector renamed every `/gsd:*` command to `/workspace.gsd:*` and `/user.gsd:*`, breaking the documented namespace. Closes #3037.

View File

@@ -0,0 +1,5 @@
---
type: Fixed
pr: 2994
---
/gsd-reapply-patches Step 5 verifier now resolves at runtime — moved scripts/verify-reapply-patches.cjs to get-shit-done/bin/ which is shipped by the installer. The legacy scripts/ directory is not copied to user installs. See #2994.

View File

@@ -0,0 +1,5 @@
---
type: Changed
pr: 3060
---
**Query mutation event mapping moved to dedicated module** — preserves event payloads while improving registry locality and test surface.

View File

@@ -0,0 +1,5 @@
---
type: Fixed
pr: 3026
---
**`gsd-sdk query <subcommand> --help` now reaches the handler instead of returning top-level usage.** The query argv parser harvested `--help` as a global flag and `main()` short-circuited dispatch — there was no path to discover what arguments a query subcommand accepts. The parser now leaves `--help` in `queryArgv` so the handler/fallback can render contextual help. The `gsd-tools.cjs` fallback now renders top-level usage on `--help` (instead of erroring), preserving #1818's anti-hallucination invariant by NOT executing the destructive command. See #3019.

View File

@@ -0,0 +1,5 @@
---
type: Changed
pr: 3060
---
**Alias-family handler maps moved to dedicated catalog module** — keeps command keys/order while reducing createRegistry coupling and improving family-level locality.

View File

@@ -0,0 +1,5 @@
---
type: Fixed
pr: 3028
---
**Installer no longer prints `✓ GSD SDK ready` when the shim is unreachable from the user's runtime shells.** The previous check used `process.env.PATH` from the install subprocess, which often differs from the user's later interactive shells (POSIX `~/.local/bin` not in login shell, node-version-manager PATH shims). Added `getUserShellPath()` helper that probes `$SHELL -lc 'printf %s "$PATH"'` and `isGsdSdkOnPath(pathString?)` overload that accepts an explicit PATH; the install-time check now downgrades to the actionable `⚠` diagnostic from PR #3014 when install-PATH and user-shell-PATH disagree. Windows cross-shell support tracked separately. See #3020.

View File

@@ -0,0 +1,5 @@
---
type: Added
pr: 2840
---
**`docs/issue-driven-orchestration.md` — recipe for driving GSD from a tracker issue** — new guide that maps Symphony-style orchestration concepts (workflow, isolated agent workspace, proof-of-work, human review gate, follow-up capture) onto existing GSD primitives (`/gsd-new-workspace`, `/gsd-manager`, `/gsd-autonomous`, `/gsd-verify-work`, `/gsd-review`, `/gsd-ship`, `STATE.md`, phase artifacts). Documentation only — no new commands, no daemon, no tracker integration.

View File

@@ -0,0 +1,5 @@
---
type: Fixed
pr: 2994
---
/gsd-reapply-patches Step 5 verifier now resolves at runtime — moved scripts/verify-reapply-patches.cjs to get-shit-done/bin/ which is shipped by the installer. The legacy scripts/ directory is not copied to user installs. See #2994.

View File

@@ -0,0 +1,5 @@
---
type: Fixed
pr: 2979
---
Managed JS hooks now resolve under GUI/minimal-PATH runtimes — installer emits process.execPath (absolute, quoted, forward-slash-normalized) as the runner for every .js hook command instead of bare node. See #2979.

View File

@@ -0,0 +1,5 @@
---
type: Added
pr: 2995
---
Post-install path smoke test for workflow-invoked scripts — audits every node ${GSD_HOME}/...cjs invocation in workflows resolves at the runtime-installed path. See #2995.

View File

@@ -0,0 +1,5 @@
---
type: Fixed
pr: 3011
---
**Actionable diagnostic when `gsd-sdk` is not on PATH after install** — Windows users (and others on multi-shell setups) reported that the previous "GSD SDK files are present but `gsd-sdk` is not on your PATH" warning gave them no way to fix it: no path to look at, no shell-specific commands, no mention of the npx-cache caveat. New `formatSdkPathDiagnostic({ shimDir, platform, runDir })` helper returns a typed IR with the resolved shim location, platform-specific PATH-export commands (PowerShell / cmd.exe / Git Bash on Windows; `export PATH` on POSIX), and an npx-specific note when running under an `_npx` cache segment (where the shim may be written to a temp dir that won't persist). The console renderer in `bin/install.js` emits the lines from the IR; tests assert on the typed fields directly. (#3011)

View File

@@ -0,0 +1,5 @@
---
type: Added
pr: 3032
---
**Documentation: MCP tool schema as a context-budget concern (#3025).** Adds new sections to `get-shit-done/references/context-budget.md` and `docs/USER-GUIDE.md` explaining that every enabled MCP server injects its tool schema into every turn — heavyweight servers (browser/playwright, Mac-tools, Windows-tools) can cost 20k+ tokens each, often dwarfing what `model_profile` tuning saves. The toggle lives in `.claude/settings.json` (`enabledMcpjsonServers` / `disabledMcpjsonServers`) and is a Claude Code harness concern, not a GSD concern. Includes a pre-phase audit checklist (browser, platform-specific, cross-project, duplicates) and notes the multiplier interaction with `model_profile`. Companion to #3023 (per-phase-type model map) and #3024 (dynamic routing); together they cover the three biggest cost levers.

View File

@@ -0,0 +1,5 @@
---
type: Fixed
pr: 2997
---
SDK config-set/config-get and init responses no longer echo plaintext API keys. New sdk/src/query/secrets.ts ports SECRET_CONFIG_KEYS masking from CJS; init bundles only mask string values to preserve the boolean availability-flag contract. See #2997.

View File

@@ -0,0 +1,5 @@
---
type: Fixed
pr: 2992
---
/gsd-update queries wrong npm package names — moved package name into a deterministic check-latest-version.cjs script and updated the workflow to use ${GSD_DIR} from get_installed_version. See #2992.

View File

@@ -0,0 +1,5 @@
---
type: Fixed
pr: 3007
---
**PR templates now point at the changeset workflow** — the `Fix`, `Enhancement`, and `Feature` PR templates previously asked contributors to tick `CHANGELOG.md updated`, which contradicted the post-#2978 rule that `CHANGELOG.md` must not be edited directly. Each checkbox now references `npm run changeset` (and the `no-changelog` opt-out where applicable).

View File

@@ -0,0 +1,5 @@
---
type: Changed
pr: 3060
---
**CLI query CJS fallback execution extracted to dedicated adapter module** — preserves logs/help passthrough behavior while improving fallback locality and testability.

View File

@@ -0,0 +1,5 @@
---
type: Changed
pr: 3060
---
**Query mutation event emission now uses a dedicated decorator seam** — preserves fire-and-forget behavior while reducing registry coupling and improving testability.

View File

@@ -0,0 +1,5 @@
---
type: Added
pr: 3030
---
**`models` block in `.planning/config.json` for per-phase-type model selection (#3023).** A new resolution layer between per-agent `model_overrides` and the `model_profile` tier table. Six named slots (`planning` / `discuss` / `research` / `execution` / `verification` / `completion`) accept tier aliases (`opus` / `sonnet` / `haiku` / `inherit`). Lets you express "Opus for planning, Sonnet for the rest" in two lines without learning the agent taxonomy. Fully backward compatible — configs without `models` behave exactly as today.

View File

@@ -0,0 +1,5 @@
---
type: Fixed
pr: 2998
---
gsd-pristine/ is now populated by the installer when local patches are detected — saveLocalPatches calls a new populatePristineDir helper that runs the install transform pipeline into a tmp staging dir and copies modified files into pristineDir. The reapply-patches Step 5 verifier no longer falls back to its over-broad heuristic. See #2998.

View File

@@ -0,0 +1,5 @@
---
type: Fixed
pr: 2997
---
SDK config-set/config-get and init responses no longer echo plaintext API keys. New sdk/src/query/secrets.ts ports SECRET_CONFIG_KEYS masking from CJS; init bundles only mask string values to preserve the boolean availability-flag contract. See #2997.

View File

@@ -0,0 +1,5 @@
---
type: Added
pr: 2995
---
Post-install path smoke test for workflow-invoked scripts — audits every node ${GSD_HOME}/...cjs invocation in workflows resolves at the runtime-installed path. See #2995.

View File

@@ -0,0 +1,5 @@
---
type: Changed
pr: 3060
---
**Query fallback orchestration now shared** — CLI and SDK query dispatch now use one planning seam for native vs CJS fallback decisions with behavior parity preserved.

View File

@@ -0,0 +1,5 @@
---
type: Changed
pr: 3060
---
**Query/transport policy data now converged in shared module** — mutation and raw-output policy wiring now share one source of truth to reduce drift.

View File

@@ -0,0 +1,5 @@
---
type: Changed
pr: 3042
---
**`/gsd-research-phase` consolidated into `/gsd-plan-phase --research-phase <N>`** — the standalone research command's slash-command stub was never registered (#3042). Rather than restore the orphan, the research-only capability now lives as a flag on `/gsd-plan-phase`. New modifiers: `--view` prints existing `RESEARCH.md` to stdout without spawning, `--research` forces refresh, otherwise prompts `update / view / skip` when `RESEARCH.md` already exists. Also scrubs four other stale slash-command references (`/gsd-check-todos`, `/gsd-new-workspace`, `/gsd-status`, residual `/gsd-plan-milestone-gaps`) across English + 4 localized doc sets (#3044). Closes #3042 and #3044.

View File

@@ -0,0 +1,5 @@
---
type: Fixed
pr: 3029
---
**`/gsd-code-review-fix` and `/gsd-plan-milestone-gaps` no longer surface as "Unknown command"** — both were consolidated by #2790 (`/gsd-code-review --fix` and inline gap planning in `/gsd-audit-milestone` respectively), but several user-facing surfaces still emitted the old slash forms in their offer text. Fixed audit-milestone offer blocks, gsd-complete-milestone routing, code-review/execute-phase offer text, gsd-code-fixer agent role card, and the doc surfaces (USER-GUIDE, FEATURES, INVENTORY, AGENTS, CONFIGURATION). Closes #3029, closes #3034.

View File

@@ -0,0 +1,5 @@
---
type: Fixed
pr: 2990
---
gsd-code-fixer worktree no longer fails on the same-branch checkout — the agent now creates a new gsd-reviewfix/ branch via git worktree add -b and fast-forwards the user's branch on cleanup. See #2990.

View File

@@ -0,0 +1,5 @@
---
type: Added
pr: 2982
---
Extended no-source-grep lint to catch var-binding readFileSync.includes() pattern. Tests now fail when source-grep is hidden behind a parser wrapper. See #2982.

View File

@@ -0,0 +1,6 @@
---
type: Changed
pr: 3065
---
**Dispatch policy seam now returns a structured result contract** across native and fallback query execution paths (`ok`, typed error `kind`, `details`, and final `exit_code`), with CLI consuming the unified result instead of mixed throw/result handling.

View File

@@ -0,0 +1,5 @@
---
type: Changed
pr: 3060
---
**Query static command registrations now split into domain catalog modules** — preserves command order/strings while improving registry locality and maintenance.

View File

@@ -0,0 +1,5 @@
---
type: Changed
pr: 3085
---
**`GSDTools` query execution internals now use deep Module seams** — refactors runtime composition, native/subprocess adapters, and output projection behind stable public interfaces for better locality and testability.

View File

@@ -0,0 +1,5 @@
---
type: Changed
pr: 2974
---
Migrated 8 test files from raw text matching (`stdout.includes(...)`, `assert.match(stderr, ...)`) to typed-IR assertions per CONTRIBUTING.md. Adds shared `ERROR_REASON` enum and `--json-errors` flag in `core.cjs`, typed `GRAPHIFY_REASON` in `graphify.cjs`, pure `buildSdkFailFastReport()` IR builder in `bin/install.js`, and Claude Code JSON envelope output (`hookSpecificOutput` with typed fields) for `gsd-session-state.sh` and `gsd-phase-boundary.sh`. Tests now assert on structured fields (`reason`, `context`, `state_present`, `planning_modified`, etc.) instead of substring matching. See #2974.

View File

@@ -0,0 +1,5 @@
---
type: Added
pr: 2795
---
**Optional update banner for non-GSD statusline users** — when the installer detects you've declined or kept a non-GSD statusline, it now offers an opt-in `SessionStart` banner that surfaces update availability via the existing `~/.cache/gsd/gsd-update-check.json` cache. Silent when up-to-date, rate-limits failure diagnostics to once per 24h, removed cleanly by `npx get-shit-done-cc --uninstall`.

View File

@@ -0,0 +1,5 @@
---
type: Fixed
pr: 2973
---
/gsd-profile-user --refresh writes dev-preferences.md to ~/.claude/skills/gsd-dev-preferences/SKILL.md instead of the legacy commands/gsd/ directory. Installer migrates any preserved legacy file to the new location. See #2973.

View File

@@ -0,0 +1,5 @@
---
type: Fixed
pr: 2992
---
/gsd-update queries wrong npm package names — moved package name into a deterministic check-latest-version.cjs script and updated the workflow to use ${GSD_DIR} from get_installed_version. See #2992.

View File

@@ -0,0 +1,5 @@
---
type: Fixed
pr: 2979
---
Managed JS hooks now resolve under GUI/minimal-PATH runtimes — installer emits process.execPath (absolute, quoted, forward-slash-normalized) as the runner for every .js hook command instead of bare node. See #2979.

View File

@@ -0,0 +1,5 @@
---
type: Added
pr: 2982
---
Extended no-source-grep lint to catch var-binding readFileSync.includes() pattern. Tests now fail when source-grep is hidden behind a parser wrapper. See #2982.

26
.coderabbit.yaml Normal file
View File

@@ -0,0 +1,26 @@
# CodeRabbit configuration — gsd-build/get-shit-done
#
# Schema: https://docs.coderabbit.ai/reference/yaml-template/
#
# Project context: GSD ships a CLI tool + an agent runtime, not a documented
# public library. We carry rich JSDoc on internal helpers that warrant it
# (see bin/install.js, get-shit-done/bin/lib/*.cjs) but we do not enforce a
# blanket docstring coverage bar — see issue #2932 for rationale.
reviews:
pre_merge_checks:
# Disable docstring coverage check.
#
# The check produces false-positive warnings on PRs whose new code is
# entirely test files: it counts test(...) / beforeEach / afterEach
# arrow-function callbacks as functions and then reports 0% coverage
# because nothing has JSDoc. There is no per-check path filter in CR's
# documented schema that would let us exclude tests/** while keeping
# the check active elsewhere, and the top-level path_filters approach
# would silence ALL CR review on tests (security scans, out-of-scope
# checks, line-level findings) which we want to keep.
#
# All other CR pre-merge checks (out-of-scope, security, title) remain
# at their defaults.
docstrings:
mode: off

6
.githooks/pre-commit Executable file
View File

@@ -0,0 +1,6 @@
#!/usr/bin/env bash
set -euo pipefail
if git diff --cached --name-only | grep -Eq "^sdk/src/query/command-manifest\.|^sdk/src/query/command-aliases\.generated\.ts$|^get-shit-done/bin/lib/command-aliases\.generated\.cjs$|^sdk/scripts/gen-command-aliases\.ts$"; then
npm run check:alias-drift
fi

48
.githooks/pre-push Executable file
View File

@@ -0,0 +1,48 @@
#!/usr/bin/env bash
set -euo pipefail
zero_sha='0000000000000000000000000000000000000000'
blocked_regex="${GSD_BLOCKED_AUTHOR_REGEX:-}"
# Local-only guard: no-op unless the developer opts in via env var, e.g.
# export GSD_BLOCKED_AUTHOR_REGEX='@example-corp\.com$'
if [[ -z "$blocked_regex" ]]; then
exit 0
fi
violations=()
while read -r local_ref local_sha remote_ref remote_sha; do
# branch/tag deletion
if [[ "$local_sha" == "$zero_sha" ]]; then
continue
fi
if [[ "$remote_sha" == "$zero_sha" ]]; then
# New remote ref: inspect commits not already on any remote
commit_list=$(git rev-list "$local_sha" --not --remotes)
else
commit_list=$(git rev-list "$remote_sha..$local_sha")
fi
while read -r commit; do
[[ -z "$commit" ]] && continue
author_email=$(git show -s --format='%ae' "$commit")
lower_email=$(printf '%s' "$author_email" | tr '[:upper:]' '[:lower:]')
if printf '%s' "$lower_email" | grep -Eq "$blocked_regex"; then
violations+=("$commit <$author_email>")
fi
done <<< "$commit_list"
done
if [[ ${#violations[@]} -gt 0 ]]; then
{
echo "Push blocked: commit author email matched local blocked regex ($blocked_regex)."
echo "Rewrite author info before pushing these commits:"
for v in "${violations[@]}"; do
echo " - $v"
done
echo "Suggested fix: git rebase -i <base> --exec \"git commit --amend --no-edit --author='Your Name <non-enterprise@email>'\""
} >&2
exit 1
fi

View File

@@ -73,7 +73,7 @@ Closes #
- [ ] Changes are scoped to the approved enhancement — nothing extra included
- [ ] All existing tests pass (`npm test`)
- [ ] New or updated tests cover the enhanced behavior
- [ ] CHANGELOG.md updated
- [ ] `.changeset/` fragment added (`npm run changeset -- --type Changed --pr <NNN> --body "..."`) — or `no-changelog` label applied if not user-facing
- [ ] Documentation updated if behavior or output changed
- [ ] No unnecessary dependencies added

View File

@@ -94,7 +94,7 @@ Closes #
- [ ] Implementation scope matches the approved spec exactly
- [ ] All existing tests pass (`npm test`)
- [ ] New tests cover the happy path, error cases, and edge cases
- [ ] CHANGELOG.md updated with a user-facing description of the feature
- [ ] `.changeset/` fragment added with a user-facing description of the feature (`npm run changeset -- --type Added --pr <NNN> --body "..."`)
- [ ] Documentation updated — commands, workflows, references, README if applicable
- [ ] No unnecessary external dependencies added
- [ ] Works on Windows (backslash paths handled)

View File

@@ -63,7 +63,7 @@ Fixes #
- [ ] Fix is scoped to the reported bug — no unrelated changes included
- [ ] Regression test added (or explained why not)
- [ ] All existing tests pass (`npm test`)
- [ ] CHANGELOG.md updated if this is a user-facing fix
- [ ] `.changeset/` fragment added if this is a user-facing fix (`npm run changeset -- --type Fixed --pr <NNN> --body "..."`) — or `no-changelog` label applied
- [ ] No unnecessary dependencies added
## Breaking changes

157
.github/workflows/canary.yml vendored Normal file
View File

@@ -0,0 +1,157 @@
# Release stream policy:
# dev → @canary (this workflow — preview builds for the long-lived integration branch)
# main → @next (RC train, see release.yml)
# main → @latest (stable cuts, see release.yml)
#
# Streams do not mix. The publish/tag steps below gate on `refs/heads/dev` so a
# workflow_dispatch run on any other branch (including main) completes the
# build/test/dry-run validation but does not publish or tag.
name: Canary
on:
workflow_dispatch:
inputs:
dry_run:
description: 'Dry run (skip npm publish, tagging, and push)'
required: false
type: boolean
default: false
concurrency:
group: canary
cancel-in-progress: false
env:
NODE_VERSION: 24
jobs:
canary:
runs-on: ubuntu-latest
timeout-minutes: 10
permissions:
contents: write
id-token: write
environment: npm-publish
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
- uses: actions/setup-node@53b83947a5a98c8d113130e565377fae1a50d02f # v6.3.0
with:
node-version: ${{ env.NODE_VERSION }}
registry-url: 'https://registry.npmjs.org'
cache: 'npm'
- name: Determine canary version
id: canary
run: |
# Strip any pre-release suffix from package.json version to get base (e.g. 1.39.0-rc.4 → 1.39.0)
RAW=$(node -p "require('./package.json').version")
BASE=$(echo "$RAW" | sed 's/-.*//')
# Find next sequential canary number from existing tags
N=1
while git tag -l "v${BASE}-canary.${N}" | grep -q .; do
N=$((N + 1))
done
CANARY_VERSION="${BASE}-canary.${N}"
echo "canary_version=$CANARY_VERSION" >> "$GITHUB_OUTPUT"
- name: Configure git identity
run: |
git config user.name "github-actions[bot]"
git config user.email "41898282+github-actions[bot]@users.noreply.github.com"
- name: Bump to canary version
env:
CANARY_VERSION: ${{ steps.canary.outputs.canary_version }}
run: |
npm version "$CANARY_VERSION" --no-git-tag-version
cd sdk && npm version "$CANARY_VERSION" --no-git-tag-version && cd ..
- name: Install and test
run: |
npm ci
npm test
- name: Build SDK dist for tarball
run: npm run build:sdk
- name: Verify tarball ships sdk/dist/cli.js (bug #2647)
run: bash scripts/verify-tarball-sdk-dist.sh
- name: Dry-run publish validation
run: |
npm publish --dry-run --tag canary
cd sdk && npm publish --dry-run --tag canary
env:
NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
- name: Tag and push
if: ${{ github.ref == 'refs/heads/dev' && !inputs.dry_run }}
env:
CANARY_VERSION: ${{ steps.canary.outputs.canary_version }}
run: |
git tag "v${CANARY_VERSION}"
git push origin "v${CANARY_VERSION}"
- name: Publish to npm (canary)
if: ${{ github.ref == 'refs/heads/dev' && !inputs.dry_run }}
run: npm publish --provenance --access public --tag canary
env:
NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
- name: Publish SDK to npm (canary)
if: ${{ github.ref == 'refs/heads/dev' && !inputs.dry_run }}
run: cd sdk && npm publish --provenance --access public --tag canary
env:
NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
- name: Verify publish
if: ${{ github.ref == 'refs/heads/dev' && !inputs.dry_run }}
env:
CANARY_VERSION: ${{ steps.canary.outputs.canary_version }}
run: |
PUBLISHED="NOT_FOUND"
SDK_PUBLISHED="NOT_FOUND"
for delay in 5 10 20 30 45; do
PUBLISHED=$(npm view get-shit-done-cc@"$CANARY_VERSION" version 2>/dev/null || echo "NOT_FOUND")
SDK_PUBLISHED=$(npm view @gsd-build/sdk@"$CANARY_VERSION" version 2>/dev/null || echo "NOT_FOUND")
if [ "$PUBLISHED" = "$CANARY_VERSION" ] && [ "$SDK_PUBLISHED" = "$CANARY_VERSION" ]; then
break
fi
echo "Not yet live (sleeping ${delay}s)..."
sleep "$delay"
done
if [ "$PUBLISHED" != "$CANARY_VERSION" ]; then
echo "::error::Published version verification failed. Expected $CANARY_VERSION, got $PUBLISHED"
exit 1
fi
echo "Verified: get-shit-done-cc@$CANARY_VERSION is live on npm"
if [ "$SDK_PUBLISHED" != "$CANARY_VERSION" ]; then
echo "::error::SDK version verification failed. Expected $CANARY_VERSION, got $SDK_PUBLISHED"
exit 1
fi
echo "Verified: @gsd-build/sdk@$CANARY_VERSION is live on npm"
CANARY_TAG=$(npm dist-tag ls get-shit-done-cc 2>/dev/null | grep "canary:" | awk '{print $2}')
echo "canary dist-tag points to: $CANARY_TAG"
- name: Summary
env:
CANARY_VERSION: ${{ steps.canary.outputs.canary_version }}
DRY_RUN: ${{ inputs.dry_run }}
PUBLISH_ELIGIBLE: ${{ github.ref == 'refs/heads/dev' && !inputs.dry_run }}
BRANCH_REF: ${{ github.ref }}
run: |
echo "## Canary v${CANARY_VERSION}" >> "$GITHUB_STEP_SUMMARY"
if [ "$DRY_RUN" = "true" ]; then
echo "**DRY RUN** — npm publish, tagging, and push skipped" >> "$GITHUB_STEP_SUMMARY"
elif [ "$PUBLISH_ELIGIBLE" != "true" ]; then
echo "**VALIDATION ONLY** — publish/tag skipped for \`${BRANCH_REF}\`; canary publish is gated to \`refs/heads/dev\`." >> "$GITHUB_STEP_SUMMARY"
else
echo "- Published to npm as \`canary\`" >> "$GITHUB_STEP_SUMMARY"
echo "- SDK also published: \`@gsd-build/sdk@${CANARY_VERSION}\` on \`canary\`" >> "$GITHUB_STEP_SUMMARY"
echo "- Tagged \`v${CANARY_VERSION}\`" >> "$GITHUB_STEP_SUMMARY"
echo "- Install: \`npx get-shit-done-cc@canary\`" >> "$GITHUB_STEP_SUMMARY"
fi

View File

@@ -0,0 +1,24 @@
name: Changeset Required
on:
pull_request:
types: [opened, synchronize, reopened, labeled, unlabeled]
permissions:
contents: read
pull-requests: read
jobs:
changeset-lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: actions/setup-node@v4
with:
node-version: '24'
- name: Run changeset lint
env:
GITHUB_BASE_REF: ${{ github.base_ref }}
run: node scripts/changeset/lint.cjs

View File

@@ -1,5 +1,27 @@
name: Hotfix Release
# Hotfix flow for X.YY.Z patch releases (Z > 0).
#
# create:
# - Branches hotfix/X.YY.Z from the highest existing vX.YY.* tag (1.27.2 from
# v1.27.1, 1.27.1 from v1.27.0). The base IS the cumulative-fix anchor for
# the previous patch.
# - Auto-cherry-picks every fix:/chore: commit on origin/main that isn't
# already in the base, oldest-first. Patch-equivalents (already applied)
# are skipped via `git cherry`. feat:/refactor: are NEVER auto-included.
# - Conflicts fail the workflow with the offending SHA so the operator can
# resolve manually on the branch and re-run finalize with auto_cherry_pick=false.
# - Step summary lists every included SHA so the eventual vX.YY.Z tag
# self-documents what shipped.
#
# finalize:
# - install-smoke gate (cross-platform, parity with release.yml/release-sdk.yml)
# - Bundles SDK as both loose tree (sdk/dist/cli.js) and recoverable tarball
# (sdk-bundle/gsd-sdk.tgz) — parity with release-sdk.yml so a hotfix shipped
# during the @gsd-build-token outage carries the same payload shape.
# - Publishes to @latest, tags vX.YY.Z, re-points @next → vX.YY.Z, opens
# merge-back PR.
on:
workflow_dispatch:
inputs:
@@ -14,6 +36,11 @@ on:
description: 'Patch version (e.g., 1.27.1)'
required: true
type: string
auto_cherry_pick:
description: 'Auto-cherry-pick fix:/chore: commits from origin/main since base tag (create only)'
required: false
type: boolean
default: true
dry_run:
description: 'Dry run (skip npm publish, tagging, and push)'
required: false
@@ -54,10 +81,13 @@ jobs:
MAJOR_MINOR=$(echo "$VERSION" | cut -d. -f1-2)
TARGET_TAG="v${VERSION}"
BRANCH="hotfix/${VERSION}"
BASE_TAG=$(git tag -l "v${MAJOR_MINOR}.*" \
| grep -E "^v[0-9]+\.[0-9]+\.[0-9]+$" \
# Append TARGET_TAG to the candidate list, then sort -V, then walk the
# sorted list and print whatever immediately precedes TARGET_TAG. This
# is semver-correct for multi-digit patches (v1.27.10 > v1.27.9) where
# a plain `awk '$1 < target'` lexicographic compare would mis-order.
BASE_TAG=$( ( git tag -l "v${MAJOR_MINOR}.*" | grep -E "^v[0-9]+\.[0-9]+\.[0-9]+$"; echo "$TARGET_TAG" ) \
| sort -V \
| awk -v target="$TARGET_TAG" '$1 < target { last=$1 } END { if (last != "") print last }')
| awk -v target="$TARGET_TAG" '$1 == target { print prev; exit } { prev = $1 }')
if [ -z "$BASE_TAG" ]; then
echo "::error::No prior stable tag found for ${MAJOR_MINOR}.x before $TARGET_TAG"
exit 1
@@ -95,29 +125,160 @@ jobs:
git config user.name "github-actions[bot]"
git config user.email "41898282+github-actions[bot]@users.noreply.github.com"
- name: Create hotfix branch
if: inputs.dry_run != 'true'
- name: Create hotfix branch from base tag and push (skeleton)
env:
BRANCH: ${{ needs.validate-version.outputs.branch }}
BASE_TAG: ${{ needs.validate-version.outputs.base_tag }}
DRY_RUN: ${{ inputs.dry_run }}
run: |
set -euo pipefail
git checkout -b "$BRANCH" "$BASE_TAG"
# Push the skeleton branch up-front so any subsequent cherry-pick
# conflict leaves a remote artefact the operator can fetch, resolve,
# and re-push. Skipped on dry-run — local checkout still exercises
# the same cherry-pick + bump flow so conflicts are caught.
if [ "$DRY_RUN" != "true" ]; then
git push -u origin "$BRANCH"
fi
- name: Cherry-pick fix/chore commits from origin/main since base tag
if: ${{ inputs.auto_cherry_pick }}
env:
BRANCH: ${{ needs.validate-version.outputs.branch }}
BASE_TAG: ${{ needs.validate-version.outputs.base_tag }}
DRY_RUN: ${{ inputs.dry_run }}
run: |
set -euo pipefail
git fetch origin main:refs/remotes/origin/main
# `git cherry $BASE_TAG origin/main` lists every commit on main not
# patch-equivalent in BASE_TAG. + means needs picking, - means
# already applied (skipped silently).
CANDIDATES=$(git cherry "$BASE_TAG" origin/main | awk '/^\+ / {print $2}')
if [ -z "$CANDIDATES" ]; then
echo "No commits on origin/main beyond $BASE_TAG."
echo "## Cherry-pick summary" >> "$GITHUB_STEP_SUMMARY"
echo "" >> "$GITHUB_STEP_SUMMARY"
echo "Base: \`$BASE_TAG\` — no commits to consider." >> "$GITHUB_STEP_SUMMARY"
exit 0
fi
# Re-order chronologically (oldest first) for predictable application.
ORDERED=$(git log --reverse --format='%H' "$BASE_TAG..origin/main" \
| grep -F -f <(echo "$CANDIDATES") || true)
INCLUDED=""
SKIPPED=""
while IFS= read -r SHA; do
[ -z "$SHA" ] && continue
SUBJECT=$(git log -1 --format='%s' "$SHA")
# fix: or chore:, optional scope, optional ! breaking marker
if echo "$SUBJECT" | grep -qE '^(fix|chore)(\([^)]+\))?!?: '; then
echo "→ cherry-picking $SHA $SUBJECT"
if ! git cherry-pick -x "$SHA"; then
# Abort restores HEAD to the last successful pick. On real
# runs, push that state so the operator can fetch, resolve
# $SHA manually, and finalize with auto_cherry_pick=false.
git cherry-pick --abort || true
if [ "$DRY_RUN" != "true" ]; then
git push --force-with-lease origin "$BRANCH" || git push origin "$BRANCH" || true
fi
{
echo "## Cherry-pick conflict"
echo ""
echo "Failed at: \`${SHA}\` — \`${SUBJECT}\`"
echo ""
if [ "$DRY_RUN" = "true" ]; then
echo "**Dry run:** branch was not pushed, so the picks below were discarded with the runner."
if [ -n "$INCLUDED" ]; then
echo ""
echo "Already-applied picks (lost — must be re-applied before resolving \`${SHA}\`):"
echo ""
echo "$INCLUDED"
fi
echo ""
echo "**To resolve:** re-run \`create\` with \`auto_cherry_pick=true\` (real, not dry-run) to materialize the partial branch on origin, then resolve \`${SHA}\` manually. Re-running with \`auto_cherry_pick=false\` would recreate the branch from \`${BASE_TAG}\` and lose every pick listed above."
else
echo "Branch \`${BRANCH}\` was pushed with picks applied up to (but not including) the conflicting commit."
echo ""
echo "**To resolve:** \`git fetch origin && git checkout ${BRANCH} && git cherry-pick -x ${SHA}\`, fix the conflict, push, then re-run \`finalize\` with \`auto_cherry_pick=false\`."
fi
} >> "$GITHUB_STEP_SUMMARY"
echo "::error::Cherry-pick of $SHA failed. See summary."
exit 1
fi
INCLUDED="${INCLUDED}- \`${SHA}\` ${SUBJECT}"$'\n'
else
echo " skip $SHA $SUBJECT (not fix/chore)"
SKIPPED="${SKIPPED}- \`${SHA}\` ${SUBJECT}"$'\n'
fi
done <<< "$ORDERED"
{
echo "## Cherry-pick summary"
echo ""
echo "Base: \`$BASE_TAG\`"
echo ""
if [ -n "$INCLUDED" ]; then
echo "### Included (fix/chore)"
echo ""
echo "$INCLUDED"
else
echo "_No fix/chore commits to include._"
echo ""
fi
if [ -n "$SKIPPED" ]; then
echo "### Skipped (feat/refactor/etc — not auto-included)"
echo ""
echo "$SKIPPED"
fi
} >> "$GITHUB_STEP_SUMMARY"
- name: Bump version and push
env:
BRANCH: ${{ needs.validate-version.outputs.branch }}
BASE_TAG: ${{ needs.validate-version.outputs.base_tag }}
VERSION: ${{ inputs.version }}
DRY_RUN: ${{ inputs.dry_run }}
run: |
git checkout -b "$BRANCH" "$BASE_TAG"
# Bump version in package.json
set -euo pipefail
npm version "$VERSION" --no-git-tag-version
git add package.json package-lock.json
# Keep sdk/package.json in lockstep (parity with release-sdk.yml).
if [ -f sdk/package.json ]; then
(cd sdk && npm version "$VERSION" --no-git-tag-version)
git add sdk/package.json
[ -f sdk/package-lock.json ] && git add sdk/package-lock.json
fi
git commit -m "chore: bump version to $VERSION for hotfix"
git push origin "$BRANCH"
echo "## Hotfix branch created" >> "$GITHUB_STEP_SUMMARY"
echo "- Branch: \`$BRANCH\`" >> "$GITHUB_STEP_SUMMARY"
echo "- Based on: \`$BASE_TAG\`" >> "$GITHUB_STEP_SUMMARY"
echo "- Apply your fix, push, then run this workflow again with \`finalize\`" >> "$GITHUB_STEP_SUMMARY"
if [ "$DRY_RUN" != "true" ]; then
git push origin "$BRANCH"
else
echo "DRY RUN — branch not pushed. Local checkout exercised the cherry-pick and bump flow."
fi
{
echo "## Hotfix branch created"
echo ""
echo "- Branch: \`$BRANCH\`"
echo "- Based on: \`$BASE_TAG\`"
echo "- Apply additional manual fixes if needed, then run \`finalize\`."
} >> "$GITHUB_STEP_SUMMARY"
finalize:
install-smoke:
needs: validate-version
if: inputs.action == 'finalize'
permissions:
contents: read
uses: ./.github/workflows/install-smoke.yml
with:
ref: ${{ needs.validate-version.outputs.branch }}
finalize:
needs: [validate-version, install-smoke]
if: inputs.action == 'finalize'
runs-on: ubuntu-latest
timeout-minutes: 10
timeout-minutes: 15
permissions:
contents: write
pull-requests: write
@@ -140,31 +301,83 @@ jobs:
git config user.name "github-actions[bot]"
git config user.email "41898282+github-actions[bot]@users.noreply.github.com"
- name: Detect prior publish (reconciliation mode)
id: prior_publish
env:
VERSION: ${{ inputs.version }}
run: |
EXISTING=$(npm view get-shit-done-cc@"$VERSION" version 2>/dev/null || true)
if [ -n "$EXISTING" ]; then
echo "::warning::get-shit-done-cc@${VERSION} is already on the registry — entering reconciliation mode (skip publish, continue with tag/release/PR/dist-tag)."
echo "skip_publish=true" >> "$GITHUB_OUTPUT"
else
echo "skip_publish=false" >> "$GITHUB_OUTPUT"
fi
- name: Install and test
run: |
npm ci
npm run test:coverage
- name: Create PR to merge hotfix back to main
if: ${{ !inputs.dry_run }}
- name: Build SDK dist for tarball
run: npm run build:sdk
- name: Verify CC tarball ships sdk/dist/cli.js (bug #2647 guard)
run: bash scripts/verify-tarball-sdk-dist.sh
- name: Pack SDK as tarball and bundle into CC source tree
env:
GH_TOKEN: ${{ github.token }}
BRANCH: ${{ needs.validate-version.outputs.branch }}
VERSION: ${{ inputs.version }}
run: |
EXISTING_PR=$(gh pr list --base main --head "$BRANCH" --state open --json number --jq '.[0].number')
if [ -n "$EXISTING_PR" ]; then
echo "PR #$EXISTING_PR already exists; updating"
gh pr edit "$EXISTING_PR" \
--title "chore: merge hotfix v${VERSION} back to main" \
--body "Merge hotfix changes back to main after v${VERSION} release."
else
gh pr create \
--base main \
--head "$BRANCH" \
--title "chore: merge hotfix v${VERSION} back to main" \
--body "Merge hotfix changes back to main after v${VERSION} release."
set -e
cd sdk
npm pack
TARBALL="gsd-build-sdk-${VERSION}.tgz"
if [ ! -f "$TARBALL" ]; then
echo "::error::Expected $TARBALL but npm pack did not produce it."
ls -la
exit 1
fi
mkdir -p ../sdk-bundle
mv "$TARBALL" ../sdk-bundle/gsd-sdk.tgz
cd ..
ls -la sdk-bundle/
- name: Add sdk-bundle to CC files whitelist (in-tree, not committed)
run: |
node <<'NODE'
const fs = require('fs');
const pkg = JSON.parse(fs.readFileSync('package.json', 'utf8'));
if (!Array.isArray(pkg.files)) {
console.error('::error::package.json files is not an array');
process.exit(1);
}
if (!pkg.files.includes('sdk-bundle')) {
pkg.files.push('sdk-bundle');
fs.writeFileSync('package.json', JSON.stringify(pkg, null, 2) + '\n');
console.log('Added sdk-bundle/ to package.json files whitelist');
}
NODE
- name: Verify CC tarball will contain sdk-bundle/gsd-sdk.tgz
run: |
set -e
TARBALL=$(npm pack --ignore-scripts 2>/dev/null | tail -1)
if [ -z "$TARBALL" ] || [ ! -f "$TARBALL" ]; then
echo "::error::npm pack produced no tarball"
exit 1
fi
if ! tar -tzf "$TARBALL" | grep -q "package/sdk-bundle/gsd-sdk.tgz"; then
echo "::error::CC tarball is missing package/sdk-bundle/gsd-sdk.tgz"
exit 1
fi
echo "✅ CC tarball contains sdk-bundle/gsd-sdk.tgz"
rm -f "$TARBALL"
- name: Dry-run publish validation
env:
NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
run: npm publish --dry-run --tag latest
- name: Tag and push
if: ${{ !inputs.dry_run }}
@@ -185,55 +398,98 @@ jobs:
fi
- name: Publish to npm (latest)
if: ${{ !inputs.dry_run }}
run: npm publish --provenance --access public
if: ${{ !inputs.dry_run && steps.prior_publish.outputs.skip_publish != 'true' }}
env:
NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
run: npm publish --provenance --access public --tag latest
- name: Create GitHub Release
- name: Re-point next dist-tag at this hotfix
if: ${{ !inputs.dry_run }}
env:
VERSION: ${{ inputs.version }}
NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
run: |
npm dist-tag add "get-shit-done-cc@${VERSION}" next
echo "✅ next dist-tag re-pointed to v${VERSION} (matches latest)"
- name: Create GitHub Release (idempotent)
if: ${{ !inputs.dry_run }}
env:
GH_TOKEN: ${{ github.token }}
VERSION: ${{ inputs.version }}
run: |
gh release create "v${VERSION}" \
--title "v${VERSION} (hotfix)" \
--generate-notes
if gh release view "v${VERSION}" >/dev/null 2>&1; then
echo "GitHub Release v${VERSION} already exists; ensuring --latest flag is set"
gh release edit "v${VERSION}" --latest || true
else
gh release create "v${VERSION}" \
--title "v${VERSION} (hotfix)" \
--generate-notes \
--latest
fi
- name: Clean up next dist-tag
- name: Create PR to merge hotfix back to main
if: ${{ !inputs.dry_run }}
env:
GH_TOKEN: ${{ github.token }}
BRANCH: ${{ needs.validate-version.outputs.branch }}
VERSION: ${{ inputs.version }}
NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
run: |
# Point next to the stable release so @next never returns something
# older than @latest. This prevents stale pre-release installs.
npm dist-tag add "get-shit-done-cc@${VERSION}" next 2>/dev/null || true
echo "✓ next dist-tag updated to v${VERSION}"
EXISTING_PR=$(gh pr list --base main --head "$BRANCH" --state open --json number --jq '.[0].number')
if [ -n "$EXISTING_PR" ]; then
gh pr edit "$EXISTING_PR" \
--title "chore: merge hotfix v${VERSION} back to main" \
--body "Merge hotfix changes back to main after v${VERSION} release."
else
gh pr create \
--base main \
--head "$BRANCH" \
--title "chore: merge hotfix v${VERSION} back to main" \
--body "Merge hotfix changes back to main after v${VERSION} release."
fi
- name: Verify publish
- name: Verify publish landed on registry
if: ${{ !inputs.dry_run }}
env:
VERSION: ${{ inputs.version }}
run: |
sleep 10
PUBLISHED=$(npm view get-shit-done-cc@"$VERSION" version 2>/dev/null || echo "NOT_FOUND")
PUBLISHED="NOT_FOUND"
for delay in 5 10 20 30 45; do
PUBLISHED=$(npm view get-shit-done-cc@"$VERSION" version 2>/dev/null || echo "NOT_FOUND")
if [ "$PUBLISHED" = "$VERSION" ]; then
break
fi
echo "Waiting ${delay}s for registry to catch up (saw: $PUBLISHED)..."
sleep "$delay"
done
if [ "$PUBLISHED" != "$VERSION" ]; then
echo "::error::Published version verification failed. Expected $VERSION, got $PUBLISHED"
echo "::error::Version $VERSION did not appear on the registry within timeout"
exit 1
fi
echo "✓ Verified: get-shit-done-cc@$VERSION is live on npm"
LATEST_VER=$(npm view get-shit-done-cc dist-tags.latest 2>/dev/null || echo "NOT_FOUND")
if [ "$LATEST_VER" != "$VERSION" ]; then
echo "::error::dist-tag 'latest' resolves to '$LATEST_VER', expected '$VERSION'"
exit 1
fi
echo "✓ Verified: get-shit-done-cc@$VERSION is live on @latest"
- name: Summary
env:
VERSION: ${{ inputs.version }}
BASE_TAG: ${{ needs.validate-version.outputs.base_tag }}
DRY_RUN: ${{ inputs.dry_run }}
run: |
echo "## Hotfix v${VERSION}" >> "$GITHUB_STEP_SUMMARY"
if [ "$DRY_RUN" = "true" ]; then
echo "**DRY RUN** — npm publish, tagging, and push skipped" >> "$GITHUB_STEP_SUMMARY"
else
echo "- Published to npm as \`latest\`" >> "$GITHUB_STEP_SUMMARY"
echo "- Tagged \`v${VERSION}\`" >> "$GITHUB_STEP_SUMMARY"
echo "- PR created to merge back to main" >> "$GITHUB_STEP_SUMMARY"
fi
{
echo "## Hotfix v${VERSION}"
echo ""
echo "- Base (cumulative-fix anchor): \`${BASE_TAG}\`"
if [ "$DRY_RUN" = "true" ]; then
echo "- **DRY RUN** — npm publish, tagging, and push skipped"
else
echo "- Published to npm as \`latest\`"
echo "- \`next\` dist-tag re-pointed to v${VERSION}"
echo "- Tagged \`v${VERSION}\` (anchor for the next hotfix's cherry-pick base)"
echo "- SDK bundled at \`sdk-bundle/gsd-sdk.tgz\` inside CC tarball"
echo "- Merge-back PR opened against main"
fi
} >> "$GITHUB_STEP_SUMMARY"

View File

@@ -1,10 +1,13 @@
name: Install Smoke
# Exercises the real install path: `npm pack` → `npm install -g <tarball>`
# → run `bin/install.js` → assert `gsd-sdk` is on PATH.
# Exercises the real install paths:
# tarball: `npm pack` → `npm install -g <tarball>` → assert gsd-sdk on PATH
# unpacked: `npm install -g <dir>` (no pack) → assert gsd-sdk on PATH + executable
#
# Closes the CI gap that let #2439 ship: the rest of the suite only reads
# `bin/install.js` as a string and never executes it.
# The tarball path is the canonical ship path. The unpacked path reproduces the
# mode-644 failure class (issue #2453): npm does NOT chmod bin targets when
# installing from an unpacked local directory, so any stale tsc output lacking
# execute bits will be caught by the unpacked job before release.
#
# - PRs: path-filtered, minimal runner (ubuntu + Node LTS) for fast signal.
# - Push to release branches / main: full matrix.
@@ -16,6 +19,7 @@ on:
- main
paths:
- 'bin/install.js'
- 'bin/gsd-sdk.js'
- 'sdk/**'
- 'package.json'
- 'package-lock.json'
@@ -40,6 +44,9 @@ concurrency:
cancel-in-progress: true
jobs:
# ---------------------------------------------------------------------------
# Job 1: tarball install (existing canonical path)
# ---------------------------------------------------------------------------
smoke:
runs-on: ${{ matrix.os }}
timeout-minutes: 12
@@ -78,6 +85,31 @@ jobs:
if: steps.skip.outputs.skip != 'true'
with:
ref: ${{ inputs.ref || github.ref }}
# Need enough history to merge origin/main for stale-base detection.
fetch-depth: 0
# The default `refs/pull/N/merge` ref GitHub produces for PRs is cached
# against the recorded merge-base, not current main. When main advances
# after the PR was opened, the merge ref stays stale and CI can fail on
# issues that were already fixed upstream. Explicitly merge current
# origin/main into the PR head so smoke always tests the PR against the
# latest trunk. If the merge conflicts, emit a clear "rebase onto main"
# diagnostic instead of a downstream build error that looks unrelated.
- name: Rebase check — merge origin/main into PR head
if: steps.skip.outputs.skip != 'true' && github.event_name == 'pull_request'
shell: bash
run: |
set -euo pipefail
git config user.email "ci@gsd-build"
git config user.name "CI Rebase Check"
git fetch origin main
if ! git merge --no-edit --no-ff origin/main; then
echo "::error::This PR cannot cleanly merge origin/main. Rebase your branch onto current main and push again."
echo "::error::Conflicting files:"
git diff --name-only --diff-filter=U
git merge --abort
exit 1
fi
- name: Set up Node.js ${{ matrix.node-version }}
if: steps.skip.outputs.skip != 'true'
@@ -90,6 +122,23 @@ jobs:
if: steps.skip.outputs.skip != 'true'
run: npm ci
# Isolated SDK typecheck — if the build fails, emit a clear "stale base
# or real type error" diagnostic instead of letting the failure cascade
# into the tarball install step, where the downstream PATH assertion
# misreports it as "gsd-sdk not on PATH — installSdkIfNeeded regression".
- name: SDK typecheck (fails fast on type regressions)
if: steps.skip.outputs.skip != 'true'
shell: bash
run: |
set -euo pipefail
if ! npm run build:sdk; then
echo "::error::SDK build (npm run build:sdk) failed."
echo "::error::Common cause: your PR base is behind main and picks up intermediate type errors that are already fixed on trunk."
echo "::error::Fix: git fetch origin main && git rebase origin/main && git push --force-with-lease"
echo "::error::If the error persists on a fresh rebase, the type error is real — fix it in sdk/src/ and push."
exit 1
fi
- name: Pack root tarball
if: steps.skip.outputs.skip != 'true'
id: pack
@@ -109,7 +158,7 @@ jobs:
echo "$NPM_BIN" >> "$GITHUB_PATH"
echo "npm global bin: $NPM_BIN"
- name: Install tarball globally (runs bin/install.js → installSdkIfNeeded)
- name: Install tarball globally
if: steps.skip.outputs.skip != 'true'
shell: bash
env:
@@ -121,13 +170,14 @@ jobs:
cd "$TMPDIR_ROOT"
npm install -g "$WORKSPACE/$TARBALL"
command -v get-shit-done-cc
# `--claude --local` is the non-interactive code path (see
# install.js main block: when both a runtime and location are set,
# installAllRuntimes runs with isInteractive=false, no prompts).
# We tolerate non-zero here because the authoritative assertion is
# the next step: gsd-sdk must land on PATH. Some runtime targets
# may exit before the SDK step for unrelated reasons on CI.
get-shit-done-cc --claude --local || true
# `--claude --local` is the non-interactive code path. Don't swallow
# non-zero exit — if the installer fails, that IS the CI failure, and
# its own error message is more useful than the downstream "shim
# regression" assertion masking the real cause.
if ! get-shit-done-cc --claude --local; then
echo "::error::get-shit-done-cc --claude --local failed. See the install.js output above for the real error (SDK build, PATH resolution, chmod, etc.)."
exit 1
fi
- name: Assert gsd-sdk resolves on PATH
if: steps.skip.outputs.skip != 'true'
@@ -135,7 +185,7 @@ jobs:
run: |
set -euo pipefail
if ! command -v gsd-sdk >/dev/null 2>&1; then
echo "::error::gsd-sdk is not on PATH after install installSdkIfNeeded() regression"
echo "::error::gsd-sdk is not on PATH after tarball install — shim regression"
NPM_BIN="$(npm config get prefix)/bin"
echo "npm global bin: $NPM_BIN"
ls -la "$NPM_BIN" | grep -i gsd || true
@@ -150,3 +200,99 @@ jobs:
set -euo pipefail
gsd-sdk --version || gsd-sdk --help
echo "✓ gsd-sdk is executable"
# ---------------------------------------------------------------------------
# Job 2: unpacked-dir install — reproduces the mode-644 failure class (#2453)
#
# `npm install -g <directory>` does NOT chmod bin targets when the source
# file was produced by a build script (tsc emits 0o644). This job catches
# regressions where sdk/dist/cli.js loses its execute bit before publish.
# ---------------------------------------------------------------------------
smoke-unpacked:
runs-on: ubuntu-latest
timeout-minutes: 10
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.ref || github.ref }}
fetch-depth: 0
# See the `smoke` job above for rationale — refs/pull/N/merge is cached
# against the recorded merge-base, not current main. Explicitly merge
# origin/main so smoke-unpacked also runs against the latest trunk.
- name: Rebase check — merge origin/main into PR head
if: github.event_name == 'pull_request'
shell: bash
run: |
set -euo pipefail
git config user.email "ci@gsd-build"
git config user.name "CI Rebase Check"
git fetch origin main
if ! git merge --no-edit --no-ff origin/main; then
echo "::error::This PR cannot cleanly merge origin/main. Rebase your branch onto current main and push again."
echo "::error::Conflicting files:"
git diff --name-only --diff-filter=U
git merge --abort
exit 1
fi
- name: Set up Node.js 22
uses: actions/setup-node@53b83947a5a98c8d113130e565377fae1a50d02f # v6.3.0
with:
node-version: 22
cache: 'npm'
- name: Install root deps
run: npm ci
- name: Build SDK dist (sdk/dist is gitignored — must build for unpacked install)
run: npm run build:sdk
- name: Ensure npm global bin is on PATH
shell: bash
run: |
NPM_BIN="$(npm config get prefix)/bin"
echo "$NPM_BIN" >> "$GITHUB_PATH"
echo "npm global bin: $NPM_BIN"
- name: Strip execute bit from sdk/dist/cli.js to simulate tsc-fresh output
shell: bash
run: |
set -euo pipefail
# Simulate the exact state tsc produces: cli.js at mode 644.
chmod 644 sdk/dist/cli.js
echo "Stripped execute bit: $(stat -c '%a' sdk/dist/cli.js 2>/dev/null || stat -f '%p' sdk/dist/cli.js)"
- name: Install from unpacked directory (no npm pack)
shell: bash
run: |
set -euo pipefail
TMPDIR_ROOT=$(mktemp -d)
cd "$TMPDIR_ROOT"
npm install -g "$GITHUB_WORKSPACE"
command -v get-shit-done-cc
get-shit-done-cc --claude --local || true
- name: Assert gsd-sdk resolves on PATH after unpacked install
shell: bash
run: |
set -euo pipefail
if ! command -v gsd-sdk >/dev/null 2>&1; then
echo "::error::gsd-sdk is not on PATH after unpacked install — #2453 regression"
NPM_BIN="$(npm config get prefix)/bin"
ls -la "$NPM_BIN" | grep -i gsd || true
exit 1
fi
echo "✓ gsd-sdk resolves at: $(command -v gsd-sdk)"
- name: Assert gsd-sdk is executable after unpacked install (#2453)
shell: bash
run: |
set -euo pipefail
# This is the exact check that would have caught #2453 before release.
# The shim (bin/gsd-sdk.js) invokes sdk/dist/cli.js via `node`, so
# the execute bit on cli.js is not needed for the shim path. However
# installSdkIfNeeded() also chmods cli.js in-place as a safety net.
gsd-sdk --version || gsd-sdk --help
echo "✓ gsd-sdk is executable after unpacked install"

790
.github/workflows/release-sdk.yml vendored Normal file
View File

@@ -0,0 +1,790 @@
# Release SDK Bundle
#
# Stopgap workflow_dispatch publish path: builds get-shit-done-cc with the
# compiled SDK and the SDK .tgz bundled inside the CC tarball, then
# publishes the CC package to ONE chosen dist-tag (dev | next | latest)
# per run.
#
# Why this exists: @gsd-build/sdk publishes from canary.yml and release.yml
# fail because the @gsd-build npm token is currently unavailable. CC users
# do not consume @gsd-build/sdk directly — bin/gsd-sdk.js resolves
# sdk/dist/cli.js from inside the installed CC package, so the bundled
# copy is sufficient for full functionality. This workflow ships CC alone
# (no separate @gsd-build/sdk publish attempt) and additionally bakes a
# bundled gsd-sdk-<version>.tgz at sdk-bundle/gsd-sdk.tgz inside the CC
# tarball as a recoverable npm-installable artifact.
#
# Existing canary.yml and release.yml are intentionally untouched. They
# remain the canonical two-package publish path; restore them to primary
# use once @gsd-build/sdk ownership is recovered.
#
# Tracking issues: #2925 (initial workflow), #2929 (CI-gate parity with release.yml)
name: Release SDK Bundle
on:
workflow_dispatch:
inputs:
action:
description: 'publish = normal dev/next/latest publish; hotfix = create hotfix/X.YY.Z branch from latest vX.YY.* tag, cherry-pick fix:/chore: from main, publish to @latest'
required: true
type: choice
default: publish
options:
- publish
- hotfix
tag:
description: 'npm dist-tag (publish action only; hotfix forces latest)'
required: false
type: choice
default: latest
options:
- dev
- next
- latest
version:
description: 'Version. publish: explicit (e.g. 1.50.0-dev.3) or empty to derive. hotfix: REQUIRED patch (e.g. 1.27.1, Z>0).'
required: false
type: string
ref:
description: 'Branch or ref to build from. Ignored for hotfix (workflow uses hotfix/X.YY.Z).'
required: false
type: string
auto_cherry_pick:
description: 'Hotfix only: auto-cherry-pick fix:/chore: commits from origin/main since base tag.'
required: false
type: boolean
default: true
dry_run:
description: 'Dry run (skip npm publish, git tag, and push). Hotfix branch creation/push also skipped.'
required: false
type: boolean
default: false
# Per stream (dist-tag for publish, version for hotfix) — no concurrent publishes for the same stream.
concurrency:
group: release-sdk-${{ inputs.action == 'hotfix' && format('hotfix-{0}', inputs.version) || inputs.tag }}
cancel-in-progress: false
env:
NODE_VERSION: 24
jobs:
# Resolves the effective git ref for this run.
#
# action=publish → outputs inputs.ref verbatim (may be empty = workflow ref)
# action=hotfix → branches hotfix/X.YY.Z from highest existing vX.YY.* tag,
# auto-cherry-picks fix:/chore: from origin/main, pushes,
# and outputs the new branch as ref. Idempotent: if branch
# already exists (operator pre-prepared it via hotfix.yml),
# we just check it out and re-run the cherry-pick step
# no-ops since `git cherry` will report nothing new.
prepare:
runs-on: ubuntu-latest
timeout-minutes: 10
permissions:
contents: write
outputs:
ref: ${{ steps.out.outputs.ref }}
base_tag: ${{ steps.hotfix.outputs.base_tag }}
steps:
- name: Validate hotfix inputs
if: inputs.action == 'hotfix'
env:
VERSION: ${{ inputs.version }}
run: |
if [ -z "$VERSION" ]; then
echo "::error::action=hotfix requires the 'version' input (e.g. 1.27.1)"
exit 1
fi
if ! echo "$VERSION" | grep -qE '^[0-9]+\.[0-9]+\.[1-9][0-9]*$'; then
echo "::error::Hotfix version must match X.YY.Z with Z>0 (got: $VERSION)"
exit 1
fi
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
if: inputs.action == 'hotfix'
with:
fetch-depth: 0
- name: Configure git identity
if: inputs.action == 'hotfix'
run: |
git config user.name "github-actions[bot]"
git config user.email "41898282+github-actions[bot]@users.noreply.github.com"
- name: Prepare hotfix branch
id: hotfix
if: inputs.action == 'hotfix'
env:
VERSION: ${{ inputs.version }}
AUTO_CHERRY_PICK: ${{ inputs.auto_cherry_pick }}
DRY_RUN: ${{ inputs.dry_run }}
run: |
set -euo pipefail
# Stash the shipped-paths classifier from the dispatched ref's
# working tree BEFORE `git checkout -b ... "$BASE_TAG"` below
# overwrites it. Base tags predating #2980 don't have the
# classifier in their tree, so the loop must reference a
# location that survives the working-tree swap. Bug #2983.
CLASSIFIER_SRC="scripts/diff-touches-shipped-paths.cjs"
if [ ! -f "$CLASSIFIER_SRC" ]; then
echo "::error::shipped-paths classifier not found at $CLASSIFIER_SRC in dispatched ref — refusing to run"
exit 1
fi
CLASSIFIER="${RUNNER_TEMP}/diff-touches-shipped-paths.cjs"
cp "$CLASSIFIER_SRC" "$CLASSIFIER"
if [ ! -f "$CLASSIFIER" ]; then
echo "::error::failed to stage classifier at $CLASSIFIER"
exit 1
fi
MAJOR_MINOR=$(echo "$VERSION" | cut -d. -f1-2)
TARGET_TAG="v${VERSION}"
BRANCH="hotfix/${VERSION}"
# Semver-correct selection: append TARGET_TAG, sort -V, take preceding entry.
# Plain lexicographic compare mis-orders multi-digit patches (v1.27.10 vs v1.27.9).
BASE_TAG=$( ( git tag -l "v${MAJOR_MINOR}.*" | grep -E "^v[0-9]+\.[0-9]+\.[0-9]+$"; echo "$TARGET_TAG" ) \
| sort -V \
| awk -v target="$TARGET_TAG" '$1 == target { print prev; exit } { prev = $1 }')
if [ -z "$BASE_TAG" ]; then
echo "::error::No prior stable tag found for ${MAJOR_MINOR}.x before $TARGET_TAG"
exit 1
fi
echo "base_tag=$BASE_TAG" >> "$GITHUB_OUTPUT"
echo "branch=$BRANCH" >> "$GITHUB_OUTPUT"
# Idempotent branch creation — operator may have pre-prepared via hotfix.yml.
git fetch origin main:refs/remotes/origin/main
if git ls-remote --exit-code origin "refs/heads/$BRANCH" >/dev/null 2>&1; then
echo "Branch $BRANCH already exists on origin; checking out"
git fetch origin "$BRANCH"
git checkout "$BRANCH"
BRANCH_PRE_EXISTED=1
else
git checkout -b "$BRANCH" "$BASE_TAG"
BRANCH_PRE_EXISTED=0
# Push the skeleton up-front (real runs only) so cherry-pick conflicts
# leave a remote artefact the operator can resolve. Dry-run keeps
# everything local — no orphan branch created on origin.
if [ "$DRY_RUN" != "true" ]; then
git push -u origin "$BRANCH"
fi
fi
if [ "$AUTO_CHERRY_PICK" = "true" ]; then
CANDIDATES=$(git cherry HEAD origin/main | awk '/^\+ / {print $2}')
if [ -n "$CANDIDATES" ]; then
ORDERED=$(git log --reverse --format='%H' "${BASE_TAG}..origin/main" \
| grep -F -f <(echo "$CANDIDATES") || true)
INCLUDED=""
# POLICY_SKIPPED — commits intentionally not picked because they
# don't match the fix/chore filter (feat/refactor/docs/etc).
# CONFLICT_SKIPPED — fix/chore commits whose cherry-pick failed
# and were skipped per the full-automation policy (#2968).
# NON_SHIPPED_SKIPPED — fix/chore commits whose diff doesn't
# touch any path in the npm tarball's `files` whitelist
# (CI / test / docs / planning-only changes). They can't
# affect the published package's behavior, so picking them
# into a hotfix is meaningless — and picking workflow-file
# changes specifically would also fail the push step because
# the default GITHUB_TOKEN lacks the `workflow` scope. The
# shipped-paths filter is the precise root cause: bug #2980.
# Operators reviewing the run summary need these distinct so
# the manual-review queue (CONFLICT_SKIPPED) isn't buried in
# the noise from the other two buckets.
POLICY_SKIPPED=""
CONFLICT_SKIPPED=""
NON_SHIPPED_SKIPPED=""
while IFS= read -r SHA; do
[ -z "$SHA" ] && continue
SUBJECT=$(git log -1 --format='%s' "$SHA")
if echo "$SUBJECT" | grep -qE '^(fix|chore)(\([^)]+\))?!?: '; then
# Merge commits with fix:/chore: titles can't be cherry-picked
# without `-m <parent>` and we can't pick the parent
# automatically. They fail BEFORE entering cherry-pick state
# (no CHERRY_PICK_HEAD), so an unconditional `--skip` would
# then fail and brick the loop. Skip them upfront with a
# distinct reason. Bug #2968 / CodeRabbit on PR #2970.
PARENT_COUNT=$(git rev-list --parents -n 1 "$SHA" | awk '{print NF - 1}')
if [ "$PARENT_COUNT" -gt 1 ]; then
REASON="merge commit — manual -m parent selection required"
echo "↷ skipping $SHA — $REASON"
CONFLICT_SKIPPED="${CONFLICT_SKIPPED}- \`${SHA}\` ${SUBJECT} ($REASON)"$'\n'
continue
fi
# Pre-pick guard: a hotfix release can only be affected
# by commits whose diff intersects the npm tarball's
# shipped paths (package.json `files` whitelist plus
# package.json itself, which `npm pack` always
# includes). Commits that touch only CI workflows,
# tests, docs, or planning artifacts cannot change what
# ships, so picking them into a hotfix is meaningless.
# As a side benefit, this excludes
# `.github/workflows/*` changes whose push would
# otherwise be rejected by GitHub because the default
# GITHUB_TOKEN lacks the `workflow` scope. The filter
# is implemented in
# scripts/diff-touches-shipped-paths.cjs rather than
# inline so the rules (read package.json `files`,
# treat entries as file-OR-directory prefix, the
# `package.json`-always-shipped rule) are
# unit-testable. Bug #2980.
#
# Use $CLASSIFIER (staged at workflow-start, before
# `git checkout -b ... "$BASE_TAG"` swapped the working
# tree) rather than `scripts/...` directly — base tags
# older than #2980 don't have the classifier in their
# tree. Capture the exit code via PIPESTATUS and
# dispatch on it: 0 = shipped, 1 = not shipped, 2+ =
# classifier error → fail-fast (don't silently treat
# tooling errors as informational skips). Bug #2983.
#
# PIPESTATUS capture must happen IMMEDIATELY after the
# pipeline — the previous form (`pipeline || true; RC=
# ${PIPESTATUS[1]}`) had a subtle bug: when the
# pipeline fails (exit 1 or 2 — exactly the cases we
# care about), `|| true` runs `true` as a one-command
# pipeline, overwriting PIPESTATUS to (0). The fix is
# to wrap the pipeline in `set +e`/`set -e` and snapshot
# PIPESTATUS into a local array on the very next line.
# CodeRabbit on PR #2984.
set +e
git diff-tree --no-commit-id --name-only -r "$SHA" \
| node "$CLASSIFIER"
PIPE_RC=("${PIPESTATUS[@]}")
set -e
DIFFTREE_RC="${PIPE_RC[0]}"
CLASSIFIER_RC="${PIPE_RC[1]}"
if [ "$DIFFTREE_RC" -ne 0 ]; then
echo "::error::git diff-tree failed for $SHA (exit $DIFFTREE_RC) — refusing to classify on incomplete input."
exit "$DIFFTREE_RC"
fi
case "$CLASSIFIER_RC" in
0) ;;
1)
REASON="touches no shipped paths (CI / test / docs / planning only)"
echo "↷ skipping $SHA — $REASON"
NON_SHIPPED_SKIPPED="${NON_SHIPPED_SKIPPED}- \`${SHA}\` ${SUBJECT}"$'\n'
continue
;;
*)
echo "::error::shipped-paths classifier failed for $SHA (exit $CLASSIFIER_RC). Refusing to silently skip — bug #2983."
exit "$CLASSIFIER_RC"
;;
esac
echo "→ cherry-picking $SHA $SUBJECT"
# Pin merge.conflictStyle=merge on the cherry-pick so the
# awk classifier below sees deterministic marker shapes —
# diff3/zdiff3 would inject `||||||| ancestor` lines into
# the HEAD section and cause context-missing conflicts to
# misclassify as real. Bug #2966.
if ! git -c merge.conflictStyle=merge cherry-pick -x --allow-empty --keep-redundant-commits "$SHA"; then
# Full automation policy (bug #2968): any conflict the
# cherry-pick can't auto-resolve is skipped, not aborted.
# The hotfix run completes with whatever applies cleanly;
# the CONFLICT_SKIPPED list below becomes the operator's
# review queue (see "Cherry-pick summary" in the run
# summary).
#
# Classify the conflict for the skip reason (operator-
# facing diagnostic — doesn't change control flow):
# - context absent at base: HEAD section in every
# conflict marker is empty (the picked commit modifies
# code that doesn't exist at the base). Bug #2966.
# - merge conflict: HEAD section has content (both base
# and patch want different content for the same
# region). Typical when the base tag was cut from a
# branch that has diverged from main. Bug #2968.
UNMERGED=$(git diff --name-only --diff-filter=U)
REASON="merge conflict — manual review"
if [ -n "$UNMERGED" ]; then
ALL_EMPTY_HEAD=true
while IFS= read -r CONFLICTED; do
[ -z "$CONFLICTED" ] && continue
# Guard the classifier against degenerate cases that
# would otherwise skew toward "context absent" (the
# auto-skip path) when they're actually unsafe to skip:
# - file missing or unreadable: don't pretend the
# conflict is benign; treat as real.
# - file listed as unmerged but no conflict markers
# present: anomalous git state; treat as real so
# the pick goes to the manual-review queue.
# CodeRabbit on PR #2970.
if [ ! -r "$CONFLICTED" ] || ! grep -q '^<<<<<<< ' "$CONFLICTED" 2>/dev/null; then
ALL_EMPTY_HEAD=false
break
fi
REAL=$(awk '
/^<<<<<<< / { in_head=1; head=""; next }
/^=======$/ && in_head { in_head=0; next }
/^>>>>>>> / {
if (head ~ /[^[:space:]]/) { print "real"; exit }
head=""
next
}
in_head { head = head $0 "\n" }
' "$CONFLICTED" 2>/dev/null || echo "real")
if [ "$REAL" = "real" ]; then
ALL_EMPTY_HEAD=false
break
fi
done <<< "$UNMERGED"
if [ "$ALL_EMPTY_HEAD" = "true" ]; then
REASON="context absent at base"
fi
fi
echo "↷ skipping $SHA — $REASON"
# Guard `--skip`: cherry-pick can fail before entering the
# conflict state (e.g. unreadable commit, empty-without-
# --allow-empty edge cases the flag misses). Calling
# `--skip` outside an in-progress cherry-pick exits non-
# zero and would brick the loop. CodeRabbit on PR #2970.
if git rev-parse -q --verify CHERRY_PICK_HEAD >/dev/null 2>&1; then
git cherry-pick --skip
fi
CONFLICT_SKIPPED="${CONFLICT_SKIPPED}- \`${SHA}\` ${SUBJECT} ($REASON)"$'\n'
continue
fi
INCLUDED="${INCLUDED}- \`${SHA}\` ${SUBJECT}"$'\n'
else
POLICY_SKIPPED="${POLICY_SKIPPED}- \`${SHA}\` ${SUBJECT}"$'\n'
fi
done <<< "$ORDERED"
{
echo "## Cherry-pick summary"
echo ""
echo "Base: \`$BASE_TAG\` → Branch: \`$BRANCH\`$([ "$DRY_RUN" = "true" ] && echo " (DRY RUN — local only)")"
echo ""
if [ -n "$INCLUDED" ]; then
echo "### Included (fix/chore)"
echo ""
echo "$INCLUDED"
else
echo "_No fix/chore commits to include._"
fi
if [ -n "$NON_SHIPPED_SKIPPED" ]; then
echo "### Skipped — touches no shipped paths (informational)"
echo ""
echo "These fix/chore commits don't touch any path in the npm tarball's \`files\` whitelist (or \`package.json\`), so they cannot change the published package's behavior. CI / test / docs / planning-only changes belong on \`main\`, not in a hotfix. No action needed."
echo ""
echo "$NON_SHIPPED_SKIPPED"
fi
if [ -n "$CONFLICT_SKIPPED" ]; then
echo "### Skipped — cherry-pick conflict (manual review)"
echo ""
echo "$CONFLICT_SKIPPED"
fi
if [ -n "$POLICY_SKIPPED" ]; then
echo "### Not auto-included (feat/refactor/docs/etc)"
echo ""
echo "$POLICY_SKIPPED"
fi
} >> "$GITHUB_STEP_SUMMARY"
fi
fi
# Bump version on the branch (committed) so downstream install-smoke +
# release jobs build the correct version. The release job's own in-tree
# bump becomes a no-op when the file already has the right version.
CURRENT=$(node -p "require('./package.json').version")
if [ "$CURRENT" != "$VERSION" ]; then
npm version "$VERSION" --no-git-tag-version
git add package.json package-lock.json
if [ -f sdk/package.json ]; then
(cd sdk && npm version "$VERSION" --no-git-tag-version)
git add sdk/package.json
[ -f sdk/package-lock.json ] && git add sdk/package-lock.json
fi
git commit -m "chore: bump version to $VERSION for hotfix"
fi
if [ "$DRY_RUN" != "true" ]; then
git push origin "$BRANCH"
else
echo "DRY RUN — cherry-picks applied locally; branch not pushed. Downstream install-smoke will run against \`$BASE_TAG\` (the cherry-pick verification above is the dry-run signal)."
fi
- name: Determine effective ref
id: out
env:
ACTION: ${{ inputs.action }}
INPUT_REF: ${{ inputs.ref }}
DRY_RUN: ${{ inputs.dry_run }}
BASE_TAG: ${{ steps.hotfix.outputs.base_tag }}
BRANCH: ${{ steps.hotfix.outputs.branch }}
run: |
if [ "$ACTION" = "hotfix" ]; then
if [ "$DRY_RUN" = "true" ]; then
echo "ref=$BASE_TAG" >> "$GITHUB_OUTPUT"
else
echo "ref=$BRANCH" >> "$GITHUB_OUTPUT"
fi
else
echo "ref=$INPUT_REF" >> "$GITHUB_OUTPUT"
fi
# Cross-platform install validation gate (parity with release.yml).
install-smoke:
needs: prepare
permissions:
contents: read
uses: ./.github/workflows/install-smoke.yml
with:
ref: ${{ needs.prepare.outputs.ref }}
release:
needs: [prepare, install-smoke]
runs-on: ubuntu-latest
timeout-minutes: 15
permissions:
contents: write # tag + push + GitHub Release
id-token: write # provenance
# The merge-back PR step (and the pull-request scope it required)
# was removed in #2983 — auto-cherry-pick hotfix flow only picks
# commits already on main, so there's nothing to merge back.
environment: npm-publish
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
ref: ${{ needs.prepare.outputs.ref }}
- uses: actions/setup-node@53b83947a5a98c8d113130e565377fae1a50d02f # v6.3.0
with:
node-version: ${{ env.NODE_VERSION }}
registry-url: 'https://registry.npmjs.org'
cache: 'npm'
- name: Determine version
id: ver
env:
ACTION: ${{ inputs.action }}
INPUT_TAG: ${{ inputs.tag }}
INPUT_OVERRIDE: ${{ inputs.version }}
run: |
set -e
# Hotfix forces version=inputs.version and dist-tag=latest.
if [ "$ACTION" = "hotfix" ]; then
if [ -z "$INPUT_OVERRIDE" ]; then
echo "::error::action=hotfix requires the 'version' input"
exit 1
fi
VERSION="$INPUT_OVERRIDE"
EFFECTIVE_TAG="latest"
echo "version=$VERSION" >> "$GITHUB_OUTPUT"
echo "tag=$EFFECTIVE_TAG" >> "$GITHUB_OUTPUT"
echo "→ Hotfix: will publish v${VERSION} to dist-tag '${EFFECTIVE_TAG}'"
exit 0
fi
RAW=$(node -p "require('./package.json').version")
BASE=$(echo "$RAW" | sed 's/-.*//')
if [ -n "$INPUT_OVERRIDE" ]; then
VERSION="$INPUT_OVERRIDE"
else
case "$INPUT_TAG" in
dev)
N=1
while git tag -l "v${BASE}-dev.${N}" | grep -q .; do
N=$((N + 1))
done
VERSION="${BASE}-dev.${N}"
;;
next)
N=1
while git tag -l "v${BASE}-rc.${N}" | grep -q .; do
N=$((N + 1))
done
VERSION="${BASE}-rc.${N}"
;;
latest)
VERSION="$BASE"
;;
*)
echo "::error::Unknown tag '$INPUT_TAG' (expected dev|next|latest)"
exit 1
;;
esac
fi
echo "version=$VERSION" >> "$GITHUB_OUTPUT"
echo "tag=$INPUT_TAG" >> "$GITHUB_OUTPUT"
echo "→ Will publish v${VERSION} to dist-tag '${INPUT_TAG}'"
# Reconciliation mode: if version is already on npm (a prior run
# published successfully but a downstream step failed), don't hard-fail.
# Set a flag and skip the publish step below; tag/release/PR/dist-tag
# steps still execute so the rerun can finish reconciling state.
- name: Detect prior publish (reconciliation mode)
id: prior_publish
env:
VERSION: ${{ steps.ver.outputs.version }}
run: |
EXISTING=$(npm view get-shit-done-cc@"$VERSION" version 2>/dev/null || true)
if [ -n "$EXISTING" ]; then
echo "::warning::get-shit-done-cc@${VERSION} is already on the registry — entering reconciliation mode (skip publish, continue with tag/release/PR/dist-tag)."
echo "skip_publish=true" >> "$GITHUB_OUTPUT"
else
echo "skip_publish=false" >> "$GITHUB_OUTPUT"
fi
# Tolerant tag-existence check (matches release.yml pattern). An
# operator re-running after a mid-flight publish-step failure should
# not be blocked just because the tag step succeeded last time. Only
# error if the existing tag points at a different commit than HEAD.
- name: Check git tag (skip if matches HEAD, error if mismatched)
env:
VERSION: ${{ steps.ver.outputs.version }}
run: |
if git rev-parse -q --verify "refs/tags/v${VERSION}" >/dev/null; then
EXISTING_SHA=$(git rev-parse "refs/tags/v${VERSION}")
HEAD_SHA=$(git rev-parse HEAD)
if [ "$EXISTING_SHA" != "$HEAD_SHA" ]; then
echo "::error::git tag v${VERSION} already exists pointing at ${EXISTING_SHA}, but HEAD is ${HEAD_SHA}"
exit 1
fi
echo "::notice::tag v${VERSION} already exists at HEAD; tag step will skip"
fi
- name: Configure git identity
run: |
git config user.name "github-actions[bot]"
git config user.email "41898282+github-actions[bot]@users.noreply.github.com"
- name: Bump in-tree version (not committed)
env:
VERSION: ${{ steps.ver.outputs.version }}
run: |
# --allow-same-version: prepare may have already committed this bump
# on the hotfix branch (release checks out BRANCH in real runs,
# BASE_TAG in dry-runs — only the latter has the older version).
npm version "$VERSION" --no-git-tag-version --allow-same-version
cd sdk && npm version "$VERSION" --no-git-tag-version --allow-same-version
- name: Install dependencies
run: npm ci
- name: Run full test suite with coverage (parity with release.yml)
run: npm run test:coverage
- name: Build SDK dist for tarball
run: npm run build:sdk
- name: Verify CC tarball ships sdk/dist/cli.js (bug #2647 guard)
run: bash scripts/verify-tarball-sdk-dist.sh
- name: Pack SDK as tarball and bundle into CC source tree
env:
VERSION: ${{ steps.ver.outputs.version }}
run: |
set -e
cd sdk
npm pack
# npm pack emits gsd-build-sdk-<version>.tgz in the cwd
TARBALL="gsd-build-sdk-${VERSION}.tgz"
if [ ! -f "$TARBALL" ]; then
echo "::error::Expected $TARBALL but npm pack did not produce it. Listing sdk/:"
ls -la
exit 1
fi
mkdir -p ../sdk-bundle
mv "$TARBALL" ../sdk-bundle/gsd-sdk.tgz
cd ..
ls -la sdk-bundle/
- name: Add sdk-bundle to CC files whitelist (in-tree, not committed)
run: |
node <<'NODE'
const fs = require('fs');
const pkg = JSON.parse(fs.readFileSync('package.json', 'utf8'));
if (!Array.isArray(pkg.files)) {
console.error('::error::package.json files is not an array');
process.exit(1);
}
if (!pkg.files.includes('sdk-bundle')) {
pkg.files.push('sdk-bundle');
fs.writeFileSync('package.json', JSON.stringify(pkg, null, 2) + '\n');
console.log('Added sdk-bundle/ to package.json files whitelist');
} else {
console.log('sdk-bundle/ already in files whitelist');
}
NODE
- name: Verify CC tarball will contain sdk-bundle/gsd-sdk.tgz
run: |
set -e
TARBALL=$(npm pack --ignore-scripts 2>/dev/null | tail -1)
if [ -z "$TARBALL" ] || [ ! -f "$TARBALL" ]; then
echo "::error::npm pack produced no tarball"
exit 1
fi
echo "Inspecting $TARBALL for sdk-bundle/gsd-sdk.tgz:"
if ! tar -tzf "$TARBALL" | grep -q "package/sdk-bundle/gsd-sdk.tgz"; then
echo "::error::CC tarball is missing package/sdk-bundle/gsd-sdk.tgz"
tar -tzf "$TARBALL" | grep -E "sdk-bundle|sdk/dist" | head -20
exit 1
fi
echo "✅ CC tarball contains sdk-bundle/gsd-sdk.tgz"
rm -f "$TARBALL"
- name: Dry-run publish validation
# Skip the rehearsal when the version is already on npm
# (reconciliation mode). `npm publish --dry-run` contacts the
# registry and fails with "You cannot publish over the
# previously published versions" if the version exists, even
# though no actual publish would be attempted. The real publish
# step (further down) is gated on the same condition; gate the
# rehearsal too so re-runs of an already-published hotfix don't
# fail here on a check that doesn't apply. Bug #2987.
if: ${{ steps.prior_publish.outputs.skip_publish != 'true' }}
env:
TAG: ${{ steps.ver.outputs.tag }}
NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
run: npm publish --dry-run --tag "$TAG"
- name: Tag and push
if: ${{ !inputs.dry_run }}
env:
VERSION: ${{ steps.ver.outputs.version }}
run: |
if git rev-parse -q --verify "refs/tags/v${VERSION}" >/dev/null; then
echo "Tag v${VERSION} already exists at HEAD (per pre-flight check); skipping git tag step"
else
git tag "v${VERSION}"
fi
git push origin "v${VERSION}"
- name: Publish to npm (CC bundle, SDK included as both loose tree and .tgz)
if: ${{ !inputs.dry_run && steps.prior_publish.outputs.skip_publish != 'true' }}
env:
TAG: ${{ steps.ver.outputs.tag }}
NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
run: npm publish --provenance --access public --tag "$TAG"
# Keep `next` from going stale relative to `latest`. When publishing a
# stable release, also point `next` at it so users on `@next` don't
# get stuck on an older pre-release than what's now stable. Parity
# with release.yml#finalize "Clean up next dist-tag" step.
- name: Re-point next dist-tag at the new latest (only when tag=latest)
if: ${{ !inputs.dry_run && steps.ver.outputs.tag == 'latest' }}
env:
VERSION: ${{ steps.ver.outputs.version }}
NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
run: |
npm dist-tag add "get-shit-done-cc@${VERSION}" next
echo "✅ next dist-tag re-pointed to v${VERSION} (matches latest)"
- name: Create GitHub Release (idempotent)
if: ${{ !inputs.dry_run }}
env:
GH_TOKEN: ${{ github.token }}
VERSION: ${{ steps.ver.outputs.version }}
TAG: ${{ steps.ver.outputs.tag }}
run: |
# Per-tag release flags:
# dev, next → --prerelease (won't be highlighted as the latest release on the repo page)
# latest → --latest (becomes the highlighted release)
# Idempotent: if release already exists (rerun after a transient
# downstream failure), edit the latest flag instead of failing.
if gh release view "v${VERSION}" >/dev/null 2>&1; then
echo "GitHub Release v${VERSION} already exists; reconciling --latest flag"
if [ "$TAG" = "latest" ]; then
gh release edit "v${VERSION}" --latest || true
fi
elif [ "$TAG" = "latest" ]; then
gh release create "v${VERSION}" \
--title "v${VERSION}" \
--generate-notes \
--latest
else
gh release create "v${VERSION}" \
--title "v${VERSION}" \
--generate-notes \
--prerelease
fi
echo "✅ GitHub Release v${VERSION} ready"
# Merge-back PR step removed — bug #2983.
#
# The auto-cherry-pick hotfix flow only picks commits already on
# main (`git cherry HEAD origin/main` outputs unmerged commits;
# we filter to fix:/chore: from main). By construction every code
# commit on the hotfix branch is already on main. The only
# hotfix-branch-only commit is `chore: bump version to X.Y.Z for
# hotfix`, which would either no-op against main (already past
# X.Y.Z) or rewind main's in-progress version — strictly
# counterproductive in either case.
#
# The original merge-back step also failed in production with
# `GitHub Actions is not permitted to create or approve pull
# requests (createPullRequest)` (org policy), but even if the
# policy were lifted the PR would have nothing useful to merge.
# Run 25232968975 was the trigger for removal.
- name: Verify publish landed on registry
if: ${{ !inputs.dry_run }}
env:
VERSION: ${{ steps.ver.outputs.version }}
TAG: ${{ steps.ver.outputs.tag }}
run: |
PUBLISHED="NOT_FOUND"
for delay in 5 10 20 30 45; do
PUBLISHED=$(npm view get-shit-done-cc@"$VERSION" version 2>/dev/null || echo "NOT_FOUND")
if [ "$PUBLISHED" = "$VERSION" ]; then
break
fi
echo "Waiting ${delay}s for registry to catch up (saw: $PUBLISHED)..."
sleep "$delay"
done
if [ "$PUBLISHED" != "$VERSION" ]; then
echo "::error::Version $VERSION did not appear on the registry within timeout"
exit 1
fi
TAG_VERSION=$(npm view get-shit-done-cc dist-tags."$TAG" 2>/dev/null || echo "NOT_FOUND")
if [ "$TAG_VERSION" != "$VERSION" ]; then
echo "::error::dist-tag '$TAG' resolves to '$TAG_VERSION', expected '$VERSION'"
exit 1
fi
echo "✅ get-shit-done-cc@${VERSION} live on dist-tag '${TAG}'"
- name: Summary
env:
ACTION: ${{ inputs.action }}
VERSION: ${{ steps.ver.outputs.version }}
TAG: ${{ steps.ver.outputs.tag }}
BASE_TAG: ${{ needs.prepare.outputs.base_tag }}
BRANCH: ${{ needs.prepare.outputs.ref }}
DRY_RUN: ${{ inputs.dry_run }}
run: |
{
if [ "$ACTION" = "hotfix" ]; then
echo "## Release SDK Bundle (hotfix): v${VERSION} → @${TAG}"
echo ""
echo "- Base (cumulative-fix anchor): \`${BASE_TAG}\`"
echo "- Branch: \`${BRANCH}\`"
else
echo "## Release SDK Bundle: v${VERSION} → @${TAG}"
fi
echo ""
if [ "$DRY_RUN" = "true" ]; then
echo "**DRY RUN** — npm publish, git tag, push, and GitHub Release were skipped."
else
echo "- Published \`get-shit-done-cc@${VERSION}\` to dist-tag \`${TAG}\`"
echo "- SDK bundled inside the CC tarball at:"
echo " - \`sdk/dist/cli.js\` (loose tree, consumed by \`bin/gsd-sdk.js\` shim)"
echo " - \`sdk-bundle/gsd-sdk.tgz\` (npm-installable artifact)"
echo "- Git tag \`v${VERSION}\` pushed"
echo "- GitHub Release \`v${VERSION}\` created"
if [ "$TAG" = "latest" ]; then
echo "- \`next\` dist-tag re-pointed at \`v${VERSION}\` (kept current with \`latest\`)"
fi
if [ "$ACTION" = "hotfix" ]; then
# Auto-cherry-pick hotfixes only pick commits already on
# main, so there's nothing to merge back. The merge-back
# PR step was removed in #2983; this line surfaces the
# explicit non-action so operators don't expect a PR
# that was never opened.
echo "- No merge-back PR (auto-picked commits are already on main)"
fi
echo "- Install: \`npm install -g get-shit-done-cc@${TAG}\`"
fi
} >> "$GITHUB_STEP_SUMMARY"

View File

@@ -189,8 +189,11 @@ jobs:
git add package.json package-lock.json sdk/package.json
git commit -m "chore: bump to ${PRE_VERSION}"
- name: Build SDK
run: cd sdk && npm ci && npm run build
- name: Build SDK dist for tarball
run: npm run build:sdk
- name: Verify tarball ships sdk/dist/cli.js (bug #2647)
run: bash scripts/verify-tarball-sdk-dist.sh
- name: Dry-run publish validation
run: |
@@ -330,8 +333,11 @@ jobs:
npm ci
npm run test:coverage
- name: Build SDK
run: cd sdk && npm ci && npm run build
- name: Build SDK dist for tarball
run: npm run build:sdk
- name: Verify tarball ships sdk/dist/cli.js (bug #2647)
run: bash scripts/verify-tarball-sdk-dist.sh
- name: Dry-run publish validation
run: |
@@ -342,23 +348,32 @@ jobs:
- name: Create PR to merge release back to main
if: ${{ !inputs.dry_run }}
continue-on-error: true
env:
GH_TOKEN: ${{ github.token }}
BRANCH: ${{ needs.validate-version.outputs.branch }}
VERSION: ${{ inputs.version }}
run: |
EXISTING_PR=$(gh pr list --base main --head "$BRANCH" --state open --json number --jq '.[0].number')
# Non-fatal: repos that disable "Allow GitHub Actions to create and
# approve pull requests" cause this step to fail with GraphQL 403.
# The release itself (tag + npm publish + GitHub Release) must still
# proceed. Open the merge-back PR manually afterwards with:
# gh pr create --base main --head release/${VERSION} \
# --title "chore: merge release v${VERSION} to main"
EXISTING_PR=$(gh pr list --base main --head "$BRANCH" --state open --json number --jq '.[0].number' 2>/dev/null || echo "")
if [ -n "$EXISTING_PR" ]; then
echo "PR #$EXISTING_PR already exists; updating"
gh pr edit "$EXISTING_PR" \
--title "chore: merge release v${VERSION} to main" \
--body "Merge release branch back to main after v${VERSION} stable release."
--body "Merge release branch back to main after v${VERSION} stable release." \
|| echo "::warning::Could not update merge-back PR (likely PR-creation policy disabled). Open it manually after release."
else
gh pr create \
--base main \
--head "$BRANCH" \
--title "chore: merge release v${VERSION} to main" \
--body "Merge release branch back to main after v${VERSION} stable release."
--body "Merge release branch back to main after v${VERSION} stable release." \
|| echo "::warning::Could not create merge-back PR (likely PR-creation policy disabled). Open it manually after release."
fi
- name: Tag and push

View File

@@ -24,19 +24,20 @@ jobs:
echo "found=false" >> "$GITHUB_OUTPUT"
fi
- name: Comment and fail if no issue link
- name: Comment, close, and fail if no issue link
if: steps.check.outputs.found == 'false'
uses: actions/github-script@3a2844b7e9c422d3c10d287c895573f7108da1b3 # v9.0.0
with:
# Uses GitHub API SDK — no shell string interpolation of untrusted input
script: |
const repoUrl = `https://github.com/${context.repo.owner}/${context.repo.repo}`;
const prNumber = context.payload.pull_request.number;
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: context.payload.pull_request.number,
issue_number: prNumber,
body: [
'## Missing issue link',
'## Missing issue link — PR auto-closed',
'',
'This PR does not reference an issue. **All PRs must link to an open issue** using a closing keyword in the PR body:',
'',
@@ -46,7 +47,13 @@ jobs:
'',
`If no issue exists for this change, [open one first](${repoUrl}/issues/new/choose), then update this PR body with the reference.`,
'',
'This PR will remain blocked until a valid `Closes #NNN`, `Fixes #NNN`, or `Resolves #NNN` line is present in the description.',
'To resume work after fixing the body: edit the PR description to add a valid `Closes #NNN`, `Fixes #NNN`, or `Resolves #NNN` line, then click **Reopen pull request**. The workflow will re-evaluate on reopen.',
].join('\n')
});
core.setFailed('PR body must contain a closing issue reference (e.g. "Closes #123")');
await github.rest.pulls.update({
owner: context.repo.owner,
repo: context.repo.repo,
pull_number: prNumber,
state: 'closed',
});
core.setFailed('PR body must contain a closing issue reference (e.g. "Closes #123") — PR closed.');

View File

@@ -16,6 +16,21 @@ concurrency:
cancel-in-progress: true
jobs:
# Static lint: no source-grep tests in the test suite.
# Runs once (not per matrix node version) since it is a file-content check.
lint-tests:
runs-on: ubuntu-latest
timeout-minutes: 2
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Set up Node.js
uses: actions/setup-node@53b83947a5a98c8d113130e565377fae1a50d02f # v6.3.0
with:
node-version: 24
- name: Lint — no source-grep tests
shell: bash
run: node scripts/lint-no-source-grep.cjs
test:
runs-on: ${{ matrix.os }}
timeout-minutes: 10
@@ -35,6 +50,31 @@ jobs:
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
# Fetch full history so we can merge origin/main for stale-base detection.
fetch-depth: 0
# GitHub's `refs/pull/N/merge` is cached against the recorded merge-base.
# When main advances after a PR is opened, the cache stays stale and CI
# runs against the pre-advance state — hiding bugs that are already fixed
# on trunk and surfacing type errors that were introduced and then patched
# on main in between. Explicitly merge current origin/main here so tests
# always run against the latest trunk.
- name: Rebase check — merge origin/main into PR head
if: github.event_name == 'pull_request'
shell: bash
run: |
set -euo pipefail
git config user.email "ci@gsd-build"
git config user.name "CI Rebase Check"
git fetch origin main
if ! git merge --no-edit --no-ff origin/main; then
echo "::error::This PR cannot cleanly merge origin/main. Rebase your branch onto current main and push again."
echo "::error::Conflicting files:"
git diff --name-only --diff-filter=U
git merge --abort
exit 1
fi
- name: Set up Node.js ${{ matrix.node-version }}
uses: actions/setup-node@53b83947a5a98c8d113130e565377fae1a50d02f # v6.3.0
@@ -45,6 +85,21 @@ jobs:
- name: Install dependencies
run: npm ci
- name: Build SDK dist (required by installer)
run: npm run build:sdk
# Seam contract gate: keep manifest -> generated aliases -> registry/CJS adapters aligned.
# Run once per workflow on the primary Linux node to avoid redundant matrix cost.
- name: SDK seam coverage tests
if: matrix.os == 'ubuntu-latest' && matrix.node-version == 24
shell: bash
run: cd sdk && npx vitest run src/query/command-seam-coverage.test.ts
- name: SDK generated alias artifact drift check
if: matrix.os == 'ubuntu-latest' && matrix.node-version == 24
shell: bash
run: node sdk/scripts/check-command-aliases-fresh.mjs
- name: Run tests with coverage
shell: bash
run: npm run test:coverage

View File

@@ -0,0 +1,104 @@
# Render agent definitions from templates at install/config-change time
**Source:** [#2758](https://github.com/gsd-build/get-shit-done/issues/2758)
**Decision:** wontfix — closed on the technical merits
**Date:** 2026-05-02
## Proposal summary
Move config-gated prose out of `agents/*.md` into `agents/templates/*.md.tmpl`,
rendered at install time and after `.planning/config.json` writes via a new
`gsd-sdk agents render` subcommand. Conditional branches resolve at render time
(deterministic code) instead of at inference time (LLM interpretation).
Three named benefits:
1. Token reduction proportional to disabled features.
2. Deterministic feature gating (impossible-by-construction vs. test-for).
3. Single source of truth for contributor-facing gating.
Cites PR #2279 (Codex/OpenCode model embedding at install time) as direct
precedent for compile-time embedding.
## Why GSD does not own this
### 1. The determinism claim is theoretical, not observed
The proposal's strongest argument is that config-gated branches in agent prose
are a determinism failure surface. The actual patterns in the codebase today are
already heavily mitigated:
- The `use_worktrees` branch in `gsd-executor` is resolved deterministically via
`gsd-sdk query config-get` in bash — it is not LLM-interpreted.
- "Skip if `workflow.X` is `false`" prose patterns are short, stable, and
follow a uniform "missing key = enabled" convention. There is no documented
history of LLMs running disabled checks or skipping enabled ones because of
this prose.
A theoretical failure surface should not be traded for a real, high-risk
patch-migration surface (`gsd-local-patches/` rebase logic, by the reporter's own
admission "the highest-risk piece of the change"). The reporter was asked for
documented evidence; none was provided.
### 2. Token waste is small and bounded
The codebase has roughly 5 `workflow.*` toggle references in agent files and
~20 "Skip if" conditional-prose patterns total — most 12 sentences. The
"real spend across multi-phase milestones" claim was not measured against
`gsd-context-monitor` output despite being asked. Without a measured baseline,
the token-savings argument is asserted rather than demonstrated, and the savings
ceiling on ~20 short conditionals is small enough that it does not justify a new
template-and-rendering subsystem with a CI-enforced template/generated split.
### 3. The deterministic-gating need is already served
PR #2279 established orchestrator-time config embedding for the cases that
genuinely need deterministic resolution (model selection, reasoning effort,
worktree mode). That mechanism is the right layer for orchestration-time
decisions and can be extended toggle-by-toggle along the existing path without
introducing a parallel templating subsystem. The proposal's own "Alternative #1"
(continue the orchestrator-embedding pattern) was rejected on the grounds that
agent-internal conditionals belong in the agent layer, but the asks behind the
proposal — determinism, lower token cost — are equally satisfied by extending
PR #2279 incrementally without a second mechanism.
Adding a templating layer alongside orchestrator-embedding means two mechanisms
own the same problem. The proposal does not specify a partition rule, and the
reporter did not respond when asked for one.
### 4. Patch-migration risk is disproportionate to benefit
The `/gsd-reapply-patches` three-way-merge migration for `gsd-local-patches/`
is, in the proposal's own words, the highest-risk piece of the change. It exists
solely to absorb a contributor-workflow shift — the user-facing surface is
unchanged. Risk that flows entirely from internal restructuring, where the
benefit is unmeasured token savings and a theoretical determinism gain, is the
wrong trade.
The reduced-scope variant (Alternative #5: fresh installs only, defer the
migration) avoids that specific risk but still ships a parallel mechanism for
benefits that remain unmeasured and that PR #2279's path can absorb.
## Re-open criteria
This may be revisited if a contributor:
- Provides measured token deltas via `gsd-context-monitor` against a
representative all-toggles-off config, and the delta is materially larger
than what extending PR #2279's orchestrator-embedding path one toggle at a
time would produce.
- Documents a real LLM misinterpretation of an existing toggle conditional
(executor ignored `workflow.use_worktrees: false`, verifier ran when
`workflow.verifier: false`, etc.) — not a projected failure mode.
- Proposes a clear partition rule between orchestrator-time embedding (PR #2279)
and any new install-time templating layer, so the two mechanisms do not
overlap.
## Related
- PR #2279 — Codex/OpenCode model embedding at install time (the established
precedent for deterministic compile-time embedding into agent files)
- v1.37.0 release notes — shared-boilerplate extraction (reference files for
mandatory-initial-read, project-skills-discovery)
- `get-shit-done/workflows/` — workflow-level config embedding before subagent
spawn (the path of least friction for incremental deterministic gating)

View File

@@ -0,0 +1,56 @@
# Temporal context as a first-class GSD signal
**Source:** [#2756](https://github.com/gsd-build/get-shit-done/issues/2756)
**Decision:** wontfix — closed without further engagement
**Date:** 2026-05-02
## Proposal summary
Reporter proposed treating idle-time-between-turns as a first-class context signal in
GSD. Three flavors floated across the issue:
1. **Passive** — block at session resume injecting "you've been idle Nh, here's what was
open" into the orchestrator prompt.
2. **Active**`/resume-context` slash command.
3. **Retrospective**`HANDOFF.json` written at session end, read at next start.
Framed initially as a `claude-inject-idle-time` plugin, with a request that GSD treat
the pattern as core.
## Why GSD does not own this
- **Subagent gap unsolved.** Passive injection lands in the orchestrator's context
only. Subagents (the workers that actually do GSD's planning, execution, verification)
spawn fresh and never see the temporal signal. The proposal does not solve this, and
any GSD-core integration would inherit the gap. Until the subagent boundary is
addressed, "first-class temporal context" is at best a partial feature.
- **`HANDOFF.json` duplicates existing artifacts.** GSD already persists session
continuity through `.planning/state/*` and per-phase artifacts (PLAN.md, RESEARCH.md,
REVIEW.md, VERIFICATION.md). A separate handoff file would either drift from those or
redundantly mirror them. The right primitive for "what was I doing" already exists.
- **Statusline / TUI re-entry is platform-level, not GSD-level.** A statusline showing
idle time belongs in Claude Code itself or in a thin user plugin, not in GSD's phase
machinery.
- **Scope is unstable.** Reporter agreed with the narrowed minimum ask ("doc mention
only, rest opt-in"), then partially retracted it in a follow-up comment ("very
integral to myself"). The maintainer asked which version of the ask should move
forward; reporter did not respond.
## Re-open criteria
This may be revisited if a reporter:
- Engages with the subagent-gap problem and proposes a concrete mechanism for
temporal context to reach subagents (not just the orchestrator).
- Demonstrates a use case `.planning/state/*` provably cannot serve.
- Commits to a single stable scope (doc mention OR core integration OR plugin
reference) rather than oscillating between them mid-thread.
A drive-by enhancement request that the author does not return to engage with after
maintainer questions is not actionable. Future proposers: please plan to participate
through to a triage decision rather than dropping an issue and moving on.
## Related
- `.planning/state/` — existing session-continuity artifacts
- `get-shit-done/references/` — where any future plugin-interface doc would live

File diff suppressed because it is too large Load Diff

41
CONTEXT.md Normal file
View File

@@ -0,0 +1,41 @@
# Context
## Domain terms
### Dispatch Policy Module
Module owning dispatch error mapping, fallback policy, timeout classification, and CLI exit mapping contract.
Canonical error kind set:
- `unknown_command`
- `native_failure`
- `native_timeout`
- `fallback_failure`
- `validation_error`
- `internal_error`
### Command Definition Module
Canonical command metadata Interface powering alias, catalog, and semantics generation.
### Query Runtime Context Module
Module owning query-time context resolution for `projectDir` and `ws`, including precedence and validation policy used by query adapters.
### Native Dispatch Adapter Module
Adapter Module that satisfies native query dispatch at the Dispatch Policy seam, so policy modules consume a focused dispatch Interface instead of closure-wired call sites.
### Query CLI Output Module
Module owning projection from dispatch results/errors to CLI `{ exitCode, stdoutChunks, stderrLines }` output contract.
### Query Execution Policy Module
Module owning query transport routing policy projection (`preferNative`, fallback policy, workstream subprocess forcing) at execution seam.
### Query Subprocess Adapter Module
Adapter Module owning subprocess execution contract for query commands (JSON/raw invocation, `@file:` indirection parsing, timeout/exit error projection).
### Query Command Resolution Module
Canonical command normalization and resolution Interface (`query-command-resolution-strategy`) used by internal query/transport paths after dead-wrapper convergence.
### Command Topology Module
Module owning command resolution, policy projection (`mutation`, `output_mode`), unknown-command diagnosis, and handler Adapter binding at one seam for query dispatch.
### Query Pre-Project Config Policy Module
Module policy that defines query-time behavior when `.planning/config.json` is absent: use built-in defaults for parity-sensitive query Interfaces, and emit parity-aligned empty model ids for pre-project model resolution surfaces.

View File

@@ -81,6 +81,20 @@ PRs that arrive without a properly-labeled linked issue are closed automatically
## Pull Request Guidelines
### Architecture & Domain Standards (Maintainer-Defined)
The following files are maintainer-owned coding standards and must be treated as canonical when contributing:
- `CONTEXT.md` — domain language and module naming standards
- `docs/adr/` — Architecture Decision Records (ADRs) for accepted architectural decisions
Contributor requirements:
- Read `CONTEXT.md` before naming or refactoring modules/interfaces/seams.
- Use `CONTEXT.md` vocabulary consistently in code comments, tests, issue/PR text, and docs for the touched area.
- Check relevant ADRs in `docs/adr/` before proposing or implementing architectural changes.
- If a change intentionally revisits an ADR decision, call it out explicitly in the linked issue and PR rationale.
- Do not rewrite maintainer intent in `CONTEXT.md`/ADRs as part of drive-by cleanup; propose focused updates tied to approved scope.
**Every PR must link to an approved issue.** PRs without a linked issue are closed without review, no exceptions.
- **No draft PRs** — draft PRs are automatically closed. Only open a PR when it is complete, tested, and ready for review. If your work is not finished, keep it on your local branch until it is.
@@ -91,6 +105,23 @@ PRs that arrive without a properly-labeled linked issue are closed automatically
- **CI must pass** — all matrix jobs (Ubuntu × Node 22, 24; macOS × Node 24) must be green
- **Scope matches the approved issue** — if your PR does more than what the issue describes, the extra changes will be asked to be removed or moved to a new issue
## CHANGELOG Entries — Drop a Fragment
**Do not edit `CHANGELOG.md` directly.** Two PRs that both append to a `### Fixed` block always conflict on merge — git can't pick a serialization order without a human. Instead, every PR with user-facing changes drops a fragment file in `.changeset/`.
```bash
npm run changeset -- --type Fixed --pr <YOUR_PR_NUMBER> \
--body "**\`/gsd-foo\` no longer drops trailing slashes** — explain the user-visible change."
```
This writes `.changeset/<adjective>-<noun>-<noun>.md`. Three random words → concurrent PRs never collide. Allowed `type:` values follow [Keep a Changelog](https://keepachangelog.com/): `Added`, `Changed`, `Deprecated`, `Removed`, `Fixed`, `Security`.
Fragments are consolidated into `CHANGELOG.md` at release time by the release workflow. See [`.changeset/README.md`](.changeset/README.md) for the format spec and [#2975](https://github.com/gsd-build/get-shit-done/issues/2975) for the rationale.
**CI enforcement:** the `Changeset Required` workflow (`scripts/changeset/lint.cjs`) fails any PR that touches `bin/`, `get-shit-done/`, `agents/`, `commands/`, `hooks/`, or `sdk/src/` without a `.changeset/*.md` fragment.
**Opt-out:** PRs with no user-facing impact (test refactors, lint config changes, CI tweaks, formatting-only changes) can add the `no-changelog` label. The lint honors it. When unsure whether a change is user-facing, **add the fragment**.
## Testing Standards
All tests use Node.js built-in test runner (`node:test`) and assertion library (`node:assert`). **Do not use Jest, Mocha, Chai, or any external test framework.**
@@ -229,6 +260,136 @@ const content = `
`;
```
### Prohibited: Source-Grep Tests
**Never read source-code `.cjs` files with `readFileSync` to assert that strings exist within them.** This is source-grep theater: it proves a literal is present in a file, not that the feature works at runtime.
```javascript
// BAD — source-grep theater
const configSrc = fs.readFileSync(
path.join(GSD_ROOT, 'bin', 'lib', 'config-schema.cjs'), 'utf-8'
);
assert.ok(
configSrc.includes("'workflow.plan_bounce'"),
'VALID_CONFIG_KEYS should contain workflow.plan_bounce'
);
```
This test passes even if `workflow.plan_bounce` is present but misspelled in the schema, removed from the validation path, or moved to a different file under a different name. It survives every behavioral regression and fails only on trivial renames.
The correct pattern for config key tests — use the CLI:
```javascript
// GOOD — behavioral test via the CLI
test('config-set accepts workflow.plan_bounce', (t) => {
const tmpDir = createTempProject();
t.after(() => cleanup(tmpDir));
const result = runGsdTools('config-set workflow.plan_bounce true', tmpDir);
assert.ok(result.success, `config-set should accept workflow.plan_bounce: ${result.error}`);
const configPath = path.join(tmpDir, '.planning', 'config.json');
const config = JSON.parse(fs.readFileSync(configPath, 'utf-8'));
assert.strictEqual(config.workflow?.plan_bounce, true, 'value must be persisted');
});
```
This single test covers key registration in `VALID_CONFIG_KEYS`, the key's namespace resolution in `KNOWN_TOP_LEVEL`, and value persistence — all behaviors that the source-grep test could not touch.
**Why this pattern broke at scale:** Commit `990c3e64` in this repo updated 5 source-grep tests in one pass when `VALID_CONFIG_KEYS` moved between files. Zero of those tests were testing behavior. If they had been behavioral tests, the migration would have been invisible.
**CI enforcement:** A linter (`scripts/lint-no-source-grep.cjs`, run as `npm run lint:tests`) detects violations. Any test file that calls `readFileSync` on a `.cjs` path in a source directory without the exemption annotation below will fail the `lint-tests` CI job.
### Exception: `allow-test-rule: <reason>`
Some tests legitimately read source files. There are six recognized categories:
| Reason | When to use |
|--------|-------------|
| `source-text-is-the-product` | Agent `.md`, workflow `.md`, command `.md` files — their text IS what the runtime loads. Testing text content tests the deployed contract. |
| `architectural-invariant` | Implementation must use a specific primitive (e.g., `Atomics.wait`, atomic file writes) that cannot be tested by observing outputs. |
| `structural-regression-guard` | A specific code pattern must (or must not) exist to prevent a class of bug (e.g., regex global-state misuse). Behavioral tests cannot distinguish which pattern was used. |
| `docs-parity` | A reference doc must stay in sync with source-defined constants (e.g., `CONFIG_DEFAULTS`). The source is the canonical list; there is no runtime API to enumerate it. |
| `integration-test-input` | A source file is used as a real fixture input to a transformation function under test — the file is not inspected for strings but passed as data. |
| `structural-implementation-guard` | A feature's interception or wiring point is not reachable end-to-end via `runGsdTools`. Used temporarily until a behavioral path exists. |
| `pending-migration-to-typed-ir` | **Tracked for correction, not exempted.** Test was identified by the lint as carrying a raw-text-matching pattern that contradicts the rule above. Each annotated file MUST cite the open migration issue (e.g. `// allow-test-rule: pending-migration-to-typed-ir [#NNNN]`) so the tracking is auditable. New tests cannot use this category — they must refactor production to expose typed IR. The annotation is removed when the test is corrected. |
Annotate with a standalone `//` comment before the file's opening block comment:
```javascript
// allow-test-rule: architectural-invariant
// state.cjs locking must use Atomics.wait(), not a spin-loop. Behavioral tests
// cannot observe which sleep primitive was chosen — only source inspection can.
/**
* Regression tests for locking bugs #1909...
*/
```
The annotation **must** be a standalone `// allow-test-rule:` line, not inside a `/** */` block comment — the CI linter scans for the pattern `// allow-test-rule:`.
### Prohibited: Raw Text Matching on Test Outputs (file content, stdout, stderr)
**Source-grep is not just `readFileSync` of a `.cjs` file.** The same anti-pattern shows up wherever a test pattern-matches against text that a system-under-test produced, regardless of whether that text came from a source file, a rendered shim, a child process's stdout, or a free-form `reason` string. **All forms are forbidden.**
The following are all violations of the same rule:
```javascript
// BAD — substring match on text written by the code under test
const cmdContent = fs.readFileSync(path.join(tmpDir, 'gsd-sdk.cmd'), 'utf8');
assert.ok(cmdContent.includes(`@node ${jsonQuoted} %*`), '.cmd embeds shim path');
// BAD — regex match on a child process's human-readable stdout formatter
const r = cp.spawnSync(SCRIPT, ['--patches-dir', dir]);
assert.match(r.stdout, /Failures: 1/);
assert.match(r.stdout, /not a regular file/);
// BAD — "structured parser" that hides string ops behind a function wrapper
function parseCmdShim(content) {
const lines = content.split('\r\n').filter((l) => l.length > 0);
return { header: lines[0], usesCRLF: content.includes('\r\n') };
}
// BAD — assert.match on a free-form `reason` string from a JSON report
assert.ok(/not a regular file/.test(report.results[0].reason));
```
Each of these passes on accidental near-matches (a comment containing `@node` somewhere, a stack trace that happens to say `Failures: 1`, a mis-typed reason that still contains the substring you're matching) and fails on harmless reformatting (changing `Failures: 1` to `1 failure`, swapping CRLF rendering style, rewording the error prose).
#### The rule
> **Tests assert on typed structured values. If the code under test produces text, the code under test must also expose a structured intermediate representation, and the test must assert on that IR — never on the rendered text.**
Concretely: for any system-under-test that produces text output (a file renderer, a CLI formatter, an error-message builder), the production code MUST expose a typed alternative that the test consumes:
| Output kind | Required structured surface | What the test asserts on |
|---|---|---|
| Rendered file (shim, template, generated code) | A pure builder function returning the IR (`{ invocation, eol, fileNames, render }`) | `triple.invocation.target === expected`, `triple.eol.cmd === '\r\n'` |
| CLI human-formatter output | A `--json` mode that emits the same data structurally | `report.results[0].reason === REASON.FAIL_INSTALLED_NOT_REGULAR_FILE` |
| Error / status / reason | A frozen enum (`Object.freeze({ FAIL_X: 'fail_x', ... })`) | `assert.equal(result.reason, REASON.FAIL_X)` |
| File presence after a write | `fs.statSync().isFile()`, `.size > 0`, `.mtimeMs` advances | Filesystem facts; never read the file content back |
#### Concrete examples from this repo
`buildWindowsShimTriple(shimSrc)` in `bin/install.js` is the canonical IR pattern: pure function, no I/O, returns `{ invocation, eol, fileNames, render }`. `trySelfLinkGsdSdkWindows` calls it and writes `triple.render[kind]()` to disk. Tests assert on `triple.invocation.target`, `triple.eol.cmd`, `Object.keys(triple).sort()` — never on the rendered text. Filesystem-level tests assert `fs.statSync(target).size === Buffer.byteLength(triple.render.cmd())` to prove the writer writes what the renderer produces, **without comparing content**.
`scripts/verify-reapply-patches.cjs` exposes a frozen `REASON` enum and emits it through `--json`. Tests assert `report.results[0].reason === REASON.FAIL_USER_LINES_MISSING`. The human formatter exists for operator console output only — tests must not depend on its prose. Adding a new reason code requires updating the `REASON` enum, the `--json` output, AND the test that locks `Object.keys(REASON).sort()` — three coordinated changes that prevent the code surface from drifting from the test surface.
#### Hiding grep behind a function is still grep
`parseCmdShim`, `parsePs1Invocation`, etc. that internally do `content.split(...)`, `lines[1].trim()`, `content.includes(...)` are still string manipulation. The fact that the entry point looks like a parser doesn't change what's happening underneath — the test is still asserting on the lexical shape of rendered text. The fix is not "wrap the grep in a function with a typed-looking return value." The fix is to **eliminate the rendered text from the test path entirely** by surfacing the IR.
#### When you cannot eliminate text matching
There are exactly two cases where text content is the legitimate object of a test, both already covered by the existing exemption matrix:
1. `source-text-is-the-product` — workflow `.md` / agent `.md` / command `.md` files where the deployed text IS what the runtime loads.
2. `docs-parity` — a reference doc must mirror source-defined constants and there is no runtime enumeration API.
For everything else, if a test reaches for `.includes()` / `.startsWith()` / `assert.match(text, /…/)`, the production code is missing a typed surface. **Add the typed surface; do not work around it.**
**CI enforcement:** `scripts/lint-no-source-grep.cjs` is being extended (see issue tracker for the latest scope) to flag `String#includes`/`String#startsWith`/`String#endsWith`/`assert.match` on `readFileSync` results and on `cp.spawnSync` stdout/stderr in test files, with the same `// allow-test-rule:` exemption mechanism.
### Node.js Version Compatibility
**Node 22 is the minimum supported version.** Node 24 is the primary CI target. All tests must pass on both.
@@ -278,8 +439,93 @@ node --test tests/core.test.cjs
npm run test:coverage
```
### Pre-PR Seam Checks (Manifest/Alias Routing)
If you touched any of the command-manifest or generated alias files, run:
```bash
npm run check:alias-drift
```
This verifies generated alias artifacts are in sync with manifest source-of-truth.
Optional local pre-commit hook entry (Git-native):
```bash
# one-time setup
mkdir -p .githooks
cat > .githooks/pre-commit <<'EOF'
#!/usr/bin/env bash
set -euo pipefail
if git diff --cached --name-only | grep -Eq "^sdk/src/query/command-manifest\.|^sdk/src/query/command-aliases\.generated\.ts$|^get-shit-done/bin/lib/command-aliases\.generated\.cjs$|^sdk/scripts/gen-command-aliases\.ts$"; then
npm run check:alias-drift
fi
EOF
chmod +x .githooks/pre-commit
git config core.hooksPath .githooks
```
Optional local pre-push hook to block a private author-email pattern:
```bash
# set locally in your shell profile (example)
export GSD_BLOCKED_AUTHOR_REGEX='@example-corp\\.com$'
cat > .githooks/pre-push <<'EOF'
#!/usr/bin/env bash
set -euo pipefail
zero_sha='0000000000000000000000000000000000000000'
blocked_regex="${GSD_BLOCKED_AUTHOR_REGEX:-}"
[[ -z "$blocked_regex" ]] && exit 0
violations=()
while read -r local_ref local_sha remote_ref remote_sha; do
[[ "$local_sha" == "$zero_sha" ]] && continue
if [[ "$remote_sha" == "$zero_sha" ]]; then
commits=$(git rev-list "$local_sha" --not --remotes)
else
commits=$(git rev-list "$remote_sha..$local_sha")
fi
while read -r commit; do
[[ -z "$commit" ]] && continue
email=$(git show -s --format='%ae' "$commit" | tr '[:upper:]' '[:lower:]')
if printf '%s' "$email" | grep -Eq "$blocked_regex"; then
violations+=("$commit <$email>")
fi
done <<< "$commits"
done
if [[ ${#violations[@]} -gt 0 ]]; then
echo "Push blocked: commit author email matched local blocked regex ($blocked_regex)." >&2
printf ' - %s\n' "${violations[@]}" >&2
exit 1
fi
EOF
chmod +x .githooks/pre-push
```
### CI Test Quality Checks
The following checks run on every PR in addition to the test suite:
| Job | What it checks | How to pass |
|-----|----------------|-------------|
| `lint-tests` | No source-grep tests (see above) | Replace with `runGsdTools()` behavioral tests, or add `// allow-test-rule: <reason>` |
Run locally before pushing: `npm run lint:tests`
### Test Requirements by Contribution Type
### Architecture-Aware Testing Requirements
When work touches architecture, routing, policy, registry assembly, or command semantics:
- Write tests against module **interfaces** and seam behavior, not implementation trivia.
- Prefer invariant/contract tests that protect ADR-backed behavior and `CONTEXT.md` terminology.
- Ensure tests validate canonical behavior through the defined seam (for example: structured result contracts, canonical command metadata, and adapter parity), not source-text coupling.
- If ADRs define expected behavior, tests should assert those expectations directly.
The required tests differ depending on what you are contributing:
**Bug Fix:** A regression test is required. Write the test first — it must demonstrate the original failure before your fix is applied, then pass after the fix. A PR that fixes a bug without a regression test will be asked to add one. "Tests pass" does not prove correctness; it proves the bug isn't present in the tests that exist.
@@ -314,6 +560,15 @@ bin/install.js — Installer (multi-runtime)
get-shit-done/
bin/lib/ — Core library modules (.cjs)
workflows/ — Workflow definitions (.md)
Large workflows split per progressive-disclosure
pattern: workflows/<name>/modes/*.md +
workflows/<name>/templates/*. Parent dispatches
to mode files. See workflows/discuss-phase/ as
the canonical example (#2551). New modes for
discuss-phase land in
workflows/discuss-phase/modes/<mode>.md.
Per-file budgets enforced by
tests/workflow-size-budget.test.cjs.
references/ — Reference documentation (.md)
templates/ — File templates
agents/ — Agent definitions (.md) — CANONICAL SOURCE

View File

@@ -75,15 +75,17 @@ GSDはそれを解決します。Claude Codeを信頼性の高いものにする
ビルトインの品質ゲートが本当の問題を検出しますスキーマドリフト検出はマイグレーション漏れのORM変更をフラグし、セキュリティ強制は検証を脅威モデルに紐付け、スコープ削減検出はプランナーが要件を暗黙的に落とすのを防止します。
### v1.32.0 ハイライト
### v1.39.0 ハイライト
- **STATE.md整合性ゲート** — `state validate`がSTATE.mdとファイルシステムの差分を検出、`state sync`が実際のプロジェクト状態から再構築
- **`--to N`フラグ** — 自律実行を特定のフェーズ完了後に停止
- **リサーチゲート** — RESEARCH.mdに未解決の質問がある場合、計画をブロック
- **検証マイルストーンスコープフィルタリング** — 後のフェーズで対処されるギャップは「ギャップ」ではなく「延期」としてマーク
- **読み取り後編集ガード** — 非Claudeランタイムでの無限リトライループを防止するアドバイザリーフック
- **コンテキスト削減** — Markdownのトランケーションとキャッシュフレンドリーなプロンプト順序でトークン使用量を削減
- **4つの新ランタイム** — Trae、Kilo、Augment、Cline合計12ランタイム
完全なリストは [v1.39.0 リリースノート](https://github.com/gsd-build/get-shit-done/releases/tag/v1.39.0) を参照してください。
- **`--minimal` インストールプロファイル** — エイリアス `--core-only`。メインループの6スキル`new-project``discuss-phase``plan-phase``execute-phase``help``update`)のみをインストールし、`gsd-*` サブエージェントはゼロ。コールドスタート時のシステムプロンプトのオーバーヘッドを ~12kトークンから ~700トークンへ削減≥94%減。32K〜128Kコンテキストのローカル LLM やトークン課金 API に有効。
- **`/gsd-edit-phase`** — `ROADMAP.md` 上の既存フェーズの任意フィールドをその場で編集(番号や位置は変更されない)。`--force` で確認 diff をスキップ、`depends_on` の参照を検証し、書き込み時に `STATE.md` も更新。
- **マージ後ビルド & テストゲート** — `execute-phase` のステップ 5.6 が `workflow.build_command` の設定を自動検出し、無ければ Xcode`.xcodeproj`、Makefile、Justfile、Cargo、Go、Python、npm の順にフォールバック。Xcode/iOS プロジェクトでは `xcodebuild build``xcodebuild test` を自動実行。並列・直列両モードで動作。
- **ランタイム別レビューモデル選択** — `review.models.<cli>` で各外部レビュー CLIcodex、gemini など)が使うモデルをプランナー/実行プロファイルとは独立に指定可能。
- **ワークストリーム設定の継承** — `GSD_WORKSTREAM` が設定されている場合、ルートの `.planning/config.json` を先に読み込み、ワークストリーム設定をディープマージ(衝突時はワークストリーム側が優先)。ワークストリーム設定で明示的に `null` を指定するとルート値を上書き可能。
- **手動カナリアリリースワークフロー** — `.github/workflows/canary.yml``workflow_dispatch` 経由で `dev` ブランチから `{base}-canary.{N}` ビルドを `@canary` dist-tag に手動公開(`get-shit-done-cc``@gsd-build/sdk`)。
- **スキルの統合86 → 59** — 4つの新しいグループ化スキル`capture``phase``config``workspace`が31のマイクロスキルを吸収。既存の親スキル6つはラップアップやサブ操作をフラグ化`update --sync/--reapply``sketch --wrap-up``spike --wrap-up``map-codebase --fast/--query``code-review --fix``progress --do/--next`。機能の欠損なし。
---
@@ -597,6 +599,7 @@ lmn012o feat(08-02): create registration endpoint
|---------|--------------|
| `/gsd-add-phase` | ロードマップにフェーズを追加 |
| `/gsd-insert-phase [N]` | フェーズ間に緊急作業を挿入 |
| `/gsd-edit-phase [N] [--force]` | 既存フェーズの任意フィールドをその場で編集 — 番号と位置は変更されない |
| `/gsd-remove-phase [N]` | 将来のフェーズを削除し番号を振り直し |
| `/gsd-list-phase-assumptions [N]` | 計画前にClaudeの意図するアプローチを確認 |
| `/gsd-plan-milestone-gaps` | 監査で見つかったギャップを埋めるフェーズを作成 |

View File

@@ -75,15 +75,17 @@ GSD가 그걸 고칩니다. Claude Code를 신뢰할 수 있게 만드는 컨텍
내장 품질 게이트가 실제 문제를 잡아냅니다: 스키마 드리프트 감지는 마이그레이션 누락된 ORM 변경을 플래그하고, 보안 강제는 검증을 위협 모델에 고정시키고, 스코프 축소 감지는 플래너가 요구사항을 몰래 빠뜨리는 걸 방지합니다.
### v1.32.0 하이라이트
### v1.39.0 하이라이트
- **STATE.md 일관성 게이트** — `state validate`가 STATE.md와 파일시스템 간 드리프트를 감지, `state sync`가 실제 프로젝트 상태에서 재구성
- **`--to N` 플래그** — 자율 실행을 특정 단계 완료 후 중지
- **리서치 게이트** — RESEARCH.md에 미해결 질문이 있으면 기획을 차단
- **검증 마일스톤 스코프 필터링** — 이후 단계에서 처리될 격차는 "격차"가 아닌 "지연됨"으로 표시
- **읽기-후-편집 가드** — 비Claude 런타임에서 무한 재시도 루프를 방지하는 어드바이저리 훅
- **컨텍스트 축소** — 마크다운 잘라내기 및 캐시 친화적 프롬프트 순서로 토큰 사용량 절감
- **4개의 새 런타임** — Trae, Kilo, Augment, Cline (총 12개 런타임)
전체 목록은 [v1.39.0 릴리스 노트](https://github.com/gsd-build/get-shit-done/releases/tag/v1.39.0)를 참고하세요.
- **`--minimal` 설치 프로파일** — 별칭 `--core-only`. 메인 루프 6개 스킬(`new-project`, `discuss-phase`, `plan-phase`, `execute-phase`, `help`, `update`)만 설치하고 `gsd-*` 서브에이전트는 설치하지 않음. 콜드 스타트 시스템 프롬프트 오버헤드를 ~12k 토큰에서 ~700 토큰으로 축소(≥94% 감소). 32K128K 컨텍스트의 로컬 LLM이나 토큰 과금 API에 유용.
- **`/gsd-edit-phase`** — `ROADMAP.md`에 있는 기존 단계의 임의 필드를 그 자리에서 수정(번호와 위치는 변경되지 않음). `--force`는 확인 diff를 건너뛰고, `depends_on` 참조를 검증하며 쓰기 시 `STATE.md`도 갱신.
- **머지 후 빌드 & 테스트 게이트** — `execute-phase` 5.6 단계가 `workflow.build_command` 설정을 우선 자동 감지하고, 없으면 Xcode(`.xcodeproj`), Makefile, Justfile, Cargo, Go, Python, npm 순으로 폴백. Xcode/iOS 프로젝트는 `xcodebuild build``xcodebuild test`를 자동 실행. 병렬·직렬 모드 모두에서 동작.
- **런타임별 리뷰 모델 선택** — `review.models.<cli>`로 각 외부 리뷰 CLI(codex, gemini 등)가 플래너/실행 프로파일과 독립적으로 자체 모델을 선택할 수 있음.
- **워크스트림 설정 상속** — `GSD_WORKSTREAM`이 설정되면 루트 `.planning/config.json`을 먼저 로드한 뒤 워크스트림 설정을 딥 머지(충돌 시 워크스트림 우선). 워크스트림 설정에서 명시적 `null`은 루트 값을 덮어씀.
- **수동 카나리 릴리스 워크플로** — `.github/workflows/canary.yml``workflow_dispatch``dev` 브랜치에서 `{base}-canary.{N}` 빌드를 `@canary` dist-tag로 수동 게시(`get-shit-done-cc``@gsd-build/sdk`).
- **스킬 통합: 86 → 59** — 4개의 새로운 그룹 스킬(`capture`, `phase`, `config`, `workspace`)이 31개의 마이크로 스킬을 흡수. 기존 6개의 부모 스킬은 래퍼업/하위 동작을 플래그로 흡수: `update --sync/--reapply`, `sketch --wrap-up`, `spike --wrap-up`, `map-codebase --fast/--query`, `code-review --fix`, `progress --do/--next`. 기능 손실 없음.
---
@@ -594,6 +596,7 @@ lmn012o feat(08-02): create registration endpoint
|---------|------------|
| `/gsd-add-phase` | 로드맵에 단계 추가 |
| `/gsd-insert-phase [N]` | 단계 사이에 긴급 작업 삽입 |
| `/gsd-edit-phase [N] [--force]` | 기존 단계의 임의 필드를 그 자리에서 수정 — 번호와 위치는 그대로 |
| `/gsd-remove-phase [N]` | 미래 단계 제거, 번호 재정렬 |
| `/gsd-list-phase-assumptions [N]` | 기획 전 Claude의 의도된 접근 방식 확인 |
| `/gsd-plan-milestone-gaps` | 감사에서 발견된 갭을 해소하기 위한 단계 생성 |

View File

@@ -4,7 +4,7 @@
**English** · [Português](README.pt-BR.md) · [简体中文](README.zh-CN.md) · [日本語](README.ja-JP.md) · [한국어](README.ko-KR.md)
**A light-weight and powerful meta-prompting, context engineering and spec-driven development system for Claude Code, OpenCode, Gemini CLI, Kilo, Codex, Copilot, Cursor, Windsurf, Antigravity, Augment, Trae, Qwen Code, Cline, and CodeBuddy.**
**A light-weight and powerful meta-prompting, context engineering and spec-driven development system for Claude Code, OpenCode, Gemini CLI, Kilo, Codex, Copilot, Cursor, Windsurf, Antigravity, Augment, Trae, Qwen Code, Hermes Agent, Cline, and CodeBuddy.**
**Solves context rot — the quality degradation that happens as Claude fills its context window.**
@@ -41,7 +41,7 @@ npx get-shit-done-cc@latest
**Trusted by engineers at Amazon, Google, Shopify, and Webflow.**
[Why I Built This](#why-i-built-this) · [How It Works](#how-it-works) · [Commands](#commands) · [Why It Works](#why-it-works) · [User Guide](docs/USER-GUIDE.md)
[Why I Built This](#why-i-built-this) · [How It Works](#how-it-works) · [Commands](#commands) · [Why It Works](#why-it-works) · [User Guide](docs/USER-GUIDE.md) · [Walkthrough](docs/USER-GUIDE.md#end-to-end-walkthrough)
</div>
@@ -89,11 +89,17 @@ People who want to describe what they want and have it built correctly — witho
Built-in quality gates catch real problems: schema drift detection flags ORM changes missing migrations, security enforcement anchors verification to threat models, and scope reduction detection prevents the planner from silently dropping your requirements.
### v1.37.0 Highlights
### v1.39.0 Highlights
- **Spiking & sketching** — `/gsd-spike` runs 25 focused experiments with Given/When/Then verdicts; `/gsd-sketch` produces 23 interactive HTML mockup variants per design question — both store artifacts in `.planning/` and pair with wrap-up commands to package findings into project-local skills
- **Agent size-budget enforcement** — Tiered line-count limits (XL: 1 600, Large: 1 000, Default: 500) keep agent prompts lean; violations surface in CI
- **Shared boilerplate extraction** — Mandatory-initial-read and project-skills-discovery logic extracted to reference files, reducing duplication across a dozen agents
See the [v1.39.0 release notes](https://github.com/gsd-build/get-shit-done/releases/tag/v1.39.0) for the full list.
- **`--minimal` install profile** — alias `--core-only`, writes only the six main-loop skills (`new-project`, `discuss-phase`, `plan-phase`, `execute-phase`, `help`, `update`) and zero `gsd-*` subagents. Cuts cold-start system-prompt overhead from ~12k tokens to ~700 (≥94% reduction). Useful for local LLMs with 32K128K context and token-billed APIs.
- **`/gsd-edit-phase`** — modify any field of an existing phase in `ROADMAP.md` in place, without changing its number or position. `--force` skips the confirmation diff; `depends_on` references are validated and `STATE.md` is updated on write.
- **Post-merge build & test gate** — `execute-phase` step 5.6 now auto-detects the build command from `workflow.build_command`, then falls back to Xcode (`.xcodeproj`), Makefile, Justfile, Cargo, Go, Python, or npm. Xcode/iOS projects get `xcodebuild build` + `xcodebuild test` automatically. Runs in both parallel and serial mode.
- **Per-runtime review-model selection** — `review.models.<cli>` lets each external review CLI (codex, gemini, etc.) pick its own model independently of the planner/executor profile.
- **Workstream config inheritance** — when `GSD_WORKSTREAM` is set, the root `.planning/config.json` is loaded first and deep-merged with the workstream config (workstream wins on conflict). Explicit `null` in a workstream config now correctly overrides a root value.
- **Manual canary release workflow** — `.github/workflows/canary.yml` publishes `{base}-canary.{N}` builds of `get-shit-done-cc` and `@gsd-build/sdk` to the `@canary` dist-tag from `dev` on demand via `workflow_dispatch`.
- **Skill consolidation: 86 → 59** — four new grouped skills (`capture`, `phase`, `config`, `workspace`) absorb 31 micro-skills. Six existing parents absorb wrap-up and sub-operations as flags: `update --sync/--reapply`, `sketch --wrap-up`, `spike --wrap-up`, `map-codebase --fast/--query`, `code-review --fix`, `progress --do/--next`. Zero functional loss.
---
@@ -104,11 +110,11 @@ npx get-shit-done-cc@latest
```
The installer prompts you to choose:
1. **Runtime** — Claude Code, OpenCode, Gemini, Kilo, Codex, Copilot, Cursor, Windsurf, Antigravity, Augment, Trae, Qwen Code, CodeBuddy, Cline, or all (interactive multi-select — pick multiple runtimes in a single install session)
1. **Runtime** — Claude Code, OpenCode, Gemini, Kilo, Codex, Copilot, Cursor, Windsurf, Antigravity, Augment, Trae, Qwen Code, Hermes Agent, CodeBuddy, Cline, or all (interactive multi-select — pick multiple runtimes in a single install session)
2. **Location** — Global (all projects) or local (current project only)
Verify with:
- Claude Code / Gemini / Copilot / Antigravity / Qwen Code: `/gsd-help`
- Claude Code / Gemini / Copilot / Antigravity / Qwen Code / Hermes Agent: `/gsd-help`
- OpenCode / Kilo / Augment / Trae / CodeBuddy: `/gsd-help`
- Codex: `$gsd-help`
- Cline: GSD installs via `.clinerules` — verify by checking `.clinerules` exists
@@ -179,6 +185,10 @@ npx get-shit-done-cc --trae --local # Install to ./.trae/
npx get-shit-done-cc --qwen --global # Install to ~/.qwen/
npx get-shit-done-cc --qwen --local # Install to ./.qwen/
# Hermes Agent
npx get-shit-done-cc --hermes --global # Install to ~/.hermes/ (honors $HERMES_HOME)
npx get-shit-done-cc --hermes --local # Install to ./.hermes/
# CodeBuddy
npx get-shit-done-cc --codebuddy --global # Install to ~/.codebuddy/
npx get-shit-done-cc --codebuddy --local # Install to ./.codebuddy/
@@ -192,11 +202,62 @@ npx get-shit-done-cc --all --global # Install to all directories
```
Use `--global` (`-g`) or `--local` (`-l`) to skip the location prompt.
Use `--claude`, `--opencode`, `--gemini`, `--kilo`, `--codex`, `--copilot`, `--cursor`, `--windsurf`, `--antigravity`, `--augment`, `--trae`, `--qwen`, `--codebuddy`, `--cline`, or `--all` to skip the runtime prompt.
Use `--claude`, `--opencode`, `--gemini`, `--kilo`, `--codex`, `--copilot`, `--cursor`, `--windsurf`, `--antigravity`, `--augment`, `--trae`, `--qwen`, `--hermes`, `--codebuddy`, `--cline`, or `--all` to skip the runtime prompt.
The GSD SDK CLI (`gsd-sdk`) is installed automatically (required by `/gsd-*` commands). Pass `--no-sdk` to skip the SDK install, or `--sdk` to force a reinstall.
</details>
<details>
<summary><strong>Minimal Install (local LLMs and token-billed APIs)</strong></summary>
GSD ships 86 skills and 33 subagents. Every runtime (Claude Code, OpenCode, etc.) eagerly enumerates skill descriptions and subagent descriptions into the system prompt on **every turn** — about **~12k tokens** of fixed overhead before you've typed anything. Frontier models with large context (Sonnet 4.6, Opus 4.7 — 200K to 1M ctx) absorb that without a noticeable hit. **Local LLMs with 32K128K context, and any model where you're paying per token, will feel it.**
Pass `--minimal` (alias `--core-only`) to install only the **main GSD loop**:
```bash
npx get-shit-done-cc --claude --global --minimal
# or any other runtime — works the same
npx get-shit-done-cc --opencode --global --minimal
```
What you get:
| Surface | Default install | `--minimal` install |
|---|---|---|
| Skills | 86 (`new-project`, `discuss-phase`, `plan-phase`, `execute-phase`, …82 more) | **6** (`new-project`, `discuss-phase`, `plan-phase`, `execute-phase`, `help`, `update`) |
| Subagents | 33 `gsd-*` agents | **0** |
| Cold-start system-prompt overhead | ~12k tokens | **~700 tokens** (≥94% reduction) |
| Manifest mode field | `"full"` | `"minimal"` |
The 6 core skills are exactly the ones you need to drive a project from zero: `new-project` to bootstrap, then the `discuss → plan → execute` loop, plus `help` for discovery and `update` to upgrade later.
**This is a hard floor, not a ceiling.** Each `/gsd-*` command you start using and each subagent it dispatches loads its body content into the conversation for that turn — that's normal token use, not eager overhead. But:
> [!IMPORTANT]
> **The savings disappear the moment you re-install without `--minimal`.** Running `npx get-shit-done-cc@latest` (or `gsd update` from inside a session) without the flag puts the full 86-skill / 33-agent surface back on disk, and every subsequent session pays the full ~12k-token floor again. If you want to stay minimal, **always pass `--minimal` when updating**:
>
> ```bash
> npx get-shit-done-cc@latest --claude --global --minimal
> ```
>
> Need a specific skill that isn't in the core set (e.g., `gsd-autonomous`, `gsd-ship`, `gsd-debug`)? You have two options:
> 1. **Permanent expand:** re-install without `--minimal` to get the full surface (and the full token floor).
> 2. **One-shot:** run the slash command's underlying logic by reading the source from `commands/gsd/<name>.md` in the GSD package and executing it manually — no install change needed.
>
> Tip: `cat ~/.claude/get-shit-done/.gsd-manifest.json | jq .mode` (or `gsd-file-manifest.json` depending on layout) confirms which mode you're in.
When to use `--minimal`:
- Local model with 32K128K context (Qwen3, Llama, Mistral, etc.)
- Token-metered API where every turn matters
- Throwaway directory or non-GSD project where you want `/gsd-new-project` available without paying for the rest
- CI runners or ephemeral containers where install footprint matters
When **not** to use `--minimal`:
- Active GSD project where you regularly invoke the broader command set (`autonomous`, `ship`, `code-review`, `debug`, etc.) — re-installing each time is friction without payoff.
- Frontier models with 200K1M context — the savings are noise.
</details>
<details>
<summary><strong>Development Installation</strong></summary>
@@ -263,6 +324,8 @@ If you prefer not to use that flag, add this to your project's `.claude/settings
## How It Works
> **New to GSD?** See the [end-to-end walkthrough](docs/USER-GUIDE.md#end-to-end-walkthrough) in the User Guide — it shows a complete project from `/gsd-new-project` through `/gsd-verify-work` with concrete example outputs.
> **Already have code?** Run `/gsd-map-codebase` first. It spawns parallel agents to analyze your stack, architecture, conventions, and concerns. Then `/gsd-new-project` knows your codebase — questions focus on what you're adding, and planning automatically loads your patterns.
### 1. Initialize Project
@@ -588,7 +651,7 @@ You're never locked in. The system adapts.
| Command | What it does |
|---------|--------------|
| `/gsd-new-workspace` | Create isolated workspace with repo copies (worktrees or clones) |
| `/gsd-workspace --new` | Create isolated workspace with repo copies (worktrees or clones) |
| `/gsd-list-workspaces` | Show all GSD workspaces and their status |
| `/gsd-remove-workspace` | Remove workspace and clean up worktrees |
@@ -632,9 +695,9 @@ You're never locked in. The system adapts.
|---------|--------------|
| `/gsd-add-phase` | Append phase to roadmap |
| `/gsd-insert-phase [N]` | Insert urgent work between phases |
| `/gsd-edit-phase [N] [--force]` | Modify any field of an existing phase in place — number and position unchanged |
| `/gsd-remove-phase [N]` | Remove future phase, renumber |
| `/gsd-list-phase-assumptions [N]` | See Claude's intended approach before planning |
| `/gsd-plan-milestone-gaps` | Create phases to close gaps from audit |
### Session
@@ -676,7 +739,7 @@ You're never locked in. The system adapts.
| `/gsd-settings` | Configure model profile and workflow agents |
| `/gsd-set-profile <profile>` | Switch model profile (quality/balanced/budget/inherit) |
| `/gsd-add-todo [desc]` | Capture idea for later |
| `/gsd-check-todos` | List pending todos |
| `/gsd-capture --list` | List pending todos |
| `/gsd-debug [desc]` | Systematic debugging with persistent state |
| `/gsd-do <text>` | Route freeform text to the right GSD command automatically |
| `/gsd-note <text>` | Zero-friction idea capture — append, list, or promote notes to todos |
@@ -693,6 +756,8 @@ You're never locked in. The system adapts.
GSD stores project settings in `.planning/config.json`. Configure during `/gsd-new-project` or update later with `/gsd-settings`. For the full config schema, workflow toggles, git branching options, and per-agent model breakdown, see the [User Guide](docs/USER-GUIDE.md#configuration-reference).
When `GSD_WORKSTREAM` is set, GSD loads the root `.planning/config.json` first and deep-merges the workstream's `config.json` on top — workstream values win on conflict, and an explicit `null` in a workstream config overrides a root value.
### Core Settings
| Setting | Options | Default | What it controls |
@@ -721,6 +786,8 @@ Use `inherit` when using non-Anthropic providers (OpenRouter, local models) or t
Or configure via `/gsd-settings`.
Per-runtime review-model overrides live under `review.models.<cli>` (e.g. `review.models.codex`, `review.models.gemini`) and let each external review CLI pick its own model independently of the planner/executor profile.
### Workflow Agents
These spawn additional agents during planning/execution. They improve quality but add tokens and time.
@@ -736,6 +803,7 @@ These spawn additional agents during planning/execution. They improve quality bu
| `workflow.skip_discuss` | `false` | Skip discuss-phase in autonomous mode |
| `workflow.text_mode` | `false` | Text-only mode for remote sessions (no TUI menus) |
| `workflow.use_worktrees` | `true` | Toggle worktree isolation for execution |
| `workflow.build_command` | _(auto-detect)_ | Override the post-merge build gate command. Falls back to Xcode (`.xcodeproj`), Makefile, Justfile, Cargo, Go, Python, or npm; Xcode/iOS projects also run `xcodebuild test`. |
Use `/gsd-settings` to toggle these, or override per-invocation:
- `/gsd-plan-phase --skip-research`
@@ -866,6 +934,7 @@ npx get-shit-done-cc --antigravity --global --uninstall
npx get-shit-done-cc --augment --global --uninstall
npx get-shit-done-cc --trae --global --uninstall
npx get-shit-done-cc --qwen --global --uninstall
npx get-shit-done-cc --hermes --global --uninstall
npx get-shit-done-cc --codebuddy --global --uninstall
npx get-shit-done-cc --cline --global --uninstall
@@ -882,6 +951,7 @@ npx get-shit-done-cc --antigravity --local --uninstall
npx get-shit-done-cc --augment --local --uninstall
npx get-shit-done-cc --trae --local --uninstall
npx get-shit-done-cc --qwen --local --uninstall
npx get-shit-done-cc --hermes --local --uninstall
npx get-shit-done-cc --codebuddy --local --uninstall
npx get-shit-done-cc --cline --local --uninstall
```

View File

@@ -73,15 +73,17 @@ Para quem quer descrever o que precisa e receber isso construído do jeito certo
Quality gates embutidos capturam problemas reais: detecção de schema drift sinaliza mudanças ORM sem migrations, segurança ancora verificação a modelos de ameaça, e detecção de redução de escopo impede o planner de descartar requisitos silenciosamente.
### Destaques v1.32.0
### Destaques v1.39.0
- **Gates de consistência STATE.md** — `state validate` detecta divergência entre STATE.md e o filesystem; `state sync` reconstrói a partir do estado real do projeto
- **Flag `--to N`** — Para a execução autônoma após completar uma fase específica
- **Research gate** — Bloqueia planejamento quando RESEARCH.md tem perguntas abertas não resolvidas
- **Filtro de escopo do verificador** — Lacunas abordadas em fases posteriores são marcadas como "adiadas", não como lacunas
- **Guard de leitura antes de edição** — Hook consultivo previne loops de retry infinitos em runtimes não-Claude
- **Redução de contexto** — Truncamento de Markdown e ordenação de prompts cache-friendly para menor uso de tokens
- **4 novos runtimes** — Trae, Kilo, Augment e Cline (12 runtimes no total)
Lista completa nas [notas de release v1.39.0](https://github.com/gsd-build/get-shit-done/releases/tag/v1.39.0).
- **Perfil de instalação `--minimal`** — alias `--core-only`. Instala apenas os 6 skills do loop principal (`new-project`, `discuss-phase`, `plan-phase`, `execute-phase`, `help`, `update`) e nenhum subagente `gsd-*`. Reduz o overhead do system prompt no cold-start de ~12k para ~700 tokens (≥94% de redução). Útil para LLMs locais com contexto de 32K128K e APIs cobradas por token.
- **`/gsd-edit-phase`** — edita qualquer campo de uma fase existente em `ROADMAP.md` no lugar, sem alterar o número ou a posição. `--force` pula o diff de confirmação; referências em `depends_on` são validadas e o `STATE.md` é atualizado na escrita.
- **Build & test gate pós-merge** — o passo 5.6 de `execute-phase` agora detecta automaticamente o comando de build em `workflow.build_command`, com fallback para Xcode (`.xcodeproj`), Makefile, Justfile, Cargo, Go, Python ou npm. Projetos Xcode/iOS rodam `xcodebuild build` e `xcodebuild test` automaticamente. Funciona em modo paralelo e serial.
- **Modelo de review por runtime** — `review.models.<cli>` permite que cada CLI externa de review (codex, gemini, etc.) escolha seu próprio modelo, independente do perfil de planner/executor.
- **Herança de configuração de workstream** — quando `GSD_WORKSTREAM` está definido, o `.planning/config.json` raiz é carregado primeiro e merge-deep com o config da workstream (workstream vence em conflito). Um `null` explícito no config da workstream sobrescreve corretamente o valor raiz.
- **Workflow manual de canary release** — `.github/workflows/canary.yml` publica builds `{base}-canary.{N}` de `get-shit-done-cc` e `@gsd-build/sdk` na dist-tag `@canary` a partir de `dev`, sob demanda via `workflow_dispatch`.
- **Consolidação de skills: 86 → 59** — 4 novos skills agrupados (`capture`, `phase`, `config`, `workspace`) absorvem 31 micro-skills. 6 skills pais existentes absorvem wrap-up e sub-operações como flags: `update --sync/--reapply`, `sketch --wrap-up`, `spike --wrap-up`, `map-codebase --fast/--query`, `code-review --fix`, `progress --do/--next`. Sem perda funcional.
---

View File

@@ -73,15 +73,17 @@ GSD 解决的就是这个问题。它是让 Claude Code 变得可靠的上下文
适合那些想把自己的需求说明白,然后让系统正确构建出来的人,而不是假装自己在运营一个 50 人工程组织的人。
### v1.32.0 亮点
### v1.39.0 亮点
- **STATE.md 一致性检查** — `state validate` 检测 STATE.md 与文件系统之间的偏差;`state sync` 从实际项目状态重建
- **`--to N` 标志** — 在完成特定阶段后停止自主执行
- **研究门控** — 当 RESEARCH.md 有未解决的开放问题时阻止规划
- **验证里程碑范围过滤** — 后续阶段将处理的差距标记为"延迟"而非差距
- **读取后编辑保护** — 咨询性 hook 防止非 Claude 运行时的无限重试循环
- **上下文缩减** — Markdown 截断和缓存友好的 prompt 排序,降低 token 使用量
- **4 个新运行时** — Trae、Kilo、Augment 和 Cline共 12 个运行时)
完整列表请参阅 [v1.39.0 发行说明](https://github.com/gsd-build/get-shit-done/releases/tag/v1.39.0)。
- **`--minimal` 安装档** — 别名 `--core-only`。仅安装主循环的 6 个核心技能(`new-project``discuss-phase``plan-phase``execute-phase``help``update`),不安装任何 `gsd-*` 子代理。将冷启动系统提示开销从 ~12k token 降至 ~700 token≥94% 减少)。适合 32K128K 上下文的本地 LLM 和按 token 计费的 API。
- **`/gsd-edit-phase`** — 就地修改 `ROADMAP.md` 中已有阶段的任意字段,不改变其编号或位置。`--force` 跳过确认 diff验证 `depends_on` 引用,并在写入时更新 `STATE.md`
- **合并后构建与测试门** — `execute-phase` 步骤 5.6 优先自动检测 `workflow.build_command` 配置,否则按 Xcode`.xcodeproj`、Makefile、Justfile、Cargo、Go、Python、npm 顺序回退。Xcode/iOS 项目自动运行 `xcodebuild build``xcodebuild test`。在并行与串行模式下均生效。
- **每运行时评审模型选择** — `review.models.<cli>` 让每个外部评审 CLIcodex、gemini 等)独立于规划/执行档选择自己的模型。
- **工作流设置继承** — 设置 `GSD_WORKSTREAM` 后,先加载根 `.planning/config.json`,再与该工作流的配置进行深合并(冲突时工作流优先)。工作流配置中显式 `null` 会覆盖根值。
- **手动 canary 发布工作流** — `.github/workflows/canary.yml` 通过 `workflow_dispatch``dev` 分支按需将 `{base}-canary.{N}` 构建(`get-shit-done-cc``@gsd-build/sdk`)发布到 `@canary` dist-tag。
- **技能整合86 → 59** — 4 个新分组技能(`capture``phase``config``workspace`)吸收了 31 个微技能。6 个已有父技能将收尾与子操作合并为标志:`update --sync/--reapply``sketch --wrap-up``spike --wrap-up``map-codebase --fast/--query``code-review --fix``progress --do/--next`。功能无损失。
---
@@ -589,6 +591,7 @@ lmn012o feat(08-02): create registration endpoint
|------|------|
| `/gsd-add-phase` | 在路线图末尾追加 phase |
| `/gsd-insert-phase [N]` | 在 phase 之间插入紧急工作 |
| `/gsd-edit-phase [N] [--force]` | 就地修改已有 phase 的任意字段 — 编号与位置保持不变 |
| `/gsd-remove-phase [N]` | 删除未来 phase并重编号 |
| `/gsd-list-phase-assumptions [N]` | 在规划前查看 Claude 打算采用的方案 |
| `/gsd-plan-milestone-gaps` | 为 audit 发现的缺口创建 phase |

View File

@@ -67,15 +67,38 @@ main ← stable, always deployable
### Patch Release (Hotfix)
For critical bugs that can't wait for the next minor release.
For fixes that need to ship without waiting for the next minor.
1. Trigger `hotfix.yml` with version (e.g., `1.27.1`)
2. Workflow creates `hotfix/1.27.1` branch from the latest patch tag for that minor version (e.g., `v1.27.0` or `v1.27.1`)
3. Cherry-pick or apply fix on the hotfix branch
4. Push — CI runs tests automatically
5. Trigger `hotfix.yml` finalize action
6. Workflow runs full test suite, bumps version, tags, publishes to `latest`
7. Merge hotfix branch back to main
A hotfix `vX.YY.Z` cumulatively includes everything in `vX.YY.{Z-1}` plus every `fix:`/`chore:` commit landed on `main` since that base. The base tag is the anchor — `git cherry $BASE_TAG main` reveals exactly which commits are still unshipped, and the new `vX.YY.Z` tag becomes the next hotfix's base, so the cycle is self-documenting.
#### Two paths
**Path A — `hotfix.yml` (canonical, two-step):**
1. Trigger `hotfix.yml` with `action=create`, `version=1.27.1`, `auto_cherry_pick=true` (default).
- Workflow detects `BASE_TAG` = highest `v1.27.*` < `v1.27.1` (so `1.27.1` branches from `v1.27.0`; `1.27.2` would branch from `v1.27.1`).
- Branches `hotfix/1.27.1` from `BASE_TAG`.
- Auto-cherry-picks every `fix:`/`chore:` commit on `origin/main` not already in the base, oldest-first. Patch-equivalents are skipped via `git cherry`. `feat:`/`refactor:` are **never** auto-included.
- On conflict the workflow halts with the offending SHA. Resolve manually on the branch, then re-run finalize with `auto_cherry_pick=false`.
- Bumps `package.json` (and `sdk/package.json`), pushes the branch, and lists every included SHA in the run summary.
2. (Optional) push additional manual commits to `hotfix/1.27.1`.
3. Trigger `hotfix.yml` with `action=finalize`. The workflow:
- Runs `install-smoke` cross-platform gate.
- Runs full test suite + coverage.
- Builds SDK, bundles `sdk-bundle/gsd-sdk.tgz` inside the CC tarball (parity with `release-sdk.yml`).
- Tags `v1.27.1`, publishes to `@latest`, re-points `@next → v1.27.1`.
- Opens merge-back PR against `main`.
**Path B — `release-sdk.yml` (stopgap, one-shot):**
Active while the `@gsd-build/sdk` npm token is unavailable; bundles the SDK inside the CC tarball.
1. Trigger `release-sdk.yml` with `action=hotfix`, `version=1.27.1`, `auto_cherry_pick=true`.
- The `prepare` job creates the branch and cherry-picks (same logic as Path A).
- `install-smoke` runs against the new branch.
- The `release` job tags, publishes to `@latest`, re-points `@next`, opens merge-back PR.
- Idempotent: if `hotfix/1.27.1` already exists (e.g. you ran `hotfix.yml create` first), the prepare job checks it out and re-runs cherry-pick as a no-op.
2. `dry_run=true` exercises the full pipeline without pushing the branch or publishing.
### Minor Release (Standard Cycle)

View File

@@ -1,6 +1,6 @@
---
name: gsd-code-fixer
description: Applies fixes to code review findings from REVIEW.md. Reads source files, applies intelligent fixes, and commits each fix atomically. Spawned by /gsd-code-review-fix.
description: Applies fixes to code review findings from REVIEW.md. Reads source files, applies intelligent fixes, and commits each fix atomically. Spawned by /gsd-code-review --fix.
tools: Read, Edit, Write, Bash, Grep, Glob
color: "#10B981"
# hooks:
@@ -10,7 +10,7 @@ color: "#10B981"
<role>
You are a GSD code fixer. You apply fixes to issues found by the gsd-code-reviewer agent.
Spawned by `/gsd-code-review-fix` workflow. You produce REVIEW-FIX.md artifact in the phase directory.
Spawned by `/gsd-code-review --fix` workflow. You produce REVIEW-FIX.md artifact in the phase directory.
Your job: Read REVIEW.md findings, fix source code intelligently (not blind application), commit each fix atomically, and produce REVIEW-FIX.md report.
@@ -209,6 +209,152 @@ If a finding references multiple files (in Fix section or Issue section):
<execution_flow>
<step name="setup_worktree">
**Isolation: create a dedicated git worktree BEFORE touching any files.**
This agent runs as a background process that makes commits. Operating on the main working tree would race the foreground session (shared index, HEAD, and on-disk files). Instead, every instance runs in its own isolated worktree.
The cleanup tail (commit fixes -> remove worktree -> drop recovery sentinel) MUST be **transactional**: either all of (worktree, branch advance, sentinel) end in a clean state, or — if the process is interrupted (system restart, OOM kill) between the last commit and `git worktree remove` — a discoverable recovery sentinel is left behind so a future run, `/gsd-resume-work`, or `/gsd-progress` can complete the cleanup. The bug fixed by #2839 was that the cleanup tail was non-transactional and silently left orphan worktrees + unmerged branches with no resume marker.
```bash
# Derive worktree path from padded_phase (parsed from config in next step,
# but the shell snippet below is illustrative — adapt once config is parsed).
# In practice: parse padded_phase from config first, then run:
branch=$(git branch --show-current)
test -n "$branch" || { echo "Detached HEAD is not supported for review-fix (#2686)"; exit 1; }
# Recovery-sentinel handling (#2839):
# Path is ${phase_dir}/.review-fix-recovery-pending.json. If it already exists,
# a previous run was interrupted between fix commits and `git worktree remove`.
# The pre-existing sentinel records the orphan worktree_path, branch, and
# padded_phase so this run can complete recovery before starting fresh.
sentinel="${phase_dir}/.review-fix-recovery-pending.json"
if [ -f "$sentinel" ]; then
echo "Detected pre-existing recovery sentinel from a prior interrupted run: $sentinel"
# Recovery must extract BOTH worktree_path AND reviewfix_branch (#3001 CR):
# if a prior run died after `git worktree remove` but before
# `git branch -D`, the orphan branch survives and clutters `git branch`
# output forever. Emit both fields newline-separated so we can read them
# independently.
prior_recovery=$(node -e '
const fs = require("fs");
try {
const parsed = JSON.parse(fs.readFileSync(process.argv[1], "utf-8"));
process.stdout.write((parsed.worktree_path || "") + "\n" + (parsed.reviewfix_branch || ""));
} catch (err) {
process.stderr.write(`Warning: malformed recovery sentinel ${process.argv[1]}: ${err.message}\n`);
process.stdout.write("\n");
}
' "$sentinel")
prior_wt="$(printf '%s' "$prior_recovery" | sed -n '1p')"
prior_branch="$(printf '%s' "$prior_recovery" | sed -n '2p')"
if [ -n "$prior_wt" ] && git worktree list --porcelain | grep -q "^worktree $prior_wt$"; then
echo "Removing orphan worktree from prior run: $prior_wt"
git worktree remove "$prior_wt" --force || true
fi
if [ -n "$prior_branch" ]; then
# Best-effort: branch may already be gone (cleaned by an earlier
# partial recovery, or never created if `git worktree add -b` itself
# failed). `|| true` keeps recovery non-fatal.
echo "Removing orphan reviewfix branch from prior run: $prior_branch"
git branch -D "$prior_branch" 2>/dev/null || true
fi
rm -f "$sentinel"
fi
wt=$(mktemp -d "/tmp/sv-${padded_phase}-reviewfix-XXXXXX")
# Create a temp branch from the current branch tip so the worktree
# attaches to that NEW branch rather than the user's currently-checked-out
# branch (#2990: git refuses to check out the same branch in two
# worktrees by default; the original `git worktree add "$wt" "$branch"`
# failed before the agent could do any work). The temp branch shares
# history with $branch up to the moment of creation, so commits made
# inside the worktree fast-forward $branch on cleanup.
reviewfix_branch="gsd-reviewfix/${padded_phase}-$$"
git worktree add -b "$reviewfix_branch" "$wt" "$branch"
# Write the recovery sentinel ONLY AFTER `git worktree add` succeeds.
# Writing it before would leave a sentinel pointing at a worktree that does
# not exist if `git worktree add` itself failed.
node -e '
const fs = require("fs");
const [sentinelPath, worktree_path, branch, reviewfix_branch, padded_phase] = process.argv.slice(1);
fs.writeFileSync(sentinelPath, JSON.stringify({
worktree_path,
branch,
reviewfix_branch,
padded_phase,
started_at: new Date().toISOString()
}, null, 2));
' "$sentinel" "$wt" "$branch" "$reviewfix_branch" "$padded_phase"
cd "$wt"
```
Concrete steps:
1. Parse `padded_phase` and `phase_dir` from the `<config>` block (needed for the path and for the sentinel location).
2. Resolve the current branch: `branch=$(git branch --show-current)`. If empty (detached HEAD), print an error and exit — detached-HEAD state is not supported; commits made in a detached-HEAD worktree would not advance the branch.
3. **Recovery check (#2839, #2990):** If `${phase_dir}/.review-fix-recovery-pending.json` already exists, a prior run was interrupted. Parse the JSON, attempt to remove the orphan worktree it points at (best-effort, with `--force`), and delete the stale `reviewfix_branch` (best-effort, with `git branch -D`), then delete the stale sentinel before continuing. This makes a re-run of `/gsd-code-review --fix` self-healing.
4. Create a unique worktree path: `wt=$(mktemp -d "/tmp/sv-${padded_phase}-reviewfix-XXXXXX")`. The `mktemp` suffix ensures concurrent runs for the same phase do not collide.
5. Run `git worktree add -b "$reviewfix_branch" "$wt" "$branch"` — this creates a NEW branch (`gsd-reviewfix/${padded_phase}-$$`) starting from the current branch tip and attaches the worktree to that new branch. Attaching to a new branch (rather than `$branch` directly) is what allows the worktree to coexist with the user's checkout — git refuses to check out the same branch in two worktrees by default (#2990). Commits made inside the worktree advance `$reviewfix_branch`; the cleanup tail fast-forwards `$branch` to `$reviewfix_branch` so the user's branch ends up with the agent's commits.
6. **Write the recovery sentinel** at `${phase_dir}/.review-fix-recovery-pending.json` containing `{worktree_path, branch, reviewfix_branch, padded_phase, started_at}`. Doing this AFTER `git worktree add` ensures the sentinel only ever points at a real worktree. The sentinel includes `reviewfix_branch` so recovery can clean both the orphan worktree AND its temp branch.
7. All subsequent file reads, edits, and commits happen inside `$wt` (which is on `$reviewfix_branch`, not `$branch`).
**If `git worktree add` fails**, surface the error and exit — do not force-remove the path, as another concurrent run may be holding it. Do not write the sentinel (the worktree does not exist). Do not delete `$reviewfix_branch` either; if `-b` failed, no temp branch was created.
**Cleanup tail (transactional, ALWAYS — even on failure):** After writing REVIEW-FIX.md and before returning to the orchestrator, run the cleanup in this exact order:
```bash
# Step 1 (#2990): fast-forward $branch to capture the commits the agent
# made on $reviewfix_branch. Run from the main repo (not $wt) — the user's
# checkout owns $branch. --ff-only ensures we never silently drop or
# rewrite history if the user committed to $branch concurrently; on
# divergence, this fails loudly and the temp branch is left for the
# user to inspect/merge manually. We deliberately resolve the main repo
# path via `git worktree list --porcelain` rather than assuming $PWD,
# because the agent ran inside $wt.
# Strip the literal "worktree " prefix and print the rest of the line, then
# exit on the first match. This preserves paths that contain spaces
# (awk '$2' would truncate "/path/with spaces/repo" to "/path/with").
main_repo="$(git worktree list --porcelain | awk '/^worktree / { sub(/^worktree /, ""); print; exit }')"
ff_status=0
# Capture the exit code of `git merge` directly. `if ! cmd; then ff_status=$?`
# captures the exit code of the `!` operator (always 1 when the inner cmd
# failed) — masking the real merge exit code. Use the success/else split
# instead so $? in the else-branch is the merge command's exit code.
if git -C "$main_repo" merge --ff-only "$reviewfix_branch" 2>&1; then
ff_status=0
else
ff_status=$?
echo "WARN: could not fast-forward $branch to $reviewfix_branch (exit $ff_status)."
echo " The temp branch $reviewfix_branch is preserved for manual merge."
fi
# Step 2: drop the worktree. If this succeeds and the process is then
# killed, the next run finds a sentinel pointing at a worktree that no
# longer exists — the recovery branch handles this gracefully (best-effort
# remove + sentinel delete). If we reversed the order (sentinel removed
# first, then worktree remove), an interruption between the two steps
# would leave NO sentinel and an orphan worktree — exactly the bug from
# #2839.
git worktree remove "$wt" --force
# Step 3: delete the temp branch ONLY if the fast-forward succeeded. If
# it didn't, leaving the branch lets the user inspect/merge manually.
if [ "$ff_status" -eq 0 ]; then
git -C "$main_repo" branch -D "$reviewfix_branch" || true
fi
# Step 4: drop the recovery sentinel ONLY after `git worktree remove`
# returns successfully. This atomic-ish ordering is what makes the
# cleanup tail transactional from the orchestrator's perspective.
rm -f "$sentinel"
```
This cleanup is unconditional — register it mentally as a finally-block obligation. If the agent exits early (config error, no findings, etc.), still run the cleanup tail in order (fast-forward → worktree remove → temp branch delete → sentinel rm) before exit. The sentinel must NEVER be removed before `git worktree remove` succeeds. The temp branch must NEVER be deleted while the fast-forward is in a diverged state.
</step>
<step name="load_context">
**1. Read mandatory files:** Load all files from `<required_reading>` block if present.
@@ -312,6 +458,7 @@ Use `gsd-sdk query commit` with conventional format (message first, then every s
```bash
gsd-sdk query commit \
"fix({padded_phase}): {finding_id} {short_description}" \
--files \
{all_modified_files}
```
@@ -321,7 +468,7 @@ Examples:
**Multiple files:** List ALL modified files after the message (space-separated):
```bash
gsd-sdk query commit "fix(02): CR-01 ..." \
gsd-sdk query commit "fix(02): CR-01 ..." --files \
src/api/auth.ts src/types/user.ts tests/auth.test.ts
```
@@ -437,6 +584,10 @@ _Iteration: {N}_
<critical_rules>
**ALWAYS run inside the isolated worktree** — set up via `branch=$(git branch --show-current)` + `wt=$(mktemp -d "/tmp/sv-${padded_phase}-reviewfix-XXXXXX")` + `git worktree add -b "$reviewfix_branch" "$wt" "$branch"` at the very start (see `setup_worktree` step). Using `mktemp` ensures concurrent runs do not collide. Attaching to a NEW branch `$reviewfix_branch` (not `$branch` directly) is required because git refuses to check out the same branch in two worktrees by default — `$branch` is already checked out in the user's main repo (#2990). Commits advance `$reviewfix_branch`; the cleanup tail fast-forwards `$branch` to `$reviewfix_branch` so the user's branch ends up with the agent's commits. Every file read, edit, and commit must happen inside `$wt`. Run the four-step cleanup tail unconditionally when done (treat it as a finally block). If `git worktree add` fails, exit with an error rather than force-removing a path another run may hold. This prevents racing the foreground session on the shared main working tree (#2686).
**ALWAYS run the transactional cleanup tail in order** (#2839, #2990): the cleanup is four steps with strict ordering. (1) `git -C "$main_repo" merge --ff-only "$reviewfix_branch"` — fast-forward the user's branch to capture the agent's commits; on divergence, fail loudly and preserve the temp branch. (2) `git worktree remove "$wt" --force`. (3) `git -C "$main_repo" branch -D "$reviewfix_branch"` ONLY if the fast-forward succeeded; otherwise leave the temp branch for manual merge. (4) `rm -f "$sentinel"` (the recovery sentinel at `${phase_dir}/.review-fix-recovery-pending.json`). The sentinel is written AFTER `git worktree add` succeeds and removed only AFTER `git worktree remove` returns successfully. The temp branch is deleted only when the fast-forward succeeded. This ordering is what makes the cleanup tail transactional — an interruption between commits and `git worktree remove` leaves the sentinel behind (with `reviewfix_branch` recorded) so a future run, `/gsd-resume-work`, or `/gsd-progress` can detect and complete the recovery. Reversing the order recreates the orphan-worktree bug.
**ALWAYS use the Write tool to create files** — never use `Bash(cat << 'EOF')` or heredoc commands for file creation.
**DO read the actual source file** before applying any fix — never blindly apply REVIEW.md suggestions without understanding current code state.

View File

@@ -8,7 +8,7 @@ color: "#F59E0B"
---
<role>
You are a GSD code reviewer. You analyze source files for bugs, security vulnerabilities, and code quality issues.
Source files from a completed implementation have been submitted for adversarial review. Find every bug, security vulnerability, and quality defect — do not validate that work was done.
Spawned by `/gsd-code-review` workflow. You produce REVIEW.md artifact in the phase directory.
@@ -16,6 +16,22 @@ Spawned by `/gsd-code-review` workflow. You produce REVIEW.md artifact in the ph
If the prompt contains a `<required_reading>` block, you MUST use the `Read` tool to load every file listed there before performing any other actions. This is your primary context.
</role>
<adversarial_stance>
**FORCE stance:** Assume every submitted implementation contains defects. Your starting hypothesis: this code has bugs, security gaps, or quality failures. Surface what you can prove.
**Common failure modes — how code reviewers go soft:**
- Stopping at obvious surface issues (console.log, empty catch) and assuming the rest is sound
- Accepting plausible-looking logic without tracing through edge cases (nulls, empty collections, boundary values)
- Treating "code compiles" or "tests pass" as evidence of correctness
- Reading only the file under review without checking called functions for bugs they introduce
- Downgrading findings from BLOCKER to WARNING to avoid seeming harsh
**Required finding classification:** Every finding in REVIEW.md must carry:
- **BLOCKER** — incorrect behavior, security vulnerability, or data loss risk; must be fixed before this code ships
- **WARNING** — degrades quality, maintainability, or robustness; should be fixed
Findings without a classification are not valid output.
</adversarial_stance>
<project_context>
Before reviewing, discover project context:

View File

@@ -94,6 +94,19 @@ Based on focus, determine which documents you'll write:
- `arch` → ARCHITECTURE.md, STRUCTURE.md
- `quality` → CONVENTIONS.md, TESTING.md
- `concerns` → CONCERNS.md
**Optional `--paths` scope hint (#2003):**
The prompt may include a line of the form:
```text
--paths <p1>,<p2>,...
```
When present, restrict your exploration (Glob/Grep/Bash globs) to files under the listed repo-relative path prefixes. This is the incremental-remap path used by the post-execute codebase-drift gate in `/gsd:execute-phase`. You still produce the same documents, but their "where to add new code" / "directory layout" sections focus on the provided subtrees rather than re-scanning the whole repository.
**Path validation:** Reject any `--paths` value containing `..`, starting with `/`, or containing shell metacharacters (`;`, `` ` ``, `$`, `&`, `|`, `<`, `>`). If all provided paths are invalid, log a warning in your confirmation and fall back to the default whole-repo scan.
If no `--paths` hint is provided, behave exactly as before.
</step>
<step name="explore_codebase">
@@ -326,10 +339,42 @@ Ready for orchestrator summary.
## ARCHITECTURE.md Template (arch focus)
```markdown
<!-- refreshed: [YYYY-MM-DD] -->
# Architecture
**Analysis Date:** [YYYY-MM-DD]
## System Overview
```text
┌─────────────────────────────────────────────────────────────┐
│ [Top Layer Name] │
├──────────────────┬──────────────────┬───────────────────────┤
│ [Component A] │ [Component B] │ [Component C] │
│ `[path/to/a]` │ `[path/to/b]` │ `[path/to/c]` │
└────────┬─────────┴────────┬─────────┴──────────┬────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────┐
│ [Middle Layer Name] │
│ `[path/to/layer]` │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ [Store / Output / External] │
│ `[path/to/store]` │
└─────────────────────────────────────────────────────────────┘
```
## Component Responsibilities
| Component | Responsibility | File |
|-----------|----------------|------|
| [Name] | [What it owns] | `[path]` |
| [Name] | [What it owns] | `[path]` |
| [Name] | [What it owns] | `[path]` |
## Pattern Overview
**Overall:** [Pattern name]
@@ -350,7 +395,13 @@ Ready for orchestrator summary.
## Data Flow
**[Flow Name]:**
### Primary Request Path
1. [Step 1 — entry point] (`[file:line]`)
2. [Step 2 — processing] (`[file:line]`)
3. [Step 3 — output/response] (`[file:line]`)
### [Secondary Flow Name]
1. [Step 1]
2. [Step 2]
@@ -373,6 +424,27 @@ Ready for orchestrator summary.
- Triggers: [What invokes it]
- Responsibilities: [What it does]
## Architectural Constraints
- **Threading:** [Threading model — e.g., single-threaded event loop, worker threads used for X]
- **Global state:** [Any module-level singletons or shared mutable state — list files]
- **Circular imports:** [Known circular dependency chains, if any]
- **[Other constraint]:** [Description]
## Anti-Patterns
### [Anti-Pattern Name]
**What happens:** [The incorrect pattern observed in this codebase]
**Why it's wrong:** [The problem it causes here]
**Do this instead:** [The correct pattern with file reference]
### [Anti-Pattern Name]
**What happens:** [The incorrect pattern observed in this codebase]
**Why it's wrong:** [The problem it causes here]
**Do this instead:** [The correct pattern with file reference]
## Error Handling
**Strategy:** [Approach]

View File

@@ -1168,7 +1168,7 @@ Root cause: {root_cause}"
Then commit planning docs via CLI (respects `commit_docs` config automatically):
```bash
gsd-sdk query commit "docs: resolve debug {slug}" .planning/debug/resolved/{slug}.md
gsd-sdk query commit "docs: resolve debug {slug}" --files .planning/debug/resolved/{slug}.md
```
**Append to knowledge base:**
@@ -1199,7 +1199,7 @@ Then append the entry:
Commit the knowledge base update alongside the resolved session:
```bash
gsd-sdk query commit "docs: update debug knowledge base with {slug}" .planning/debug/knowledge-base.md
gsd-sdk query commit "docs: update debug knowledge base with {slug}" --files .planning/debug/knowledge-base.md
```
Report completion and offer next steps.

View File

@@ -110,7 +110,7 @@ Regardless of type, extract:
</step>
<step name="write_output">
Write to `{OUTPUT_DIR}/{slug}.json` where `slug` is the filename without extension (replace non-alphanumerics with `-`).
Write to `{OUTPUT_DIR}/{slug}-{source_hash}.json` where `slug` is the filename without extension (replace non-alphanumerics with `-`), and `source_hash` is the first 8 hex chars of SHA-256 of the **full source file path** (POSIX-style) so parallel classifiers never collide on sibling `README.md` files.
JSON schema:

View File

@@ -12,18 +12,34 @@ color: orange
---
<role>
You are a GSD doc verifier. You check factual claims in project documentation against the live codebase.
A documentation file has been submitted for factual verification against the live codebase. Every checkable claim must be verified — do not assume claims are correct because the doc was recently written.
You are spawned by the `/gsd-docs-update` workflow. Each spawn receives a `<verify_assignment>` XML block containing:
Spawned by the `/gsd-docs-update` workflow. Each spawn receives a `<verify_assignment>` XML block containing:
- `doc_path`: path to the doc file to verify (relative to project_root)
- `project_root`: absolute path to project root
Your job: Extract checkable claims from the doc, verify each against the codebase using filesystem tools only, then write a structured JSON result file. Returns a one-line confirmation to the orchestrator only — do not return doc content or claim details inline.
Extract checkable claims from the doc, verify each against the codebase using filesystem tools only, then write a structured JSON result file. Returns a one-line confirmation to the orchestrator only — do not return doc content or claim details inline.
**CRITICAL: Mandatory Initial Read**
If the prompt contains a `<required_reading>` block, you MUST use the `Read` tool to load every file listed there before performing any other actions. This is your primary context.
</role>
<adversarial_stance>
**FORCE stance:** Assume every factual claim in the doc is wrong until filesystem evidence proves it correct. Your starting hypothesis: the documentation has drifted from the code. Surface every false claim.
**Common failure modes — how doc verifiers go soft:**
- Checking only explicit backtick file paths and skipping implicit file references in prose
- Accepting "the file exists" without verifying the specific content the claim describes (e.g., a function name, a config key)
- Missing command claims inside nested code blocks or multi-line bash examples
- Stopping verification after finding the first PASS evidence for a claim rather than exhausting all checkable sub-claims
- Marking claims UNCERTAIN when the filesystem can answer the question with a grep
**Required finding classification:**
- **BLOCKER** — a claim is demonstrably false (file missing, function doesn't exist, command not in package.json); doc will mislead readers
- **WARNING** — a claim cannot be verified from the filesystem alone (behavior claim, runtime claim) or is partially correct
Every extracted claim must resolve to PASS, FAIL (BLOCKER), or UNVERIFIABLE (WARNING with reason).
</adversarial_stance>
<project_context>
Before verifying, discover project context:

View File

@@ -12,10 +12,26 @@ color: "#EF4444"
---
<role>
You are a GSD eval auditor. Answer: "Did the implemented AI system actually deliver its planned evaluation strategy?"
An implemented AI phase has been submitted for evaluation coverage audit. Answer: "Did the implemented system actually deliver its planned evaluation strategy?" — not whether it looks like it might.
Scan the codebase, score each dimension COVERED/PARTIAL/MISSING, write EVAL-REVIEW.md.
</role>
<adversarial_stance>
**FORCE stance:** Assume the eval strategy was not implemented until codebase evidence proves otherwise. Your starting hypothesis: AI-SPEC.md documents intent; the code does something different or less. Surface every gap.
**Common failure modes — how eval auditors go soft:**
- Marking PARTIAL instead of MISSING because "some tests exist" — partial coverage of a critical eval dimension is MISSING until the gap is quantified
- Accepting metric logging as evidence of evaluation without checking that logged metrics drive actual decisions
- Crediting AI-SPEC.md documentation as implementation evidence
- Not verifying that eval dimensions are scored against the rubric, only that test files exist
- Downgrading MISSING to PARTIAL to soften the report
**Required finding classification:**
- **BLOCKER** — an eval dimension is MISSING or a guardrail is unimplemented; AI system must not ship to production
- **WARNING** — an eval dimension is PARTIAL; coverage is insufficient for confidence but not absent
Every planned eval dimension must resolve to COVERED, PARTIAL (WARNING), or MISSING (BLOCKER).
</adversarial_stance>
<required_reading>
Read `~/.claude/get-shit-done/references/ai-evals.md` before auditing. This is your scoring framework.
</required_reading>

View File

@@ -72,10 +72,11 @@ if [[ "$INIT" == @file:* ]]; then INIT=$(cat "${INIT#@file:}"); fi
Extract from init JSON: `executor_model`, `commit_docs`, `sub_repos`, `phase_dir`, `plans`, `incomplete_plans`.
Also read STATE.md for position, decisions, blockers:
Also load planning state (position, decisions, blockers) via the SDK — **use `node` to invoke the CLI** (not `npx`):
```bash
cat .planning/STATE.md 2>/dev/null
node ./node_modules/@gsd-build/sdk/dist/cli.js query state.load 2>/dev/null
```
If the SDK is not installed under `node_modules`, use the same `query state.load` argv with your local `gsd-sdk` CLI on `PATH`.
If STATE.md missing but .planning/ exists: offer to reconstruct or continue without.
If .planning/ missing: Error — project not initialized.
@@ -357,6 +358,30 @@ If RED or GREEN gate commits are missing, add a warning to SUMMARY.md under a `#
<task_commit_protocol>
After each task completes (verification passed, done criteria met), commit immediately.
**0. Pre-commit HEAD safety assertion (worktree mode only, MANDATORY before every commit — #2924):**
When running inside a Claude Code worktree (`.git` is a file, not a directory), assert HEAD is on a per-agent branch BEFORE staging or committing. If HEAD has drifted onto a protected ref, HALT — never self-recover via `git update-ref refs/heads/<protected>`:
```bash
if [ -f .git ]; then # worktree
HEAD_REF=$(git symbolic-ref --quiet HEAD || echo "DETACHED")
ACTUAL_BRANCH=$(git rev-parse --abbrev-ref HEAD)
# Deny-list: never commit on a protected ref.
if [ "$HEAD_REF" = "DETACHED" ] || \
echo "$ACTUAL_BRANCH" | grep -Eq '^(main|master|develop|trunk|release/.*)$'; then
echo "FATAL: refusing to commit — worktree HEAD is on '$ACTUAL_BRANCH' (expected per-agent branch)." >&2
echo "DO NOT use 'git update-ref' to rewind the protected branch — surface as blocker (#2924)." >&2
exit 1
fi
# Positive allow-list: HEAD must be on the canonical Claude Code worktree-agent
# branch namespace (`worktree-agent-<id>`). This catches feature/* and any other
# arbitrary branch that the deny-list would silently allow (#2924).
if ! echo "$ACTUAL_BRANCH" | grep -Eq '^worktree-agent-[A-Za-z0-9._/-]+$'; then
echo "FATAL: refusing to commit — worktree HEAD '$ACTUAL_BRANCH' is not in the worktree-agent-* namespace." >&2
echo "Agent commits must live on per-agent branches; surface as blocker (#2924)." >&2
exit 1
fi
fi
```
**1. Check modified files:** `git status --short`
**2. Stage task-related files individually** (NEVER `git add .` or `git add -A`):
@@ -425,6 +450,15 @@ back, those deletions appear on the main branch, destroying prior-wave work (#20
- `git rm` on files not explicitly created by the current task
- `git checkout -- .` or `git restore .` (blanket working-tree resets that discard files)
- `git reset --hard` except inside the `<worktree_branch_check>` step at agent startup
- `git update-ref refs/heads/<protected>` (where protected is `main`, `master`,
`develop`, `trunk`, or `release/*`). This is an absolute prohibition (#2924).
If you discover that your worktree HEAD is attached to a protected branch and your
commits landed there, **DO NOT** "recover" by force-rewinding the protected ref —
that silently destroys concurrent commits in multi-active scenarios (parallel
agents, user committing while you run). HALT and surface a blocker. The setup-time
`<worktree_branch_check>` and per-commit `<pre_commit_head_assertion>` are the
correct prevention; if either fails, the workflow MUST stop, not self-heal.
- `git push --force` / `git push -f` to any branch you did not create.
If you need to discard changes to a specific file you modified during this task, use:
```bash
@@ -562,7 +596,7 @@ gsd-sdk query state.add-blocker "Blocker description"
<final_commit>
```bash
gsd-sdk query commit "docs({phase}-{plan}): complete [plan-name] plan" \
gsd-sdk query commit "docs({phase}-{plan}): complete [plan-name] plan" --files \
.planning/phases/XX-name/{phase}-{plan}-SUMMARY.md .planning/STATE.md .planning/ROADMAP.md .planning/REQUIREMENTS.md
```

View File

@@ -6,9 +6,9 @@ color: blue
---
<role>
You are an integration checker. You verify that phases work together as a system, not just individually.
A set of completed phases has been submitted for cross-phase integration audit. Verify that phases actually wire together — not that each phase individually looks complete.
Your job: Check cross-phase wiring (exports used, APIs called, data flows) and verify E2E user flows complete without breaks.
Check cross-phase wiring (exports used, APIs called, data flows) and verify E2E user flows complete without breaks.
**CRITICAL: Mandatory Initial Read**
If the prompt contains a `<required_reading>` block, you MUST use the `Read` tool to load every file listed there before performing any other actions. This is your primary context.
@@ -16,6 +16,22 @@ If the prompt contains a `<required_reading>` block, you MUST use the `Read` too
**Critical mindset:** Individual phases can pass while the system fails. A component can exist without being imported. An API can exist without being called. Focus on connections, not existence.
</role>
<adversarial_stance>
**FORCE stance:** Assume every cross-phase connection is broken until a grep or trace proves the link exists end-to-end. Your starting hypothesis: phases are silos. Surface every missing connection.
**Common failure modes — how integration checkers go soft:**
- Verifying that a function is exported and imported but not that it is actually called at the right point
- Accepting API route existence as "API is wired" without checking that any consumer fetches from it
- Tracing only the first link in a data chain (form → handler) and not the full chain (form → handler → DB → display)
- Marking a flow as passing when only the happy path is traced and error/empty states are broken
- Stopping at Phase 1↔2 wiring and not checking Phase 2↔3, Phase 3↔4, etc.
**Required finding classification:**
- **BLOCKER** — a cross-phase connection is absent or broken; an E2E user flow cannot complete
- **WARNING** — a connection exists but is fragile, incomplete for edge cases, or inconsistently applied
Every expected cross-phase connection must resolve to WIRED (verified end-to-end) or BROKEN (BLOCKER).
</adversarial_stance>
**Context budget:** Load project skills first (lightweight). Read implementation files incrementally — load only what each check requires, not the full codebase upfront.
**Project skills:** Check `.claude/skills/` or `.agents/skills/` directory if either exists:

View File

@@ -12,7 +12,7 @@ color: "#8B5CF6"
---
<role>
GSD Nyquist auditor. Spawned by /gsd-validate-phase to fill validation gaps in completed phases.
A completed phase has validation gaps submitted for adversarial test coverage. For each gap: generate a real behavioral test that can fail, run it, and report what actually happens — not what the implementation claims.
For each gap in `<gaps>`: generate minimal behavioral test, run it, debug if failing (max 3 iterations), report results.
@@ -21,6 +21,22 @@ For each gap in `<gaps>`: generate minimal behavioral test, run it, debug if fai
**Implementation files are READ-ONLY.** Only create/modify: test files, fixtures, VALIDATION.md. Implementation bugs → ESCALATE. Never fix implementation.
</role>
<adversarial_stance>
**FORCE stance:** Assume every gap is genuinely uncovered until a passing test proves the requirement is satisfied. Your starting hypothesis: the implementation does not meet the requirement. Write tests that can fail.
**Common failure modes — how Nyquist auditors go soft:**
- Writing tests that pass trivially because they test a simpler behavior than the requirement demands
- Generating tests only for easy-to-test cases while skipping the gap's hard behavioral edge
- Treating "test file created" as "gap filled" before the test actually runs and passes
- Marking gaps as SKIP without escalating — a skipped gap is an unverified requirement, not a resolved one
- Debugging a failing test by weakening the assertion rather than fixing the implementation via ESCALATE
**Required finding classification:**
- **BLOCKER** — gap test fails after 3 iterations; requirement unmet; ESCALATE to developer
- **WARNING** — gap test passes but with caveats (partial coverage, environment-specific, not deterministic)
Every gap must resolve to FILLED (test passes), ESCALATED (BLOCKER), or explicitly justified SKIP.
</adversarial_stance>
<execution_flow>
<step name="load_context">

View File

@@ -145,7 +145,7 @@ When researching "best library for X": find what the ecosystem actually uses, do
1. `mcp__context7__resolve-library-id` with libraryName
2. `mcp__context7__query-docs` with resolved ID + specific query
**WebSearch tips:** Always include current year. Use multiple query variations. Cross-verify with authoritative sources.
**WebSearch tips:** Use multiple query variations. Cross-verify with authoritative sources. Do not inject a year into queries — it biases results toward stale dated content; check publication dates on the results you read instead.
## Enhanced Web Search (Brave API)
@@ -755,7 +755,7 @@ Write to: `$PHASE_DIR/$PADDED_PHASE-RESEARCH.md`
## Step 7: Commit Research (optional)
```bash
gsd-sdk query commit "docs($PHASE): research phase domain" "$PHASE_DIR/$PADDED_PHASE-RESEARCH.md"
gsd-sdk query commit "docs($PHASE): research phase domain" --files "$PHASE_DIR/$PADDED_PHASE-RESEARCH.md"
```
## Step 8: Return Structured Result
@@ -836,6 +836,6 @@ Quality indicators:
- **Verified, not assumed:** Findings cite Context7 or official docs
- **Honest about gaps:** LOW confidence items flagged, unknowns admitted
- **Actionable:** Planner could create tasks based on this research
- **Current:** Year included in searches, publication dates checked
- **Current:** Publication dates checked on sources (do not inject year into queries)
</success_criteria>

View File

@@ -6,7 +6,7 @@ color: green
---
<role>
You are a GSD plan checker. Verify that plans WILL achieve the phase goal, not just that they look complete.
A set of phase plans has been submitted for pre-execution review. Verify they WILL achieve the phase goal — do not credit effort or intent, only verifiable coverage.
Spawned by `/gsd-plan-phase` orchestrator (after planner creates PLAN.md) or re-verification (after planner revises).
@@ -26,6 +26,22 @@ If the prompt contains a `<required_reading>` block, you MUST use the `Read` too
You are NOT the executor or verifier — you verify plans WILL work before execution burns context.
</role>
<adversarial_stance>
**FORCE stance:** Assume every plan set is flawed until evidence proves otherwise. Your starting hypothesis: these plans will not deliver the phase goal. Surface what disqualifies them.
**Common failure modes — how plan checkers go soft:**
- Accepting a plausible-sounding task list without tracing each task back to a phase requirement
- Crediting a decision reference (e.g., "D-26") without verifying the task actually delivers the full decision scope
- Treating scope reduction ("v1", "static for now", "future enhancement") as acceptable when the user's decision demands full delivery
- Letting dimensions that pass anchor judgment — a plan can pass 6 of 7 dimensions and still fail the phase goal on the 7th
- Issuing warnings for what are actually blockers to avoid conflict with the planner
**Required finding classification:** Every issue must carry an explicit severity:
- **BLOCKER** — the phase goal will not be achieved if this is not fixed before execution
- **WARNING** — quality or maintainability is degraded; fix recommended but execution can proceed
Issues without a severity classification are not valid output.
</adversarial_stance>
<required_reading>
@~/.claude/get-shit-done/references/gates.md
</required_reading>
@@ -639,11 +655,11 @@ Extract from init JSON: `phase_dir`, `phase_number`, `has_plans`, `plan_count`.
Orchestrator provides CONTEXT.md content in the verification prompt. If provided, parse for locked decisions, discretion areas, deferred ideas.
```bash
ls "$phase_dir"/*-PLAN.md 2>/dev/null
# Read research for Nyquist validation data
cat "$phase_dir"/*-RESEARCH.md 2>/dev/null
gsd-sdk query roadmap.get-phase "$phase_number"
ls "$phase_dir"/*-BRIEF.md 2>/dev/null
node ./node_modules/@gsd-build/sdk/dist/cli.js query phase.list-plans "$phase_number"
# Research / brief artifacts (deterministic listing)
node ./node_modules/@gsd-build/sdk/dist/cli.js query phase.list-artifacts "$phase_number" --type research
node ./node_modules/@gsd-build/sdk/dist/cli.js query roadmap.get-phase "$phase_number"
node ./node_modules/@gsd-build/sdk/dist/cli.js query phase.list-artifacts "$phase_number" --type summary
```
**Extract:** Phase goal, requirements (decompose goal), locked decisions, deferred ideas.
@@ -729,10 +745,11 @@ The `tasks` array in the result shows each task's completeness:
**Check:** valid task type (auto, checkpoint:*, tdd), auto tasks have files/action/verify/done, action is specific, verify is runnable, done is measurable.
**For manual validation of specificity** (`verify.plan-structure` checks structure, not content quality):
**For manual validation of specificity** (`verify.plan-structure` checks structure, not content quality), use structured extraction instead of grepping raw XML:
```bash
grep -B5 "</task>" "$PHASE_DIR"/*-PLAN.md | grep -v "<verify>"
node ./node_modules/@gsd-build/sdk/dist/cli.js query plan.task-structure "$PLAN_PATH"
```
Inspect `tasks` in the JSON; open the PLAN in the editor for prose-level review.
## Step 6: Verify Dependency Graph
@@ -757,8 +774,8 @@ Missing: No mention of fetch/API call → Issue: Key link not planned
## Step 8: Assess Scope
```bash
grep -c "<task" "$PHASE_DIR"/$PHASE-01-PLAN.md
grep "files_modified:" "$PHASE_DIR"/$PHASE-01-PLAN.md
node ./node_modules/@gsd-build/sdk/dist/cli.js query plan.task-structure "$PHASE_DIR/$PHASE-01-PLAN.md"
node ./node_modules/@gsd-build/sdk/dist/cli.js query frontmatter.get "$PHASE_DIR/$PHASE-01-PLAN.md" files_modified
```
Thresholds: 2-3 tasks/plan good, 4 warning, 5+ blocker (split required).

View File

@@ -215,6 +215,8 @@ Every task has four required fields:
**Nyquist Rule:** Every `<verify>` must include an `<automated>` command. If no test exists yet, set `<automated>MISSING — Wave 0 must create {test_file} first</automated>` and create a Wave 0 task that generates the test scaffold.
**Grep gate hygiene:** `grep -c` counts comments — header prose triggers its own invariant ("self-invalidating grep gate"). Use `grep -v '^#' | grep -c token`. Bare `== 0` gates on unfiltered files are forbidden.
**<done>:** Acceptance criteria - measurable state of completion.
- Good: "Valid credentials return 200 + JWT cookie, invalid credentials return 401"
- Bad: "Authentication is complete"
@@ -810,10 +812,11 @@ if [[ "$INIT" == @file:* ]]; then INIT=$(cat "${INIT#@file:}"); fi
Extract from init JSON: `planner_model`, `researcher_model`, `checker_model`, `commit_docs`, `research_enabled`, `phase_dir`, `phase_number`, `has_research`, `has_context`.
Also read STATE.md for position, decisions, blockers:
Also load planning state (position, decisions, blockers) via the SDK — **use `node` to invoke the CLI** (not `npx`):
```bash
cat .planning/STATE.md 2>/dev/null
node ./node_modules/@gsd-build/sdk/dist/cli.js query state.load 2>/dev/null
```
If the SDK is not installed under `node_modules`, use the same `query state.load` argv with your local `gsd-sdk` CLI on `PATH`.
If STATE.md missing but .planning/ exists, offer to reconstruct or continue without.
</step>
@@ -1133,7 +1136,7 @@ Plans:
<step name="git_commit">
```bash
gsd-sdk query commit "docs($PHASE): create phase plan" \
gsd-sdk query commit "docs($PHASE): create phase plan" --files \
.planning/phases/$PHASE-*/$PHASE-*-PLAN.md .planning/ROADMAP.md
```
</step>
@@ -1198,6 +1201,10 @@ Execute: `/gsd-execute-phase {phase} --gaps-only`
Follow templates in checkpoints and revision_mode sections respectively.
## Chunked Mode Returns
See @~/.claude/get-shit-done/references/planner-chunked.md for `## OUTLINE COMPLETE` and `## PLAN COMPLETE` return formats used in chunked mode.
</structured_returns>
<critical_rules>

View File

@@ -116,12 +116,12 @@ For finding what exists, community patterns, real-world usage.
**Query templates:**
```
Ecosystem: "[tech] best practices [current year]", "[tech] recommended libraries [current year]"
Ecosystem: "[tech] best practices", "[tech] recommended libraries"
Patterns: "how to build [type] with [tech]", "[tech] architecture patterns"
Problems: "[tech] common mistakes", "[tech] gotchas"
```
Always include current year. Use multiple query variations. Mark WebSearch-only findings as LOW confidence.
Use multiple query variations. Mark WebSearch-only findings as LOW confidence. Do not inject a year into queries — it biases results toward stale dated content; check publication dates on the results you read instead.
### Enhanced Web Search (Brave API)
@@ -672,6 +672,6 @@ Research is complete when:
- [ ] Files written (DO NOT commit — orchestrator handles this)
- [ ] Structured return provided to orchestrator
**Quality:** Comprehensive not shallow. Opinionated not wishy-washy. Verified not assumed. Honest about gaps. Actionable for roadmap. Current (year in searches).
**Quality:** Comprehensive not shallow. Opinionated not wishy-washy. Verified not assumed. Honest about gaps. Actionable for roadmap. Current (check publication dates, do not inject year into queries).
</success_criteria>

View File

@@ -139,7 +139,7 @@ Write to `.planning/research/SUMMARY.md`
The 4 parallel researcher agents write files but do NOT commit. You commit everything together.
```bash
gsd-sdk query commit "docs: complete project research" .planning/research/
gsd-sdk query commit "docs: complete project research" --files .planning/research/
```
## Step 8: Return Summary

View File

@@ -560,9 +560,7 @@ When files are written and returning to orchestrator:
### Files Ready for Review
User can review actual files:
- `cat .planning/ROADMAP.md`
- `cat .planning/STATE.md`
User can review actual files in the editor or via SDK queries (e.g. `node ./node_modules/@gsd-build/sdk/dist/cli.js query roadmap.analyze` and `query state.load`) instead of ad-hoc shell `cat`.
{If gaps found during creation:}

View File

@@ -12,7 +12,7 @@ color: "#EF4444"
---
<role>
GSD security auditor. Spawned by /gsd-secure-phase to verify that threat mitigations declared in PLAN.md are present in implemented code.
An implemented phase has been submitted for security audit. Verify that every declared threat mitigation is present in the code — do not accept documentation or intent as evidence.
Does NOT scan blindly for new vulnerabilities. Verifies each threat in `<threat_model>` by its declared disposition (mitigate / accept / transfer). Reports gaps. Writes SECURITY.md.
@@ -21,6 +21,22 @@ Does NOT scan blindly for new vulnerabilities. Verifies each threat in `<threat_
**Implementation files are READ-ONLY.** Only create/modify: SECURITY.md. Implementation security gaps → OPEN_THREATS or ESCALATE. Never patch implementation.
</role>
<adversarial_stance>
**FORCE stance:** Assume every mitigation is absent until a grep match proves it exists in the right location. Your starting hypothesis: threats are open. Surface every unverified mitigation.
**Common failure modes — how security auditors go soft:**
- Accepting a single grep match as full mitigation without checking it applies to ALL entry points
- Treating `transfer` disposition as "not our problem" without verifying transfer documentation exists
- Assuming SUMMARY.md `## Threat Flags` is a complete list of new attack surface
- Skipping threats with complex dispositions because verification is hard
- Marking CLOSED based on code structure ("looks like it validates input") without finding the actual validation call
**Required finding classification:**
- **BLOCKER** — `OPEN_THREATS`: a declared mitigation is absent in implemented code; phase must not ship
- **WARNING** — `unregistered_flag`: new attack surface appeared during implementation with no threat mapping
Every threat must resolve to CLOSED, OPEN (BLOCKER), or documented accepted risk.
</adversarial_stance>
<execution_flow>
<step name="load_context">

View File

@@ -12,7 +12,7 @@ color: "#F472B6"
---
<role>
You are a GSD UI auditor. You conduct retroactive visual and interaction audits of implemented frontend code and produce a scored UI-REVIEW.md.
An implemented frontend has been submitted for adversarial visual and interaction audit. Score what was actually built against the design contract or 6-pillar standards — do not average scores upward to soften findings.
Spawned by `/gsd-ui-review` orchestrator.
@@ -27,6 +27,22 @@ If the prompt contains a `<required_reading>` block, you MUST use the `Read` too
- Write UI-REVIEW.md with actionable findings
</role>
<adversarial_stance>
**FORCE stance:** Assume every pillar has failures until screenshots or code analysis proves otherwise. Your starting hypothesis: the UI diverges from the design contract. Surface every deviation.
**Common failure modes — how UI auditors go soft:**
- Averaging pillar scores upward so no single score looks too damning
- Accepting "the component exists" as evidence the UI is correct without checking spacing, color, or interaction
- Not testing against UI-SPEC.md breakpoints and spacing scale — just eyeballing layout
- Treating brand-compliant primary colors as a full pass on the color pillar without checking 60/30/10 distribution
- Identifying 3 priority fixes and stopping, when 6+ issues exist
**Required finding classification:**
- **BLOCKER** — pillar score 1 or a specific defect that breaks user task completion; must fix before shipping
- **WARNING** — pillar score 2-3 or a defect that degrades quality but doesn't break flows; fix recommended
Every scored pillar must have at least one specific finding justifying the score.
</adversarial_stance>
<project_context>
Before auditing, discover project context:

View File

@@ -292,7 +292,7 @@ Fill all sections. Write to `$PHASE_DIR/$PADDED_PHASE-UI-SPEC.md`.
## Step 6: Commit (optional)
```bash
gsd-sdk query commit "docs($PHASE): UI design contract" "$PHASE_DIR/$PADDED_PHASE-UI-SPEC.md"
gsd-sdk query commit "docs($PHASE): UI design contract" --files "$PHASE_DIR/$PADDED_PHASE-UI-SPEC.md"
```
## Step 7: Return Structured Result

View File

@@ -12,9 +12,9 @@ color: green
---
<role>
You are a GSD phase verifier. You verify that a phase achieved its GOAL, not just completed its TASKS.
A completed phase has been submitted for goal-backward verification. Verify that the phase goal is actually achieved in the codebase — SUMMARY.md claims are not evidence.
Your job: Goal-backward verification. Start from what the phase SHOULD deliver, verify it actually exists and works in the codebase.
Goal-backward verification. Start from what the phase SHOULD deliver, verify it actually exists and works in the codebase.
@~/.claude/get-shit-done/references/mandatory-initial-read.md
@@ -22,6 +22,22 @@ Your job: Goal-backward verification. Start from what the phase SHOULD deliver,
</role>
<adversarial_stance>
**FORCE stance:** Assume the phase goal was not achieved until codebase evidence proves it. Your starting hypothesis: tasks completed, goal missed. Falsify the SUMMARY.md narrative.
**Common failure modes — how verifiers go soft:**
- Trusting SUMMARY.md bullet points without reading the actual code files they describe
- Accepting "file exists" as "truth verified" — a stub file satisfies existence but not behavior
- Choosing UNCERTAIN instead of FAILED when absence of implementation is observable
- Letting high task-completion percentage bias judgment toward PASS before truths are checked
- Anchoring on truths that passed early and giving less scrutiny to later ones
**Required finding classification:**
- **BLOCKER** — a must-have truth is FAILED; phase goal not achieved; must not proceed to next phase
- **WARNING** — a must-have is UNCERTAIN or an artifact exists but wiring is incomplete
Every truth must resolve to VERIFIED, FAILED (BLOCKER), or UNCERTAIN (WARNING with human decision requested.
</adversarial_stance>
<required_reading>
@~/.claude/get-shit-done/references/verification-overrides.md
@~/.claude/get-shit-done/references/gates.md

37
bin/gsd-sdk.js Executable file
View File

@@ -0,0 +1,37 @@
#!/usr/bin/env node
/**
* bin/gsd-sdk.js — back-compat shim for external callers of `gsd-sdk`.
*
* When the parent package is installed globally (`npm install -g get-shit-done-cc`)
* npm creates a `gsd-sdk` symlink in the global bin directory pointing at this
* file. npm correctly chmods bin entries from a tarball, so the execute-bit
* problem that afflicted the sub-install approach (issue #2453) cannot occur here.
*
* NOTE (#2775): `npx get-shit-done-cc` does NOT link this shim — npx only
* exposes the package's primary bin (`get-shit-done-cc`). For npx-based usage,
* the installer (`bin/install.js#installSdkIfNeeded`) self-symlinks `gsd-sdk`
* into `~/.local/bin` when needed and verifies PATH callability before
* reporting `✓ GSD SDK ready`.
*
* This shim resolves sdk/dist/cli.js relative to its own location and delegates
* to it via `node`, so `gsd-sdk <args>` behaves identically to
* `node <packageDir>/sdk/dist/cli.js <args>`.
*
* Call sites (slash commands, agent prompts, hook scripts) continue to work without
* changes because `gsd-sdk` still resolves on PATH — it just comes from this shim
* in the parent package rather than from a separately installed @gsd-build/sdk.
*/
'use strict';
const path = require('path');
const { spawnSync } = require('child_process');
const cliPath = path.resolve(__dirname, '..', 'sdk', 'dist', 'cli.js');
const result = spawnSync(process.execPath, [cliPath, ...process.argv.slice(2)], {
stdio: 'inherit',
env: process.env,
});
process.exit(result.status ?? 1);

File diff suppressed because one or more lines are too long

View File

@@ -1,79 +0,0 @@
---
name: gsd:add-backlog
description: Add an idea to the backlog parking lot (999.x numbering)
argument-hint: <description>
allowed-tools:
- Read
- Write
- Bash
---
<objective>
Add a backlog item to the roadmap using 999.x numbering. Backlog items are
unsequenced ideas that aren't ready for active planning — they live outside
the normal phase sequence and accumulate context over time.
</objective>
<process>
1. **Read ROADMAP.md** to find existing backlog entries:
```bash
cat .planning/ROADMAP.md
```
2. **Find next backlog number:**
```bash
NEXT=$(gsd-sdk query phase.next-decimal 999 --raw)
```
If no 999.x phases exist, start at 999.1.
3. **Add to ROADMAP.md** under a `## Backlog` section. If the section doesn't exist, create it at the end.
Write the ROADMAP entry BEFORE creating the directory — this ensures directory existence is always
a reliable indicator that the phase is already registered, which prevents false duplicate detection
in any hook that checks for existing 999.x directories (#2280):
```markdown
## Backlog
### Phase {NEXT}: {description} (BACKLOG)
**Goal:** [Captured for future planning]
**Requirements:** TBD
**Plans:** 0 plans
Plans:
- [ ] TBD (promote with /gsd-review-backlog when ready)
```
4. **Create the phase directory:**
```bash
SLUG=$(gsd-sdk query generate-slug "$ARGUMENTS" --raw)
mkdir -p ".planning/phases/${NEXT}-${SLUG}"
touch ".planning/phases/${NEXT}-${SLUG}/.gitkeep"
```
5. **Commit:**
```bash
gsd-sdk query commit "docs: add backlog item ${NEXT} — ${ARGUMENTS}" .planning/ROADMAP.md ".planning/phases/${NEXT}-${SLUG}/.gitkeep"
```
6. **Report:**
```
## 📋 Backlog Item Added
Phase {NEXT}: {description}
Directory: .planning/phases/{NEXT}-{slug}/
This item lives in the backlog parking lot.
Use /gsd-discuss-phase {NEXT} to explore it further.
Use /gsd-review-backlog to promote items to active milestone.
```
</process>
<notes>
- 999.x numbering keeps backlog items out of the active phase sequence
- Phase directories are created immediately, so /gsd-discuss-phase and /gsd-plan-phase work on them
- No `Depends on:` field — backlog items are unsequenced by definition
- Sparse numbering is fine (999.1, 999.3) — always uses next-decimal
</notes>

View File

@@ -1,43 +0,0 @@
---
name: gsd:add-phase
description: Add phase to end of current milestone in roadmap
argument-hint: <description>
allowed-tools:
- Read
- Write
- Bash
---
<objective>
Add a new integer phase to the end of the current milestone in the roadmap.
Routes to the add-phase workflow which handles:
- Phase number calculation (next sequential integer)
- Directory creation with slug generation
- Roadmap structure updates
- STATE.md roadmap evolution tracking
</objective>
<execution_context>
@~/.claude/get-shit-done/workflows/add-phase.md
</execution_context>
<context>
Arguments: $ARGUMENTS (phase description)
Roadmap and state are resolved in-workflow via `init phase-op` and targeted tool calls.
</context>
<process>
**Follow the add-phase workflow** from `@~/.claude/get-shit-done/workflows/add-phase.md`.
The workflow handles all logic including:
1. Argument parsing and validation
2. Roadmap existence checking
3. Current milestone identification
4. Next phase number calculation (ignoring decimals)
5. Slug generation from description
6. Phase directory creation
7. Roadmap entry insertion
8. STATE.md updates
</process>

Some files were not shown because too many files have changed in this diff Show More