Commit Graph

2035 Commits

Author SHA1 Message Date
Jeremy McSpadden
0a049149e1 fix(sdk): decouple from build-from-source install, close #2441 #2453 (#2457)
* fix(sdk): decouple SDK from build-from-source install path, close #2441 and #2453

Ship sdk/dist prebuilt in the tarball and replace the npm-install-g
sub-install with a parent-package bin shim (bin/gsd-sdk.js). npm chmods
bin entries from a packed tarball correctly, eliminating the mode-644
failure (#2453) and the full class of NPM_CONFIG_PREFIX/ignore-scripts/
corepack/air-gapped failure modes that caused #2439 and #2441.

Changes:
- sdk/package.json: prepublishOnly runs `rm -rf dist && tsc && chmod +x
  dist/cli.js` (stale-build guard + execute-bit fix at publish time)
- package.json: add "gsd-sdk": "bin/gsd-sdk.js" bin entry; add sdk/dist
  to files so the prebuilt CLI ships in the tarball
- bin/gsd-sdk.js: new back-compat shim — resolves sdk/dist/cli.js relative
  to the package root and delegates via `node`, so all existing PATH call
  sites (slash commands, agents, hooks) continue to work unchanged (S1 shim)
- bin/install.js: replace installSdkIfNeeded() build-from-source + global-
  install dance with a dist-verify + chmod-in-place guard; delete
  resolveGsdSdk(), detectShellRc(), emitSdkFatal() helpers now unused
- .github/workflows/install-smoke.yml: add smoke-unpacked job that strips
  execute bit from sdk/dist/cli.js before install to reproduce the exact
  #2453 failure mode
- tests/bug-2441-sdk-decouple.test.cjs: new regression tests asserting all
  invariants (no npm install -g from sdk/, shim exists, sdk/dist in files,
  prepublishOnly has rm -rf + chmod)
- tests/bugs-1656-1657.test.cjs: update stale assertions that required
  build-from-source behavior (now asserts new prebuilt-dist invariants)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore(release): bump to 1.38.2, wire release.yml to build SDK dist

- Bump version 1.38.1 -> 1.38.2 for the #2441/#2453 fix shipped in 0f6903d.
- Add `build:sdk` script (`cd sdk && npm ci && npm run build`).
- `prepublishOnly` now runs hooks + SDK builds as a safety net.
- release.yml (rc + finalize): build SDK dist before `npm publish` so the
  published tarball always ships fresh `sdk/dist/` (kept gitignored).
- CHANGELOG: document 1.38.2 entry and `--sdk` flag semantics change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: build SDK dist before tests and smoke jobs

sdk/dist/ is gitignored (built fresh at publish time via release.yml),
but both the test suite and install-smoke jobs run `bin/install.js`
or `npm pack` against the checked-out tree where dist doesn't exist yet.

- test.yml: `npm run build:sdk` before `npm run test:coverage`, so tests
  that spawn `bin/install.js` don't hit `installSdkIfNeeded()`'s fatal
  missing-dist check.
- install-smoke.yml (both smoke and smoke-unpacked): build SDK before
  pack/chmod so the published tarball contains dist and the unpacked
  install has a file to strip exec-bit from.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(sdk): lift SDK runtime deps to parent so tarball install can resolve them

The SDK's runtime deps (ws, @anthropic-ai/claude-agent-sdk) live in
sdk/package.json, but sdk/node_modules is NOT shipped in the parent
tarball — only sdk/dist, sdk/src, sdk/prompts, and sdk/package.json are.
When a user runs `npm install -g get-shit-done-cc`, npm installs the
parent's node_modules but never runs `npm install` inside the nested
sdk/ directory.

Result: `node sdk/dist/cli.js` fails with ERR_MODULE_NOT_FOUND for 'ws'.
The smoke tarball job caught this; the unpacked variant masked it
because `npm install -g <dir>` copies the entire workspace including
sdk/node_modules (left over from `npm run build:sdk`).

Fix: declare the same deps in the parent package.json so they land in
<pkg>/node_modules, which Node's resolution walks up to from
<pkg>/sdk/dist/cli.js. Keep them declared in sdk/package.json too so
the SDK remains a self-contained package for standalone dev.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(lockfile): regenerate package-lock.json cleanly

The previous `npm install` run left the lockfile internally inconsistent
(resolved esbuild@0.27.7 referenced but not fully written), causing
`npm ci` to fail in CI with "Missing from lock file" errors.

Clean regen via rm + npm install fixes all three failed jobs
(test, smoke, smoke-unpacked), which were all hitting the same
`npm ci` sync check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(deps): remove unused esbuild + vitest from root devDependencies

Both were declared but never imported anywhere in the root package
(confirmed via grep of bin/, scripts/, tests/). They lived in sdk/
already, which is the only place they're actually used.

The transitive tree they pulled in (vitest → vite → esbuild 0.28 →
@esbuild/openharmony-arm64) was the root of the CI npm ci failures:
the openharmony platform package's `optional: true` flag was not being
applied correctly by npm 10 on Linux runners, causing EBADPLATFORM.

After removal: 800+ transitive packages → 155. Lockfile regenerated
cleanly. All 4170 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(sdk): pretest:coverage builds sdk; tighten shim test assertions

Add "pretest:coverage": "npm run build:sdk" so npm run test:coverage
works in clean checkouts where sdk/dist/ hasn't been built yet.

Tighten the two loose shim assertions in bug-2441-sdk-decouple.test.cjs:
- forwards-to test now asserts path.resolve() is called with the
  'sdk','dist','cli.js' path segments, not just substring presence
- node-invocation test now asserts spawnSync(process.execPath, [...])
  pattern, ruling out matches in comments or the shebang line

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: address PR review — pretest:coverage + tighten shim tests

Review feedback from trek-e on PR 2457:

1. pretest:coverage + pretest hooks now run `npm run build:sdk` so
   `npm run test[:coverage]` in a clean checkout produces the required
   sdk/dist/ artifacts before running the installer-dependent tests.
   CI already does this explicitly; local contributors benefit.

2. Shim tests in bug-2441-sdk-decouple.test.cjs tightened from loose
   substring matches (which would pass on comments/shebangs alone) to
   regex assertions on the actual path.resolve call, spawnSync with
   process.execPath, process.argv.slice(2), and process.exit pattern.
   These now provide real regression protection for #2453-class bugs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: correct CHANGELOG entry and add [1.38.2] reference link

Two issues in the 1.38.2 CHANGELOG entry:
- installSdkIfNeeded() was described as deleted but it still exists in
  bin/install.js (repurposed to verify sdk/dist/cli.js and fix execute bit).
  Corrected the description to say 'repurposes' rather than 'deletes'.
- The reference-link block at the bottom of the file was missing a [1.38.2]
  compare URL and [Unreleased] still pointed to v1.37.1...HEAD. Added the
  [1.38.2] link and updated [Unreleased] to compare/v1.38.2...HEAD.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(sdk): double-cast WorkflowConfig to Record for strict tsc build

TypeScript error on main (introduced in #2611) blocks `npm run build`
in sdk/, which now runs as part of this PR's tarball build path. Apply
the double-cast via `unknown` as the compiler suggests.

Same fix as #2622; can be dropped if that lands first.

* test: remove bug-2598 test obsoleted by SDK decoupling

The bug-2598 test guards the Windows CVE-2024-27980 fix in the old
build-from-source path (npm spawnSync with shell:true + formatSpawnFailure
diagnostics). This PR removes that entire code path — installSdkIfNeeded
no longer spawns npm, it just verifies the prebuilt sdk/dist/cli.js
shipped in the tarball.

The test asserts `installSdkIfNeeded.toString()` contains a
formatSpawnFailure helper. After decoupling, no such helper exists
(nothing to format — there's no spawn). Keeping the test would assert
invariants of the rejected architecture.

The original #2598 defect (silent failure of npm spawn on Windows) is
structurally impossible in the shim path: bin/gsd-sdk.js invokes
`node sdk/dist/cli.js` directly via child_process.spawn with an
explicit argv array. No .cmd wrapper, no shell delegation.

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Tom Boucher <trekkie@nomorestars.com>
2026-04-23 08:36:03 -04:00
Tom Boucher
a56707a07b fix(#2613): preserve STATE.md frontmatter on write path (option 2) (#2622)
* fix(#2613): preserve STATE.md frontmatter on write path (option 2)

`readModifyWriteStateMd` strips frontmatter before invoking the modifier,
so `syncStateFrontmatter` received body-only content and `existingFm`
was always `{}`. The preservation branch never fired, and every mutation
re-derived `status` (to `'unknown'` when body had no `Status:` line) and
`progress.*` (to 0/0 when the shipped milestone's phase directories were
archived), silently overwriting authoritative frontmatter values.

Option 2 — write-side analogue of #2495 READ fix: `buildStateFrontmatter`
reads the current STATE.md frontmatter from disk as a preservation
backstop. Status preserved when derived is `'unknown'` and existing is
non-unknown. Progress preserved when disk scan returns all zeros AND
existing has non-zero counts. Legitimate body-driven status changes and
non-zero disk counts still win.

Milestone/milestone_name already preserved via `getMilestoneInfo`'s
#2495 fix — regression test added to lock that in.

Adds 5 regression tests covering status preservation, progress
preservation, milestone preservation, legitimate status updates, and
disk-scan-wins-when-non-zero.

Closes #2613

* fix(sdk): double-cast WorkflowConfig to Record in loadGateConfig

TypeScript error on main (introduced in #2611) blocks the install-smoke
CI job: `WorkflowConfig` has no string index signature, so the direct
cast to `Record<string, unknown>` fails type-check. The SDK build fails,
`installSdkIfNeeded()` cannot install `gsd-sdk` from source, and the
smoke job reports a false-positive installer regression.

  src/query/check-decision-coverage.ts(236,16): error TS2352:
  Conversion of type 'WorkflowConfig' to type 'Record<string, unknown>'
  may be a mistake because neither type sufficiently overlaps with the
  other.

Apply the double-cast via `unknown` as the compiler suggests. Behavior
is unchanged — this was already a cast.
2026-04-23 08:22:42 -04:00
Tom Boucher
f30da8326a feat: add gates ensuring discuss-phase decisions are translated to plans and verified (closes #2492) (#2611)
* feat(#2492): add gates ensuring discuss-phase decisions are translated and verified

Two gates close the loop between CONTEXT.md `<decisions>` and downstream
work, fixing #2492:

- Plan-phase **translation gate** (BLOCKING). After requirements
  coverage, refuses to mark a phase planned when a trackable decision
  is not cited (by id `D-NN` or by 6+-word phrase) in any plan's
  `must_haves`, `truths`, or body. Failure message names each missed
  decision with id, category, text, and remediation paths.

- Verify-phase **validation gate** (NON-BLOCKING). Searches plans,
  SUMMARY.md, files modified, and recent commit subjects for each
  trackable decision. Misses are written to VERIFICATION.md as a
  warning section but do not change verification status. Asymmetry is
  deliberate — fuzzy-match miss should not fail an otherwise green
  phase.

Shared helper `parseDecisions()` lives in `sdk/src/query/decisions.ts`
so #2493 can consume the same parser.

Decisions opt out of both gates via `### Claude's Discretion` heading
or `[informational]` / `[folded]` / `[deferred]` tags.

Both gates skip silently when `workflow.context_coverage_gate=false`
(default `true`).

Closes #2492

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(#2492): make plan-phase decision gate actually block (review F1, F8, F9, F10, F15)

- F1: replace `${context_path}` with `${CONTEXT_PATH}` in the plan-phase
  gate snippet so the BLOCKING gate receives a non-empty path. The
  variable was defined in Step 4 (`CONTEXT_PATH=$(_gsd_field "$INIT" ...)`)
  and the gate snippet referenced the lowercase form, leaving the gate to
  run with an empty path argument and silently skip.
- F15: wrap the SDK call with `jq -e '.data.passed == true' || exit 1` so
  failure halts the workflow instead of being printed and ignored. The
  verify-phase counterpart deliberately keeps no exit-1 (non-blocking by
  design) and now carries an inline note documenting the asymmetry.
- F10: tag the JSON example fence as `json` and the options-list fence as
  `text` (MD040).
- F8/F9: anchor the heading-presence test regexes to `^## 13[a-z]?\\.` so
  prose substrings like "Requirements Coverage Gate" mentioned in body
  text cannot satisfy the assertion. Added two new regression tests
  (variable-name match, exit-1 guard) so a future revert is caught.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(#2492): tighten decision-coverage gates against false positives and config drift (review F3,F4,F5,F6,F7,F16,F18,F19)

- F3: forward `workstream` arg through both gate handlers so workstream-scoped
  `workflow.context_coverage_gate=false` actually skips. Added negative test
  that creates a workstream config disabling the gate while the root config
  has it enabled and asserts the workstream call is skipped.
- F4: restrict the plan-phase haystack to designated sections — front-matter
  `must_haves` / `truths` / `objective` plus body sections under headings
  matching `must_haves|truths|tasks|objective`. HTML comments and fenced
  code blocks are stripped before extraction so a commented-out citation or
  a literal example never counts as coverage. Verify-phase keeps the broader
  artifact-wide haystack by design (non-blocking).
- F5: reject decisions with fewer than 6 normalized words from soft-matching
  (previously only rejected when the resulting phrase was under 12 chars
  AFTER slicing — too lenient). Short decisions now require an explicit
  `D-NN` citation, with regression tests for the boundary.
- F6: walk every `*-SUMMARY.md` independently and use `matchAll` with the
  `/g` flag so multiple `files_modified:` blocks across multiple summaries
  are all aggregated. Previously only the first block in the concatenated
  string was parsed, silently dropping later plans' files.
- F7: validate every `files_modified` path stays inside `projectDir` after
  resolution (rejects absolute paths, `../` traversal). Cap each file read
  at 256 KB. Skipped paths emit a stderr warning naming the entry.
- F16: validate `workflow.context_coverage_gate` is boolean in
  `loadGateConfig`; warn loudly on numeric or other-shaped values and
  default to ON. Mirrors the schema-vs-loadConfig validation gap from
  #2609.
- F18: bump verify-phase `git log -n` cap from 50 to 200 so longer-running
  phases are not undercounted. Documented as a precision-vs-recall tradeoff
  appropriate for a non-blocking gate.
- F19: tighten `QueryResult` / `QueryHandler` to be parameterized
  (`<T = unknown>`). Drops the `as unknown as Record<string, unknown>`
  casts in the gate handlers and surfaces shape mismatches at compile time
  for callers that pass a typed `data` value.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(#2492): harden decisions parser and verify-phase glob (review F11,F12,F13,F14,F17,F20)

- F11: strip fenced code blocks from CONTEXT.md before searching for
  `<decisions>` so an example block inside ``` ``` is not mis-parsed.
- F12: accept tab-indented continuation lines (previously required a leading
  space) so decisions split with `\t` continue cleanly.
- F13: parse EVERY `<decisions>` block in the file via `matchAll`, not just
  the first. CONTEXT.md may legitimately carry more than one block.
- F14: `decisions.parse` handler now resolves a relative path against
  `projectDir` — symmetric with the gate handlers — and still accepts
  absolute paths.
- F17: replace `ls "${PHASE_DIR}"/*-CONTEXT.md | head -1` in verify-phase.md
  with a glob loop (ShellCheck SC2012 fix). Also avoids spawning an extra
  subprocess and survives filenames with whitespace.
- F20: extend the unicode quote-stripping in the discretion-heading match
  to cover U+2018/2019/201A/201B and the U+201C-F double-quote variants
  plus backtick, so any rendering of "Claude's Discretion" collapses to
  the same key.

Each fix has a regression test in `decisions.test.ts`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 00:26:53 -04:00
Tom Boucher
1a3d953767 feat: add unified post-planning gap checker (closes #2493) (#2610)
* feat: add unified post-planning gap checker (closes #2493)

Adds a unified post-planning gap checker as Step 13e of plan-phase.md.
After all plans are generated and committed, scans REQUIREMENTS.md and
CONTEXT.md <decisions> against every PLAN.md in the phase directory and
emits a single Source | Item | Status table.

Why
- The existing Requirements Coverage Gate (§13) blocks/re-plans on REQ
  gaps but emits two separate per-source signals. Issue #2493 asks for
  one unified report after planning so that requirements AND
  discuss-phase decisions slipping through are surfaced in one place
  before execution starts.

What
- New workflow.post_planning_gaps boolean config key, default true,
  added to VALID_CONFIG_KEYS, CONFIG_DEFAULTS, hardcoded.workflow, and
  cmdConfigSet (boolean validation).
- New get-shit-done/bin/lib/decisions.cjs — shared parser for
  CONTEXT.md <decisions> blocks (D-NN entries). Designed for reuse by
  the related #2492 plan/verify decision gates.
- New get-shit-done/bin/lib/gap-checker.cjs — parses REQUIREMENTS.md
  (checkbox + traceability table forms), reads CONTEXT.md decisions,
  walks PHASE_DIR/*-PLAN.md, runs word-boundary coverage detection
  (REQ-1 must not match REQ-10), formats a sorted report.
- New gsd-tools gap-analysis CLI command wired through gsd-tools.cjs.
- workflows/plan-phase.md gains §13e between §13d (commit plans) and
  §14 (Present Final Status). Existing §13 gate preserved — §13e is
  additive and non-blocking.
- sdk/prompts/workflows/plan-phase.md gets an equivalent
  post_planning_gaps step for headless mode.
- Docs: CONFIGURATION.md, references/planning-config.md, INVENTORY.md,
  INVENTORY-MANIFEST.json all updated.

Tests
- tests/post-planning-gaps-2493.test.cjs: 30 test cases covering step
  insertion position, decisions parser, gap detector behavior
  (covered/not-covered, false-positive guard, missing-file
  resilience, malformed-input resilience, gate on/off, deterministic
  natural sort), and full config integration.
- Full suite: 5234 / 5234 pass.

Design decisions
- Numbered §13e (sub-step), not §14 — §14 already exists (Present
  Final Status); inserting before it preserves downstream auto-advance
  step numbers.
- Existing §13 gate kept, not replaced — §13 blocks/re-plans on
  REQ gaps; §13e is the unified post-hoc report. Per spec: "default
  behavior MUST be backward compatible."
- Word-boundary ID matching avoids REQ-1 matching REQ-10 and avoids
  brittle semantic/substring matching.
- Shared decisions.cjs parser so #2492 can reuse the same regex.
- Natural-sort keys (REQ-02 before REQ-10) for deterministic output.
- Boolean validation in cmdConfigSet rejects non-boolean values
  matches the precedent set by drift_threshold/drift_action.

Closes #2493

* fix(#2493): expose post_planning_gaps in loadConfig() + sync schema example

Address CodeRabbit review on PR #2610:

- core.cjs loadConfig(): return post_planning_gaps from both the
  config.json branch and the global ~/.gsd/defaults.json fallback so
  callers can rely on config.post_planning_gaps regardless of whether
  the key is present (comment 3127977404, Major).
- docs/CONFIGURATION.md: add workflow.post_planning_gaps to the Full
  Schema JSON example so copy/paste users see the new toggle alongside
  security_block_on (comment 3127977392, Minor).
- tests/post-planning-gaps-2493.test.cjs: regression coverage for
  loadConfig() — default true when key absent, honors explicit
  true/false from workflow.post_planning_gaps.
2026-04-22 23:03:59 -04:00
Tom Boucher
cc17886c51 feat: make model profiles runtime-aware for Codex/non-Claude runtimes (closes #2517) (#2609)
* feat: make model profiles runtime-aware for Codex/non-Claude runtimes (closes #2517)

Adds an optional top-level `runtime` config key plus a
`model_profile_overrides[runtime][tier]` map. When `runtime` is set,
profile tiers (opus/sonnet/haiku) resolve to runtime-native model IDs
(and reasoning_effort where supported) instead of bare Claude aliases.

Codex defaults from the spec:
  opus   -> gpt-5.4        reasoning_effort: xhigh
  sonnet -> gpt-5.3-codex  reasoning_effort: medium
  haiku  -> gpt-5.4-mini   reasoning_effort: medium

Claude defaults mirror MODEL_ALIAS_MAP. Unknown runtimes fall back to
the Claude-alias safe default rather than emit IDs the runtime cannot
accept. reasoning_effort is only emitted into Codex install paths;
never returned from resolveModelInternal and never written to Claude
agent frontmatter.

Backwards compatible: any user without `runtime` set sees identical
behavior — the new branch is gated on `config.runtime != null`.

Precedence (highest to lowest):
  1. per-agent model_overrides
  2. runtime-aware tier resolution (when `runtime` is set)
  3. resolve_model_ids: "omit"
  4. Claude-native default
  5. inherit (literal passthrough)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(#2517): address adversarial review of #2609 (findings 1-16)

Addresses all 16 findings from the adversarial review of PR #2609.
Each finding is enumerated below with its resolution.

CRITICAL
- F1: readGsdRuntimeProfileResolver(targetDir) now probes per-project
  .planning/config.json AND ~/.gsd/defaults.json with per-project winning,
  so the PR's headline claim ("set runtime in project config and Codex
  TOML emit picks it up") actually holds end-to-end.
- F2: resolveTierEntry field-merges user overrides with built-in defaults.
  The CONFIGURATION.md string-shorthand example
    `{ codex: { opus: "gpt-5-pro" } }`
  now keeps reasoning_effort from the built-in entry. Partial-object
  overrides like `{ opus: { reasoning_effort: 'low' } }` keep the
  built-in model. Both paths regression-tested.

MAJOR
- F3: resolveReasoningEffortInternal gates strictly on the
  RUNTIMES_WITH_REASONING_EFFORT allowlist regardless of override
  presence. Override + unknown-runtime no longer leaks reasoning_effort.
- F4: runtime:"claude" is now a no-op for resolution (it is the implicit
  default). It no longer hijacks resolve_model_ids:"omit". Existing
  tests for `runtime:"claude"` returning Claude IDs were rewritten to
  reflect the no-op semantics; new test asserts the omit case returns "".
- F5: _readGsdConfigFile in install.js writes a stderr warning on JSON
  parse failure instead of silently returning null. Read failure and
  parse failure are warned separately. Library require is hoisted to top
  of install.js so it is not co-mingled with config-read failure modes.
- F6: install.js requires for core.cjs / model-profiles.cjs are hoisted
  to the top of the file with __dirname-based absolute paths so global
  npm install works regardless of cwd. Test asserts both lib paths exist
  relative to install.js __dirname.
- F7: docs/CONFIGURATION.md `runtime` row no longer lists `opencode` as
  a valid runtime — install-path emission for non-Codex runtimes is
  explicitly out of scope per #2517 / #2612, and the doc now points at
  #2612 for the follow-on work. resolveModelInternal still accepts any
  runtime string (back-compat) and falls back safely for unknown values.
- F8: Tests now isolate HOME (and GSD_HOME) to a per-test tmpdir so the
  developer's real ~/.gsd/defaults.json cannot bleed into assertions.
  Same pattern CodeRabbit caught on PRs #2603 / #2604.
- F9: `runtime` and `model_profile_overrides` documented as flat-only
  in core.cjs comments — not routed through `get()` because they are
  top-level keys per docs/CONFIGURATION.md and introducing nested
  resolution for two new keys was not worth the edge-case surface.
- F10/F13: loadConfig now invokes _warnUnknownProfileOverrides on the
  raw parsed config so direct .planning/config.json edits surface
  unknown runtime values (e.g. typo `runtime: "codx"`) and unknown
  tier values (e.g. `model_profile_overrides.codex.banana`) at read
  time. Warnings only — preserves back-compat for runtimes added
  later. Per-process warning cache prevents log spam across repeated
  loadConfig calls.

MINOR / NIT
- F11: Removed dead `tier || 'sonnet'` defensive shortcut. The local
  is now `const alias = tier;` with a comment explaining why `tier`
  is guaranteed truthy at that point (every MODEL_PROFILES entry
  defines `balanced`, the fallback profile).
- F12: Extracted resolveTierEntry() in core.cjs as the single source
  of truth for runtime-aware tier resolution. core.cjs and bin/install.js
  both consume it — no duplicated lookup logic between the two files.
- F14: Added regression tests for findings #1, #2, #3, #4, #6, #10, #13
  in tests/issue-2517-runtime-aware-profiles.test.cjs. Each must-fix
  path has a corresponding test that fails against the pre-fix code
  and passes against the post-fix code.
- F15: docs/CONFIGURATION.md `model_profile` row cross-references
  #1713 / #1806 next to the `adaptive` enum value.
- F16: RUNTIME_PROFILE_MAP remains in core.cjs as the single source of
  truth; install.js imports it through the exported resolveTierEntry
  helper rather than carrying its own copy. Doc files (CONFIGURATION.md,
  USER-GUIDE.md, settings.md) intentionally still embed the IDs as text
  — code comment in core.cjs flags that those doc files must be updated
  whenever the constant changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 23:00:37 -04:00
Tom Boucher
41dc475c46 refactor(workflows): extract discuss-phase modes/templates/advisor for progressive disclosure (closes #2551) (#2607)
* refactor(workflows): extract discuss-phase modes/templates/advisor for progressive disclosure (closes #2551)

Splits 1,347-line workflows/discuss-phase.md into a 495-line dispatcher plus
per-mode files in workflows/discuss-phase/modes/ and templates in
workflows/discuss-phase/templates/. Mirrors the progressive-disclosure
pattern that #2361 enforced for agents.

- Per-mode files: power, all, auto, chain, text, batch, analyze, default, advisor
- Templates lazy-loaded at the step that produces the artifact (CONTEXT.md
  template at write_context, DISCUSSION-LOG.md template at git_commit,
  checkpoint.json schema when checkpointing)
- Advisor mode gated behind `[ -f $HOME/.claude/get-shit-done/USER-PROFILE.md ]`
  — inverse of #2174's --advisor flag (don't pay the cost when unused)
- scout_codebase phase-type→map selection table extracted to
  references/scout-codebase.md
- New tests/workflow-size-budget.test.cjs enforces tiered budgets across
  all workflows/*.md (XL=1700 / LARGE=1500 / DEFAULT=1000) plus the
  explicit <500 ceiling for discuss-phase.md per #2551
- Existing tests updated to read from the new file locations after the
  split (functional equivalence preserved — content moved, not removed)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(#2607): align modes/auto.md check_existing with parent (Update it, not Skip)

CodeRabbit flagged drift between the parent step (which auto-selects "Update
it") and modes/auto.md (which documented "Skip"). The pre-refactor file had
both — line 182 said "Skip" in the overview, line 250 said "Update it" in the
actual step. The step is authoritative. Fix the new mode file to match.

Refs: PR #2607 review comment 3127783430

* test(#2607): harden discuss-phase regression tests after #2551 split

CodeRabbit identified four test smells where the split weakened coverage:

- workflow-size-budget: assertion was unreachable (entered if-block on match,
  then asserted occurrences === 0 — always failed). Now unconditional.
- bug-2549-2550-2552: bounded-read assertion checked concatenated source, so
  src.includes('3') was satisfied by unrelated content in scout-codebase.md
  (e.g., "3-5 most relevant files"). Now reads parent only with a stricter
  regex. Also asserts SCOUT_REF exists.
- chain-flag-plan-phase: filter(existsSync) silently skipped a missing
  modes/chain.md. Now fails loudly via explicit asserts.
- discuss-checkpoint: same silent-filter pattern across three sources. Now
  asserts each required path before reading.

Refs: PR #2607 review comments 3127783457, 3127783452, plus nitpicks for
chain-flag-plan-phase.test.cjs:21-24 and discuss-checkpoint.test.cjs:22-27

* docs(#2607): fix INVENTORY count, context.md placeholders, scout grep portability

- INVENTORY.md: subdirectory note said "50 top-level references" but the
  section header now says 51. Updated to 51.
- templates/context.md: footer hardcoded XX-name instead of declared
  placeholders [X]/[Name], which would leak sample text into generated
  CONTEXT.md files. Now uses the declared placeholders.
- references/scout-codebase.md: no-maps fallback used grep -rl with
  "\\|" alternation (GNU grep only — silent on BSD/macOS grep). Switched
  to grep -rlE with extended regex for portability.

Refs: PR #2607 review comments 3127783404, 3127783448, plus nitpick for
scout-codebase.md:32-40

* docs(#2607): label fenced examples + clarify overlay/advisor precedence

- analyze.md / text.md / default.md: add language tags (markdown/text) to
  fenced example blocks to silence markdownlint MD040 warnings flagged by
  CodeRabbit (one fence in analyze.md, two in text.md, five in default.md).
- discuss-phase.md: document overlay stacking rules in discuss_areas — fixed
  outer→inner order --analyze → --batch → --text, with a pointer to each
  overlay file for mode-specific precedence.
- advisor.md: add tie-breaker rules for NON_TECHNICAL_OWNER signals — explicit
  technical_background overrides inferred signals; otherwise OR-aggregate;
  contradictory explanation_depth values resolve by most-recent-wins.

Refs: PR #2607 review comments 3127783415, 3127783437, plus nitpicks for
default.md:24, discuss-phase.md:345-365, and advisor.md:51-56

* fix(#2607): extract codebase_drift_gate body to keep execute-phase under XL budget

PR #2605 added 80 lines to execute-phase.md (1622 -> 1702), pushing it over
the XL_BUDGET=1700 line cap enforced by tests/workflow-size-budget.test.cjs
(introduced by this PR). Per the test's own remediation hint and #2551's
progressive-disclosure pattern, extract the codebase_drift_gate step body to
get-shit-done/workflows/execute-phase/steps/codebase-drift-gate.md and leave
a brief pointer in the workflow. execute-phase.md is now 1633 lines.

Budget is NOT relaxed; the offending workflow is tightened.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 21:57:24 -04:00
Tom Boucher
220da8e487 feat: /gsd-settings-integrations — configure third-party search and review integrations (closes #2529) (#2604)
* feat(#2529): /gsd-settings-integrations — third-party integrations command

Adds /gsd-settings-integrations for configuring API keys, code-review CLI
routing, and agent-skill injection. Distinct from /gsd-settings (workflow
toggles) because these are connectivity, not pipeline shape.

Three sections:
- Search Integrations: brave_search / firecrawl / exa_search API keys,
  plus search_gitignored toggle.
- Code Review CLI Routing: review.models.{claude,codex,gemini,opencode}
  shell-command strings.
- Agent Skills Injection: agent_skills.<agent-type> free-text input,
  validated against [a-zA-Z0-9_-]+.

Security:
- New secrets.cjs module with ****<last-4> masking convention.
- cmdConfigSet now masks value/previousValue in CLI output for secret keys.
- Plaintext is written only to .planning/config.json; never echoed to
  stdout/stderr, never written to audit/log files by this flow.
- Slug validators reject path separators, whitespace, shell metacharacters.

Tests (tests/settings-integrations.test.cjs — 25 cases):
- Artifact presence / frontmatter.
- Field round-trips via gsd-tools config-set for all four search keys,
  review.models.<cli>, agent_skills.<agent-type>.
- Config-merge safety: unrelated keys preserved across writes.
- Masking: config-set output never contains plaintext sentinel.
- Logging containment: plaintext secret sentinel appears only in
  config.json under .planning/, nowhere else on disk.
- Negative: path-traversal, shell-metachar, and empty-slug rejected.
- /gsd:settings workflow mentions /gsd:settings-integrations.

Docs:
- docs/COMMANDS.md: new command entry with security note.
- docs/CONFIGURATION.md: integration settings section (keys, routing,
  skills injection) with masking documentation.
- docs/CLI-TOOLS.md: reviewer CLI routing and secret-handling sections.
- docs/INVENTORY.md + INVENTORY-MANIFEST.json regenerated.

Closes #2529

* fix(#2529): mask secrets in config-get; address CodeRabbit review

cmdConfigGet was emitting plaintext for brave_search/firecrawl/exa_search.
Apply the same isSecretKey/maskSecret treatment used by config-set so the
CLI surface never echoes raw API keys; plaintext still lives only in
config.json on disk.

Also addresses CodeRabbit review items in the same PR area:
- #3127146188: config-get plaintext leak (root fix above)
- #3127146211: rename test sentinels to concat-built markers so secret
  scanners stop flagging the test file. Behavior preserved.
- #3127146207: add explicit 'text' language to fenced code blocks (MD040).
- nitpick: unify masked-value wording in read_current legend
  ('****<last-4>' instead of '**** already set').
- nitpick: extend round-trip test to cover search_gitignored toggle.

New regression test 'config-get masks secrets and never echoes plaintext'
verifies the fix for all three secret keys.

* docs(#2529): bump INVENTORY counts post-rebase (commands 84→85, workflows 82→83)

* fix(test): bump CLI Modules count 27→28 after rebase onto main (CI #24811455435)

PR #2604 was rebased onto main before #2605 (drift.cjs) merged. The
pull_request CI runs against the merge ref (refs/pull/2604/merge),
which now contains 28 .cjs files in get-shit-done/bin/lib/, but
docs/INVENTORY.md headline still said "(27 shipped)".

inventory-counts.test.cjs failed with:
  AssertionError: docs/INVENTORY.md "CLI Modules (27 shipped)" disagrees
  with get-shit-done/bin/lib/ file count (28)

Rebased branch onto current origin/main (picks up drift.cjs row, which
was already added by #2605) and bumped the headline to 28.

Full suite: 5200/5200 pass.
2026-04-22 21:41:00 -04:00
Tom Boucher
c90081176d fix(#2598): pass shell: true to npm spawnSync on Windows (#2600)
* fix(#2598): pass shell: true to npm spawnSync on Windows

Since Node's CVE-2024-27980 fix (>= 18.20.2 / >= 20.12.2 / >= 21.7.3),
spawnSync refuses to launch .cmd/.bat files on Windows without
`shell: true`. installSdkIfNeeded picks npmCmd='npm.cmd' on win32 and
then calls spawnSync five times — every one returns
{ status: null, error: EINVAL } before npm ever runs. The installer
checks `status !== 0`, trips the failure path, and emits a bare
"Failed to `npm install` in sdk/." with zero diagnostic output because
`stdio: 'inherit'` never had a child to stream.

Every fresh install on Windows has failed at the SDK build step on any
supported Node version for the life of the post-CVE bin/install.js.

Introduce a local `spawnNpm(args, opts)` helper inside
installSdkIfNeeded that injects `shell: process.platform === 'win32'`
when the caller doesn't override it. Route all five npm invocations
through it: `npm install`, `npm run build`, `npm install -g .`, and
both `npm config get prefix` calls.

Adds a static regression test that parses installSdkIfNeeded and
asserts no bare `spawnSync(npmCmd, ...)` remains, a shell-aware
wrapper exists, and at least five invocations go through it.

Closes #2598

* fix(#2598): surface spawnSync diagnostics in SDK install fatal paths

Thread result.error / result.signal / result.status into emitSdkFatal for
the three npm failure branches (install, run build, install -g .) via a
formatSpawnFailure helper. The root cause of #2598 went silent precisely
because `{ status: null, error: EINVAL }` was reduced to a generic
"Failed to `npm install` in sdk/." with no diagnostic — stdio: 'inherit'
had no child process to stream and result.error was swallowed. Any future
regression in the same area (EINVAL, ENOENT, signal termination) now
prints its real cause in the red fatal banner.

Also strengthen the regression test so it cannot pass with only four
real npm call sites: the previous `spawnSync(npmCmd, ..., shell)` regex
double-counted the spawnNpm helper's own body when a helper existed.
Separate arrow-form vs function-form helper detection and exclude the
wrapper body from explicitShellNpm so the `>= 5` assertion reflects real
invocations only. Add a new test that asserts all three fatal branches
now reference formatSpawnFailure / result.error / signal / status.

Addresses CodeRabbit review comments on PR #2600:
- r3126987409 (bin/install.js): surface underlying spawnSync failure
- r3126987419 (test): explicitShellNpm overcounts by one via helper def
2026-04-22 21:23:44 -04:00
Tom Boucher
1a694fcac3 feat: auto-remap codebase after significant phase execution (closes #2003) (#2605)
* feat: auto-remap codebase after significant phase execution (#2003)

Adds a post-phase structural drift detector that compares the committed tree
against `.planning/codebase/STRUCTURE.md` and either warns or auto-remaps
the affected subtrees when drift exceeds a configurable threshold.

## Summary
- New `bin/lib/drift.cjs` — pure detector covering four drift categories:
  new directories outside mapped paths, new barrel exports at
  `(packages|apps)/*/src/index.*`, new migration files, and new route
  modules. Prioritizes the most-specific category per file.
- New `verify codebase-drift` CLI subcommand + SDK handler, registered as
  `gsd-sdk query verify.codebase-drift`.
- New `codebase_drift_gate` step in `execute-phase` between
  `schema_drift_gate` and `verify_phase_goal`. Non-blocking by contract —
  any error logs and the phase continues.
- Two new config keys: `workflow.drift_threshold` (int, default 3) and
  `workflow.drift_action` (`warn` | `auto-remap`, default `warn`), with
  enum/integer validation in `config-set`.
- `gsd-codebase-mapper` learns an optional `--paths <p1,p2,...>` scope hint
  for incremental remapping; agent/workflow docs updated.
- `last_mapped_commit` lives in YAML frontmatter on each
  `.planning/codebase/*.md` file; `readMappedCommit`/`writeMappedCommit`
  round-trip helpers ship in `drift.cjs`.

## Tests
- 55 new tests in `tests/drift-detection.test.cjs` covering:
  classification, threshold gating at 2/3/4 elements, warn vs. auto-remap
  routing, affected-path scoping, `--paths` sanitization (traversal,
  absolute, shell metacharacter rejection), frontmatter round-trip,
  defensive paths (missing STRUCTURE.md, malformed input, non-git repos),
  CLI JSON output, and documentation parity.
- Full suite: 5044 pass / 0 fail.

## Documentation
- `docs/CONFIGURATION.md` — rows for both new keys.
- `docs/ARCHITECTURE.md` — section on the post-execute drift gate.
- `docs/AGENTS.md` — `--paths` flag on `gsd-codebase-mapper`.
- `docs/USER-GUIDE.md` — user-facing behavior note + toggle commands.
- `docs/FEATURES.md` — new 27a section with REQ-DRIFT-01..06.
- `docs/INVENTORY.md` + `docs/INVENTORY-MANIFEST.json` — drift.cjs listed.
- `get-shit-done/workflows/execute-phase.md` — `codebase_drift_gate` step.
- `get-shit-done/workflows/map-codebase.md` — `parse_paths_flag` step.
- `agents/gsd-codebase-mapper.md` — `--paths` directive under parse_focus.

## Design decisions
- **Frontmatter over sidecar JSON** for `last_mapped_commit`: keeps the
  baseline attached to the file, survives git moves, survives per-doc
  regeneration, no extra file lifecycle.
- **Substring match against STRUCTURE.md** for `isPathMapped`: the map is
  free-form markdown, not a structured manifest; any mention of a path
  prefix counts as "mapped territory". Cheap, no parser, zero false
  negatives on reasonable maps.
- **Category priority migration > route > barrel > new_dir** so a file
  matching multiple rules counts exactly once at the most specific level.
- **Empty-tree SHA fallback** (`4b825dc6…`) when `last_mapped_commit` is
  absent — semantically correct (no baseline means everything is drift)
  and deterministic across repos.
- **Four layers of non-blocking** — detector try/catch, CLI try/catch, SDK
  handler try/catch, and workflow `|| echo` shell fallback. Any single
  layer failing still returns a valid skipped result.
- **SDK handler delegates to `gsd-tools.cjs`** rather than re-porting the
  detector to TypeScript, keeping drift logic in one canonical place.

Closes #2003

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(mapper): tag --paths fenced block as text (CodeRabbit MD040)

Comment 3127255172.

* docs(config): use /gsd- dash command syntax in drift_action row (CodeRabbit)

Comment 3127255180. Matches the convention used by every other command
reference in docs/CONFIGURATION.md.

* fix(execute-phase): initialize AGENT_SKILLS_MAPPER + tag fenced blocks

Two CodeRabbit findings on the auto-remap branch of the drift gate:

- 3127255186 (must-fix): the mapper Task prompt referenced
  ${AGENT_SKILLS_MAPPER} but only AGENT_SKILLS (for gsd-executor) is
  loaded at init_context (line 72). Without this fix the literal
  placeholder string would leak into the spawned mapper's prompt.
  Add an explicit gsd-sdk query agent-skills gsd-codebase-mapper step
  right before the Task spawn.
- 3127255183: tag the warn-message and Task() fenced code blocks as
  text to satisfy markdownlint MD040.

* docs(map-codebase): wire PATH_SCOPE_HINT through every mapper prompt

CodeRabbit (review id 4158286952, comment 3127255190) flagged that the
parse_paths_flag step defined incremental-remap semantics but did not
inject a normalized variable into the spawn_agents and sequential_mapping
mapper prompts, so incremental remap could silently regress to a
whole-repo scan.

- Define SCOPED_PATHS / PATH_SCOPE_HINT in parse_paths_flag.
- Inject ${PATH_SCOPE_HINT} into all four spawn_agents Task prompts.
- Document the same scope contract for sequential_mapping mode.

* fix(drift): writeMappedCommit tolerates missing target file

CodeRabbit (review id 4158286952, drift.cjs:349-355 nitpick) noted that
readMappedCommit returns null on ENOENT but writeMappedCommit threw — an
asymmetry that breaks first-time stamping of a freshly produced doc that
the caller has not yet written.

- Catch ENOENT on the read; treat absent file as empty content.
- Add a regression test that calls writeMappedCommit on a non-existent
  path and asserts the file is created with correct frontmatter.
  Test was authored to fail before the fix (ENOENT) and passes after.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 21:21:44 -04:00
Tom Boucher
9c0a153a5f feat: /gsd-settings-advanced — power-user config tuning command (closes #2528) (#2603)
* feat: /gsd-settings-advanced — power-user config tuning command (closes #2528)

Adds a second-tier interactive configuration command covering the power-user
knobs that don't belong in the common-case /gsd-settings prompt. Six sectioned
AskUserQuestion batches cover planning, execution, discussion, cross-AI, git,
and runtime settings (19 config keys total). Current values are pre-selected;
numeric fields reject non-numeric input; writes route through
gsd-sdk query config-set so unrelated keys are preserved.

- commands/gsd/settings-advanced.md — command entry
- get-shit-done/workflows/settings-advanced.md — six-section workflow
- get-shit-done/workflows/settings.md — advertise advanced command
- get-shit-done/bin/lib/config-schema.cjs — add context_window to VALID_CONFIG_KEYS
- docs/COMMANDS.md, docs/CONFIGURATION.md, docs/INVENTORY.md — docs + inventory
- tests/gsd-settings-advanced.test.cjs — 81 tests (files, frontmatter,
  field coverage, pre-selection, merge-preserves-siblings, VALID_CONFIG_KEYS
  membership, confirmation table, /gsd-settings cross-link, negative scenarios)

All 5073 tests pass; coverage 88.66% (>= 70% threshold).

* docs(settings-advanced): clarify per-field numeric bounds and label fenced blocks

Addresses CodeRabbit review on PR #2603:
- Numeric-input rule now states min is field-specific: plan_bounce_passes
  and max_discuss_passes require >= 1; other numeric fields accept >= 0.
  Resolves the inconsistency between the global rule and the field-level
  prompts (CodeRabbit comment 3127136557).
- Adds 'text' fence language to seven previously unlabeled code blocks in
  the workflow (six AskUserQuestion sections plus the confirmation banner)
  to satisfy markdownlint MD040 (CodeRabbit comment 3127136561).

* test(settings-advanced): tighten section assertion, fix misleading test name, add executable numeric-input coverage

Addresses CodeRabbit review on PR #2603:
- Required section list now asserts the full 'Runtime / Output' heading
  rather than the looser 'Runtime' substring (comment 3127136564).
- Renames the subagent_timeout coercion test to match the actual key
  under test (was titled 'context_window' but exercised
  workflow.subagent_timeout — comment 3127136573).
- Adds two executable behavioral tests at the config-set boundary
  (comment 3127136579):
  * Non-numeric input on a numeric key currently lands as a string —
    locks in that the workflow's AskUserQuestion re-prompt loop is the
    layer responsible for type rejection. If a future change adds CLI-side
    numeric validation, the assertion flips and the test surfaces it.
  * Numeric string on workflow.max_discuss_passes is coerced to Number —
    locks in the parser invariant for a second numeric key.
2026-04-22 20:50:15 -04:00
Tom Boucher
86c5863afb feat: add settings layers to /gsd-settings (Group A toggles) (closes #2527) (#2602)
* feat(#2527): add settings layers to /gsd:settings (Group A toggles)

Expand /gsd:settings from 14 to 22 settings, grouped into six visual
sections: Planning, Execution, Docs & Output, Features, Model & Pipeline,
Misc. Adds 8 new toggles:

  workflow.pattern_mapper, workflow.tdd_mode, workflow.code_review,
  workflow.code_review_depth (conditional on code_review=on),
  workflow.ui_review, commit_docs, intel.enabled, graphify.enabled

All 8 keys already existed in VALID_CONFIG_KEYS and docs/CONFIGURATION.md;
this wires them into the interactive flow, update_config write step,
~/.gsd/defaults.json persistence, and confirmation table.

Closes #2527

* test(#2527): tighten leaf-collision and rename mismatched negative test

Addresses CodeRabbit findings on PR #2602:

- comment 3127100796: leaf-only matching collapsed `intel.enabled` and
  `graphify.enabled` to a single `enabled` token, so one occurrence
  could satisfy both assertions. Replace with hasPathLike(), which
  requires each dotted segment to appear in order within a bounded
  window. Applied to both update_config and save_as_defaults blocks.

- comment 3127100798: the negative-test description claimed to verify
  invalid `code_review_depth` value rejection but actually exercised an
  unknown key path. Split into two suites with accurate names: one
  asserts settings.md constrains the depth options, the other asserts
  config-set rejects an unknown key path.

* docs(#2527): clarify resolved config path for /gsd-settings

Addresses CodeRabbit comment 3127100790 on PR #2602: the original line
implied a single `.planning/config.json` target, but settings updates
route to `.planning/workstreams/<active>/config.json` when a workstream
is active. Document both resolved paths so the merge target is
unambiguous.
2026-04-22 20:49:52 -04:00
Tom Boucher
1f2850c1a8 fix(#2597): expand dotted query tokens with trailing args (#2599)
resolveQueryArgv only expanded `init.execute-phase` → `init execute-phase`
when the tokens array had length 1. Argv like `init.execute-phase 1` has
length 2, skipped the expansion, and resolved to no registered handler.

All 50+ workflow files use the dotted form with arguments, so this broke
every non-argless query route (`init.execute-phase`, `state.update`,
`phase.add`, `milestone.complete`, etc.) at runtime.

Rename `expandSingleDottedToken` → `expandFirstDottedToken`: split only
the first token on its dots (guarding against `--` flags) and preserve
the tail as positional args. Identity comparison at the call site still
detects "no expansion" since we return the input array unchanged.

Adds regression tests for the three failure patterns reported:
`init.execute-phase 1`, `state.update status X`, `phase.add desc`.

Closes #2597
2026-04-22 17:30:08 -04:00
Tom Boucher
b35fdd51f3 Revert "feat(#2473): ship refuses to open PR when HANDOFF.json declares in-pr…" (#2596)
This reverts commit 7212cfd4de.
2026-04-22 12:57:12 -04:00
Fernando Castillo
7212cfd4de feat(#2473): ship refuses to open PR when HANDOFF.json declares in-progress work (#2553)
* feat(#2473): ship refuses to open PR when HANDOFF.json declares in-progress work

Add a preflight step to /gsd-ship that parses .planning/HANDOFF.json and
refuses to run git push + gh pr create when any remaining_tasks[].status
is not in the terminal set {done, cancelled, deferred_to_backend, wont_fix}.

Refusal names each blocking task and lists four resolutions (finish, mark
terminal, delete stale file, --force). Missing HANDOFF.json is a no-op so
projects that do not use /gsd-pause-work see no behavior change.

Also documents the terminal-statuses contract in references/artifact-types.md
and adds tests/ship-handoff-preflight.test.cjs to lock in the contract.

Closes #2473

* fix(#2473): capture node exit from $() so malformed HANDOFF.json hard-stops

Command substitution BLOCKING=$(node -e "...") discards the inner process
exit code, so a corrupted HANDOFF.json that fails JSON.parse would yield
empty BLOCKING and fall through silently to push_branch — the opposite of
what preflight is supposed to do.

Capture node's exit into HANDOFF_EXIT via $? immediately after the
assignment and branch on it. A non-zero exit is now a hard refusal with
the parser error printed on the preceding stderr line. --force does not
bypass this branch: if the file exists and can't be parsed, something is
wrong and the user should fix it (option 3 in the refusal message —
"Delete HANDOFF.json if it's stale" — still applies).

Verified with a tmp-dir simulation: captured exit 2, hard-stop fires
correctly on malformed JSON. Added a test case asserting the capture
($?) + branch (-ne 0) + parser exit (process.exit(2)) are all present,
so a future refactor can't silently reintroduce the bug.

Reported by @coderabbitai on PR #2553.
2026-04-22 12:11:31 -04:00
Tom Boucher
2b5c35cdb1 test(#2519): add regression test for sdk tarball dist inclusion (#2586)
* test(#2519): add regression test verifying sdk/package.json has files + prepublishOnly

Guards the sdk/package.json fix for #2519 (tarball shipped without dist/)
so future edits can't silently drop either the `files` whitelist or the
`prepublishOnly` build hook. Asserts:

- `files` is a non-empty array
- `files` includes "dist" (so compiled CLI ships in tarball)
- `scripts.prepublishOnly` runs a build (npm run build / tsc)
- `bin` target lives under dist/ (sanity tie-in)

Closes #2519

* test(#2519): accept valid npm glob variants for dist in files matcher

Addresses CodeRabbit nitpick: the previous equality check on 'dist' / 'dist/' /
'dist/**' would false-fail on other valid npm packaging forms like './dist',
'dist/**/*', or backslash-separated paths. Normalize each entry and use a
regex that accepts all common dist path variants.
2026-04-22 12:09:12 -04:00
Tom Boucher
73c1af5168 fix(#2543): replace legacy /gsd-<cmd> syntax with /gsd:<cmd> across all source files (#2595)
Commands are now installed as commands/gsd/<name>.md and invoked as
/gsd:<name> in Claude Code. The old hyphen form /gsd-<name> was still
hardcoded in hundreds of places across workflows, references, templates,
lib modules, and command files — causing "Unknown command" errors
whenever GSD suggested a command to the user.

Replace all /gsd-<cmd> occurrences where <cmd> is a known command name
(derived at runtime from commands/gsd/*.md) using a targeted Node.js
script. Agent names, tool names (gsd-sdk, gsd-tools), directory names,
and path fragments are not touched.

Adds regression test tests/bug-2543-gsd-slash-namespace.test.cjs that
enforces zero legacy occurrences going forward. Removes inverted
tests/stale-colon-refs.test.cjs (bug #1748) which enforced the now-obsolete
hyphen form; the new bug-2543 test supersedes it. Updates 5 assertion
tests that hardcoded the old hyphen form to accept the new colon form.

Closes #2543

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 12:04:25 -04:00
Tom Boucher
533973700c feat(#2538): add last: /cmd suffix to statusline (opt-in) (#2594)
Adds a `statusline.show_last_command` config toggle (default: false) that
appends ` │ last: /<cmd>` to the statusline, showing the most recently
invoked slash command in the current session.

The suffix is derived by tailing the active Claude Code transcript
(provided as transcript_path in the hook input) and extracting the last
<command-name> tag. Reads only the final 256 KiB to stay cheap per render.
Graceful degradation: missing transcript, no recorded command, unreadable
config, or parse errors all silently omit the suffix without breaking the
statusline.

Closes #2538
2026-04-22 12:04:21 -04:00
Tom Boucher
349daf7e6a fix(#2545): use word boundary in path replacement to catch ~/.claude without trailing slash (#2592)
The Copilot content converter only replaced `~/.claude/` and
`$HOME/.claude/` when followed by a literal `/`. Bare references
(e.g. `configDir = ~/.claude` at end of line) slipped through and
triggered the post-install "Found N unreplaced .claude path reference(s)"
warning, since the leak scanner uses `(?:~|$HOME)/\.claude\b`.

Switched both replacements to a `(\/|\b)` capture group so trailing-slash
and bare forms are handled in a single pass — matching the pattern
already used by Antigravity, OpenCode, Kilo, and Codex converters.

Closes #2545
2026-04-22 12:04:17 -04:00
Tom Boucher
6b7b5c15a5 fix(#2559): remove stale year injection from research agent web search instructions (#2591)
The gsd-phase-researcher and gsd-project-researcher agents instructed
WebSearch queries to always include 'current year' (e.g., 2024). As
time passes, a hardcoded year biases search results toward stale
dated content — users saw 2024-tagged queries producing stale blog
references in 2026.

Remove the year-injection guidance. Instead, rely on checking
publication dates on the returned sources. Query templates and
success criteria updated accordingly.

Closes #2559
2026-04-22 12:04:13 -04:00
Tom Boucher
67a9550720 fix(#2549,#2550,#2552): bound discuss-phase context reads, add phase-type map selection, prohibit split reads (#2590)
#2549: load_prior_context was reading every prior *-CONTEXT.md file,
growing linearly with project phase count. Cap to the 3 most recent
phases. If .planning/DECISIONS-INDEX.md exists, read that instead.

#2550: scout_codebase claimed to select maps "based on phase type" but
had no classifier — agents read all 7 maps. Replace with an explicit
phase-type-to-maps table (2–3 maps per phase type) with a Mixed fallback.

#2552: Add explicit instruction not to split-read the same file at two
different offsets. Split reads break prompt cache reuse and cost more
than a single full read.

Closes #2549
Closes #2550
Closes #2552

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 12:04:10 -04:00
Tom Boucher
fba040c72c fix(#2557): Gemini/Antigravity local hook commands use relative paths, not \$CLAUDE_PROJECT_DIR (#2589)
\$CLAUDE_PROJECT_DIR is Claude Code-specific. Gemini CLI doesn't set it, and on
Windows its path-join logic doubled the value producing unresolvable paths like
D:\Projects\GSD\'D:\Projects\GSD'. Gemini runs project hooks with project root
as cwd, so bare relative paths (e.g. node .gemini/hooks/gsd-check-update.js)
are cross-platform and correct. Claude Code and others still use the env var.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 12:04:06 -04:00
Tom Boucher
7032f44633 fix(#2544): exit 1 on missing key in config-get (#2588)
The configGet query handler previously threw GSDError with
ErrorClassification.Validation, which maps to exit code 10. Callers
using `if ! gsd-sdk query config-get key; then fallback; fi` could
not detect missing keys through the exit code alone, because exit 10
is still truthy-failure but the intent (and documented UNIX
convention — cf. `git config --get`) is exit 1 for absent key.

Change the classification for the two 'Key not found' throw sites to
ErrorClassification.Execution so the CLI exits 1 on missing key.
Usage/schema errors (no key argument, malformed JSON, missing
config.json) remain Validation.

Closes #2544
2026-04-22 12:04:03 -04:00
Tom Boucher
2404b40a15 fix(#2555): SDK agent-skills reads config.agent_skills and returns <agent_skills> block (#2587)
The SDK query handler `agent-skills` previously scanned every skill
directory on the filesystem and returned a flat JSON list, ignoring
`config.agent_skills[agentType]` entirely. Workflows that interpolate
$(gsd-sdk query agent-skills <type>) into Task() prompts got a JSON
dump of all skills instead of the documented <agent_skills> block.

Port `buildAgentSkillsBlock` semantics from
get-shit-done/bin/lib/init.cjs into the SDK handler:

- Read config.agent_skills[agentType] via loadConfig()
- Support single-string and array forms
- Validate each project-relative path stays inside the project root
  (symlink-aware, mirrors security.cjs#validatePath)
- Support `global:<name>` prefix for ~/.claude/skills/<name>/
- Skip entries whose SKILL.md is missing, with a stderr warning
- Return the exact string block workflows embed:
  <agent_skills>\nRead these user-configured skills:\n- @.../SKILL.md\n</agent_skills>
- Empty string when no agent type, no config, or nothing valid — matches
  gsd-tools.cjs cmdAgentSkills output.
2026-04-22 12:03:59 -04:00
Tom Boucher
0d6349a6c1 fix(#2554): preserve leading zero in getMilestonePhaseFilter (#2585)
The normalization `replace(/^0+/, '')` over-stripped decimal phase IDs:
`"00.1"` collapsed to `".1"`, while the disk-side extractor yielded
`"0.1"` from `"00.1-<slug>"`. Set membership failed and inserted decimal
phases were silently excluded from every disk scan inside
`buildStateFrontmatter`, causing `state update` to rewind progress
counters.

Strip leading zeros only when followed by a digit
(`replace(/^0+(?=\d)/, '')`), preserving the zero before the decimal
point while keeping existing behavior for zero-padded integer IDs.

Closes #2554
2026-04-22 12:03:56 -04:00
Tom Boucher
c47a6a2164 fix: correct VALID_CONFIG_KEYS — remove internal state key, add missing public keys, migration hints (#2561)
* fix(#2530-2535): correct VALID_CONFIG_KEYS set — remove internal state key, add missing public keys, add migration hints

- Remove workflow._auto_chain_active from VALID_CONFIG_KEYS (internal runtime state, not user-settable) (#2530)
- Add hooks.workflow_guard to VALID_CONFIG_KEYS (read by gsd-workflow-guard.js hook, already documented) (#2531)
- Add workflow.ui_review to VALID_CONFIG_KEYS (read in autonomous.md via config-get) (#2532)
- Add workflow.max_discuss_passes to VALID_CONFIG_KEYS (read in discuss-phase.md via config-get) (#2533)
- Add CONFIG_KEY_SUGGESTIONS entries for sub_repos → planning.sub_repos and plan_checker → workflow.plan_check (#2535)
- Document workflow.ui_review and workflow.max_discuss_passes in docs/CONFIGURATION.md
- Clear INTERNAL_KEYS exemption in parity test (workflow._auto_chain_active removed from schema entirely)
- Add regression test file tests/bug-2530-valid-config-keys.test.cjs covering all 6 bugs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: align SDK VALID_CONFIG_KEYS with CJS — remove internal key, add missing public keys

- Remove workflow._auto_chain_active from SDK (internal runtime state, not user-settable)
- Add workflow.ui_review, workflow.max_discuss_passes, hooks.workflow_guard to SDK
- Add ui_review and max_discuss_passes to Full Schema example in CONFIGURATION.md

Resolves CodeRabbit review on #2561.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 11:28:25 -04:00
forfrossen
af2dba2328 fix(hooks): detect Claude Code via stdin session_id (closes #2520) (#2521)
* fix(hooks): detect Claude Code via stdin session_id, not filtered env (#2520)

The #2344 fix assumed `CLAUDECODE` would propagate to hook subprocesses.
On Claude Code v2.1.116 it doesn't — Claude Code applies a separate env
filter to PreToolUse hook commands that drops bare CLAUDECODE and
CLAUDE_SESSION_ID, keeping only CLAUDE_CODE_*-prefixed vars plus
CLAUDE_PROJECT_DIR. As a result every Edit/Write on an existing file
produced a redundant READ-BEFORE-EDIT advisory inside Claude Code.

Use `data.session_id` from the hook's stdin JSON as the primary Claude
Code signal (it's part of Claude Code's documented PreToolUse hook-input
schema). Keep CLAUDE_CODE_ENTRYPOINT / CLAUDE_CODE_SSE_PORT env checks
as propagation-verified fallbacks, and keep the legacy
CLAUDE_SESSION_ID / CLAUDECODE checks for back-compat and
future-proofing.

Add tests/bug-2520-read-guard-hook-subprocess-env.test.cjs, which spawns
the hook with an env mirroring the actual Claude Code hook-subprocess
filter. Extend the legacy test harnesses to also strip the
propagation-verified CLAUDE_CODE_* vars so positive-path tests keep
passing when the suite itself runs inside a Claude Code session (same
class of leak as #2370 / PR #2375, now covering the new detection
signals).

Non-Claude-host behavior (OpenCode / MiniMax) is unchanged: with no
`session_id` on stdin and no CLAUDE_CODE_* env var, the advisory still
fires.

Closes #2520

* test(2520): isolate session_id signal from env fallbacks in regression test

Per reviewer feedback (Copilot + CodeRabbit on #2521): the session_id
isolation test used the helper's default CLAUDE_CODE_ENTRYPOINT /
CLAUDE_CODE_SSE_PORT values, so the env fallback would rescue the skip
even if the primary `data.session_id` check regressed. Pass an explicit
env override that clears those fallbacks, so only the stdin `session_id`
signal can trigger the skip.

Other cases (env-only fallback, negative / non-Claude host) already
override env appropriately.

---------

Co-authored-by: forfrossen <forfrossensvart@gmail.com>
2026-04-22 10:41:58 -04:00
elfstrob
9b5397a30f feat(sdk): add queued_phases to init.manager (closes #2497) (#2514)
* feat(sdk): add queued_phases to init.manager (closes #2497)

Surfaces the milestone immediately AFTER the active one so the
/gsd-manager dashboard can preview upcoming phases without mixing
them into the active phases grid.

Changes:
- roadmap.ts: exports two new helpers
  - extractPhasesFromSection(section): parses phase number / name /
    goal / depends_on using the same pattern initManager uses for
    the active milestone, so queued phases have identical shape.
  - extractNextMilestoneSection(content, projectDir): resolves the
    current milestone via the STATE-first path (matching upstream
    PR #2508) then scans for the next ## milestone heading. Shipped
    milestones are stripped first so they can't shadow the real
    next. Returns null when the active milestone is the last one.
- init-complex.ts: initManager now exposes
  - queued_phases: Array<{ number, name, display_name, goal,
    depends_on, dep_phases, deps_display }>
  - queued_milestone_version: string | null
  - queued_milestone_name: string | null
  Existing phases array is unchanged — callers that only care about
  the active milestone see no behavior difference.

Scope note: PR #2508 (merged upstream 2026-04-21) superseded the
#2495 + #2496 portions of this branch's original submission. This
commit is the rebased remainder contributing only #2497 on top of
upstream's new helpers.

Test coverage (7 new tests, all passing):
- roadmap.test.ts: +5 tests
  - extractPhasesFromSection parses multiple phases with goal + deps
  - extractPhasesFromSection returns [] when no phase headings
  - extractNextMilestoneSection returns the milestone after the
    STATE-resolved active one
  - extractNextMilestoneSection returns null when active is last
  - extractNextMilestoneSection returns null when no version found
- init-complex.test.ts: +4 tests under `queued_phases (#2497)`
  - surfaces next milestone with version + name metadata
  - queued entries carry name / deps_display / display_name
  - queued phases are NOT mixed into active phases list
  - returns [] + nulls when active is the last milestone

All 51 tests in roadmap.test.ts + init-complex.test.ts pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(workflows): render queued_phases section in /gsd-manager dashboard

Surfaces the new `queued_phases` / `queued_milestone_version` /
`queued_milestone_name` fields from init.manager (SDK #2497) in a
compact preview section directly below the main active-milestone
table.

Changes to workflows/manager.md:
- Initialize step: parse the optional trio
  (queued_milestone_version, queued_milestone_name, queued_phases)
  alongside the existing init.manager fields. Treat missing as
  empty for backward compatibility with older SDK versions.
- Dashboard step: new "Queued section (next milestone preview)"
  rendered between the main active-milestone grid and the
  Recommendations section. Renders only when queued_phases is
  non-empty; skipped entirely when absent or empty (e.g. active
  milestone is the last one).
- Queued rows render without D/P/E columns since the phases haven't
  been discussed yet — just number, display_name, deps_display,
  and a fixed "· Queued" status.
- Success criterion added: queued section renders when non-empty
  and is skipped when absent.

Queued phases are deliberately NOT eligible for the Continue action
menu; they live in a future milestone. The preview exists for
situational awareness only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 10:41:37 -04:00
Tom Boucher
7397f580a5 fix(#2516): resolve executor_model inherit literal passthrough; add regression test (#2537)
When model_profile is "inherit", execute-phase was passing the literal string
"inherit" to Task(model=), causing fallback to the default model. The workflow
now documents that executor_model=="inherit" requires omitting the model= parameter
entirely so Claude Code inherits the orchestrator model automatically.

Closes #2516
2026-04-21 21:35:22 -04:00
Tom Boucher
9a67e350b3 fix(#2504): auto-pass UAT for infrastructure/foundation phases with no user-facing elements (#2541)
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 21:20:27 -04:00
Tom Boucher
98d92d7570 fix(#2526): warn about REQ-IDs in body missing from Traceability table (#2539)
Scan REQUIREMENTS.md body for all **REQ-ID** patterns during phase
complete and emit a warning for any IDs absent from the Traceability
table, regardless of whether the roadmap has a Requirements: line.

Closes #2526
2026-04-21 21:18:58 -04:00
Tom Boucher
8eeaa20791 fix(install): chmod dist/cli.js 0o755 after npm install -g; add regression test (closes #2525) (#2536)
Use process.platform !== 'win32' guard in catch instead of a comment, and add
regression test for bug #2525 (gsd-sdk bin symlink points at non-executable file).

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 21:18:34 -04:00
Tom Boucher
f32ffc9fb8 fix(quick): include deferred-items.md in final commit file list (closes #2523) (#2542)
Step 8 file list omitted deferred-items.md, leaving executor out-of-scope
findings untracked after final commit even with commit_docs: true.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 20:33:43 -04:00
Tom Boucher
5676e2e4ef fix(sdk): forward --ws workstream flag through query dispatch (#2546)
* fix(sdk): forward --ws workstream flag through query dispatch (closes #2524)

- cli.ts: pass args.ws as workstream to registry.dispatch()
- registry.ts: add workstream? param to dispatch(), thread to handler
- utils.ts: add optional workstream? to QueryHandler type signature
- helpers.ts: planningPaths() accepts workstream? and uses relPlanningPath()
- All ~26 query handlers updated to receive and pass workstream to planningPaths()
- Config/commit/intel handlers use _workstream (project-global, not scoped)
- Add failing-then-passing test: tests/bug-2524-sdk-query-ws-flag.test.cjs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(sdk): forward workstream to all downstream query helpers

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(test): rewrite #2524 test as static source assertions — no sdk/dist build in CI

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 20:33:24 -04:00
Lex Christopherson
7bb6b6452a fix: spike workflow defaults to interactive UI demos, not stdout
Flips the bias in step 8b: build a simple HTML page/web UI by default,
fall back to stdout only for pure fact-checking (binary yes/no, benchmarks).
Mirrors upstream spike-idea skill constraint #3 update.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 09:19:04 -06:00
Lex Christopherson
43ea92578b Merge remote-tracking branch 'origin/main' into hotfix/1.38.2
# Conflicts:
#	CHANGELOG.md
#	bin/install.js
#	sdk/src/query/init.ts
2026-04-21 09:16:24 -06:00
Lex Christopherson
a42d5db742 1.38.2 v1.38.2 2026-04-21 09:14:52 -06:00
Lex Christopherson
c86ca1b3eb fix: sync spike/sketch workflows with upstream skill v2 improvements
Spike workflow:
- Add frontier mode (no-arg or "frontier" proposes integration + frontier spikes)
- Add depth-over-speed principle — follow surprising findings, test edge cases,
  document investigation trail not just verdict
- Add CONVENTIONS.md awareness — follow established patterns, update after session
- Add Requirements section in MANIFEST — track design decisions as they emerge
- Add re-ground step before each spike to prevent drift in long sessions
- Add Investigation Trail section to README template
- Restructured prior context loading with priority ordering
- Research step now runs per-spike with briefing and approach comparison table

Sketch workflow:
- Add frontier mode (no-arg or "frontier" proposes consistency + frontier sketches)
- Add spike context loading — ground mockups in real data shapes, requirements,
  and conventions from spike findings

Spike wrap-up workflow:
- Add CONVENTIONS.md generation step (recurring stack/structure/pattern choices)
- Reference files now use implementation blueprint format (Requirements, How to
  Build It, What to Avoid, Constraints)
- SKILL.md now includes requirements section from MANIFEST
- Next-steps route to /gsd-spike frontier mode instead of inline analysis

Sketch wrap-up workflow:
- Next-steps route to /gsd-sketch frontier mode

Commands updated with frontier mode in descriptions and argument hints.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 09:14:32 -06:00
github-actions[bot]
337e052aa9 chore: bump version to 1.38.2 for hotfix 2026-04-21 15:13:56 +00:00
Lex Christopherson
969ee38ee5 fix: sync spike/sketch workflows with upstream skill v2 improvements
Spike workflow:
- Add frontier mode (no-arg or "frontier" proposes integration + frontier spikes)
- Add depth-over-speed principle — follow surprising findings, test edge cases,
  document investigation trail not just verdict
- Add CONVENTIONS.md awareness — follow established patterns, update after session
- Add Requirements section in MANIFEST — track design decisions as they emerge
- Add re-ground step before each spike to prevent drift in long sessions
- Add Investigation Trail section to README template
- Restructured prior context loading with priority ordering
- Research step now runs per-spike with briefing and approach comparison table

Sketch workflow:
- Add frontier mode (no-arg or "frontier" proposes consistency + frontier sketches)
- Add spike context loading — ground mockups in real data shapes, requirements,
  and conventions from spike findings

Spike wrap-up workflow:
- Add CONVENTIONS.md generation step (recurring stack/structure/pattern choices)
- Reference files now use implementation blueprint format (Requirements, How to
  Build It, What to Avoid, Constraints)
- SKILL.md now includes requirements section from MANIFEST
- Next-steps route to /gsd-spike frontier mode instead of inline analysis

Sketch wrap-up workflow:
- Next-steps route to /gsd-sketch frontier mode

Commands updated with frontier mode in descriptions and argument hints.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 09:05:47 -06:00
Tom Boucher
2980f0ec48 fix(sdk): stripShippedMilestones handles inline SHIPPED headings; getMilestoneInfo prefers STATE.md (#2508)
* fix(sdk): stripShippedMilestones handles inline SHIPPED headings; getMilestoneInfo prefers STATE.md

Fixes two compounding bugs:

- #2496: stripShippedMilestones only stripped <details> blocks, ignoring
  '## Heading —  SHIPPED ...' inline markers. Shipped milestone sections
  were leaking into downstream parsers.

- #2495: getMilestoneInfo checked STATE.md frontmatter only as a last-resort
  fallback, so it returned the first heading match (often a leaked shipped
  milestone) rather than the current milestone. Moved STATE.md check to
  priority 1, consistent with extractCurrentMilestone.

Closes #2495
Closes #2496

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(roadmap): handle ### SHIPPED headings and STATE.md version-only case

Two follow-up fixes from CodeRabbit review of #2508:

1. stripShippedMilestones only split on ## boundaries; ### headings marked
    SHIPPED were not stripped, leaking into fallback parsers. Expanded
   the split/filter regex to #{2,3} to align with extractCurrentMilestone.

2. getMilestoneInfo's early-return on parseMilestoneFromState discarded the
   real milestone name from ROADMAP.md when STATE.md had only `milestone:`
   (no `milestone_name:`), returning the placeholder name 'milestone'.
   Now only short-circuits when STATE.md provides a real name; otherwise
   falls through to ROADMAP for the name while using stateVersion to
   override the version in every ROADMAP-derived return path.

Tests: +2 new cases (### SHIPPED heading, version-only STATE.md).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 10:41:35 -04:00
Tom Boucher
8789211038 fix(insert-phase): update STATE.md next-phase recommendation after phase insertion (#2509)
* fix(insert-phase): update STATE.md next-phase recommendation after inserting a phase

Closes #2502

* fix(insert-phase): update all STATE.md pointers; tighten test scope

Two follow-up fixes from CodeRabbit review of #2509:

1. The update_project_state instruction only said to find "the line" for
   the next-phase recommendation. STATE.md can have multiple pointers
   (structured current_phase: field AND prose recommendation text).
   Updated wording to explicitly require updating all of them in the same
   edit.

2. The regression test for the next-phase pointer update scanned the
   entire file, so a match anywhere would pass even if update_project_state
   itself was missing the instruction. Scoped the assertion to only the
   content inside <step name="update_project_state"> to prevent false
   positives.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 10:10:45 -04:00
Tom Boucher
57bbfe652b fix: exclude non-wiped dirs from custom-file scan; warn on non-Claude model profiles (#2511)
* fix(detect-custom-files): exclude skills and command dirs not wiped by installer (closes #2505)

GSD_MANAGED_DIRS included 'skills' and 'command' directories, but the
installer never wipes those paths. Users with third-party skills installed
(40+ files, none in GSD's manifest) had every skill flagged as a "custom
file" requiring backup, producing noisy false-positive reports on every
/gsd-update run.

Removes 'skills' and 'command' from both gsd-tools.cjs and the SDK's
detect-custom-files.ts. Adds two regression tests confirming neither
directory is scanned.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(settings): warn that model profiles are no-ops on non-Claude runtimes (closes #2506)

settings.md presented Quality/Balanced/Budget model profiles without any
indication that these tiers map to Claude models (Opus/Sonnet/Haiku) and
have no effect on non-Claude runtimes (Codex, Gemini CLI, OpenRouter).
Users on Codex saw the profile chooser as if it would meaningfully select
models, but all agents silently used the runtime default regardless.

Adds a non-Claude runtime note before the profile question (shown in
TEXT_MODE, the path all non-Claude runtimes take) explaining the profiles
are no-ops and directing users to either choose Inherit or configure
model_overrides manually. Also updates the Inherit option description to
explicitly name the runtimes where it is the correct choice.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 10:10:10 -04:00
Tom Boucher
a4764c5611 fix(execute-phase): resurrection-detection must check git history before deleting new .planning/ files (#2510)
The guard at the worktree-merge resurrection block was inverting the
intended logic: it deleted any .planning/ file absent from PRE_MERGE_FILES,
which includes brand-new files (e.g. SUMMARY.md just created by the
executor). A genuine resurrection is a file that was previously tracked on
main, deliberately removed, and then re-introduced by the merge. Detecting
that requires a git history check — not just tree membership.

Fix: replace the PRE_MERGE_FILES grep guard with a `git log --follow
--diff-filter=D` check that only removes the file if it has a deletion
event in main's ancestry.

Closes #2501
2026-04-21 09:46:01 -04:00
Tom Boucher
b2534e8a05 feat(plan-phase): chunked mode + filesystem fallback for Windows stdio hang (#2499)
* feat(plan-phase): chunked mode + filesystem fallback for Windows stdio hang (#2310)

Addresses the 2026-04-16 Windows incident where gsd-planner wrote all 5
PLAN.md files to disk but Task() never returned, hanging the orchestrator
for 30+ minutes. Two mitigations:

1. Filesystem fallback (steps 9a, 11a): when Task() returns with an
   empty/truncated response but PLAN.md files exist on disk, surface a
   recoverable prompt (Accept plans / Retry planner / Stop) instead of
   silently failing. Directly addresses the post-restart recovery path.

2. Chunked mode (--chunked flag / workflow.plan_chunked config): splits the
   single long-lived planner Task into a short outline Task (~2 min) followed
   by N short per-plan Tasks (~3-5 min each). Each plan is committed
   individually for crash resilience. A hang loses one plan, not all of them.
   Resume detection skips plans already on disk on re-run.

RCA confirmed: task state mtime 14:29 vs PLAN.md writes 14:32-14:52 =
subagent completed normally, IPC return was dropped by Windows stdio deadlock.
Neither mitigation fixes the root cause (requires upstream Task() timeout
support); both bound damage and enable recovery.

New reference file planner-chunked.md keeps OUTLINE COMPLETE / PLAN COMPLETE
return formats out of gsd-planner.md (which sits at 46K near its size limit).

Closes #2310

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(plan-phase): address CodeRabbit review comments on #2499

- docs/CONFIGURATION.md: add workflow.plan_chunked to full JSON schema example
- plan-phase.md step 8.5.1: validate PLAN-OUTLINE.md with grep for OUTLINE
  COMPLETE marker before reusing (not just file existence)
- plan-phase.md step 8.5.2: validate per-plan PLAN.md has YAML frontmatter
  (head -1 grep for ---) before skipping in resume path
- plan-phase.md: add language tags (text/javascript/bash) to bare fenced
  code blocks in steps 8.5, 9a, 11a (markdownlint MD040)
- Rejected: commit_docs gate on per-plan commits (gsd-sdk query commit
  already respects commit_docs internally — comment was a false positive)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(plan-phase): route Accept-plans through step 9 PLANNING COMPLETE handling

Honors --skip-verify / plan_checker_enabled=false in 9a fallback path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 08:40:39 -04:00
Tom Boucher
d1b56febcb fix(execute-phase): post-merge deletion audit for bulk file deletions (closes #2384) (#2483)
* fix(execute-phase): post-merge deletion audit for bulk file deletions (closes #2384)

Two data-loss incidents were caused by worktree merges bringing in bulk
file deletions silently. The pre-merge check (HEAD...WT_BRANCH) catches
deletions on the worktree branch, but files deleted during the merge
itself (e.g., from merge conflict resolution or stale branch state) were
not audited post-merge.

Adds a post-merge audit immediately after git merge --no-ff succeeds:
- Counts files deleted outside .planning/ in the merge commit
- If count > 5 and ALLOW_BULK_DELETE!=1: reverts the merge with
  git reset --hard HEAD~1 and continues to the next worktree
- Logs the full file list and an escape-hatch instruction

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(test): tighten post-merge deletion audit assertions (CodeRabbit #2483)

Replace loose substring checks with exact regex assertions:
- assert.match against 'git diff --diff-filter=D --name-only HEAD~1 HEAD'
- assert.match against threshold gate + ALLOW_BULK_DELETE override condition
- assert.match against git reset --hard HEAD~1 revert
- assert.match against MERGE_DEL_COUNT grep -vc for non-.planning count

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(inventory): update workflow count to 81 (graduation.md added in #2490)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 18:37:42 -04:00
Tom Boucher
1657321eb0 fix(install): remove bare ~/.claude reference in update.md (closes #2470) (#2482)
* fix(install): remove bare ~/.claude reference in update.md (closes #2470)

The installer's copyWithPathReplacement() replaces ~/\.claude\/ (with
trailing slash) but not ~/\.claude (bare, no trailing slash). A comment
on line 398 of update.md used the bare form, which scanForLeakedPaths()
correctly flagged for every non-Claude runtime install.

Replaced the example in the comment with a non-Claude runtime path so
the file passes the scanner for all runtimes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(test): align regex with installer's word-boundary semantics (CodeRabbit #2482)

Replace negative lookahead (?!\/) with \b word boundary to match the
installer's scanForLeakedPaths() pattern. The lookahead would incorrectly
flag ~/.claude_suffix whereas \b correctly excludes it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(test): revert \b regex — (?!\/) was intentionally scoped to bare refs

The installer's scanForLeakedPaths uses \b but the test is specifically
checking for bare ~/.claude without trailing slash that the replacer misses.
~/.claude/ (with slash) at line 359 of update.md is expected and handled.
\b would flag it as a false positive.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(inventory): update workflow count to 81 (graduation.md added in #2490)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 18:37:32 -04:00
Tom Boucher
2b494407e5 feat(assembly): add link mode for CLAUDE.md @-reference sections (#2484)
* feat(assembly): add link mode for CLAUDE.md @-reference sections (#2415)

Adds `claude_md_assembly.mode: "link"` config option that writes
`@.planning/<source>` instead of inlining content between GSD markers,
reducing typical CLAUDE.md size by ~65%. Per-block overrides available
via `claude_md_assembly.blocks.<section>`. Falls back to embed for
sections without a real source file (workflow, fallbacks).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(test): add positive assertion for embedded workflow content (CodeRabbit #2484)

The negative assertion only confirmed @GSD defaults wasn't written.
Add assert.ok(content.includes('GSD Workflow Enforcement')) to verify
the workflow section is actually embedded inline when link mode falls back.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 18:27:55 -04:00
Tom Boucher
d0f4340807 feat(workflows): link pending todos to roadmap phases in new-milestone (#2433) (#2485)
Adds step 10.5 to gsd-new-milestone that scans pending todos against the
approved roadmap and tags matches with `resolves_phase: N` in their YAML
frontmatter. Adds a `close_phase_todos` step to execute-phase that moves
tagged todos to `completed/` when the phase completes — closing the loop
automatically with no manual cleanup.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 18:25:24 -04:00
Tom Boucher
280eed93bc feat(cli): add /gsd-sync-skills for cross-runtime managed skill sync (#2491)
* fix(tests): update 5 source-text tests to read config-schema.cjs

VALID_CONFIG_KEYS moved from config.cjs to config-schema.cjs in the
drift-prevention companion PR. Tests that read config.cjs source text
and checked for key literal includes() now point to the correct file.

Closes #2480

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(cli): add /gsd-sync-skills for cross-runtime managed skill sync (#2380)

Adds /gsd-sync-skills command so multi-runtime users can keep gsd-* skill
directories aligned across runtime roots after updating one runtime with gsd-update.

Changes:
- bin/install.js: add --skills-root <runtime> flag that prints the skills root
  path for any supported runtime, reusing the existing getGlobalDir() table.
  Banner is suppressed when --skills-root is used (machine-readable output).
- commands/gsd/sync-skills.md: slash command definition
- get-shit-done/workflows/sync-skills.md: full workflow spec covering argument
  parsing, path resolution via --skills-root, diff computation (CREATE/UPDATE/
  REMOVE/SKIP), dry-run report (default), apply execution, idempotency guarantee,
  and safety rules (only gsd-* touched, dry-run performs no writes).

Safety rules: only gsd-* directories are ever created/updated/removed; non-GSD
skills in destination roots are never touched; --dry-run is the default.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 18:21:43 -04:00
Tom Boucher
b432d4a726 feat(workflows): close LEARNINGS.md consumption-and-graduation loop (#2490)
* fix(tests): update 5 source-text tests to read config-schema.cjs

VALID_CONFIG_KEYS moved from config.cjs to config-schema.cjs in the
drift-prevention companion PR. Tests that read config.cjs source text
and checked for key literal includes() now point to the correct file.

Closes #2480

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(workflows): close LEARNINGS.md consumption-and-graduation loop (#2430)

Part A — Consumption: extend plan-phase.md cross-phase context load to include
LEARNINGS.md files from the 3 most recent prior phases (same recency gate as
CONTEXT.md + SUMMARY.md: CONTEXT_WINDOW >= 500000 only). Also loads LEARNINGS.md
from any phases in the Depends-on chain. Silent skip if absent; 15% context
budget cap with oldest-first truncation; [from Phase N LEARNINGS] attribution.

Part B — Graduation: add graduation_scan step to transition.md (after
evolve_project) that delegates to new graduation.md helper workflow. The helper
clusters recurring items across the last N phases (default window=5, threshold=3)
using Jaccard lexical similarity, surfaces HITL Promote/Defer/Dismiss prompts,
routes promotions to PROJECT.md or PATTERNS.md by category, annotates graduated
items with `graduated:` field, and persists dismissed/deferred clusters in
STATE.md graduation_backlog. Always non-blocking; silently no-ops on first phase
or when data is insufficient.

Also: adds optional `graduated:` annotation docs to extract_learnings.md schema.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(graduation): address CodeRabbit review findings on PR #2490

- graduation.md: unify insufficient-data guard to silent-skip (remove
  contradictory [no-op] print path)
- graduation.md: add TEXT_MODE fallback for HITL cluster prompts
- graduation.md: add A (defer-all) to accepted actions [P/D/X/A]
- graduation.md: tag untyped code fences with text language (MD040)
- transition.md: tag untyped graduation.md fence with text language

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(graduation): rephrase TEXT_MODE line to avoid prompt-injection scanner false positive

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 18:21:35 -04:00