* refactor(sdk): extract gsdtools transport seam with per-command policy
* fix(sdk): address CodeRabbit transport policy and timeout findings
* fix(sdk): harden raw transport formatting and raw-path coverage
* docs(en): update FEATURES/USER-GUIDE/COMMANDS for v1.40.0 surface
- FEATURES.md: append v1.40.0 section (#122 skill consolidation, #123
namespace meta-skills, #124 context-window guard, #125 phase-lifecycle
status-line read-side); add to TOC.
- USER-GUIDE.md: add slash-command form (hyphen vs colon) primer and
namespace routing primer; replace deleted slash forms in walkthroughs
(`/gsd-add-backlog`, `/gsd-plant-seed`, `/gsd-add-phase`,
`/gsd-set-profile`, `/gsd-list-workspaces`, etc.) with consolidated
forms (`/gsd-capture --backlog`, `/gsd-phase --insert`,
`/gsd-config --profile`, `/gsd-workspace --list`, etc.); fix
`/gsd-spike-wrap-up` and `/gsd-sketch-wrap-up` to flag form.
- COMMANDS.md: clarify Command Syntax (Gemini = colon form, others =
hyphen form); add Namespace Meta-Skills section with all six routers;
add `--context` to /gsd-health flag table.
Refs #3047
* docs(en): refresh INVENTORY/CLI-TOOLS/STATE-MD-LIFECYCLE for v1.40.0
- INVENTORY.md: workflow-row "Invoked by" column updated to point at
consolidated commands (`/gsd-phase` family, `/gsd-workspace --list`,
`/gsd-config --advanced/--integrations/--profile`,
`/gsd-sketch --wrap-up`, `/gsd-spike --wrap-up`); CLI-modules row for
`secrets.cjs` updated to `/gsd-config --integrations`. Command count
and namespace meta-skills section already reflect 65 shipped (= 59
consolidated sub-skills + 6 ns-* routers).
- CLI-TOOLS.md: add `validate context` row under Validation Commands
with the 60 %/70 % threshold envelope used by `/gsd-health --context`.
- STATE-MD-LIFECYCLE.md: flip status header from "proposed" to
"shipped in v1.40.0" since `parseStateMd()` and `formatGsdState()`
now read and render `active_phase`, `next_action`, `next_phases`,
and `progress`.
`docs/AGENTS.md` audited and verified clean — `gsd-code-fixer` row
already lists the correct `/gsd-code-review --fix` spawner; no
deleted-skill references found. `docs/INVENTORY-MANIFEST.json`
audited and verified clean — already enumerates the 65 commands
(including six ns-* routers) and contains no deleted slash forms.
Refs #3047
* docs(en): cleanup ARCHITECTURE/CONFIGURATION for v1.40.0
- ARCHITECTURE.md: split Commands install-target list to call out the
Gemini colon form (`/gsd:command-name`) vs hyphen form for every
other runtime. Add a new subsection covering two-stage hierarchical
routing via the six namespace meta-skills (#2792) and a paired note
on the MCP token-budget interaction so readers see the two big
per-turn cost levers in one place.
- CONFIGURATION.md: rewrite three references to the deleted
`/gsd-settings-advanced` and `/gsd-settings-integrations` slash
forms to use the consolidated `/gsd-config --advanced` /
`/gsd-config --integrations` invocations. Add a new "STATE.md
Frontmatter (Phase Lifecycle)" section documenting the four
optional fields (`active_phase`, `next_action`, `next_phases`,
`progress`) read by the v1.40 status-line, with a pointer to
STATE-MD-LIFECYCLE.md for the full reference.
`docs/manual-update.md` audited and verified clean — already documents
`/gsd-update --reapply` (the consolidated form), no reference to the
deleted `/gsd-reapply-patches`.
Refs #3047
* docs(i18n): mirror v1.40.0 slash-command rename into ja-JP/ko-KR/zh-CN/pt-BR
Mechanical token-level renames only — every reference to a deleted
micro-skill slash form is rewritten to the consolidated form on the
matching parent skill. No prose was machine-translated; new prose
sections (slash-form primer, namespace routing primer, v1.40 feature
entries, STATE.md frontmatter) were left for human translator
follow-up.
Renames applied uniformly across all four trees:
/gsd-add-todo, /gsd-add-note, /gsd-add-backlog,
/gsd-plant-seed, /gsd-check-todos → /gsd-capture[ --note|
--backlog|--seed|--list]
/gsd-add-phase, /gsd-insert-phase,
/gsd-remove-phase, /gsd-edit-phase → /gsd-phase[ --insert|
--remove|--edit]
/gsd-new-workspace, /gsd-list-workspaces,
/gsd-remove-workspace → /gsd-workspace[ --new|
--list|--remove]
/gsd-settings-advanced,
/gsd-settings-integrations,
/gsd-set-profile → /gsd-config[ --advanced|
--integrations|--profile]
/gsd-sketch-wrap-up → /gsd-sketch --wrap-up
/gsd-spike-wrap-up → /gsd-spike --wrap-up
/gsd-reapply-patches → /gsd-update --reapply
/gsd-code-review-fix → /gsd-code-review --fix
/gsd-plan-milestone-gaps → /gsd-audit-milestone
Refs #3047
* docs(changelog): regroup [Unreleased] under Feature/Enhancement/Fix
Replace the existing Keep-a-Changelog \`Added\` / \`Changed\` /
\`Performance\` / \`Removed\` / \`Fixed\` sub-headers in the [Unreleased]
block with the issue/PR template taxonomy:
Added → Feature
Changed / Performance → Enhancement
Removed → Enhancement
Fixed → Fix
Order within the release: Feature → Enhancement → Fix. Every bullet
preserved verbatim — only headers and grouping changed; the awkward
inline-versioned headers (\`### Added — 1.40.0-rc.1\`,
\`### Changed — 1.40.0-rc.1\`, \`### Fixed — 1.40.0-rc.1\`) folded into
the same buckets with the \`— 1.40.0-rc.1\` suffix dropped, since the
[Unreleased] block IS 1.40.0-rc.1.
The [1.39.2] hotfix block called out in #3047's spec does not yet
exist in CHANGELOG.md (the previously released hotfix is [1.39.1]),
so this commit only regroups [Unreleased]. Older release blocks
([1.39.1] and earlier) are frozen and untouched.
Refs #3047
* docs(changeset): add fragment for v1.40.0 doc audit
Refs #3047
* docs(en): strip leading / from deleted slash-command tokens in FEATURES
REQ-CONSOLIDATE-03 and REQ-CONSOLIDATE-04 listed deleted commands by
their `/gsd-foo` form for the historical record. The docs-parity tests
in bug-3010, bug-3029-3034, and bug-3042-3044 use the regex
`/\/gsd-[a-z0-9][a-z0-9-]*/g` to scan user-facing surfaces for any
remaining mention of removed slash forms — they cannot tell prose
about a deleted command from a live recommendation.
Strip the leading slash from the bare-name references (preserve the
historical text otherwise). Tests now require a `/` prefix to match,
so `gsd-add-todo` reads identically to a human but no longer trips
the parser.
Verified locally: 65/65 tests pass across the three docs-parity
suites that were red on CI run 25270072600.
Refs #3047
* docs(en): fix CR feedback + drop literal /gsd:plan-phase from USER-GUIDE
CI: tests/bug-2543-gsd-slash-namespace.test.cjs flagged
docs/USER-GUIDE.md:35 for embedding the literal `/gsd:plan-phase`
token in the parenthetical Gemini-form example. The test scans every
.md under docs/ for `/gsd:<live-cmd>` because non-Gemini surfaces must
not advertise the colon form. Replaced the literal example with a
prose substitution rule.
CR: docs/ARCHITECTURE.md:125 — the namespace meta-skills were listed
by file-prefix (`gsd-ns-workflow`) but the invocable frontmatter `name:`
is the bare form (`gsd-workflow`). Verified against the six
`commands/gsd/ns-*.md` files. Replaced with the canonical names and
noted the file/name disagreement in-line.
CR: docs/COMMANDS.md:723 — `v1.40` aligned to canonical `v1.40.0`.
CR: docs/FEATURES.md:2679 — REQ-CTX-GUARD-02 advertised the wrong
invocation (`gsd-tools validate context`). The shipped handler is
exposed via `gsd-sdk query validate.context` and requires explicit
`--tokens-used <int>` + `--context-window <int>` flags (verified
against sdk/src/query/validate.ts:849-882 and
get-shit-done/bin/lib/validate-command-router.cjs:19-36).
CR: docs/zh-CN/README.md:533 — added `inherit` to the profile-options
parenthetical to match the canonical set (verified against
model-profiles.cjs:29 `VALID_PROFILES = […MODEL_PROFILES['gsd-planner'], 'inherit']`).
Verified locally: 74/74 tests pass across the four docs-parity suites
that were red on CI runs 25270072600 and 25270182903.
Refs #3047
* feat(plan-phase): --research-phase flag absorbs deleted /gsd-research-phase + scrub stale refs (#3042, #3044)
#3042 (orphaned research-phase): /gsd-research-phase had a workflow file
but no slash-command stub. Rather than restore the orphan, the research-
only capability is now a flag on /gsd-plan-phase:
/gsd-plan-phase --research-phase <N>
When set, the workflow scopes to phase N, runs the research step (Section
5 of the existing plan-phase workflow), then early-exits before the
planner/plan-checker/verifier chain.
Per RCA against the deleted standalone, the flag adds two modifiers to
fully cover the original surface (Option B from the RCA discussion):
- --view : print existing RESEARCH.md to stdout, no spawn. Cheapest mode
for the correction-without-replanning loop the issue reporter
explicitly called out. Errors with a clear hint if RESEARCH.md is
missing.
- --research : reuse the existing "force re-research" semantics. In
research-only mode this skips the existing-RESEARCH.md prompt and
re-spawns unconditionally.
- Neither flag, RESEARCH.md exists : prompt update/view/skip. Mirrors
the deleted standalone's existing-artifact menu (#3042 RCA).
#3044 (stale slash-command refs): scrubbed five deleted commands from
all user-facing surfaces, including English docs, 4 localized doc sets
(ja-JP, ko-KR, zh-CN, pt-BR), workflows, templates, and references.
/gsd-check-todos → /gsd-capture --list
/gsd-new-workspace → /gsd-workspace --new
/gsd-status → /gsd-progress
/gsd-plan-milestone-gaps → table rows / orphan sections removed
(PR #3038 only scrubbed workflows/agent;
missed the docs surfaces this PR covers)
/gsd-research-phase → /gsd-plan-phase --research-phase
Includes a fix to docs/issue-driven-orchestration.md (PR #3036)
which itself referenced /gsd-new-workspace 4 times — self-correction.
Removed:
- get-shit-done/workflows/research-phase.md (orphan, capability
absorbed into --research-phase flag)
Tests:
- tests/bug-3042-3044-research-flag-and-stale-refs.test.cjs — 46
structural-IR tests across both bugs:
- argument-hint advertises --research-phase + --view
- workflow parses --research-phase, sets RESEARCH_ONLY,
early-exits before planner
- --view prints RESEARCH.md without spawning
- --research forces refresh in research-only mode
- existing-RESEARCH.md prompt path with update/view/skip
- workflows/research-phase.md is removed
- 5 deleted slash-commands absent from 17 English user-facing
surfaces + 16 localized doc surfaces (4 locales × 4 docs each)
- replacement command tokens present where deleted ones lived
6950/6950 full suite pass. Lints clean.
Closes#3042Closes#3044
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix: address all 8 CR findings on PR #3045
Major (3):
- get-shit-done/workflows/plan-phase.md:344 — added explicit early-exit
guard at Section 5.1: "Skip if RESEARCH_ONLY=true". Without it, an LLM
could fall through "use existing, skip to step 6" → planner spawn,
violating the research-only contract. The guard makes the early-exit
unreachable from any non-research-only branch.
- get-shit-done/references/continuation-format.md (3 examples) +
zh-CN/.../continuation-format.md (3 examples) — pointed to
`/gsd-plan-phase --research-phase` but docs/COMMANDS.md didn't
document the flag. Added a full --research-phase + --view + --research
modifier section to the /gsd-plan-phase flag table in COMMANDS.md so
the canonical reference matches the continuation examples.
Minor (5):
- docs/FEATURES.md:1632 — `/gsd-plan-phase --research-phase` →
`/gsd-plan-phase --research-phase <N>` (include required arg).
- get-shit-done/templates/README.md:46 — NN-VALIDATION.md producer
reverted from `/gsd-plan-phase --research-phase` (Nyquist) to plain
`/gsd-plan-phase` (Nyquist). VALIDATION.md is created during normal
Nyquist flow, not research-only mode — the bulk replacement was
wrong for that line.
- get-shit-done/workflows/help.md:89 — signature line was missing
`--research`; added it alongside `--research-phase` and `--view`.
- tests/bug-3042-3044-...:197 — promptHasView/promptHasSkip were
tautological (matched anywhere in 1700-line workflow). Tightened
to a proximity check anchored on "RESEARCH.md already exists" prompt
header within a 600-char window. Updated workflow to emit that
literal phrase.
- tests/feat-2840-...:95 — workspace assertion used `/gsd-workspace`
but the documented replacement is `/gsd-workspace --new`. Tightened
to require both tokens (in 3 places: requiredCommands list, regex
in conceptPairs, error message).
6950/6950 full suite pass. Lint clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* fix(install): skip Gemini local commands/gsd when global GSD present (#3037)
Reporter showed that running `npx get-shit-done-cc --gemini --global`
followed by `--gemini --local` in a project creates the same 65 GSD
command files in both Gemini scopes:
- ~/.gemini/commands/gsd/ (user scope)
- <project>/.gemini/commands/gsd/ (workspace scope)
Gemini conflict-detects by command name across scopes and renames every
overlapping /gsd:* command to /workspace.gsd:* and /user.gsd:*, breaking
the documented /gsd:* namespace.
Fix: in bin/install.js, when handling --gemini --local, detect whether
~/.gemini/commands/gsd/ already exists with managed-shape content. If
so, skip the local copy and print a clear three-line warning explaining
the conflict avoidance. The user-scope install already provides the
same /gsd:* commands in this project; the local copy adds zero value.
Sibling fixes (test isolation):
- tests/install-minimal-all-runtimes.test.cjs: pass HOME/USERPROFILE
through the spawned installer's env so the developer's real
~/.gemini/commands/gsd/ doesn't trigger the new skip path during
test runs that want to assert the local-install populates
commands/gsd/.
- tests/gemini-namespacing.test.cjs: the "Gemini Install (Behavioral)"
describe block now creates an isolated tmpHome and points
process.env.HOME at it before calling install(false, 'gemini'),
with proper restore in afterEach.
Test:
- tests/bug-3037-gemini-duplicate-commands.test.cjs — 4 structural
tests:
1. global install populates HOME/.gemini/commands/gsd
2. local install AFTER global skips the local copy
3. local install with NO existing global still populates locally
(no-regression)
4. local install when HOME has .gemini/ but no GSD-managed
commands/gsd/ still populates locally (non-GSD-Gemini-user
no-regression)
6909/6909 full suite pass. Lints clean.
Closes#3037
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix: address CR feedback on PR #3041 — narrower detection + USERPROFILE restore
CR findings:
1. **bin/install.js (Major)** — userScopeHasGsd used
`fs.readdirSync(homeGeminiGsd).length > 0` which would skip the
local install for any non-empty directory, including a user who
hand-dropped a single override at ~/.gemini/commands/gsd/<thing>
.toml without ever running --gemini --global. Narrowed the
detection to require at least 3 canonical GSD command files
(help.toml, progress.toml, new-project.toml) — a marker that
ships in every GSD Gemini install (minimal mode included) and is
structurally impossible to produce by accident.
2. **tests/bug-3037-...:59 (Minor)** — beforeEach overwrites
process.env.USERPROFILE but afterEach only restores HOME, leaking
the temp home into later tests on Windows or any code path that
reads USERPROFILE. Added save/restore symmetric with HOME.
Plus added a 5th regression test covering the narrowed detection:
"local install when HOME has hand-dropped overrides UNDER commands/gsd/
(but no full GSD) still populates locally" — directly exercises the
edge case CR identified.
5/5 targeted tests pass. 6910/6910 full suite pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* fix(workflows,docs): scrub stale /gsd-code-review-fix and /gsd-plan-milestone-gaps refs (#3029, #3034)
#2790 consolidated /gsd-code-review-fix into /gsd-code-review --fix and
deleted /gsd-plan-milestone-gaps in favor of inline gap planning as part
of /gsd-audit-milestone's output. The deletion was propagated through
some surfaces (#2950 covered help/do/settings/discuss-phase/etc.) but
several user-facing surfaces still emitted the old forms:
#3029 — /gsd-code-review-fix references in:
- agents/gsd-code-fixer.md (description, "Spawned by", recovery prose)
- get-shit-done/workflows/code-review.md (offer text)
- get-shit-done/workflows/execute-phase.md (offer text)
- get-shit-done/workflows/code-review-fix.md (internal retry hints)
- docs/INVENTORY.md (agent + workflow rows)
- docs/CONFIGURATION.md (workflow.code_review row)
- docs/USER-GUIDE.md (3 occurrences in walkthrough)
- docs/AGENTS.md (gsd-code-fixer agent stub)
- docs/FEATURES.md (commands list + REQ-REVIEW-04)
All replaced with /gsd-code-review --fix. Internal retry hints in the
workflow file itself updated to point at the new form. Release notes
(docs/RELEASE-*.md) and gsd-ns-review's "absorbed by" deletion note
left unchanged — historical/explanatory content.
#3034 — /gsd-plan-milestone-gaps references in:
- get-shit-done/workflows/audit-milestone.md (<offer_next> blocks for
gaps_found and tech_debt: lines 281, 323)
- commands/gsd/complete-milestone.md (gaps_found pre-flight: lines 46, 57)
Replaced with inline closure path:
/gsd-phase --insert <N> "Close gap: <REQ-ID> ..."
/gsd-discuss-phase <N>
/gsd-plan-phase <N>
/gsd-execute-phase <N>
Plus a Nyquist-coverage hint pointing at /gsd-validate-phase /
/gsd-secure-phase for retroactive audit-chain hygiene gaps. The
gsd-ns-project SKILL.md "deleted by #2790" note is preserved
(it's the canonical pointer for future readers asking what
happened to the command).
Tests:
- tests/bug-3029-3034-stale-command-routes.test.cjs — parser-based
assertions per fixed surface, plus a structural cross-check that
gsd-ns-project keeps the deletion note. 15 tests, all green.
- 6905/6905 full suite passes.
Closes#3029Closes#3034
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix: address CR feedback on PR #3038 — argument order, structural tests, agent count
CR findings on PR #3038:
1. **docs/USER-GUIDE.md (Major)** — `--fix` examples used flag-first form
(`/gsd-code-review --fix 3`), but the supported CLI grammar is
phase-first (`/gsd-code-review 3 --fix`). The original sed-based
replacement preserved the position of the `gsd-code-review-fix`
token, producing the wrong order. Fixed in USER-GUIDE.md (3
occurrences) and the same drift in the workflow surfaces:
- get-shit-done/workflows/code-review-fix.md (2 retry hints)
- get-shit-done/workflows/code-review.md (offer text)
- get-shit-done/workflows/execute-phase.md (offer text)
2. **docs/AGENTS.md (Minor)** — internal count drift: line 483 said
"Ten additional agents" but line 725 said "12 advanced/specialized".
Filesystem reality: 33 agents total, 21 primary, 12 specialized
(count of `### ` stubs in the Advanced and Specialized section).
Updated lines 3, 13, 483 to use 12/33 and added the two missing
names (doc-classifier, doc-synthesizer) to the inline list at
line 13.
3. **tests:94 (Major refactor suggestion)** — `.includes()` token checks
were source-grep style. Refactored to a typed-IR pattern: extract
the SET of slash-command tokens via regex, assert membership on the
parsed Set instead of substring scanning the raw file text. Added
the `allow-test-rule` comment explaining the IR-build vs
IR-assertion split per scripts/lint-no-source-grep.cjs convention.
4. **tests:130 (Major)** — replacement-path assertion was file-wide and
could false-pass on generic mentions of "inline" elsewhere in the
file. Refactored: `extractOfferBlocks(content)` returns the typed
list of `<offer_next>` and "Pre-flight" blocks where the deleted
command previously lived, and the assertion runs against those
blocks specifically. Now requires `/gsd-phase --insert` or
inline-audit prose to appear in the same offer block, not just
somewhere in the file.
15/15 targeted tests pass. 6905/6905 full suite pass. Lints clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* docs: add issue-driven orchestration guide (#2840)
Adds docs/issue-driven-orchestration.md — a recipe for driving GSD from a
GitHub / Linear / Jira issue using existing primitives. Maps Symphony-style
orchestration concepts onto GSD commands without vendoring code, adding a
daemon, or introducing tracker integration.
Concept mapping covers:
- WORKFLOW.md → ROADMAP.md / STATE.md / phase CONTEXT.md / phase PLAN.md
- isolated agent workspace → /gsd-new-workspace --strategy worktree
- agent dispatch → /gsd-manager (interactive), /gsd-autonomous (unattended)
- per-phase steps → /gsd-discuss-phase → /gsd-plan-phase → /gsd-execute-phase
- proof-of-work → /gsd-verify-work (UAT.md persists across /clear)
- adversarial review → /gsd-review (cross-AI peer review)
- human merge gate → /gsd-ship
- follow-up capture → /gsd-note, /gsd-plant-seed, /gsd-new-milestone
End-to-end flow walks through 7 numbered steps from picking the tracker
issue to capturing follow-ups. Safety boundaries (isolated worktrees,
explicit human review, no automatic public posting, verification before
ship) and non-goals (no vendoring, no daemon, no mandatory tracker, no
gate bypass, no command-surface expansion) are spelled out explicitly so
the doc cannot drift into "let's just add one more flag".
Cross-linked from docs/README.md (Documentation Index) and
docs/USER-GUIDE.md (Table of Contents preamble).
Tests: tests/feat-2840-issue-driven-orchestration-guide.test.cjs — 9
structural-IR tests parse the guide into a typed record and assert on
flags (commandsPresent, conceptPairs, nonGoalFlags, safetyFlags,
numberedSteps). Fence-language MD040 check enforced. Cross-link
presence enforced. No raw-text assertions on prose.
6890/6890 tests pass. Lint:tests clean (allow-test-rule comment justifies
the doc-shape parser per scripts/lint-no-source-grep.cjs escape hatch).
Lint:changeset clean.
Closes#2840
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(test): guard USER-GUIDE.md existsSync before read (CR #3036)
CR Minor: cross-linked-from-USER-GUIDE.md test called fs.readFileSync
directly without first asserting fs.existsSync, asymmetric with the
README.md test above. A missing USER-GUIDE.md would throw ENOENT instead
of producing a meaningful assertion message. Mirror the null-guard
pattern.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* feat(hooks): opt-in SessionStart update banner for non-statusline users (#2795)
When a user declines (or keeps a non-GSD) statusline at install time, the
installer now offers an opt-in SessionStart banner that surfaces GSD update
availability. The banner reads the existing
~/.cache/gsd/gsd-update-check.json cache (written by
gsd-check-update-worker.js) and emits a single systemMessage line only when
update_available is true:
GSD update available: <installed> → <latest>. Run /gsd-update.
It is silent when up-to-date and rate-limits "check failed" diagnostics to
once per 24h via a sentinel file so a corrupt cache doesn't nag every
session. Removed cleanly by `npx get-shit-done-cc --uninstall` which strips
both the script and the SessionStart entry. The banner is never offered when
GSD's statusline is being installed (statusline already surfaces update
info, so re-prompting would be noise).
Implementation:
- hooks/gsd-update-banner.js — pure functions buildBannerOutput,
shouldSuppressFailureWarning, readCache; thin main() wires them.
- bin/install.js — handleUpdateBanner() prompt, parseUpdateBannerInput(),
buildUpdateBannerHookEntry(), buildUpdateBannerPromptText(); chained into
installAllRuntimes() so finalize() receives both flags. updateBannerCommand
computed alongside the other JS-hook commands; finishInstall() registers
the SessionStart entry only when shouldInstallBanner === true and the
hook file is present at the target.
- Hook ships in scripts/build-hooks.js HOOKS_TO_COPY, listed in
MANAGED_HOOKS for stale-detection in gsd-check-update-worker.js, in the
uninstall hook-removal lists in install.js, and in the
rewriteLegacyManagedNodeHookCommands allowlist.
Tests:
- tests/feat-2795-update-banner.test.cjs — 22 tests, structural-IR
assertions on parsed JSON envelopes (no raw-text matching). Covers
pure-function branches (cache present/absent, parseError, rate-limit
suppression, missing version fields), end-to-end hook invocation against
fixture cache states, and install.js wiring (prompt text, input parsing,
hook entry shape).
- tests/trae-install.test.cjs — updated install() return-shape assertion to
include updateBannerCommand: null for the no-settings runtime.
- 6881/6881 tests pass.
Docs (bundled in same commit per the bundle-docs-with-code skill):
- docs/USER-GUIDE.md — new "Surface GSD Update Notifications Without GSD's
Statusline" task section with opt-in/opt-out instructions.
- docs/FEATURES.md — REQ-HOOK-08 added; "Update Banner" subsection under
the Hook System feature with cache flow + removal path.
- docs/INVENTORY.md — hook count 11 → 12, new row for gsd-update-banner.js.
- docs/INVENTORY-MANIFEST.json — regenerated.
Closes#2795
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(install): gate banner prompt on actual installability (CR #3035)
CodeRabbit findings on PR #3035:
- bin/install.js (Major): continueAfterStatusline gated banner prompt on
the raw `shouldInstallStatusline` flag from handleStatusline. But
finishInstall later silently skips the statusline write on local
installs unless --force-statusline is set (#2248). Two consequences:
1. Interactive local Claude/Gemini installs got neither a statusline
nor a banner offer.
2. Codex/Cursor/Copilot/Windsurf/Trae/Cline-only installs (where
every result.updateBannerCommand is null) still got prompted even
though the choice was silently ignored.
Fix: derive willInstallStatusline = shouldInstallStatusline &&
(isGlobal || forceStatusline), and gate the banner prompt on a
canInstallBanner precondition computed from results[].updateBannerCommand.
Pass the raw shouldInstallStatusline through to finalize unchanged so
per-runtime statusline gating in finishInstall is unaffected.
- tests/feat-2795-update-banner.test.cjs (Minor): rate-limit suppression
test parsed r1.stdout without first asserting r1.status === 0. Other
e2e tests in this file (lines 210, 241) do this. A non-zero exit would
surface as a cryptic SyntaxError instead of a status assertion failure.
Fix applied verbatim.
6881/6881 tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Closed on the technical merits: the determinism claim is theoretical (no
observed misinterpretation), token waste is small and unmeasured, and PR
#2279's orchestrator-embedding path already serves the deterministic-gating
need without a parallel templating subsystem.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reporter did not return to clarify the actual ask after the narrowing-then-
retraction in the comment thread. Closing as wontfix per .out-of-scope/
temporal-context.md with re-open criteria spelled out.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* docs(#3025): MCP tool schema as a context-budget concern
Adds documentation covering the largest GSD cost lever that GSD
itself does not own: MCP tool schema injection. Every enabled MCP
server adds its schema to every turn (often 20k+ tokens for
heavyweight servers like browser/playwright, mac-tools, etc.),
which can dwarf whatever `model_profile` tuning saves.
Two doc surfaces (per the bundle-docs-with-code skill depth gradient):
1. get-shit-done/references/context-budget.md
- New "MCP Tool Schema Cost (Harness Concern)" section.
- Explains schemas-per-turn cost framing.
- Names enabledMcpjsonServers / disabledMcpjsonServers and
.claude/settings.json explicitly.
- Pre-phase audit checklist: browser/playwright, platform-specific,
cross-project/stale, duplicate/shadow.
- Explicit "GSD does not manage MCP enablement — harness concern"
statement so users don't hunt for a GSD setting.
- Links to Anthropic Claude Code MCP docs as canonical reference.
- Notes compounding interaction with model_profile (additive levers).
2. docs/USER-GUIDE.md
- New task-oriented "Trim MCP servers to reduce per-turn cost"
section above "Using Non-Claude Runtimes".
- Same checklist condensed.
- Cross-link to context-budget.md for the full reference.
Tests:
- tests/feat-3025-mcp-token-budget-docs.test.cjs (12 cases) parses
both docs into typed semantic-flag records and asserts behavioral
invariants (mentions key, includes audit, names harness, etc.)
rather than substring-matching prose. Adheres to CONTRIBUTING.md
no-source-grep — section can be reworded freely as long as the
required semantics survive.
- Markdownlint pre-flight tests (MD040 fence language, MD056 table
column count) per the bundle-docs-with-code skill so CR can't
ratchet on prose nitpicks across multiple review rounds.
Verification:
- 12/12 pass on regression test
- 6857/6857 full suite (12 net new)
- lint-no-source-grep clean (377 test files)
Companion to #3023 (per-phase-type model map) and #3024 (dynamic
routing). Together they cover the three biggest cost levers users
ask about; this issue covers the one GSD does not own.
Closes#3025
* docs(#3025): batch 3 CR fixes — pr id, relative link, named flag
CodeRabbit on PR #3032 (3 minor — 2 inline + 1 nitpick), all in one
push per the bundle-docs-with-code skill (avoid per-round nitpick
ratchet):
1. Inline (Minor) — .changeset/mcp-token-budget-docs.md:3
`pr: TBD` → `pr: 3032` so changeset tooling can link the entry.
2. Inline (Minor) — docs/USER-GUIDE.md:1101
Used a hardcoded `https://github.com/.../blob/main/...` URL for the
cross-link to `context-budget.md`. Rest of USER-GUIDE.md uses
relative links. Switched to `../get-shit-done/references/context-
budget.md#mcp-tool-schema-cost-harness-concern` so feature-branch
work shows the right content and rename-resilience is preserved.
3. Nitpick — tests/feat-3025-mcp-token-budget-docs.test.cjs:234
The cross-link assertion used an inline `/context-budget/i.test(...)`
while every other invariant in the file lived as a named flag in
`parseMcpBudgetSection`. Per CONTRIBUTING.md no-source-grep, added
`crossLinksContextBudget` to the parser and asserted on
`parsed.crossLinksContextBudget` so the cross-link rule sits next
to its siblings.
Verification:
- 12/12 pass on regression test (no count change; refactor only)
- No source code changes, only docs + tests
* test(#3025): strip inline markdown before phrase-match (CR nitpick)
CodeRabbit caught that the `explainsHarnessNotGsd` primary regex
branch couldn't match "GSD does **not** manage" in
context-budget.md because the markdown bold markers (`**`) sit
between contiguous words. The test passed today only via the
fallback `harness (concern|setting|controlled)` branch — the
primary branch was effectively dead code.
Fix: strip inline markdown emphasis (`**`, `*`, `~~`) and inline-
code backticks before any phrase-matching in `parseMcpBudgetSection`.
All seven flag computations now run against the stripped text so
markdown formatting can't silently invalidate any invariant.
Underscores are intentionally NOT stripped — `model_profile` and
other snake_case identifiers must survive intact for the
mentionsModelProfileInteraction check to find them.
Verification: 12/12 still pass; primary branches now fire on
real markdown content rather than relying on fallbacks.
* test(#3025): guard markdownlint tests against null section (CR nitpick)
CodeRabbit caught that the MD040 and MD056 markdownlint pre-flight
tests called `section.match(...)` and `section.split('\n')`
directly on the value returned by `extractSection`, which returns
null when no matching header is found. If the MCP section is ever
removed (regression), both tests would throw `TypeError: Cannot
read properties of null` instead of producing a clean assertion
failure naming the actual problem.
The semantic tests above are protected because parseMcpBudgetSection
short-circuits to a typed-falsy record on null input. The
markdownlint tests bypassed that guard since they need raw section
text, not parsed flags. Added `assert.ok(section, ...)` preconditions
to both so a missing section produces a meaningful failure message.
No content changes; defensive programming only.
Verification: 12/12 still pass.
* feat(#3024): dynamic routing with failure-tier escalation
Adds a `dynamic_routing` block to .planning/config.json that lets
the resolver start agents on a cheap tier and escalate one tier up
when the orchestrator detects a soft failure (verification
inconclusive, plan-check FLAG, etc.). Solves the "pay Opus rates as
insurance" anti-pattern by making escalation observed-quality-driven.
Architecture:
- AGENT_DEFAULT_TIERS map (light/standard/heavy) — every agent in
MODEL_PROFILES declares a default tier; tests assert coverage
so adding a new agent without updating the map fails CI.
- nextTier(currentTier) helper — light → standard → heavy → heavy
(heavy stays at heavy; can't go further).
- resolveModelForTier(cwd, agentType, attempt) — new resolver. The
orchestrator tracks the attempt counter and passes 0 for the
first spawn, 1+ on escalation. The resolver caps internally at
max_escalations so the orchestrator can blindly bump the counter.
- Schema validation: dynamic_routing.enabled / escalate_on_failure /
max_escalations / tier_models.<light|standard|heavy>. Unknown
tiers and unknown sub-keys rejected at config-set time.
- SDK schema mirror updated to keep CJS/SDK in lockstep (#2653).
Resolution precedence (highest → lowest):
1. model_overrides[<agent>] (full IDs accepted)
2. dynamic_routing.tier_models[<tier>] (NEW; escalation-aware)
3. models[<phase_type>] (#3023 phase-type map)
4. model_profile (per-agent column)
5. Runtime default
Backward compatibility: dynamic_routing is disabled by default
(enabled: false or block omitted). resolveModelForTier short-
circuits to resolveModelInternal in that case, so callers can
adopt unconditionally without breaking existing behavior.
This PR delivers the JS-layer infrastructure: schema + tier map +
resolver. Orchestrator adoption (workflow markdown updates that
detect soft failures and call resolveModelForTier with attempt+1)
is incremental follow-up — verifier / plan-checker / integration-
checker each adopt the protocol when ready.
Tests (23 cases, all structural-IR — no stdout grep):
- Schema invariants: AGENT_DEFAULT_TIERS coverage, VALID_AGENT_TIERS
exact match, every assignment uses a valid tier
- nextTier helper: light→standard→heavy→heavy, null on invalid input
- Disabled mode: no block + enabled:false both no-op (back-compat)
- Enabled mode: attempt=0 returns default tier model, attempt=1
escalates, beyond max_escalations caps, heavy agents stay heavy,
default max_escalations=1 when omitted
- Precedence: per-agent override beats dynamic_routing,
dynamic_routing beats phase-type models
- Validation: every settings key accepted, unknown tiers/sub-keys
rejected, bare `dynamic_routing` rejected as config-set target
Documentation:
- get-shit-done/references/model-profiles.md — full reference section
- docs/CONFIGURATION.md — full settings table + escalation flow
- docs/USER-GUIDE.md — task-oriented "Cheap-by-default" section
- docs/FEATURES.md — config row cross-link
Verification:
- 23/23 pass on regression test
- 6843/6843 full suite (23 net new from 6820)
- lint-no-source-grep clean (376 test files)
- SDK schema mirror keeps CJS/SDK in sync per #2653 parity test
Closes#3024
* fix(#3024): honor escalate_on_failure:false + 3 CR follow-ups
CodeRabbit on PR #3031 (4 findings — 1 Major + 2 Minor + 1 Nitpick):
1. **Major (inline)** — get-shit-done/bin/lib/core.cjs:1668
resolveModelForTier ignored dynamic_routing.escalate_on_failure.
When the user set it to false, escalation should be disabled, but
the resolver only checked attempt/max_escalations. An orchestrator
that always passes attempt+1 on retry would silently escalate
despite the user opting out.
Fix: gate effectiveAttempt on `dr.escalate_on_failure !== false`
so false short-circuits every attempt back to the default tier.
2. **Minor (inline)** — docs/CONFIGURATION.md:123-126
The dynamic_routing rows in the Core Settings table had 4 cells
instead of 5 (missing the Options column), breaking the table
structure. Added explicit Options values for enabled / escalate_on_failure
/ max_escalations rows.
3. **Minor (outside-diff)** — references/model-profiles.md:179-195
"Resolution Logic" sketch was pre-#3024 and didn't include
dynamic_routing in the precedence ladder. Updated to a 6-step
block with dynamic_routing at step 3 (between override and
phase-type).
4. **Nitpick** — tests/feat-3024-dynamic-routing.test.cjs:189+
Tests used `if (lightAgent) { ... }` guards that silent-pass
when AGENT_DEFAULT_TIERS drifts. Replaced all 5 conditional
skips with `assert.ok(lightAgent, '...')` preconditions so a
tier-mapping change surfaces as a test failure.
Plus: 2 new regression tests for the Major fix:
- escalate_on_failure:false caps every attempt at default tier
- escalate_on_failure:true (explicit) still escalates normally
Verification:
- 25/25 pass on regression test (23 prior + 2 escalate_on_failure)
- 6845/6845 full suite (2 net new)
- lint-no-source-grep clean
* docs(#3024): align precedence + add fence language tags (CR follow-up)
CodeRabbit (3 minor):
1. docs/CONFIGURATION.md:691 — "Per-Phase-Type Models → Resolution
precedence" was a 4-step block written pre-#3024; readers got
contradictory rules between the per-phase-type section and the
later dynamic_routing section. Updated to the same 5-step ladder
with dynamic_routing at step 2, and noted that dynamic_routing
is disabled by default so this section's behavior is unchanged
when the kill-switch is off.
2. docs/CONFIGURATION.md:770 — escalation-flow code fence missing
language tag (MD040). Added `text`.
3. references/model-profiles.md:184 — resolution-ladder code fence
missing language tag (MD040). Added `text`.
No code changes; docs only. Verification: regression test still 25/25.
* docs(#3024): clarify precedence prose — five layers, not four (CR nitpick)
CodeRabbit nitpick: the "Per-Phase-Type Models → Resolution
precedence" prose said "The four layers compose..." but the ladder
above lists five (including Runtime default). Also "dynamic_routing
escalates per-attempt above all of them" misreads as suggesting
dynamic_routing wins over model_overrides — actually overrides still
win at step 1.
Reworded top-down so the precedence direction is unambiguous:
- model_profile = base
- models = phase-level override
- dynamic_routing = per-attempt escalation
- model_overrides = per-agent exception (top)
- runtime default = fallback
No code changes; docs only.
* docs(#3024): note escalate_on_failure:false in escalation-flow diagram (CR)
CodeRabbit nitpick: the escalation-flow diagram in
docs/CONFIGURATION.md described the soft-failure → respawn →
tier_models[next_tier_up] path, but didn't surface the
`dynamic_routing.escalate_on_failure: false` kill-switch right next
to it. Users reading the flow diagram (which is the canonical place
to understand attempt behavior) wouldn't see that the kill-switch
overrides the soft-failure branch.
Added a one-paragraph note immediately after the flow listing,
before the tier-sequence example, so the kill-switch is visible
exactly where users decide whether escalation will happen.
No code changes; docs only.
* feat(#3023): per-phase-type model map in .planning/config.json
Adds a new `models` block to .planning/config.json with six phase-type
slots (planning / discuss / research / execution / verification /
completion). Lets users express coarse tuning ("Opus for planning,
Sonnet for the rest") without learning the agent taxonomy.
Resolution precedence (highest → lowest):
1. Per-agent `model_overrides[agent]` (full IDs; targeted exception)
2. Phase-type `models[phase_type]` (NEW; tier alias)
3. Profile table (`model_profile`) (per-agent column)
4. Runtime default
The three layers compose: `models` defaults a phase, `model_overrides`
carves an exception. Phase-type values are tier aliases (opus/sonnet/
haiku/inherit) so the runtime-resolution chain (#2517) stays correct
end-to-end without further branching.
Implementation:
- model-profiles.cjs: new AGENT_TO_PHASE_TYPE map + VALID_PHASE_TYPES
set. Each agent in MODEL_PROFILES gets one phase-type assignment;
tests assert coverage so adding a new agent without updating the
table fails CI.
- core.cjs (resolveModelInternal): inserts phase-type tier lookup
between per-agent override and profile-derived tier. Skips runtime
resolution when the resolved tier is 'inherit' (was previously gated
only on profile === 'inherit'; phase-type can now produce inherit
independently).
- core.cjs (loadConfig): pass `parsed.models` through both code paths
so resolveModelInternal can read it.
- config-schema.cjs + sdk/src/query/config-schema.ts: dynamic-pattern
validator accepts only the six known phase-types; unknown slots
rejected at config-set time.
Backward compat: configs without `models` behave exactly as today.
Tests (15 cases, all structural-IR — no stdout grep):
- Schema: AGENT_TO_PHASE_TYPE coverage, VALID_PHASE_TYPES exact match
- Resolver: phase-type alone; per-agent override beats phase-type;
phase-type beats profile; issue's full example; "inherit"; empty
block is no-op; no block is no-op
- Validation: each of the 6 slots accepted; unknown slot rejected;
bare `models` (no slot) rejected
Verification:
- 15/15 pass on new regression test
- 6808/6808 full suite (5 net new), 0 fail
- lint-no-source-grep clean across 375 test files
Closes#3023
* docs(#3023): document `models` per-phase-type config in user-facing docs
Adds `models` block coverage to the three user-facing docs that ship
with each release:
1. docs/CONFIGURATION.md
- New "Per-Phase-Type Models" section between "Per-Agent Overrides"
and "Non-Claude Runtimes" with:
* full example mixing models + model_overrides
* phase-type → agent mapping table
* resolution-precedence pseudocode
* accepted values (tier alias only)
* "When to use which" decision matrix
* validation behavior + example error
- Added `"models": {}` to the Full Schema snippet
- Added a row for `models.<phase_type>` to the config keys table
(next to model_profile_overrides for adjacency)
2. docs/FEATURES.md
- Added a row for models.<phase_type> in the Configurable Settings
table (right under model_profile)
- Cross-link to CONFIGURATION.md for the full surface
3. docs/USER-GUIDE.md
- New task-oriented "Tuning model cost by phase" section above
"Using Non-Claude Runtimes" — leads with the concrete config
and shows the override pattern (one-shot phase + targeted exception)
- Cross-link to CONFIGURATION.md
Verification:
- 29/29 pass on config-schema-docs-parity + docs-update + new feature
test (parity-check passes, so the config-schema entry I added in the
feature commit is now matched by a docs row)
- 6808/6808 full suite pass
- lint-no-source-grep clean
Doc style follows the same pattern used by the existing model_profile,
model_overrides, and model_profile_overrides sections — example-led,
table-backed, cross-referenced. Each doc surfaces the feature at the
right depth (reference / settings table / task guide).
* fix(#3023): mirror phase-type tier in resolveReasoningEffortInternal (CR Major)
CodeRabbit caught a real Codex correctness bug + 3 minor docs/test issues:
1. **Major (outside-diff)** — resolveReasoningEffortInternal in core.cjs
derived its tier exclusively from the profile table, ignoring the
models.<phase_type> override added in #3023. Failure mode on Codex:
Config: model_profile=balanced, models.execution=opus, agent=gsd-executor
resolveModelInternal: tier=opus → gpt-5.4
resolveReasoningEffortInternal: tier=sonnet → reasoning_effort=medium
↑
WRONG — should be xhigh
(opus tier on Codex)
The runtime received a mismatched (model, effort) pair. Mirrored the
phase-type lookup from resolveModelInternal so both functions derive
from the same tier source. 'inherit' phase-type returns null effort
(no runtime entry maps to 'inherit'; let runtime decide).
2. Minor — .changeset/per-phase-type-models.md `pr: TBD` → `pr: 3030`.
3. Minor (outside-diff) — model-profiles.md "Resolution Logic" section
omitted the new phase-type tier. Updated the 4-step block to a 5-step
block including `models[phase_type]` between override and profile,
plus a paragraph noting that `model` and `reasoning_effort` derive
from the same tier source.
4. Nitpick — added 2 typo-safety tests:
- models.research = "haiku3" (typo) → falls through to profile
- models.research = "openai/gpt-5" (full ID) → falls through to profile
Plus 5 new reasoning_effort tests covering the Major fix:
- exported correctly
- phase-type override flips both model AND effort to same tier
- inherit phase-type returns null effort
- per-agent override still bypasses phase-type for effort
- claude runtime ignores models.* (no effort propagation)
Verification:
- 24/24 pass on regression test (15 original + 2 typo-safety + 5 effort + 2 outside-diff related)
- 6815/6815 full suite (7 net new from 6808)
- lint-no-source-grep clean
The reasoning_effort tests are written semantically (phase-type override
must produce the SAME effort as a profile-only opus config) rather than
hard-coding tier-specific effort strings, so changes to the runtime tier
map don't break them.
* fix(#3023): phase-type override beats profile=inherit (CR Major round 2)
CodeRabbit caught another precedence inversion: when
{ model_profile: 'inherit', models: { execution: 'opus' } }
both resolvers short-circuited on `profile === 'inherit'` BEFORE the
phase-type override could be honored. Result: model returned 'inherit'
and reasoning_effort returned null — both contradicting the documented
precedence where models[phase_type] wins over model_profile.
Fix in resolveModelInternal:
- Compute tier from phase-type FIRST. If phase-type is a valid alias,
it wins. Otherwise, fall back to profile-derived tier OR 'inherit'
(when profile === 'inherit').
- Gate the runtime-resolution branch on `tier !== 'inherit'` (was
`profile !== 'inherit'`) so phase-type=opus can flip runtime mapping
on even when profile=inherit.
- Gate the inherit-return on `tier === 'inherit'` (was
`profile === 'inherit'`).
Fix in resolveReasoningEffortInternal:
- Remove the `if (profile === 'inherit') return null;` early-return.
- Compute tier from phase-type first, fall back to profile. If
phase-type is explicitly 'inherit' OR the resolved tier is 'inherit',
return null (no runtime entry maps to inherit).
Tests added (5 new):
- model: phase-type wins over profile=inherit (with explicit opus, with
haiku for one phase + planner-without-slot still inheriting)
- model: profile=inherit + no models block → all agents inherit (no
regression on existing inherit semantics)
- model: profile=inherit + models block but agent has no slot → that
agent inherits, agents with slots get phase-type tier
- effort: phase-type opus + profile=inherit → produces opus-tier
effort, NOT null (the original bug)
Verification:
- 27/27 pass on regression test (24 prior + 3 model + 1 effort)
- 6820/6820 full suite (5 net new)
- lint-no-source-grep clean
The effort test reads the expected value by running a profile-only opus
config and comparing — semantic check, not hard-coded effort string. So
runtime tier map changes don't break the test.
* fix(#3020): probe user shell PATH at install-time, not just process.env.PATH
The installer's "✓ GSD SDK ready" message was a false positive whenever
the install subprocess's process.env.PATH contained the gsd-sdk shim
but the user's later interactive shells did not. Three known sources
of mismatch on POSIX:
- ~/.local/bin: install subprocess inherits npm/npx-injected PATH;
user's login shell may not add ~/.local/bin if .profile/.bashrc/
.zshrc don't.
- nvm/fnm/volta: node version managers shim PATH per-shell, so
`npm prefix -g` from inside the install subprocess can resolve to
a different bin dir than the user's interactive shell sees.
- npm-prefix tooling: some installers inject extra PATH entries that
vanish in fresh sessions.
Result reported on #3011 by @x0rk and @stefanoginella: install prints
✓, but every workflow invocation later fails with
"bash: gsd-sdk: command not found".
Fix:
- isGsdSdkOnPath(pathString?) — now accepts an explicit PATH string.
Zero-arg form preserves existing behavior (reads process.env.PATH).
Pure walk, no spawn. Lets callers verify against any PATH source.
- getUserShellPath() — new helper. Probes the user's login shell via
`$SHELL -lc 'printf %s "$PATH"'` (POSIX). 2-second timeout so a
misconfigured rc file can't hang the install. Returns null on
Windows (cross-shell PATH probing requires a different strategy
per Git Bash / PowerShell / cmd.exe — tracked separately) or when
the probe fails; callers fall back to process.env.PATH in that case.
- installSdkIfNeeded() — after the existing isGsdSdkOnPath() check
passes, also verify the shim is reachable from getUserShellPath()
on POSIX. If install-PATH and user-shell-PATH disagree, downgrade
to the actionable ⚠ diagnostic from PR #3014 (which has the shim
location, shell-specific PATH-update commands, and an npx fallback
note). Routing affected users into PR #3014's diagnostic is the
point — not silently green-then-red.
Tests:
- bug-3020-install-shell-path-probe.test.cjs (10 tests, structural):
- isGsdSdkOnPath accepts an explicit PATH (true/false on fixture
PATH dirs with/without an executable shim)
- zero-arg form returns a boolean
- empty string PATH → false
- getUserShellPath returns string-or-null
- returns null on Windows
- returns null when $SHELL unset on POSIX
- cross-shell mismatch detection: install-PATH and user-PATH that
differ produce different isGsdSdkOnPath results — the invariant
the install-time check now exploits
- All assertions on structural records, not console output. Adheres
to typed-IR / CONTRIBUTING.md "Prohibited: Raw Text Matching".
Verification:
- 10/10 pass on new regression test
- 6768/6768 pass on full suite (5 net-new tests)
- lint-no-source-grep clean
Windows cross-shell coverage (gsd-sdk.cmd resolves under PowerShell
but not Git Bash without a no-extension sibling) is tracked separately
— this PR is the POSIX-side fix and the Windows scaffolding (the
optional pathString arg on isGsdSdkOnPath) that a Windows fix can
build on.
Closes#3020
* fix(#3020): type-guard pathString, last-line PATH parse (CR)
CodeRabbit on PR #3028 (4 findings — 3 actionable + 1 nitpick):
1. .changeset/install-shell-path-probe.md (2 findings):
- `pr: TBD` → `pr: 3028`
- Doc said `echo $PATH` but impl uses `printf %s "$PATH"` (chosen
to avoid shell-dependent echo behavior, e.g. interpreting `-n`).
Aligned changeset prose with implementation.
2. bin/install.js:9176 — isGsdSdkOnPath(pathString) used
`pathString !== undefined` to gate the explicit-PATH branch, but
getUserShellPath() can return null and `null.split()` throws.
Tightened to `typeof pathString === 'string'` so null / number /
object inputs fall back to process.env.PATH. Added 2 regression
tests covering the null and non-string cases.
3. bin/install.js:9232 — getUserShellPath trimmed entire stdout. A
misconfigured rc file that prints a banner / motd / log line
BEFORE the printf would pollute the result and incorrectly flip
the cross-shell check to false. Take the LAST non-empty line
(PATH itself is single-line) so noise can't hijack the probe.
4. Nitpick: the changeset PR placeholder — covered by (1).
Verification: 12/12 pass on regression test (10 original + 2 new
type-guard tests), 6770/6770 full suite, lint clean.
* docs(#3020): JSDoc references printf %s "$PATH", not echo $PATH (CR)
CodeRabbit caught two stale JSDoc references that still said
`$SHELL -lc 'echo $PATH'` while the implementation uses
`$SHELL -lc 'printf %s "$PATH"'`. echo is undesirable here because:
- POSIX echo's behavior with `-n` / backslash escapes varies across
shells (bash builtin vs /bin/echo vs zsh) and can introduce
trailing-newline pollution that the per-line trim now papers over.
- printf is portable and emits exactly the bytes given.
Synced both stale doc strings:
- bin/install.js:9211 (getUserShellPath JSDoc)
- tests/bug-3020-install-shell-path-probe.test.cjs:27 (header)
No behavior change — implementation already uses printf.
* fix(#3018): codex adapter must stop and ask, not silently default decisions
@jon-hendry: running `\$gsd-discuss-phase 81` in Codex Default mode
proceeded toward writing CONTEXT.md / DISCUSSION-LOG.md / checkpoint
artifacts without surfacing the discussion questions to the user. The
generated Codex skill adapter explicitly told it to do that:
Execute mode fallback:
- When `request_user_input` is rejected (Execute mode), present a
plain-text numbered list and pick a reasonable default.
That instruction is wrong for any workflow whose contract is to
discuss with the user (most prominently `$gsd-discuss-phase`). The
fallback now requires the agent to:
1. STOP. Present the questions as a plain-text numbered list, then
wait for the user's reply.
2. Only proceed without a user answer when one of these is true:
(a) invocation included --auto / --all,
(b) user explicitly approved a default for this question, or
(c) workflow's documented contract permits autonomous defaults.
3. Do NOT write CONTEXT.md, DISCUSSION-LOG.md, PLAN.md, or checkpoint
files until the user has answered or one of (a)-(c) above applies.
Tests:
- bug-3018-codex-discuss-fallback.test.cjs (5 tests, structural-IR):
parses the generated header into sections via regex, asserts on
the Execute-mode-fallback section's content (must contain stop/
wait + plain-text directives, must NOT contain "pick a reasonable
default", must name a permission path, must forbid artifact
writing). No raw text snapshot — the assertions describe the
behavioral invariant, so prose can be reworded without test churn.
- codex-config.test.cjs:128 still passes — section still mentions
"Execute mode" as required.
Verification:
- 5/5 pass on new regression test
- 116/116 pass on bug-3018 + codex-config combined
- 6763/6763 pass on full suite
- lint-no-source-grep clean
Closes#3018
* test(#3018): parse fallback into typed semantic-flag record (CR)
CodeRabbit nitpick on PR #3027: the regression tests grepped the
generated header prose with regex, which is brittle and tests wording
rather than semantics. Per CONTRIBUTING.md "no-source-grep" standard.
Refactored to a structural-IR shape:
- New `parseExecuteModeFallback(section)` walks the section text once
and returns a typed record:
{
ok, sectionLength,
instructsStop, // STOP/HALT/WAIT directive
presentsPlainTextQuestions, // plain-text / numbered list
namesPermissionPath, // --auto / --all / explicit approval
forbidsWritingArtifactsBeforeAnswer, // write-ban + named artifact class
silentlyPicksDefaults, // anti-pattern guard (must be false)
}
- Each positive invariant gets its own test asserting on the parsed
boolean, so a failure points at the exact invariant that broke.
- A final test does a single assert.deepStrictEqual against the full
expected contract — gives a structured diff when any flag flips.
- The artifact-write ban now requires BOTH a "do not write" intent
AND a named artifact class (was a single broad regex), so generic
"do not write" prose elsewhere in the section can't satisfy it.
Verification: 8/8 pass; lint-no-source-grep clean.
* fix(#3019): query --help reaches handler instead of short-circuiting to top-level usage
The query argv parser in sdk/src/cli.ts harvested -h/--help as a global
flag and main() short-circuited dispatch when args.help was true. Net
effect: every `gsd-sdk query <anything> --help` printed top-level USAGE
instead of contextual subcommand help. There was no path for users to
discover what arguments a query subcommand accepts — they had to trigger
"required" errors by trial and error.
Two-layer fix:
1. sdk/src/cli.ts (parseCliArgsQueryPermissive)
- Push -h / --help onto queryArgv instead of consuming them silently,
so the registered handler / gsd-tools.cjs fallback gets to interpret
the flag and render contextual help.
- Only honor the global help flag when there is NO real subcommand to
dispatch to (i.e. queryArgv contains only help flags). Preserves
`gsd-sdk query --help` → top-level USAGE while letting
`gsd-sdk query phase add --help` reach the handler.
2. get-shit-done/bin/gsd-tools.cjs
- Render top-level usage on --help / -h / -? / --usage instead of
erroring with "Unknown flag". The discovery hint in the usage text
points users at the working method (run without args → error names
required arguments) and references #3019 for tracking subcommand-
level help printers.
- --version remains rejected (no discovery use-case).
#1818 anti-hallucination invariant preserved: the destructive command
NEVER executes when --help is present. The new shape returns success:true
+ usage on stdout instead of the old success:false + error on stderr —
both satisfy "destructive command did not run", and the new shape also
restores discoverability.
Tests:
- sdk/src/cli.test.ts: 4 new vitest cases covering #3019 — query argv
parser keeps --help with subcommand, parses -h short flag, preserves
bare `query --help` top-level behavior, preserves --help position when
intermixed with other query flags.
- tests/bug-3019-help-passthrough.test.cjs: 5 node:test cases on the
fallback — bare gsd-tools (no args) errors with usage; --help renders
usage on stdout exit 0; -h same; subcommand --help renders usage; usage
hint mentions discovery method (without prose substring matching —
parses into typed sections).
- tests/bug-1818-unknown-flags.test.cjs: rewritten to assert the new
invariant ("destructive command did not run" + "usage was rendered")
instead of the old shape ("--help is rejected with non-zero exit").
Each destructive test seeds a sentinel artifact (phase dir, slug
output) and asserts it survives.
Verification:
- 47/47 vitest pass on sdk/src/cli.test.ts
- 5/5 pass on tests/bug-3019-help-passthrough.test.cjs
- 8/8 pass on tests/bug-1818-unknown-flags.test.cjs (rewritten)
- 6763/6763 pass on full node:test suite
- lint-no-source-grep clean (0 violations)
Closes#3019
* fix(#3019): SDK fallback forwards plain-text help, broader usage list (CR)
CodeRabbit on PR #3026 (4 findings — 1 Major outside-diff, 2 inline,
1 nitpick):
1. **Major outside-diff** — sdk/src/cli.ts:442-454. The fallback path
that delegates to gsd-tools.cjs called parseCliQueryJsonOutput
(JSON.parse) on stdout. Now that gsd-tools renders plain-text usage
on --help, JSON.parse threw "Unexpected token 'U'". Wrapped the
parse in try/catch — on parse failure, forward the plain stdout
verbatim so subcommand help reaches the user. Regression test:
tests/bug-3019-help-passthrough.test.cjs spawns the built SDK and
asserts `gsd-sdk query phase --help` exits 0, stdout contains the
gsd-tools usage, and stderr does NOT contain a JSON-parse error.
2. .changeset/help-passthrough.md:3 — `pr: TBD` → `pr: 3026`.
3. gsd-tools.cjs:346 (TOP_LEVEL_USAGE):
- Removed self-referencing `#3019` link (immediately stale after
this PR merges).
- Expanded Commands list from 17 → all 47 dispatcher cases:
agent-skills, audit-open, audit-uat, check-commit, commit, …
phase, phases, roadmap, milestone, validate, progress, intel,
graphify, learnings, etc. — the bulk of the surface that was
previously unreachable via --help discovery.
4. Nitpick: `isUsageOutput` was duplicated in bug-1818 and
bug-3019-help-passthrough tests. Moved to tests/helpers.cjs with
structural-comment, removed both duplicates.
Verification: 47/47 vitest pass, 14/14 regression tests pass,
6764/6764 full suite, lint clean.
* test(#3019): use t.skip() instead of bare return when SDK not built (CR)
CodeRabbit follow-up on PR #3026:
The integration test guarded against missing sdk/dist/cli.js with a
bare `return;` — node:test counts that as a passing test (0 assertions
exercised, 0 failures). On a CI checkout that hasn't run the SDK build,
the #3026 regression test silently green-lit and no signal ever
surfaced that the integration check was skipped.
Switched to `t.skip(...)` via the test context parameter so the
omission shows up in the test report. The unit-level fix
(sdk/src/cli.ts) is still covered by vitest, so the skip only affects
the end-to-end spawn-built-SDK check.
Verification: 6/6 pass when SDK is built; 5 pass + 1 skip when not.
* fix(#3017): codex SessionStart hook uses absolute node, not bare 'node'
PR #3002fixed#2979 for settings.json-based managed JS hooks (Claude
Code, Gemini, Antigravity) by routing through buildHookCommand() →
resolveNodeRunner(), emitting the absolute Node binary path so hooks
resolve under GUI/minimal-PATH runtimes (/usr/bin:/bin:/usr/sbin:/sbin)
where nvm/Homebrew/Volta-installed node is not on PATH.
The Codex install path bypassed both helpers — line 7935 of bin/install.js
wrote `command = "node ${path}"` directly into config.toml. So Codex
SessionStart hook still failed with exit 127 ("node: command not found")
under the same minimal-PATH conditions PR #3002 was meant to close.
Fix:
- Add buildCodexHookBlock(targetDir, { absoluteRunner, eol }) — a pure
helper that emits the toml hook block with the absolute runner. Returns
null when absoluteRunner is null so the caller skips registration with
a warning instead of writing a broken bare-node hook.
- Add rewriteLegacyCodexHookBlock(content, absoluteRunner) — mirror of
rewriteLegacyManagedNodeHookCommands for the toml surface, so
reinstall migrates a 1.39.x bare-node config.toml to the absolute form.
Uses basename equality (CODEX_MANAGED_HOOK_BASENAMES set) so user-
authored bare-node hooks are left alone.
- Replace the inline string-concat at line 7935 with a call to the new
helper, threaded with the detected line ending so CRLF files stay CRLF.
- On the codex reinstall path, call rewriteLegacyCodexHookBlock first so
existing bare-node entries get migrated before the new entry is added.
Tests:
- bug-3017-codex-hook-absolute-node.test.cjs (9 tests, all typed-IR):
- buildCodexHookBlock emits absolute runner, parses to expected fields
- returns null on missing runner (caller skips)
- integrates with resolveNodeRunner() in the live process
- rewriteLegacyCodexHookBlock migrates managed bare-node entries
- leaves user-authored bare-node hooks alone (basename allowlist)
- leaves entries with absolute runner unchanged (idempotent)
- returns content unchanged when absoluteRunner is null
- codex-config.test.cjs e2e expectation updated to match new shape:
parsed.hooks.SessionStart[0].hooks[0].command now equals
'"<process.execPath>" "<hookPath>"' instead of 'node <hookPath>'.
Verification:
- 9/9 pass on the new regression test
- 179/179 pass across all codex-touching test files
- 6767/6767 pass on full suite, lint-no-source-grep clean
- Adheres to typed-IR / CONTRIBUTING.md "Prohibited: Raw Text Matching":
parseCodexHookBlock returns a typed record; assertions are on
structured fields (runner, hookPath, type, hasMarker), not stdout regex.
Closes#3017
* test(#3017): tighten runner assertions to exact process.execPath (CR)
CodeRabbit on PR #3022 (3 findings, 2 actionable + 1 nitpick):
1. .changeset/codex-bare-node-fix.md:3 — replace `pr: TBD` with
`pr: 3022` so changeset metadata is traceable.
2. tests/bug-3017-codex-hook-absolute-node.test.cjs:81-146 — the test
asserted `parsed.runner !== 'node'` and `parsed.runner.includes('/node')`,
which would false-positive on any absolute path containing '/node'
(e.g. /Users/x/notnode/foo). Tightened to compare against the EXACT
absolute path supplied by the caller (after stripping toml + JSON
escape layers via a new unescapeRunner() helper). The live-process
integration test now compares against process.execPath exactly. The
rewriteLegacyCodexHookBlock test also uses exact-equality.
3. Nitpick (skipped): use repository's TOML parser for parsing instead
of bespoke regex. The hand-rolled parser is small, scoped, and
fully tested by these structural assertions; pulling in a TOML lib
for tests would create a circular dependency on the SUT (the
installer's own parser). Leaving as-is.
Verification: 9/9 pass on regression test, 6767/6767 full suite, lint clean.
* fix(#3010): post-install message and docs use /gsd-update --reapply
PR #2824 consolidated 86 skills into ~58, removing the standalone
/gsd-reapply-patches command and folding it into a flag on /gsd-update
(/gsd-update --reapply). The 1.39.1 hotfix (#2954) updated help.md
but missed three other surfaces that still recommended the dead form:
1. bin/install.js reportLocalPatches() — runtime emitter shown after
every install with backed-up patches. All branches updated:
- claude/opencode/kilo/copilot: /gsd-update --reapply
- gemini: /gsd:update --reapply
- codex: $gsd-update --reapply
- cursor: gsd-update --reapply (mention the skill name)
2. get-shit-done/workflows/update.md — Step 4 prose and the
check_local_patches block both referenced /gsd-reapply-patches.
Replaced with /gsd-update --reapply (with backticks around the
command per CR feedback for copy/paste UX).
3. Localized docs (en/ja-JP/ko-KR/zh-CN) — 14 files across
ARCHITECTURE.md / COMMANDS.md / FEATURES.md / INVENTORY.md /
USER-GUIDE.md / manual-update.md still listed the removed command.
Tests:
- bug-3010-reapply-patches-references.test.cjs (4 tests): scans
bin/install.js's reportLocalPatches body, every workflow file, and
every doc (excluding CHANGELOG history and help.md's deprecation
notice) for the removed command form, and verifies each runtime
branch emits the consolidated form via captured console output.
- tests/copilot-install.test.cjs:1081-1115 — stale assertions that
hard-coded the removed string updated to assert /gsd-update --reapply.
Verification: 115/115 pass across both files.
Co-authored-by: Patrick Clery <patrick@patrickclery.com>
Closes#3010
* test(#3010): broaden dead-command scan + tighten runtime exact-match
CodeRabbit follow-up findings on #3012:
1. Workflow + docs scans only matched "/gsd-reapply-patches", missing
the gemini ("/gsd:reapply-patches") and codex ("$gsd-reapply-patches")
spellings. A regression that re-introduced either form in localized
docs would have passed silently. Extracted a DEAD_COMMAND_PATTERNS
array + findDeadCommands() helper used by both scans, so all three
removed forms are checked uniformly. Match output also reports which
spellings hit, for faster diagnosis.
2. reportLocalPatches runtime test asserted output.includes('update --reapply'),
which is too loose — a malformed prefix like '/gsd:update --reapply' on
the claude branch would have passed. Replaced with an exact
{runtime → expected token} map covering all 7 branches:
claude/opencode/kilo/copilot → /gsd-update --reapply
gemini → /gsd:update --reapply
codex → $gsd-update --reapply
cursor → gsd-update --reapply
Negative assertion also runs DEAD_COMMAND_PATTERNS against output for
every runtime, so dead forms can't slip in regardless of branch.
Verification: 4/4 pass on bug-3010-reapply-patches-references.test.cjs.
* test(#3010): add prefix-absence guard for cursor runtime (CR follow-up)
CodeRabbit (Minor): the cursor expected token "gsd-update --reapply" is
a substring of every prefixed form ("/gsd-update --reapply" for claude/
opencode/kilo/copilot, "\$gsd-update --reapply" for codex). The positive
output.includes(expectedToken) check therefore can't distinguish correct
cursor output from a regression where the installer emits a prefixed
form for cursor — both pass the substring check.
Add an explicit prefix-absence assertion for cursor that fails if any
of /, \$, or : appears immediately before "gsd-update --reapply" in
output. The gemini form ("/gsd:update --reapply") doesn't share the
substring (gsd:update vs gsd-update) so it's already caught by the
positive includes failing on cursor's expected bare token.
Verification: 4/4 pass.
---------
Co-authored-by: Patrick Clery <patrick@patrickclery.com>
* fix(#3011): actionable SDK-not-on-PATH diagnostic with shim location and shell-specific commands
The previous diagnostic was a generic 'GSD SDK files are present but
gsd-sdk is not on your PATH' message with no concrete path or
shell-specific PATH-export command. Windows users reported that they
couldn't find where the shim was written and didn't know how to add
it to PATH for each shell (PowerShell vs cmd.exe vs Git Bash vs WSL
all read PATH from different sources).
New formatSdkPathDiagnostic({ shimDir, platform, runDir }) helper
returns a typed IR:
- shimLocationLine: explicit 'Shim written to: <path>'
- actionLines: platform-specific PATH-export commands
- Windows: 3 lines (PowerShell, cmd.exe, Git Bash with backslash->/
translation for bash compatibility)
- POSIX: 1 line (export PATH=...)
- npxNoteLines: 'you're running via npx ... npm install -g instead'
when runDir is under an _npx cache segment (where the shim may be
written to a temp dir that won't persist for the user's interactive
shell)
- isNpx, isWin32: structured booleans for assertions
Renderer in install.js just emits each line. Tests assert on the
typed IR fields directly (no source-grep, no console-output parsing).
Tests: 12 cases across 5 suites covering Windows shell flavors
(PowerShell preserves backslashes, Git Bash translates to forward),
POSIX exports, null-shimDir fallback to npm install -g advice, npx
detection on both path-separator conventions, and IR shape contract.
Closes#3011
* fix(#3011): cmd.exe guidance uses powershell -Command, not setx
CodeRabbit flagged the cmd.exe action line as a Major Windows
correctness bug:
setx PATH "${shimDir}; %PATH%"
Two failure modes:
1. setx silently truncates the registry value above 1024 chars,
permanently storing the truncated PATH and breaking applications
until restored from the registry backup or fixed manually.
2. %PATH% expands to its current literal value at the moment setx
runs, and the result is written as REG_SZ instead of REG_EXPAND_SZ.
Lazy references like %SystemRoot% are baked in as literals, so
future changes to those variables stop propagating.
Replace with the same SetEnvironmentVariable call already used for
the PowerShell line, invoked through `powershell -Command` so cmd.exe
users get a safe command without us recommending two different APIs.
* fix(#3011): escape shimDir for PowerShell, bash, and POSIX export
CodeRabbit (Minor): a Windows username with a single quote (e.g.
"C:\Users\O'Neil\AppData\Roaming\npm") would interpolate raw into the
suggested commands, producing unparseable shell input the user can't
fix without understanding the bug.
Each shell context needs a different escape:
- PowerShell single-quoted strings: '' is the literal-quote escape.
Apply to both the PowerShell line and the cmd.exe line (which
delegates to PowerShell).
- Git Bash, where the path lives inside an outer single-quoted echo:
'\'' (close-quote, escaped-quote, reopen-quote) embeds a literal
single quote. The slash-conversion (\\ → /) still applies first.
- POSIX export (Linux/macOS) inside double quotes: escape \, $, ",
and backtick so the path is copied verbatim. $PATH lives outside
the escape and still expands at paste time.
Regression test: bug-3011-sdk-path-diagnostic.test.cjs locks in the
expected escape sequence for all three shell flavors.
* test(#2974): migrate 8 test files to typed-IR assertions
Replaces raw stdout/stderr substring matching with structured-field
assertions per CONTRIBUTING.md "Prohibited: Raw Text Matching on Test
Outputs". Adds shared infrastructure for typed error emission so this
pattern is the easy path going forward.
Shared infrastructure:
- core.cjs: ERROR_REASON frozen enum + setJsonErrorMode/getJsonErrorMode
- gsd-tools.cjs: --json-errors CLI flag, parsed before subcommand dispatch
- config.cjs: typed reasons at all 7 error sites
- graphify.cjs: GRAPHIFY_REASON enum + reason/timeout_ms in execGraphify result
- bin/install.js: pure buildSdkFailFastReport() IR builder + renderer
- hooks/gsd-session-state.sh, gsd-phase-boundary.sh: emit Claude Code
hookSpecificOutput JSON envelope with typed state_present/config_mode/
planning_modified/file_path fields (no-op when hooks.community is off)
Test migrations (all pass, 171 tests across the 8 files):
- bug-2649-sdk-fail-fast: assert on ir.reason / ir.context / ir.fix_command
- bug-2687-config-read-warning-parity: assert.equal stderr === ''
- bug-2796-arg-parsing-regression: assert on result.json.updated/.phase
- bug-2838-summary-rescue: parse rescue footer, assert mtime invariant
- bug-2943-config-get-context-window: parse JSON, assert ERROR_REASON.CONFIG_KEY_NOT_FOUND
- graphify: assert reason === GRAPHIFY_REASON.ENOENT/TIMEOUT
- hooks-opt-in: parse hookSpecificOutput, assert typed fields
- security-scan: reclassified as source-text-is-the-product (scan label
output and CI workflow YAML ARE the deployed contract)
Verification: lint-no-source-grep clean (0 violations), full suite
6741/6741 pass.
Closes#2974
* test(#2974): address CR feedback — typed code field, robust idempotency
Two CodeRabbit findings on #3016 addressed:
1. tests/hooks-opt-in.test.cjs:355 (Minor, inline) —
parsed.reason.includes('Conventional Commits') was still substring
matching after the typed-IR migration. Fixed at the source: the
gsd-validate-commit hook now emits a typed `code` field
('CONVENTIONAL_COMMITS_VIOLATION', 'COMMIT_SUBJECT_TOO_LONG')
alongside the human-readable `reason`. Test asserts strictEqual
on the code; the prose copy is no longer part of the test contract.
2. tests/bug-2838-summary-rescue-gitignored-planning.test.cjs:224-250
(Outside-diff) — mtimeMs alone can stay unchanged on coarse-grained
filesystems (HFS+, FAT) when two rewrites land within the same
timestamp tick, falsely passing the idempotency assertion.
Replaced with a full snapshot (mtimeMs, ctimeMs, size, ino, sha256
of contents) compared via assert.deepStrictEqual — the hash
catches any rewrite the timestamp would miss.
Verification: 30/30 pass on the two affected files; lint-no-source-grep
clean (0 violations across 368 test files).
* fix(#2979): emit absolute node path in managed hooks for GUI/minimal-PATH runtimes
Installer-emitted hook commands started with bare 'node' which works
under interactive shells (nvm/Homebrew/Volta on PATH) but fails in
GUI-launched runtimes that start with /usr/bin:/bin:/usr/sbin:/sbin.
Every managed JS hook (gsd-check-update, gsd-statusline, gsd-context-monitor,
gsd-prompt-guard, gsd-read-guard, gsd-read-injection-scanner,
gsd-workflow-guard) failed with /bin/sh: node: command not found —
silently disabling update checks, statusline, and security guards.
Fix: new resolveNodeRunner() helper returns process.execPath (the
absolute path of the Node binary running the installer) forward-slash-
normalized and double-quoted. Used in:
- buildHookCommand() for global installs (.js runner)
- local-install code paths for all 7 managed JS hooks
.sh hooks keep bare 'bash' — /bin/bash is in the POSIX standard PATH
and always resolves under minimal-PATH GUI launches.
Tests: bug-2979-hook-absolute-node.test.cjs parses emitted commands
into { runner, hookPath } records and asserts:
- resolveNodeRunner returns quoted absolute forward-slash node path
- .js hooks emit absolute runner (default and portableHooks modes)
- .sh hooks still emit bare 'bash'
Closes#2979
* chore(#2979): add changeset fragment for PR #3002
* chore(#2979): add changeset fragment for PR #3002
* fix(#2979): resolveNodeRunner returns null on missing execPath; rewrite legacy bare-node managed hooks (CR feedback)
CodeRabbit on PR #3002 caught two issues:
1. resolveNodeRunner fell back to bare 'node' when process.execPath was
empty -- recreating the exact #2979 bug. Now returns null. Callers
(buildHookCommand and the local-install code paths) check for null
and skip registration rather than emit a broken command.
2. The original #2979 fix only updated NEWLY registered hooks. Existing
bare-node managed hook entries from pre-#2979 installs stayed
broken across reinstalls. New rewriteLegacyManagedNodeHookCommands
walks settings.hooks and rewrites any managed-hook entry that starts
with bare 'node ' to use the absolute runner. Filename allowlist
(gsd-check-update.js, gsd-statusline.js, gsd-context-monitor.js,
gsd-prompt-guard.js, gsd-read-guard.js, gsd-read-injection-scanner.js,
gsd-workflow-guard.js) ensures user-authored bare-node hooks are
left untouched.
Tests: bug-2979-hook-absolute-node.test.cjs grows by 8 cases:
- 5 for the migration walker (rewrites managed entries, leaves quoted-
runner entries alone, leaves user-authored entries alone, leaves .sh
entries alone, no-ops on null runner).
- 2 for resolveNodeRunner returning null on empty execPath.
- 1 for buildHookCommand returning null when execPath unavailable.
* chore(#3002): drop direct CHANGELOG.md edit; release entry now lives in .changeset/
The changeset-fragment workflow (#2975) renders fragments into
CHANGELOG.md at release time. Direct edits to [Unreleased] on
each PR caused merge conflicts on every concurrent PR. This commit
restores CHANGELOG.md to match origin/main; the release entry for
this fix is preserved in the .changeset/*.md fragment(s) on this
branch, which the release workflow consolidates.
* fix(#2979): guard hook + statusline pushes against null commands (CR follow-up)
CodeRabbit on PR #3002 found an outside-diff issue: when
resolveNodeRunner() returns null, every dependent *Command becomes
null, but the registration sites still pushed { type: 'command',
command: null } entries onto settings.hooks. The runtime's hook
schema rejects null commands and the failure surfaces as a confusing
parse error.
Fix:
- One unified warning at the top of configureSettings when ANY JS-hook
command resolves null (operator sees the cause once instead of per-hook).
- Each of the 6 managed JS hook registration if-clauses now guards on
the *Command variable being truthy: && updateCheckCommand,
&& contextMonitorCommand, && promptGuardCommand, && readGuardCommand,
&& readInjectionScannerCommand, && workflowGuardCommand.
- Statusline registration adds an else-if (!statuslineCommand) clause
with its own warn before the settings.statusLine write site.
Tests: bug-2979-hook-absolute-node.test.cjs grows by 7 cases
(6 per-hook structural assertions parsing install.js for the
`fs.existsSync(<file>) && <command>` shape, plus 1 statusline
guard-precedes-write test).
* fix(#2979): defense-in-depth validateHookFields before writeSettings (CR)
CodeRabbit on PR #3002 (post-fix-up review): replace source-grep
structural tests with behavioral assertions on the settings object.
The push-site `&& <command>` guards (commit ce696c64) prevent null
commands from being pushed in the first place. As a defense-in-depth
backstop, install.js now runs validateHookFields(settings) right
before writeSettings(); validateHookFields already filters
{type:'command', command: null} entries (line 5884), so even if my
push-site guards ever regress, no null-command entries reach disk.
Tests: replaced the 7 install.js source-grep tests with 8 truly
behavioral tests:
- validateHookFields strips null-command entries for each of the 6
managed JS hook shapes (parameterized by event + matcher)
- validateHookFields drops the entry entirely when all its hooks are
null-command
- validateHookFields preserves agent-type hooks while stripping
null-command sibling hooks in the same entry
These tests exercise the actual function the production code uses,
not its source representation. They survive future refactors of the
registration call sites.
* fix(#2979): tighten managed-hook migration to basename equality (CR)
CodeRabbit on PR #3002 (post-fix-up review): the previous
`trimmed.includes(name)` matcher had a false-positive vector. A
user-authored hook whose path contained a managed filename as a
substring (e.g. /home/me/scripts/wraps-gsd-check-update.js-helper.js)
would be unconditionally rewritten with the GSD runner, replacing
the user's bare `node` with our absolute path -- silently mutating
their hook configuration.
Fix: parse the command into <runner> <script-token> with the
script-token allowed to be quoted (single or double) or bareword.
Extract the path inside quotes, take the basename (handles both
forward and backslash separators on Windows), and match against
MANAGED_HOOK_FILES via Set.has() — exact equality, not substring.
Tests: bug-2979 grows by 4 cases:
- user hook with managed-filename-as-substring is NOT rewritten
- single-quoted path: rewritten correctly
- bareword path: rewritten correctly
- Windows backslash path: basename extraction works
* fix(#2973): /gsd-profile-user writes dev-preferences.md to skills/ not legacy commands/gsd/
v1.39.0's install summary claimed the legacy ~/.claude/commands/gsd/
directory had been removed in favor of skills-only architecture, but
the cmdGenerateDevPreferences writer at profile-output.cjs:781 still
defaulted to the legacy path. Every /gsd-profile-user --refresh
deterministically re-created the legacy directory.
Missed in PR #1540's migration because dev-preferences is a
runtime-generated user artifact, not a GSD-shipped command file.
Fix:
- Writer default: ~/.claude/skills/gsd-dev-preferences/SKILL.md
- profile-user.md Display message + artifact list reference new path
- New migrateLegacyDevPreferencesToSkill(targetDir, saved) installer
helper. Called at all 5 skills-aware install branches. Copies
preserved legacy dev-preferences.md into skills/gsd-dev-preferences/
SKILL.md, but ONLY if no SKILL.md already exists -- never clobbers
user-customized skill content.
Tests: bug-2973-profile-user-skills-path.test.cjs runs the writer in
a subprocess (core.cjs:output uses fs.writeSync(1, ...) which bypasses
in-process stubbing), asserts the writer's command_path field is the
skills location, the file is on disk at that path, the legacy path is
NOT created. Tests for migration helper assert it writes when no skill
exists and skips when one does.
Closes#2973
* chore(#2973): add changeset fragment for PR #3003
* fix(#2973): rephrase comment to avoid cline-install leaked-path lint
The new comment at line 780 of profile-output.cjs literally contained
the string '~/.claude/commands/gsd/' which the cline-install
leaked-path regression test (tests/cline-install.test.cjs:175)
correctly flagged.
Cline transforms .claude/skills/ -> .cline/skills/ in installed .cjs
files but does not transform .claude/commands/. The new comment talks
about the legacy 'commands/gsd' subdirectory without the ~/.claude/
prefix, so the lint passes. The path semantics are unchanged -- the
runtime construction at line 787 still uses path.join(os.homedir(),
'.claude', 'skills', ...) which the lint regex does not match.
* test(#2973): add timeout to spawnSync to prevent CI hangs (CR feedback)
CodeRabbit on PR #3003: without a timeout, a regression that hangs the
writer or dispatcher would block CI indefinitely. Added a 30s timeout
(generous for what should complete in <1s) and an explicit signal
assertion so a timeout trip surfaces as a clear test failure with
context rather than a hung worker.
* test(#2973): add allow-test-rule annotation for legitimate product-text parsing
The new var-binding lint from #2982/#2985 caught readFileSync(...).match()
and readFileSync(...).includes() calls in this test. Both are legitimate
structural assertions against the product workflow markdown, not source-grep:
- match() extracts the path from a structured Display: "..." line and
asserts on the typed path value (same pattern as bug-2470's installer
scanForLeakedPaths regex test).
- includes() asserts the absence of a legacy path literal.
profile-user.md IS the shipped workflow artifact, and its Display: line
IS what the user sees. Per the existing test-rigor convention, this is
the source-text-is-the-product justification category.
Annotated with allow-test-rule citing that category.
* chore(#3003): drop direct CHANGELOG.md edit; release entry now lives in .changeset/
The changeset-fragment workflow (#2975) renders fragments into
CHANGELOG.md at release time. Direct edits to [Unreleased] on
each PR caused merge conflicts on every concurrent PR. This commit
restores CHANGELOG.md to match origin/main; the release entry for
this fix is preserved in the .changeset/*.md fragment(s) on this
branch, which the release workflow consolidates.
* fix(#2973): preserve user-owned gsd-dev-preferences skill across wipe (CR)
CodeRabbit on PR #3003 caught a real bug: copyCommandsAsClaudeSkills()
wipes ALL gsd-* skill directories at the top of every install, then
reinstalls from the package source. Since gsd-dev-preferences is
user-generated (written by /gsd-profile-user --refresh) and NOT
shipped by the npm package, the wipe deletes the user's customized
SKILL.md with nothing to restore from.
Fix: USER_OWNED_SKILLS allow-list in copyCommandsAsClaudeSkills.
Snapshot files under skills/gsd-dev-preferences/ before the wipe,
restore after. Same preserve/restore pattern as PR #1924.
Tests: bug-2973 grows by 2 cases:
- user-customized SKILL.md survives the wipe
- non-user-owned gsd-* skills are still wiped (preservation is opt-in)
* fix(#2990): gsd-code-fixer worktree attaches to a new branch, not the user-checked-out one
The agent's setup_worktree step ran 'git worktree add "$wt" "$branch"'
where $branch was the user's currently-checked-out branch in the main
repo. Git refuses to check out the same branch in two worktrees by
default, so the call failed before any review fix could be applied.
This is the next-layer failure after #2686 (foreground/background race)
and #2839 (transactional cleanup): the isolation strategy was correct
in design, blocked only by git's same-branch protection.
Fix:
- Create a new branch 'gsd-reviewfix/${padded_phase}-$$' from the
current branch tip and attach the worktree to it via
'git worktree add -b "$reviewfix_branch" "$wt" "$branch"'.
- Cleanup tail is now four steps:
1. 'git -C "$main_repo" merge --ff-only "$reviewfix_branch"'
-- captures the agent's commits on the user's branch. --ff-only
fails loudly on divergence (concurrent commits to $branch); the
temp branch is preserved for manual merge.
2. 'git worktree remove "$wt" --force'.
3. 'git -C "$main_repo" branch -D "$reviewfix_branch"' ONLY if
ff-only succeeded.
4. 'rm -f "$sentinel"' last (preserves #2839 transactional ordering).
- Recovery sentinel JSON now records reviewfix_branch alongside
worktree_path so a re-run after interruption cleans both the orphan
worktree and the orphan temp branch.
Regression test: tests/bug-2990-code-fixer-worktree-branch.test.cjs
parses the agent .md into structured 'git worktree add' invocation
records (skipping occurrences inside markdown inline-code or bash
comments -- those are citations of the OLD pattern, not executable)
and asserts the structural invariants on the new pattern.
Closes#2990
* chore(#2990): add changeset fragment for PR #3001
* chore(#2990): add changeset fragment for PR #3001
* fix(#2990): correct main_repo parsing and ff_status capture (CR feedback)
CodeRabbit on PR #3001 caught two real bugs in the cleanup tail:
1. `awk '/^worktree / { print $2 }'` truncates paths containing
spaces. /path/with spaces/repo becomes /path/with. Replaced with
`sub(/^worktree /, ''); print` which strips the prefix and
preserves the full path.
2. `if ! git merge ...; then ff_status=$?` captures the exit of the
`!` operator (always 1 on failure), not the merge command's exit
code. Restructured to `if cmd; then ff_status=0; else ff_status=$?`
so the else-branch captures the real merge exit code.
Tests still pass: bug-2990 structural assertions on the agent .md
content unchanged.
* fix(#2990): recovery extracts reviewfix_branch and deletes orphan branch (CR)
CodeRabbit on PR #3001 found two issues:
1. (Major) Recovery code only extracted worktree_path from the sentinel.
If a prior run died after `git worktree remove` but before
`git branch -D`, the orphan reviewfix branch survived forever. The
sentinel records reviewfix_branch (line 272) and the docs claim
recovery deletes it, but the code didn't.
Fixed: emit BOTH worktree_path and reviewfix_branch from the parser
(newline-separated), capture each into shell vars, and call
`git branch -D "$prior_branch" 2>/dev/null || true` after worktree
removal but before sentinel deletion.
2. (Quick win) The bug-2990 test used regex .test() against the raw
markdown, which would have been satisfied by prose mentioning the
token. Restructured to:
- parseCleanupGitInvocations() returns ordered records with structured
fields (verb, targetsReviewfixBranch, isMergeFfOnly, isBranchDelete)
- assert exactly-one merge --ff-only AND exactly-one branch -D
- assert merge precedes branch-delete in execution order
- parse the sentinel JSON.stringify call to extract field names and
assert reviewfix_branch is among them
Added 2 new tests for the recovery-block invariant: parses the recovery
node -e block and asserts it extracts parsed.reviewfix_branch alongside
parsed.worktree_path; and asserts the recovery shell calls
`git branch -D "$prior_branch"`.
* test(#2990): add allow-test-rule annotation for product-text parsing (CR follow-up)
The lint-tests CI catch flagged md.match() in the new structural-IR
test suite. The .match() calls extract typed fields (cleanup-tail
git invocation records, sentinel JSON field names, recovery-block
node script content) from agents/gsd-code-fixer.md — which IS the
deployed agent product. Asserting on those typed fields tests the
runtime contract, not source code internals.
source-text-is-the-product is the correct classification per the
existing convention (matches thread-session-management.test.cjs and
the others reclassified in PR #2985's CR follow-up).
* chore(#3001): drop direct CHANGELOG.md edit; release entry now lives in .changeset/
The changeset-fragment workflow (#2975) renders fragments into
CHANGELOG.md at release time. Direct edits to [Unreleased] on
each PR caused merge conflicts on every concurrent PR. This commit
restores CHANGELOG.md to match origin/main; the release entry for
this fix is preserved in the .changeset/*.md fragment(s) on this
branch, which the release workflow consolidates.
* fix(#2994): move verify-reapply-patches.cjs to get-shit-done/bin/ so installer ships it
scripts/verify-reapply-patches.cjs (added in #2972 to close the
verified-yes-without-checking gap from #2969) shipped in the npm tarball
but never reached user installs: bin/install.js copies get-shit-done/
recursively but does not copy the top-level scripts/ directory.
Effect: every fresh install hit `Cannot find module …/scripts/verify-reapply-patches.cjs`
on Step 5 of /gsd-reapply-patches. The whole point of moving
verification out of LLM-driven prose into a deterministic script is
undone if the script does not resolve at runtime.
Fix: move the script to get-shit-done/bin/verify-reapply-patches.cjs
(same pattern as gsd-tools.cjs and other runtime bin scripts that the
installer ships) and update reapply-patches.md Step 5 to invoke
${GSD_HOME}/get-shit-done/bin/verify-reapply-patches.cjs.
Tests:
- bug-2969 SCRIPT path updated to the new location
- New bug-2994-verify-reapply-patches-installed-path.test.cjs parses
reapply-patches.md into structured invocation records and asserts
every node ${GSD_HOME}/... reference lives under get-shit-done/
(the installed tree). Catches future regressions where someone moves
a runtime-needed script back to scripts/.
Closes#2994
* chore(#2994): add changeset fragment for PR #3000
* chore(#2994): add changeset fragment for PR #3000
* docs(#2994): update verifier-script-location comment to reflect new path (CR)
CodeRabbit on PR #3000: the parenthetical at line 278 still said the
script ships under scripts/, but this PR moved it to get-shit-done/bin/.
Updated the prose to reference the new location and the installer
target path.
* chore(#3000): drop direct CHANGELOG.md edit; release entry now lives in .changeset/
The changeset-fragment workflow (#2975) renders fragments into
CHANGELOG.md at release time. Direct edits to [Unreleased] on
each PR caused merge conflicts on every concurrent PR. This commit
restores CHANGELOG.md to match origin/main; the release entry for
this fix is preserved in the .changeset/*.md fragment(s) on this
branch, which the release workflow consolidates.
* fix(#2992): deterministic latest-version check — package name is a constant, not LLM choice
The /gsd-update workflow's check_latest_version step was prescribed in
LLM-driven prose: "run `npm view get-shit-done-cc version`". The
executing model could and did shortcut the prescription and invent
npm queries against name-shaped guesses — `@get-shit-done/cli`,
`get-shit-done-cli`, `gsd` — all of which 404 or, worse, return an
unrelated typosquat (the 2016 `get-shit-done` timer package). Same
architectural anti-pattern as #2969 (Hunk Verification Gate where
the LLM filled `verified: yes` without checking).
Implementation built TDD per #2992:
get-shit-done/bin/check-latest-version.cjs
- PACKAGE_NAME = 'get-shit-done-cc' as a module constant; not
parameterised, not exposed for override.
- checkLatestVersion({ spawn? }) returns
{ ok: bool, version?: string, reason: CHECK_REASON.X, detail? }
via a frozen enum: OK / FAIL_NPM_FAILED / FAIL_INVALID_OUTPUT.
- --json mode emits the structured record on stdout for the
workflow to parse via jq.
- Windows-aware: uses { shell: process.platform === 'win32' }
since npm is npm.cmd on Windows (same lesson as #2962).
- Stored under get-shit-done/bin/ (not top-level scripts/) because
that path IS in the user's installed config dir; top-level
scripts/ ships in the npm tarball but is not copied into
~/.claude/ at install time.
tests/bug-2992-check-latest-version.test.cjs
- 7 tests, all assertions on the typed CHECK_REASON enum + the
structured record. Injectable spawn function so no real npm
process is invoked. Covers OK, npm-non-zero, invalid-output,
empty-output, pre-release semver, PACKAGE_NAME constant lock,
enum-shape lock.
get-shit-done/workflows/update.md
- check_latest_version step rewritten to call the script via
`node "${GSD_HOME}/get-shit-done/bin/check-latest-version.cjs"
--json` and parse the structured response with jq. Explicit
"Do NOT run `npm view` or `npm search` directly" guidance
cites #2992 so future contributors understand why.
Closes#2992
* fix(#2992): trailing slash on GSD_HOME default to satisfy bare-path lint
The bug-2470 regression test scans update.md for bare `$HOME/.claude`
references (no trailing slash). The PR added one in the new
check_latest_version step. Fix: trailing slash on the default value
(`${GSD_HOME:-$HOME/.claude/}`). Bash POSIX collapses the resulting
double slash; the lint pattern's negative lookahead is now satisfied.
* fix(#2992): emit GSD_DIR from get_installed_version, use it in check_latest_version
Addresses CodeRabbit feedback: the previous `${GSD_HOME:-$HOME/.claude/}`
fallback hardcoded the Claude runtime path, which silently breaks for
non-Claude runtimes (gemini, codex, opencode, kilo).
Fix:
- get_installed_version now emits a 4th line with the resolved config
dir ($LOCAL_DIR or $GLOBAL_DIR), captured by callers as GSD_DIR.
- check_latest_version uses $GSD_DIR/get-shit-done/bin/check-latest-version.cjs.
Empty GSD_DIR (UNKNOWN scope) skips the version check and falls
through to fresh-install path.
This keeps the package name deterministic (#2992) AND respects the
detected runtime, instead of assuming Claude.
* chore(#2992): add changeset fragment for PR #2993
* chore(#2992): add changeset fragment for PR #2993
* fix(#2992): consolidate LATEST_RESULT parsing inside the GSD_DIR guard
CodeRabbit on PR #2993: the previous structure separated the GSD_DIR
guard from the jq parsing, so when GSD_DIR was empty the parsing block
ran against an unset LATEST_RESULT and produced misleading 'couldn't
check for updates' diagnostics instead of clean 'no_install_detected'.
Move all field assignments inside the conditional so the skip path
seeds LATEST_OK=false, LATEST_VERSION='', LATEST_REASON='no_install_detected',
and LATEST_STATUS=0 atomically.
* fix(#2992): emit GSD_DIR in early-return; add code-block lang and spawnSync timeout (CR)
CodeRabbit on PR #2993 caught three issues:
1. (Major) The early-return path in get_installed_version (PREFERRED_CONFIG_DIR
fast path) only echoed 3 lines, but PR #2993 changed the contract to 4
(GSD_DIR is now line 4). Downstream check_latest_version misread valid
installs as UNKNOWN. Added `echo "$PREFERRED_CONFIG_DIR"` before exit 0.
2. (Minor) Markdown MD040: fenced code block at line 310 was missing a
language identifier. Added ```text.
3. (Quick win) spawnSync('npm view ...') had no timeout, so a hung network
could block /gsd-update indefinitely. Added 15s timeout; on timeout
spawnSync returns with signal !== null and the existing failure path
emits FAIL_NPM_FAILED.
* fix(#3008): kill cross-process race in install-minimal:307 mid-copy test
Old shape compared listTmpStageDirs() snapshots before/after the
mid-copy throw. Under scripts/run-tests.cjs --test-concurrency=4,
tests/install-minimal-all-runtimes.test.cjs runs in a parallel
subprocess and also creates gsd-minimal-skills-* dirs in shared
os.tmpdir(). The parallel process's create/remove activity between
this test's two snapshots caused deterministic failure when timing
aligned -- presented as 'flaky' but is a real race.
CI failure data (PR #2993 run 25238555786):
expected (before): ['gsd-minimal-skills-km1O1O']
actual (after): []
Both processes behaved correctly in isolation. The test was wrong:
it observed a shared filesystem state across processes.
Fix: stub fs.mkdtempSync inside this test to record THIS call's
stage dir path. After the throw, assert fs.existsSync(stagedDir)
=== false. Direct observation of the function's own behavior; no
global tmpdir scan; no parallel-process interference.
Closes#3008
* fix(#2992): distinguish timeout from npm failure; guard empty LATEST_RESULT (CR)
CodeRabbit on PR #2993 (post-fix-up review) caught two improvements:
1. (Low value) check-latest-version.cjs:55-61 — when spawnSync times
out, r.status is null and r.signal is set (e.g. 'SIGTERM'), but
r.stderr is empty. Without the signal-first branch, both timeouts
and genuine npm failures shaped as 'npm exited non-zero' in detail,
making logs ambiguous. Added explicit signal-first branch:
'npm timed out (signal: SIGTERM)'.
2. (Quick win) update.md:284-315 — when node is missing or the script
doesn't exist, LATEST_RESULT is empty. Piping empty to jq parses
without error but leaves LATEST_OK / LATEST_REASON as empty
strings, producing the user-visible diagnostic
'Couldn\'t check for updates (reason: , exit: N)' with a blank
reason. Added an explicit guard that sets LATEST_REASON to
'script_not_found_or_node_unavailable' when LATEST_RESULT is empty,
so operators see a meaningful failure message.
Tests: bug-2992 grows by 2 cases (timeout signal detail + empty
stderr fallback).
* fix(#2997): mask SECRET_CONFIG_KEYS in SDK config-set/get and init responses
The CJS→TS port at sdk/src/query/config-mutation.ts:240,243 and
config-query.ts:122,128,132 dropped the masking layer that secrets.cjs
spec defines for brave_search/firecrawl/exa_search. Result: the SDK
echoed plaintext API keys into machine-readable JSON output (stdout,
transcripts, CI logs).
Adjacent leak in init.ts:673-675 / init.cjs:728-730: the init bundle
passed config.brave_search through raw, leaking the API key whenever
the user had stored one.
Fix:
- New sdk/src/query/secrets.ts ports SECRET_CONFIG_KEYS, isSecretKey,
maskSecret, maskIfSecret. Exact CJS parity (verified by 17 tests
in secrets.test.ts that import secrets.cjs and compare).
- config-set masks value + previousValue in response; on-disk plaintext
intact (key stays usable).
- config-get masks read response. --default flows through unmasked
(user's own input, not stored secret).
- init.ts/init.cjs mask string values only; booleans (availability
flags) pass through unchanged so the typed contract is preserved.
Tests: 17 in secrets.test.ts (including CJS parity), 5 in
config-mutation.test.ts (#2997 block — covers on-disk-preserved,
previousValue masking, short-value, unset, non-secret pass-through),
4 in config-query.test.ts.
Closes#2997
* chore(#2997): add changeset fragment for PR #2999
* chore(#2997): add changeset fragment for PR #2999
* chore(#2999): drop direct CHANGELOG.md edit; release entry now lives in .changeset/
The changeset-fragment workflow (#2975) renders fragments into
CHANGELOG.md at release time. Direct edits to [Unreleased] on
each PR caused merge conflicts on every concurrent PR. This commit
restores CHANGELOG.md to match origin/main; the release entry for
this fix is preserved in the .changeset/*.md fragment(s) on this
branch, which the release workflow consolidates.
Old shape compared listTmpStageDirs() snapshots before/after the
mid-copy throw. Under scripts/run-tests.cjs --test-concurrency=4,
tests/install-minimal-all-runtimes.test.cjs runs in a parallel
subprocess and also creates gsd-minimal-skills-* dirs in shared
os.tmpdir(). The parallel process's create/remove activity between
this test's two snapshots caused deterministic failure when timing
aligned -- presented as 'flaky' but is a real race.
CI failure data (PR #2993 run 25238555786):
expected (before): ['gsd-minimal-skills-km1O1O']
actual (after): []
Both processes behaved correctly in isolation. The test was wrong:
it observed a shared filesystem state across processes.
Fix: stub fs.mkdtempSync inside this test to record THIS call's
stage dir path. After the throw, assert fs.existsSync(stagedDir)
=== false. Direct observation of the function's own behavior; no
global tmpdir scan; no parallel-process interference.
Closes#3008
* fix(#2998): populate gsd-pristine/ from install transform pipeline so verifier has a real baseline
saveLocalPatches declared a pristineDir variable and JSDoc'd 'saves
pristine copies to gsd-pristine/' but no code ever wrote there. Effect:
/gsd-reapply-patches Step 5 verifier (#2972) silently fell back to its
over-broad heuristic ('every significant backup line') -- exactly the
silent-success-on-lost-content failure mode #2969 was designed to
prevent.
Fix: new populatePristineDir({...}) helper runs copyWithPathReplacement
(the install transform pipeline) into a tmp staging dir, then copies out
only the modified-file paths into gsd-pristine/. saveLocalPatches now
accepts a pristineCtx and calls the helper when local patches are
detected. Soft-fails on transform errors (logs warning, continues with
empty pristine -- no worse than pre-fix).
Pristine reflects the about-to-install version's content, which is the
right baseline for 'what would survive without the user's modifications'.
Tests: bug-2998-pristine-dir-populated.test.cjs asserts the helper is
exported, no-ops on empty input, writes one pristine file per source-
existing path, skips ghost paths, and produces deterministic output
(byte-identical across runs -- the property pristine_hashes depends on).
Closes#2998
* chore(#2998): add changeset fragment for PR #3004
* fix(#2998): expand pristine to all manifest install roots; clear stale pristine on populate (CR)
CodeRabbit on PR #3004 caught two issues:
1. populatePristineDir only staged packageSrc/get-shit-done/ but
manifest.files records edits under several install roots (commands/,
agents/, hooks/, skills/, root files like .clinerules). Modified
paths outside get-shit-done/ were silently skipped, leaving the
verifier with no baseline for those edits. Fixed by computing the
set of top-level dirs from the modified set and staging each one
that exists in source. Root-level files (no slash) bypass the
transform pipeline and are copied directly.
2. populatePristineDir did not wipe pre-existing gsd-pristine/ before
populating. A previous run's stale pristine could survive into the
current run's diff baseline. Now wipe before populate AND in the
catch path so soft-failures don't leave half-populated data on disk.
Tests: bug-2998-pristine-dir-populated.test.cjs grows by 2 cases:
- agents/ paths are staged and copied (was silently skipped pre-fix)
- mixed get-shit-done/ + agents/ in same modified list both stage
* feat(#2995): post-install path audit for workflow-invoked scripts
Catches the gap class surfaced by #2994: a workflow references a script
via ${GSD_HOME}/<path> that ships in the npm tarball but is not copied
to the user's config dir at install time. Unit tests don't catch it
because they resolve the script via path.join(__dirname, '..', 'scripts',
…) — the source layout, not the deployed layout.
Implementation built TDD per #2995, vertical slices with structured-IR
assertions:
scripts/audit-workflow-script-paths.cjs
- Pure auditWorkflowScriptPaths({ workflowsDir, repoRoot,
installedPrefixes }) returns { ok, findings: [{ workflow, path,
kind }] } via the AUDIT_FINDING enum.
- Two finding kinds: MISSING_FROM_REPO (typo / file deleted) and
NOT_INSTALLED (#2994 class — first segment outside installed
prefixes).
- Tolerates ${GSD_HOME:-...} default-fallback syntax.
tests/bug-2995-post-install-script-paths.test.cjs
- 9 tests across 3 suites:
• Pure-function pass and per-finding-kind detection (5 tests on
synthetic fixtures).
• Real workflow audit (2 tests asserting the actual repo's
get-shit-done/workflows/ has no NEW gaps and KNOWN_GAPS stays
consistent with audit findings).
• Enum shape lock + extractReferences edge cases.
- All assertions on typed AUDIT_FINDING enum / structured records;
zero raw text matching.
- KNOWN_GAPS is a Set keyed on `workflow|path|kind` strings;
currently contains the #2994 entry. The companion test fails if
a KNOWN_GAPS entry no longer matches a real finding (forces the
allow-list to shrink as gaps fix).
The audit immediately catches #2994's gap on `reapply-patches.md`. The
allow-list contains exactly that entry; new gaps fail CI; #2994's fix
will remove the entry as part of the same PR.
Closes#2995
Refs #2994
* chore(#2995): add changeset fragment for PR #2996
* chore(#2995): add changeset fragment for PR #2996
* fix(#2995): emit both NOT_INSTALLED + MISSING_FROM_REPO; clean up fixture leak (CR)
CodeRabbit on PR #2996 found two issues:
1. (Low value) auditWorkflowScriptPaths short-circuited on NOT_INSTALLED,
masking MISSING_FROM_REPO for the same ref. Removed the `continue` so
both findings emit in one run; added a regression test.
2. (Low value) bug-2995 test created tmpRoot in before() but never wrote
into it; per-fixture mkdtempSync dirs leaked. Rooted fixture repos
under tmpRoot so the after() cleanup actually frees them.
* test(#2986): mutation-killer suite for config-schema.cjs (95 typed assertions)
Stryker measured 4.62% mutation score on config-schema.cjs (6 killed,
124 survived). Surviving mutants documented that existing tests were
exercising paths without verifying outputs.
Adds tests/bug-2986-config-schema-mutation-killers.test.cjs (95 tests,
4 suites) targeting each surviving mutant class:
- M1/M4: parameterized isValidConfigKey(key) === true for every member
of VALID_CONFIG_KEYS. Kills static-key-fast-path mutations
(if (VALID_CONFIG_KEYS.has(...)) return true; -> if (false) return true;)
because no static key matches any DYNAMIC_KEY_PATTERN by design.
- M2: representative dynamic-pattern keys (one per pattern). Each matches
exactly one pattern. Kills .some -> .every mutation: with .every, no
single key matches all patterns -> all dynamic keys would be rejected.
- M3: strictEqual against the literal boolean true/false (not assert.ok
truthy checks). Kills polarity-flip mutations.
- Anchor-tightening: keys that differ from valid by one char beyond the
documented shape (trailing dot-segment, empty agent name, non-enum tier,
etc.). Kills regex-loosening mutations on ^, $, charset boundaries.
Tests assert on typed boolean return values from the lib's public surface.
Zero source-grep, zero raw-text matching.
* chore(#2986): add changeset fragment for PR #3005
* test(#2986): use dynamic-only rep key for features pattern (CR feedback)
CodeRabbit on PR #3005: features.thinking_partner is in the static
VALID_CONFIG_KEYS set, so the static fast-path returns true before
DYNAMIC_KEY_PATTERNS.some() is ever called. A Stryker mutant that
removed only the features entry from DYNAMIC_KEY_PATTERNS would
survive because the test only ever exercised the static path for
that key.
Replaced features.thinking_partner with features.some_dynamic_feature
which is NOT in static keys, so isValidConfigKey must reach the
dynamic path to return true. Added a per-rep invariant that asserts
each representative key is NOT a member of VALID_CONFIG_KEYS,
catching this class of mistake at test time on any future
representative-key change.
The three PR templates still asked contributors to tick `CHANGELOG.md
updated`, contradicting the post-#2978 rule (documented in
CONTRIBUTING.md and enforced by scripts/changeset/lint.cjs) that
`CHANGELOG.md` must not be edited directly.
Each checkbox now references `npm run changeset` with the appropriate
`--type` (Fixed/Changed/Added) and notes the `no-changelog` opt-out
label where applicable, so `gh pr create` users land in the correct
workflow by copy-paste.
Closes#3006
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* feat(#2982): extend no-source-grep lint to catch var-binding readFileSync.includes()
The base lint (scripts/lint-no-source-grep.cjs) only catches
readFileSync(...).<text-method>() chained directly. The much more
common var-binding form escapes it:
const src = fs.readFileSync(p, 'utf8');
// 50 lines later
if (src.includes('foo')) {} // ← still grep, lint missed it
Scan of the test suite found ~141 files using this pattern.
Implementation built TDD per #2982 with structured-IR assertions:
scripts/lint-no-source-grep-extras.cjs
- detectVarBindingViolations(src) — pure detector, two passes:
pass 1 collects vars bound from readFileSync, pass 2 finds any
<var>.<includes|startsWith|endsWith|match|search>( on those vars.
- detectWrappedAssertOkMatch(src) — flags
assert.ok(<expr>.match(...)) which escapes the assert.match rule.
- VIOLATION enum exposes stable codes for tests to assert on.
scripts/lint-no-source-grep.cjs
- Wires the new detectors into the existing per-file check; one
additional violation row per file with the first 3 sample tokens.
tests/bug-2982-lint-var-binding.test.cjs
- 13 tests, all assertions on typed VIOLATION enum / structured
records. Covers all 5 text-match methods, multi-var, no-bind,
string literal (must NOT trigger), wrapped assert.ok(.match),
and assert.match (must NOT double-flag).
Migration backlog (#2974 expanded scope):
- 42 files annotated `// allow-test-rule: source-text-is-the-product`
(legitimate — they read .md/.json/.yml files whose deployed text
IS the product)
- 3 files annotated `// allow-test-rule: pending-migration-to-typed-ir [#2974]`
(read .cjs/.js source — clear migration debt)
- 95 files annotated `pending-migration-to-typed-ir [#2974]` with
`Per-file review may reclassify as source-text-is-the-product
during migration` (mixed — manual review under #2974)
After this lands the lint reports 0 violations on main; new
violations in PRs surface immediately.
Closes#2982
Refs #2974
* test(#2982): fix truncated test name per CR
The label ended with a bare '(' from a copy-paste mishap. Now reads
'does NOT flag .matchAll(...) — matchAll is not match, so
assert.ok(.matchAll(...)) is not flagged'.
* chore(#2982): add changeset fragment for PR #2985
* chore(#2982): add changeset fragment for PR #2985
Both v1.39.0 (stable, tagged 2026-05-01T03:05:33Z) and v1.39.1
(hotfix, tagged 2026-05-01T21:03:54Z) shipped to npm but the
CHANGELOG `[Unreleased]` link still pointed at `v1.38.5...HEAD` and
the entries that landed in v1.39.1 were still un-promoted.
Move the five v1.39.1 hotfix entries (#2917, #2949, #2954, #2962,
#2969) into a new `## [1.39.1] - 2026-05-01` section above
`## [1.38.5]`, with a one-line intro and install snippet matching
the conventions used in earlier dated sections.
Update the `[Unreleased]` link to point at `v1.39.1...HEAD`.
Out of scope (separate cleanup):
- Backfilling a `## [1.39.0]` section. The CHANGELOG never had one;
this PR doesn't make that worse but also doesn't try to invent
release-note text from commit messages.
- The eight v1.39.1 commits without `[Unreleased]` entries
(#2942, #2944, #2924/#2941, #2940, #2947, #2950, #2948, #2957).
These weren't in `[Unreleased]` to begin with; faithful
promotion only moves what was already documented.
- Adding a `docs/RELEASE-v1.39.1.md` file. The `docs/RELEASE-*.md`
pattern in this repo is RC-only; stable patches historically
don't have a counterpart.
The post-v1.39.1 hardening entries (#2980, #2983, #2987 from this
session, plus #2976 which was pre-skipped from the v1.39.1
cherry-pick set after #2980 landed) remain in the new
`[Unreleased]` section — they ship in the next release.
Closes#2989
* feat(#2975): adopt changeset-fragment workflow to eliminate CHANGELOG conflicts
Two PRs that both edit `### Fixed` in CHANGELOG.md always conflict on merge.
Recently bit on #2960/#2972 in the same session — fix-the-conflict-and-rebase
tax. Replace the shared-file model with per-PR fragment files that never
share lines.
Implementation built TDD per #2975, vertical slices with structured-IR
assertions throughout:
scripts/changeset/parse.cjs - fragment text → typed record + frozen
FRAGMENT_ERROR enum (8 tests)
scripts/changeset/render.cjs - fragments → structured IR with
Keep-a-Changelog section ordering
(2 tests)
scripts/changeset/serialize.cjs - IR ↔ markdown round-trip pair
(parse(serialize(ir)) === ir,
3 tests)
scripts/changeset/cli.cjs - file-I/O wrapper with --json mode;
reads .changeset/, folds into
CHANGELOG.md, deletes consumed
fragments. Idempotent. (1 test)
scripts/changeset/lint.cjs - pure verdict (changedFiles, labels)
→ { ok, reason } via LINT_REASON
enum. Honors `no-changelog` label.
(5 tests)
scripts/changeset/new.cjs - fragment scaffolder with random
adjective-noun-noun filename. Tests
assert via parseFragment round-trip.
(3 tests)
Total: 22 tests, all assertions on typed structured fields. No regex on
text, no String#includes on file content. Lint clean across 356 test files.
Supporting:
.changeset/README.md - format spec + workflow docs
.changeset/eager-hawks-rally.md - dogfood fragment for THIS PR (will
be the first thing the new release
tool consumes)
.github/workflows/changeset-required.yml
- CI: every PR runs lint.cjs
package.json - npm run changeset, changelog:render,
lint:changeset
CONTRIBUTING.md - new "CHANGELOG Entries — Drop a
Fragment" section between PR
Guidelines and Testing Standards
Closes#2975
* fix(#2975): address CodeRabbit findings on changeset workflow
7 valid findings (4 Major, 3 Minor); all addressed:
scripts/changeset/parse.cjs
- Preserve fragment body verbatim. Previously body.trim() ate
intentional leading whitespace (code blocks, etc.); now trim() is
used only for the emptiness check, and a single trailing newline
is stripped (the editor-added one) so well-formed fragments
round-trip byte-for-byte. Added a regression test asserting a
code-block-leading body is preserved.
scripts/changeset/cli.cjs
- Validate flag values during argument parsing. parseArgs now returns
{ ok, opts | error }; rejects `--repo` etc. with no following value
or with another flag as the value. main() surfaces the error
message before exiting 2.
- Handle post-write fragment-deletion failures. After CHANGELOG.md
is written, any unlink failure is captured into a structured
deleteFailures list with reason 'fail_fragment_delete'; cmdRender
returns exitCode=1 with the partial-failure detail instead of
leaving the changelog updated and fragments behind (which would
cause double-consumption on rerun).
scripts/changeset/lint.cjs
- Treat CHANGELOG.md as a linted user-facing path. Direct edits to
CHANGELOG.md (the bypass route around the new workflow) now fail
the lint with FAIL_MISSING_FRAGMENT. Added a regression test for
that case.
- Use cp.execFileSync instead of cp.execSync for the git diff call.
Eliminates the shell-interpolation surface on GITHUB_BASE_REF;
git's own arg parser remains the validator.
scripts/changeset/new.cjs
- Atomic fragment creation. existsSync() + writeFileSync was racy
under concurrent invocations. Now writeFileSync uses { flag: 'wx' }
which fails EEXIST on collision; the random-name retry loop
catches EEXIST and re-rolls. Throws explicitly after 16 attempts
rather than silently overwriting.
.changeset/README.md
- Add language tag `md` to the format example fence (markdownlint
MD040).
All 25 changeset tests pass; lint clean (356 test files, 0 violations).
* fix(#2975): sanitize --type and validate flag values in new.cjs (CR fixes)
Two CR findings on scripts/changeset/new.cjs:
1. (Minor) `type` was embedded in frontmatter without sanitization. A
newline in the value (e.g. `--type 'Fixed\ntype: Added'`) would
corrupt the fragment. scaffoldFragment now validates `type` against
the Keep-a-Changelog ALLOWED_TYPES set BEFORE writing — same set
parse.cjs uses on consume. Throws with a typed error referencing
the allowed values; tests cover the newline case + 4 other
non-allowed values.
2. (Minor) `--repo` (and other value-taking flags) without a value
silently set opts.repo to undefined, which produced a cryptic
ERR_INVALID_ARG_TYPE deep inside path.join. parseArgs now mirrors
the cli.cjs convention: returns { ok, opts | error }, validates
that the next token exists and is not itself another flag, and
surfaces a precise "missing value for --repo" message before exit.
Added 3 tests: missing-trailing-value, flag-as-value, well-formed.
29 tests pass across the changeset suite (4 new regression tests).
The `Dry-run publish validation` step ran `npm publish --dry-run` with
no `if:` guard. `npm publish --dry-run` contacts the registry and
exits 1 with "You cannot publish over the previously published
versions" when the target version exists.
The earlier `Detect prior publish (reconciliation mode)` step already
discovers this case and sets steps.prior_publish.outputs.skip_publish=true.
The actual publish step (further down) is gated on that. The
rehearsal step was missing the gate, so any re-run of an
already-published hotfix blew up at the rehearsal before reaching
the reconciliation logic — exactly when an operator is trying to
recover from a later-step failure (merge-back, summary, etc.).
Add `if: ${{ steps.prior_publish.outputs.skip_publish != 'true' }}`
matching the publish step's gate. The rehearsal still runs on first
publishes where it has value.
Trigger: run 25233855236.
Closes#2987
* fix(#2983): classifier exit-code discipline, base-tag staging, drop vestigial merge-back
Three issues surfaced by CodeRabbit's post-merge review of #2981 plus
a production failure on the v1.39.1 release run.
(1) Overloaded classifier exit code
scripts/diff-touches-shipped-paths.cjs reused exit 1 for both the
legitimate "no shipped paths" result and Node's default exit on
uncaught throw, so any classifier failure (corrupt package.json,
EPERM, etc.) was indistinguishable from a normal skip — the workflow's
`if ! ... ; then skip` idiom would silently drop the commit.
Distinct exit codes now:
0 shipped — at least one path is in the npm `files` whitelist
1 not shipped — CI / test / docs / planning only
2 classifier error — workflow MUST fail-fast
uncaughtException + unhandledRejection + try/catch around fs/JSON
parsing all route to exit 2 with stderr context.
(2) Classifier missing at the base tag (CRITICAL)
`Prepare hotfix branch` runs `git checkout -b "$BRANCH" "$BASE_TAG"`
BEFORE the cherry-pick loop, replacing the working tree with the base
tag's contents. Base tags predating #2980 (notably v1.39.0, the most
likely next hotfix base) don't have scripts/diff-touches-shipped-paths.cjs
at all — `node <missing>` exits non-zero — `if !` skips every commit —
empty hotfix branch published. Strictly worse than the original #2980
push-rejection, which at least failed loudly.
Stage the classifier from the dispatched ref's working tree into
$RUNNER_TEMP at the top of the run script (before any working-tree-
mutating git command). The cherry-pick loop now references $CLASSIFIER
(staged) instead of the in-tree path. Sanity guards: refuse to start
if scripts/diff-touches-shipped-paths.cjs is missing in the dispatched
ref, refuse to proceed if cp didn't materialize $CLASSIFIER.
The cherry-pick loop captures node's exit via ${PIPESTATUS[1]} and
dispatches via explicit case:
0 proceed with cherry-pick
1 skip into NON_SHIPPED_SKIPPED
* emit ::error:: + exit "$CLASSIFIER_RC"
(3) Drop the merge-back PR step
Auto-cherry-pick only picks commits already on main (`git cherry HEAD
origin/main` outputs the unmerged ones; we filter fix:/chore: from
main). By construction every code commit on the hotfix branch is
already on main. The only hotfix-branch-only commit is `chore: bump
version to X.Y.Z for hotfix`, which either no-ops against main or
rewinds main's in-progress version. The merge-back PR was vestigial.
It also failed in production on run 25232968975 with `GitHub Actions
is not permitted to create or approve pull requests (createPullRequest)`
— org policy blocks PR creation from the workflow's GH_TOKEN. Even
without that block, the PR would have nothing useful to merge.
Step removed. The `pull-requests: write` permission granted solely
for the merge-back step has been dropped from the release job
(least-privilege).
Regression coverage
tests/bug-2983-classifier-exit-codes-and-base-tag-staging.test.cjs
adds 12 assertions across two describe blocks:
- 5 classifier behavioral: exit 0/1 preserved, exit 2 on missing
package.json, exit 2 on malformed JSON, exit-code constants
exported.
- 7 workflow contract: classifier staged before checkout, target
is $RUNNER_TEMP, missing-source guard, missing-staged guard,
PIPESTATUS-based dispatch, error branch fails workflow, loop uses
staged path (not in-tree).
tests/bug-2980-hotfix-only-picks-shipping-changes.test.cjs updated
where it asserted the pre-#2983 `if ! ... ; then` shape: now accepts
the post-#2983 case-dispatch form. The test still proves the
classifier participates; bug-2983 enforces the specific shape.
Run summary references for the curious reviewer:
- Run 25232010071 — original #2980 trigger (workflow-file push
rejection)
- Run 25232968975 — failed merge-back step that prompted the
"is this even useful?" question that drove the removal
Closes#2983
* fix(#2983): address CodeRabbit findings on PR #2984
Two findings, both real, both fixed.
(1) [Critical] PIPESTATUS capture clobbered by `|| true`
Pre-fix shape:
git diff-tree ... | node "$CLASSIFIER" || true
CLASSIFIER_RC="${PIPESTATUS[1]}"
When the classifier exits 1 ("not shipped" — common case) or 2
(error), `|| true` triggers the right-hand side. `true` is a
one-command "pipeline" that overwrites PIPESTATUS to (0).
${PIPESTATUS[1]} on the next line is therefore unset (or stale
under set -u). The case dispatch then matched the empty string —
falling into `*)` and failing the workflow on every non-shipped
commit, OR matching `0)` after some shells default-init unset
to 0 and silently picking commits that don't ship.
Local repro confirms the issue:
$ bash -c 'set -euo pipefail; false | sh -c "exit 7" || true; \
echo "PIPESTATUS: ${PIPESTATUS[*]}"; \
echo "[1]: ${PIPESTATUS[1]:-<unset>}"'
PIPESTATUS: 0
[1]: <unset>
Fix: bracket the pipeline in `set +e`/`set -e`, snapshot
PIPESTATUS into a local array on the very next line, then
dispatch on the snapshot:
set +e
git diff-tree ... | node "$CLASSIFIER"
PIPE_RC=("${PIPESTATUS[@]}")
set -e
DIFFTREE_RC="${PIPE_RC[0]}"
CLASSIFIER_RC="${PIPE_RC[1]}"
The snapshot must happen on the first line after the pipeline;
any intervening simple command resets PIPESTATUS. The array form
is invariant against that.
Bonus from the new shape: $DIFFTREE_RC is now also captured.
git diff-tree is unlikely to fail on a known-good $SHA, but if
it does, we no longer feed partial/empty input to the classifier
and call it "not shipped." A non-zero DIFFTREE_RC emits
::error::git diff-tree failed and exits.
(2) [Minor] Stale "Merge-back PR opened against main" summary line
The hotfix run summary still printed:
echo "- Merge-back PR opened against main"
But the merge-back step itself was removed in the previous commit
on this branch. Operators reading the summary would expect a PR
that doesn't exist. Replaced with explicit non-action text:
echo "- No merge-back PR (auto-picked commits are already on main)"
Test coverage
bug-2983 test file gains 3 assertions:
- PIPE_RC array-snapshot pattern is required (regex matches the
exact `PIPE_RC=("${PIPESTATUS[@]}")` form).
- The `pipeline || true; ${PIPESTATUS[1]}` antipattern is
explicitly forbidden via assert.doesNotMatch.
- DIFFTREE_RC is captured from PIPE_RC[0] and a non-zero value
triggers ::error::git diff-tree failed.
- Run summary forbids `Merge-back PR opened against main` and
requires the new non-action sentence.
bug-2964 test's loop-anchor window bumped 6 KB → 8 KB to
accommodate the additional pre-pick scaffolding (the test's own
comment had already anticipated this kind of growth, citing prior
precedents from #2970 and #2980).
Mark CodeRabbit comments resolved post-commit.
Refs CR finding ids 3175253571, 3175253578 on PR #2984.
* fix(#2980): pre-skip workflow-file cherry-picks in release-sdk hotfix loop
The default GITHUB_TOKEN issued to the release-sdk run lacks the
`workflow` scope, so the prepare job's `git push origin "$BRANCH"` is
rejected by GitHub when any cherry-picked commit modifies a file under
`.github/workflows/`:
! [remote rejected] hotfix/X.YY.Z -> hotfix/X.YY.Z
(refusing to allow a GitHub App to create or update workflow ...
without `workflows` permission)
Pre-#2980 behavior: the auto_cherry_pick loop happily picked
workflow-file commits, then the trailing push exploded with no clear
signal which commit was the culprit. v1.39.1 hit this on PR #2977
(run 25232010071) — earlier release-sdk fixes (#2965, #2967, #2970)
had been skipped on conflict so their workflow-file changes never
reached the push step, masking the bug; #2977 was the first
workflow-file commit to apply cleanly and the push immediately
exploded.
Fix: pre-pick guard in the cherry-pick loop. Inspect each candidate
commit's file list via `git diff-tree --no-commit-id --name-only -r`
BEFORE attempting the pick. If any path matches `^\.github/workflows/`,
skip the commit, emit a `::warning::` annotation naming the dropped
commit, and append to a new `WORKFLOW_SKIPPED` bucket. The run summary
surfaces this bucket in its own section, distinct from `CONFLICT_SKIPPED`
(real merge conflicts) and `POLICY_SKIPPED` (feat/refactor exclusions),
so operators reviewing the run never confuse the remediation paths.
The loud-warning piece is non-negotiable: silent drops were explicitly
rejected as a failure mode during the option-1/2/3 tradeoff discussion.
If a workflow-file fix genuinely needs to ship in a hotfix, the
operator applies it manually on the hotfix branch using a token with
`workflow` scope, or lands it on main and re-cuts the release.
Regression covered by tests/bug-2980-skip-workflow-file-cherrypicks.test.cjs
(5 assertions: pre-pick guard exists, uses `git diff-tree`, emits
`::warning::`, lands in dedicated bucket, surfaces in summary).
The bug-2964 test's 4 KB window after the cherry-pick-loop anchor was
nudged to 6 KB to accommodate the new pre-pick scaffolding — the test's
own comment had already anticipated this kind of growth (citing #2970's
merge-commit pre-skip as prior precedent).
Closes#2980
* refactor(#2980): replace workflow-file pre-skip with shipped-paths filter
The previous commit on this branch caught only the .github/workflows/*
subset of the bug, treating the symptom (push rejection on workflow-file
changes) rather than the root cause (the fix:/chore: filter is too broad
— it picks any commit with that conventional-commit type even when the
diff cannot affect the published npm package).
CI-only fixes (release-sdk.yml itself, hotfix tooling, test-only
commits) shouldn't flow through hotfix runs at all — they cannot change
what `npm install get-shit-done-cc@X.YY.Z` produces. The
.github/workflows/* push rejection is just the loudest of these
"shouldn't have been picked" cases; tests/, docs/, .planning/ commits
get picked silently with the same lack of effect on consumers.
Replace the workflow-file pre-skip with a shipped-paths filter:
- New scripts/diff-touches-shipped-paths.cjs reads package.json `files`,
plus package.json itself (always-shipped per `npm pack` semantics),
and exits 0 iff any input path is in the shipped set. Lockfile is
not shipped (npm pack excludes it unless explicitly in `files`).
- Workflow loop now pipes `git diff-tree --no-commit-id --name-only -r`
through the classifier; on exit 1 the commit is skipped and
appended to a new NON_SHIPPED_SKIPPED bucket (replaces
WORKFLOW_SKIPPED).
- Run summary surfaces NON_SHIPPED_SKIPPED as informational — no
::warning:: annotation. A non-shipping commit cannot affect the
package, so a yellow alert would imply remediation is possible
and would mislead operators.
The classifier in a separate .cjs file (rather than inline bash
heredoc) is so its rules — directory-prefix vs exact-match,
package.json-always-shipped, lockfile-not-shipped — are unit-testable
in tests/bug-2980-hotfix-only-picks-shipping-changes.test.cjs (11 new
assertions: 4 static workflow + 6 classifier behavioral + 1 mixed-
diff edge case).
Why this dissolves the original push-rejection bug: workflow files
aren't in `files`, so workflow-only commits are skipped pre-pick.
The push step never sees them.
If a workflow-file fix genuinely needs to ship in a hotfix release
(extremely rare — the hotfix workflow is read from main's ref, not
the hotfix branch's), the operator applies it manually using a token
with `workflow` scope. The pre-skip puts that requirement in the run
summary explicitly.
Closes#2980
The release job's "Bump in-tree version (not committed)" step ran
`npm version "$VERSION" --no-git-tag-version` without --allow-same-version,
so on real hotfix runs it failed with `npm error Version not changed` —
because the prepare job had already committed the bump on the hotfix
branch (the release job checks out BRANCH on real runs vs BASE_TAG on
dry-runs, which is why dry-run never caught it).
Pass --allow-same-version to both bumps, matching release.yml:326.
Closes#2976
* fix(#2969): deterministic Step 5 verification gate for /gsd-reapply-patches
The prior Step 5 "Hunk Verification Gate" was prescribed correctly in the
workflow text — but executed laxly by the LLM, which filled in `verified: yes`
without actually checking content presence. The reporter observed three
distinct files (skills/gsd-discuss-phase/SKILL.md, skills/gsd-autonomous/
SKILL.md, get-shit-done/workflows/new-project.md) where archives contained
substantive user-added blocks that did not survive into the merged result, yet
the gate reported clean.
Move verification from LLM-driven prose into a deterministic Node script the
workflow calls. The script can't be shortcut.
Changes:
- scripts/verify-reapply-patches.cjs (new): pure Node, no external deps.
For each file in the patches dir, computes user-added significant lines as
the line-set diff between backup and pristine baseline (when available;
falls back to "every significant backup line" when no pristine — over-broad
but the safe direction for this bug class). Asserts each line appears
literally in the merged installed file via String.prototype.includes.
Filters trivial lines (length < 12 chars, pure punctuation, decorative
comments) so harmless drift doesn't trigger false failures. Exits 0 on
pass, 1 on any miss with per-file diagnostic, 2 on usage error.
Supports --json for workflow consumption.
- get-shit-done/workflows/reapply-patches.md: rewrite Step 5 to call the
script and parse its JSON output. The Step 4 Hunk Verification Table
remains as advisory Claude-readable summary, but the gate is now the
script's exit code.
- tests/bug-2969-verify-reapply-patches.test.cjs (new): 6 tests covering
(a) pass when every line survives, (b) fail when a line is missing,
(c) fail when the merged file is deleted entirely, (d) --json structured
report shape, (e) backup-meta.json is correctly skipped as metadata,
(f) no-pristine-dir fallback exercises the safe over-broad path. All pass.
Out of scope: the manifest-baseline tightening described in #2969 Failure 1
(saveLocalPatches comparing against the wrong baseline so prior silent wipes
poison subsequent updates). That's a separate, bigger architectural change
involving pristine-content infrastructure; this PR addresses the gate fidelity
half so users at least see the diagnostic when content goes missing.
Closes#2969 (partial — Failure 2 only)
* fix(#2969): preserve #1999 Hunk Verification Table assertions alongside new script gate
CI failure on PR #2972 surfaced that tests/reapply-patches.test.cjs (the
#1999 contract) asserts Step 5 references:
- "Hunk Verification Table"
- `verified: no` failure condition
- explicit STOP/halt/abort directive
- "table absent / missing" halt path
My initial Step 5 rewrite for #2969 substituted the deterministic script
for the table-based gate entirely, stripping those references. The script
is the strictly stronger gate, but the existing #1999 test enforces the
table-based safety net as a defense-in-depth contract.
Restore both gates as a layered Step 5:
- 5a (binding): deterministic verifier script — script gate, exits
non-zero on any miss, cannot be shortcut by the LLM
- 5b (advisory): Hunk Verification Table review — preserved as
redundant safety net for the case where the script has a bug or the
pristine baseline is unavailable
Both gates must pass. Verified: tests/reapply-patches.test.cjs (5 tests
in the #1999 suite) and tests/bug-2969-verify-reapply-patches.test.cjs
(6 tests in the #2969 suite) all pass — 21/21 total in this fixture.
* fix(#2969): address CodeRabbit findings on workflow + script
Five CR findings on PR #2972, all valid; addressed in this commit:
1. (Major) Stderr was merged into VERIFY_OUTPUT via `2>&1`, so any Node
warning, deprecation notice, or stack trace would corrupt the JSON
parse downstream. Capture stdout only; stderr remains on the
controlling terminal for operator visibility.
2. (Major) verifyFile() crashed with EISDIR/EACCES instead of producing
a structured diagnostic when the installed path was a directory or
unreadable. Wrap statSync/readFileSync in try/catch and emit a
per-file fail row; the whole-run gate continues with structured
output. Added test case asserting the directory-at-installed-path
case fails with `not a regular file` diagnostic instead of crashing.
3. (Minor) PRISTINE_FLAG built as a single string + unquoted expansion
would split paths with spaces. Switched to a bash array (VERIFY_ARGS)
that preserves whitespace through expansion.
4. (Minor) Fenced code block missing language tag (markdownlint MD040).
Added `text` tag to the error message block.
5. (Minor) Usage comment said pristine fallback was "backup-meta lookup"
but the actual code path falls back to significant-line checks from
backup content. Corrected the comment to match implementation.
Verified all 21 tests in tests/reapply-patches.test.cjs (#1999 contract)
+ tests/bug-2969-verify-reapply-patches.test.cjs (now 7 tests with the
new directory case) pass.
* test(#2969): structured JSON assertions, no substring matching on script output
Replace every assert.match(r.stdout, /pattern/) call with structured
assertions on the parsed JSON report from the script's own --json mode.
The script's --json contract IS the structured shape we test against —
the test author should never depend on the human-readable formatter
output, just as no test should depend on substring presence in source.
Changes:
- All 7 tests now run the verifier with --json (via a runVerifier()
helper) and parse the resulting JSON document into { status, report,
stderr }. Diagnostic stderr is preserved as a separate channel for
debug output but is not used for assertions.
- Each previously substring-matched diagnostic ("Failures: 1",
"not a regular file", "installed file missing after merge",
file path, dropped line) is now a deepEqual / equal / Array.includes
against typed report fields: report.failures, report.results[i].status,
report.results[i].reason, report.results[i].file,
report.results[i].missing[].
- Added an explicit "documented shape" test asserting the JSON output
has exactly the keys { file, missing, reason, status } per result —
locks the public contract of the --json mode.
- DRY'd up fixture reset into a resetFixture() helper since every test
starts with a fresh patches/installed/pristine triple.
Linter: scripts/lint-no-source-grep.cjs reports 0 violations across 348
test files. Combined run of bug-2969-...test.cjs (7 tests) +
reapply-patches.test.cjs (5 tests in the #1999 suite) all pass —
22/22 in the relevant fixture.
* fix(#2969): typed REASON enum + raw-text-matching rule shipped repo-wide
This commit closes the loop on the no-source-grep discipline:
1. scripts/verify-reapply-patches.cjs:
- Frozen REASON enum exposes the diagnostic surface as stable codes:
OK_NO_USER_LINES_VS_PRISTINE, OK_NO_SIGNIFICANT_BACKUP_LINES,
FAIL_INSTALLED_MISSING, FAIL_INSTALLED_NOT_REGULAR_FILE,
FAIL_READ_ERROR, FAIL_USER_LINES_MISSING.
- Each result.reason is now a code from this enum, not free text.
Tests assert via REASON.X equality, not regex on prose.
- REASON exported from module.exports.
2. tests/bug-2969-verify-reapply-patches.test.cjs:
- Full rewrite. Every assertion on typed structured fields:
report.results[0].status === 'fail',
report.results[0].reason === REASON.FAIL_INSTALLED_NOT_REGULAR_FILE,
report.results[0].missing.includes(droppedLine) (Array set membership,
not String substring).
- Locks the REASON enum surface via Object.keys(REASON).sort() deepEqual.
- Locks the JSON report shape via Object.keys(report).sort() deepEqual.
- Zero regex, zero String#includes, zero startsWith/endsWith on text.
3. CONTRIBUTING.md:
- New section "Prohibited: Raw Text Matching on Test Outputs" with
concrete BAD/GOOD examples (substring on file content; assert.match
on stdout; "structured parser" hiding string ops; regex on free-form
reason fields).
- The rule statement: "Tests assert on typed structured values. If
the code under test produces text, the code under test must also
expose a structured intermediate representation, and the test must
assert on that IR — never on the rendered text."
- Required structured-surface table: file IR, --json mode, frozen
enum, fs facts.
- "Hiding grep behind a function is still grep" callout — the
parser-wrapper anti-pattern.
- New `pre-existing-text-matching` exemption category for the 8
grandfathered files. Marked Transitional; new tests cannot use it.
4. scripts/lint-no-source-grep.cjs:
- Three new patterns enforced (in addition to the existing .cjs-source
readFileSync rule):
- assert.match/doesNotMatch on .stdout/.stderr
- .stdout/.stderr.<includes|startsWith|endsWith>(
- readFileSync(...).<includes|startsWith|endsWith>(
- Aggregated violations per file (multiple findings now report together).
- Updated diagnostic message references both CONTRIBUTING.md sections.
5. 8 pre-existing tests annotated with `// allow-test-rule:
pre-existing-text-matching` so the lint passes on this commit; each
carries the prose "Tracked for migration to typed-IR assertions; do
not copy this pattern." Files: bug-2649, bug-2687, bug-2796, bug-2838,
bug-2943, graphify, hooks-opt-in, security-scan.
Verification: lint 0 violations across 348 test files; full suite passes.
* fix(#2969): rename exemption category to pending-migration-to-typed-ir + cite tracking issue
Per maintainer feedback:
1. "Grandfathered" / "legacy" framing is wrong — both terms imply
permanent or condoned exemption. The 8 files are tracked for
correction, not exempted.
2. Each annotated file must cite the tracking issue so the migration
work is auditable.
Changes:
- CONTRIBUTING.md: rename exemption category from
`pre-existing-text-matching` to `pending-migration-to-typed-ir`. Update
prose to "Tracked for correction, not exempted" and require each
annotation to cite the open migration issue (e.g.
`// allow-test-rule: pending-migration-to-typed-ir [#NNNN]`).
- 8 test files: update annotation to cite #2974 (the tracking issue
opened for migrating these files to typed-IR assertions).
* fix(#2962): write npm-style gsd-sdk shim on Windows under --sdk install
trySelfLinkGsdSdk previously contained `if (process.platform === 'win32')
return null;` — a missed gap from #2775's POSIX self-link rather than an
intentional design choice. As a result, `npx get-shit-done-cc@latest
--claude --global --sdk` on Windows left `gsd-sdk` off PATH despite the
installer reporting success, and the obvious recovery (`npm i -g
@gsd-build/sdk`) lands the stale 0.1.0 publication that lacks the `query`
subcommand the agents call ~40 times.
This PR addresses the shim half. The npm-publish half (publishing
@gsd-build/sdk at parity with the get-shit-done-cc version) requires
maintainer credentials and is left for separate action.
Changes:
- bin/install.js: replace the unconditional Windows return-null with
dispatch to a new trySelfLinkGsdSdkWindows() that:
* resolves npm's global bin via `execFileSync('npm', ['prefix', '-g'])`
(no shell interpolation; npm is the only PATH-resolved binary)
* verifies write access with a probe before producing partial state
* writes the standard npm shim triple to npm's global bin:
- gsd-sdk.cmd (cmd.exe; CRLF line endings)
- gsd-sdk.ps1 (PowerShell)
- gsd-sdk (Bash wrapper for Cygwin/MSYS/Git-Bash)
* each shim invokes `node "<absolute path to bin/gsd-sdk.js>"` with the
passed args, decoupling shim location from SDK location — same logical
structure as the POSIX wrapper-via-require() fallback above
* unlinks any stale shims before writing so prior installs don't pin
callers to a now-absent path
* returns the .cmd path on success (handle the existing onPath check
looks for) or null on any failure, falling through to the existing
"gsd-sdk is not on your PATH" warning at line 8704
- tests/bug-2962-windows-sdk-shim.test.cjs (new): 5 tests exercising
trySelfLinkGsdSdkWindows directly with cp.execFileSync mocked to redirect
npm prefix to a temp dir. Asserts shim contents reference the absolute
path, .cmd uses CRLF, stale shims are replaced not appended, and null is
returned when `npm prefix -g` fails.
- tests/no-unconditional-win32-skip.test.cjs (new): regression guard
that fails CI if any future commit re-introduces
`if (process.platform === 'win32') return null;` (or similar
skip-only branches) in bin/install.js. Negative test verified by
transiently re-introducing the bad pattern → guard fired → restored
→ passes.
Out of scope: publishing @gsd-build/sdk@<current> to npm so the natural
`npm i -g @gsd-build/sdk` recovery also lands a usable SDK. That requires
maintainer credentials and is the second half of the issue.
Closes#2962
* fix(#2962): address CodeRabbit findings — execSync for npm.cmd, behavior-based regression guard
CR finding 1 (🟠 Major): Node's child_process docs explicitly call out that
.cmd/.bat files cannot be spawned via execFile/execFileSync without a shell
("Spawning .bat and .cmd files on Windows" section). Since `npm` on Windows
is `npm.cmd`, my use of execFileSync('npm', ['prefix', '-g'], { shell: false })
would have failed on the very platform this PR is meant to fix.
Switched to cp.execSync('npm prefix -g', ...) — matching the existing
convention at line ~8718 which makes the same lookup. Args are static literals
so shell interpolation is not an injection vector.
CR finding 2 (🟠 Major): the source-grep regression test in
tests/no-unconditional-win32-skip.test.cjs violated the repo's no-source-grep
testing standard (CONTRIBUTING.md). Replaced with a behavior-based test that:
- overrides process.platform to 'win32' via Object.defineProperty
- mocks cp.execSync to return a temp-dir as npm prefix
- calls trySelfLinkGsdSdk(shimSrc) and asserts it returns non-null AND
materializes gsd-sdk.cmd on disk
The behavior guard is strictly stronger than the regex version: it would
catch any equivalent skip pattern (e.g. os.platform() === 'win32', a
typeof-based guard, etc.), not just literal `if (process.platform === 'win32')`
text. Negative-tested by re-introducing the `return null` skip → test fails
with maintainer-quoted diagnostic "trySelfLinkGsdSdk must not silently
return null on Windows; a no-op skip is a missed-parity regression"; restored
→ passes.
Test for Windows shim materialization (bug-2962-windows-sdk-shim.test.cjs)
also updated to mock cp.execSync (matching the new production code path)
instead of cp.execFileSync.
Full suite: 6480/6480 pass.
* test(#2962): make Windows shim tests self-contained per CR
Each test now invokes trySelfLinkGsdSdkWindows() itself before reading
the shim files, so they don't implicitly depend on the earlier test's
side effects. Addresses CR's order-dependence finding.
* test(#2962): structured shim parsing — eliminate substring source-grep
CR found that even after the prior refactor, three tests in the suite
still used .includes()/.startsWith() against shim file content
(cmdContent.includes(\`@node ${jsonQuoted} %*\`) etc.). Substring matching
on file text is the same anti-pattern the no-source-grep standard
forbids — even when the file is one this test wrote — because it asserts
a literal exists rather than that the structured shape is correct.
Replace with three small parsers (parseCmdShim, parsePs1Invocation,
parseBashInvocation) that split each shim into header + invocation
tokens and assert via deepEqual on structured records. The assertions
now check that the .cmd has @ECHO OFF / @SETLOCAL / @node <abs> %* in
that order with exactly 3 meaningful lines, and that the .ps1 and bash
wrappers split into the expected (call, nodeCmd, target, argToken)
tuples.
The stale-shim replacement test was hardened the same way: instead of
proving the absence of a sentinel substring, it now proves the parsed
target equals the new shimSrc and != the old path.
Verified: scripts/lint-no-source-grep.cjs reports 0 violations across
348 test files. The 6-test windows-sdk-shim + win32-skip-guard suite
all pass.
* fix(#2962): expose pure shim IR + tests assert on typed fields, not rendered text
Earlier "structured parser" approach (parseCmdShim / parsePs1Invocation /
parseBashInvocation) was still raw-text manipulation behind a function
wrapper — split('\\r\\n'), trim().split(/\\s+/), content.includes('\\r\\n').
Maintainer was right: hiding grep behind a parser is still grep.
Real fix: refactor production code to expose the structured intermediate
representation, and have tests assert on the IR fields directly.
Production:
- New buildWindowsShimTriple(shimSrc) — pure function, no fs/spawn.
Returns { invocation: { interpreter, target }, eol: { cmd, ps1, sh },
fileNames: { cmd, ps1, sh }, render: { cmd: () => string, ... } }.
The IR is the contract; rendered text is an implementation detail of
the renderers.
- trySelfLinkGsdSdkWindows now calls buildWindowsShimTriple, looks up
filenames from triple.fileNames, and writes triple.render[kind]() to
each target. Same observable behavior, structurally separated.
- buildWindowsShimTriple added to test-mode exports.
Tests (full rewrite — no shim file content is read at any point):
- Layer 1: pure-IR tests assert on triple.invocation.target,
triple.eol === { cmd: '\\r\\n', ps1: '\\n', sh: '\\n' },
triple.fileNames === { cmd: 'gsd-sdk.cmd', ... }, and the
documented IR shape via Object.keys().sort() deepEqual.
- Layer 2: fs/spawn driver tests assert filesystem FACTS:
- return value equals expected path
- all three target files exist as regular non-empty files
- rendered file byte length === Buffer.byteLength of triple.render(kind)
output (proves the writer writes what the renderer produces, no
mutation, no truncation, no double-write — without comparing content)
- mtime advances on rewrite (proves stale-replace behavior)
- returns null when npm prefix -g throws
No more split, .includes, .startsWith, .endsWith, or substring matching
anywhere in the test suite. Lint clean. 10/10 tests pass.
* fix(release-sdk): skip all cherry-pick conflicts in hotfix loop
Full-automation policy: any conflict the cherry-pick can't auto-resolve
— context-missing (#2966) or real merge conflict — is now skipped, not
aborted. The hotfix run completes with whatever applies cleanly; the
SKIPPED list in the run summary becomes the operator's post-hoc review
queue.
Surfaced in run 25227493387 (1.39.1 dry-run): commit 0fb992d
("fix(git): add git.base_branch config") produced real conflicts in
config.cjs / ship.md / complete-milestone.md / tests/config.test.cjs.
v1.39.0 was tagged on the feat/hermes-runtime-2841 branch (#2920),
which restructured those files. 0fb992d was authored against the
pre-restructure shape, so cherry-pick can't auto-resolve.
Pre-#2968 behavior: the workflow distinguished context-missing (skip)
from real (abort + push partial + exit 1). Real conflicts blocked every
hotfix from a base tag whose lineage diverged from main — exactly the
v1.39.x situation. The user has called explicitly for full automation:
"this needs to be fully automated, no one is going to sit there and
tag fixes."
Behavior change:
- Both classification branches now `git cherry-pick --skip` and
append to SKIPPED with a reason category:
* "context absent at base" — empty-HEAD markers (#2966)
* "merge conflict — manual review" — non-empty HEAD (#2968)
- Removed: `git cherry-pick --abort`, partial-state push,
"Cherry-pick conflict" GITHUB_STEP_SUMMARY block, `exit 1`.
- Operator's manual recovery path via `auto_cherry_pick=false`
remains intact.
Trade-off (acknowledged in #2968): a critical fix can be silently
dropped if no one reviews the SKIPPED list. The release job's
install-smoke + full test suite still runs and would catch any
test-covered regression. Fixes that aren't test-covered could ship
missing — accepted cost of full automation per the issue.
Tests:
- tests/bug-2968-cherry-pick-skip-on-any-conflict.test.cjs (new) —
extracts the cherry-pick failure block via bash if/fi nesting walk
(no raw-text grep) and asserts the abort path is removed, --skip
is unconditional, and "merge conflict" + "context absent at base"
annotations both exist.
- tests/bug-2966-cherry-pick-context-missing.test.cjs (renamed
describe + first test name) — assertions still valid since the
classifier survives for skip-reason annotation.
- tests/bug-2964-release-sdk-empty-cherry-pick.test.cjs — unchanged
and still green.
Local: `node --test tests/bug-2964-...test.cjs tests/bug-2966-...test.cjs
tests/bug-2968-...test.cjs` → 8/8 pass.
Local: `npm run lint:tests` → 0 violations.
https://claude.ai/code/session_01LApueb9PVs2uSBhsLprVzG
* fix(release-sdk): split cherry-pick conflict skips from policy skips
CodeRabbit flagged on PR #2970 that conflict skips and policy skips
share the SKIPPED bucket. The run summary heading
"Skipped (feat/refactor/etc — not auto-included)" buries manual-review
conflicts (which the operator must triage) under the same list as
intentional policy exclusions (commits that don't match fix/chore by
design and need no action). Operators reviewing the summary can't
distinguish the two without reading every entry.
Split into two variables:
- POLICY_SKIPPED — feat/refactor/docs/etc filtered out by the
fix/chore regex (informational, no action needed)
- CONFLICT_SKIPPED — fix/chore commits whose cherry-pick failed and
were skipped per the full-automation policy (#2968) (manual
review queue)
Run summary now emits two sections with distinct headings:
- "Skipped — cherry-pick conflict (manual review)"
- "Not auto-included (feat/refactor/docs/etc)"
The new bug-2968 test asserts both buckets are populated correctly:
- failure path appends to CONFLICT_SKIPPED, not SKIPPED
- both bucket variables are echoed in the summary
- both section headings are present
Local: `node --test tests/bug-2964-...test.cjs tests/bug-2966-...test.cjs
tests/bug-2968-...test.cjs` → 9/9 pass.
https://claude.ai/code/session_01LApueb9PVs2uSBhsLprVzG
* fix(release-sdk): handle merge commits and guard cherry-pick --skip
CodeRabbit flagged a real major issue on PR #2970: merge commits with
fix:/chore: titles fail BEFORE entering cherry-pick state because they
need `-m <parent>` to specify the diff base. Without it, the cherry-pick
errors out and CHERRY_PICK_HEAD is never created. The unconditional
`git cherry-pick --skip` call that follows then fails too (no in-progress
cherry-pick to skip), bricking the loop — defeating the full-automation
policy this PR set out to deliver.
Two guards added:
1. Pre-skip merge commits before invoking cherry-pick. The loop checks
parent count via `git rev-list --parents -n 1 "$SHA"`; if > 1, the
commit goes straight to CONFLICT_SKIPPED with reason "merge commit —
manual -m parent selection required". Operator decides which parent
to keep when reviewing the run summary.
2. Guard `git cherry-pick --skip` with a CHERRY_PICK_HEAD existence
check. Catches any other failure mode where the cherry-pick aborts
before entering conflict state (unreadable commit, ref problems,
etc.) so the loop still continues cleanly.
Also bumped the bug-2964 test's regex slice window from 2000 to 4000
chars so the merge-commit pre-skip block doesn't push the cherry-pick
line out of the test's match range.
Tests added in tests/bug-2968-cherry-pick-skip-on-any-conflict.test.cjs:
- merge-commit detection: workflow must call
`git rev-list --parents -n 1 "$SHA"` before cherry-pick and annotate
skips with the distinct "manual -m parent selection required"
reason.
- guard: failure block must check CHERRY_PICK_HEAD before --skip.
Local: `node --test tests/bug-2964-...test.cjs tests/bug-2966-...test.cjs
tests/bug-2968-...test.cjs` → 11/11 pass.
https://claude.ai/code/session_01LApueb9PVs2uSBhsLprVzG
* fix(release-sdk): guard awk classifier against degenerate unmerged paths
CodeRabbit raised two issues on PR #2970:
1. Major (workflow): the `awk` classifier runs under `set -euo pipefail`.
If a CONFLICTED path is missing/unreadable, awk exits non-zero and
terminates the entire step — bricking the loop on a degenerate file.
Also, an unmerged path with no `<<<<<<< ` markers (path-level conflict
or anomalous git state) was misclassified as "context absent at base"
(the auto-skip path), letting potentially-real conflicts skip silently.
Fix: before invoking awk, check `[ ! -r "$CONFLICTED" ]` and
`grep -q '^<<<<<<< ' "$CONFLICTED"`. Either failure marks
ALL_EMPTY_HEAD=false → REASON falls through to "merge conflict —
manual review", landing the pick in the operator review queue.
Also added `2>/dev/null || echo "real"` on the awk call so a
transient awk failure can't slip into the auto-skip bucket.
2. Nitpick (tests): regex assertions on `failureBlock` could match
commented lines (e.g. comment text mentioning "CONFLICT_SKIPPED"
or "git cherry-pick --skip" satisfied the assertions without the
real command being present).
Fix: anchor with `^\s*...` + `m` flag so only executable shell lines
count.
Plus a new test asserting all three workflow guards
(`[ ! -r "$CONFLICTED" ]`, `grep -q '^<<<<<<< '`, `awk ... || echo
"real"`) are present in the failure block.
Local: `node --test tests/bug-2964-...test.cjs tests/bug-2966-...test.cjs
tests/bug-2968-...test.cjs` → 12/12 pass.
https://claude.ai/code/session_01LApueb9PVs2uSBhsLprVzG
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* fix(release-sdk): skip cherry-picks whose target context is absent at base
When auto_cherry_pick processed a fix:/chore: commit whose patch modified
code that didn't exist at the hotfix base tag — typically because the
surrounding infrastructure was added later in a feat/refactor commit
excluded by the filter — `git cherry-pick` failed with a conflict that
no operator could meaningfully resolve, and the loop bricked the run.
Discovered re-running the 1.39.1 dry-run after #2965 merged: cherry-pick
of `a3467792` (the #2965 merge itself) failed because the auto_cherry_pick
block it modifies was added in #2956 ("Add automated cherry-pick + SDK-
bundle parity to hotfix flow") — an Add/feat commit, so the fix/chore
filter excludes it. v1.39.0 has no such block, so the patch had no
anchor.
The conflict is unmistakably distinguishable from a real content conflict:
git emits marker blocks where every `<<<<<<< HEAD ... =======` HEAD
section is empty (no anchor lines to reconcile against), while real
conflicts have content on both sides.
After cherry-pick fails:
1. List unmerged paths via `git diff --diff-filter=U`.
2. For each, scan conflict markers with awk. If every HEAD section is
blank/whitespace-only across every block, classify as
context-missing.
3. Context-missing → `git cherry-pick --skip` and append to SKIPPED
list with reason "(context absent at base)".
4. Otherwise fall through to the existing abort/push-partial/error
path that surfaces the conflict for operator resolution.
Real conflicts still surface with the same workflow as before.
Tests in tests/bug-2966-cherry-pick-context-missing.test.cjs cover:
- Static — extracts the "Prepare hotfix branch" run block via
indentation-aware YAML parsing (no raw-text grep) and asserts the
classification predicate, --skip call, and skipped-reason annotation
are present.
- Behavioral — synthetic repo reproducing the real shape of the
failure, asserts cherry-pick exits non-zero and produces the
empty-HEAD marker shape.
- Predicate — pulls the awk script out of the deployed workflow and
feeds it sample conflict shapes (empty-HEAD, real, mixed,
whitespace-only); asserts each is classified as the workflow will
behave.
Local: `node --test tests/bug-2966-...test.cjs` → 3/3 pass.
Local: `npm run lint:tests` → 0 violations.
https://claude.ai/code/session_01LApueb9PVs2uSBhsLprVzG
* fix(release-sdk): pin merge.conflictStyle=merge on hotfix cherry-pick
CodeRabbit flagged on #2967 that the awk classifier introduced for #2966
assumes default conflict-marker style (plain `<<<<<<< HEAD ... ======= ...
>>>>>>>`). If a runner has merge.conflictStyle=diff3 or zdiff3 set
(globally, repo-config, or via git defaults shift), the marker emits an
extra `||||||| ancestor` section between HEAD and =======. The awk's
`in_head` mode would accumulate that ancestor content into the HEAD
buffer, and a context-missing conflict would misclassify as real —
sending the workflow into the abort path on a pick that should be
silently skipped.
Pass `-c merge.conflictStyle=merge` on the cherry-pick command itself
(scoped to that one git invocation; doesn't leak to other commands).
This guarantees marker shape regardless of the runner's git config.
Updated the existing static assertion in
tests/bug-2966-cherry-pick-context-missing.test.cjs to require the pin —
a future edit dropping it fails the test.
Local: `node --test tests/bug-2966-...test.cjs` → 3/3 pass.
https://claude.ai/code/session_01LApueb9PVs2uSBhsLprVzG
* test(#2964): allow git options between `git` and `cherry-pick`
The previous commit on this branch (d6530190) added
`git -c merge.conflictStyle=merge cherry-pick ...` to release-sdk.yml.
The bug-2964 static test's regex `/git cherry-pick[^\n]*"\$SHA"/`
required `cherry-pick` to be the literal next token after `git`, so it
no longer matched the line and CI failed on Node 22 / Node 24 / macOS.
Loosen to `/git\b[^\n]*?cherry-pick[^\n]*"\$SHA"/` so any options
between `git` and `cherry-pick` (e.g. `-c key=value`) are tolerated.
The flag assertions on the matched line still verify --allow-empty and
--keep-redundant-commits are present, which is what bug-2964 actually
guards.
Local: `node --test tests/bug-2964-...test.cjs tests/bug-2966-...test.cjs`
→ 5/5 pass.
https://claude.ai/code/session_01LApueb9PVs2uSBhsLprVzG
* test(#2966): pin merge.conflictStyle in test git wrapper, assert awk status
CodeRabbit raised two issues on PR #2967:
1. The synthetic-repo cherry-pick reproducer asserted `<<<<<<< HEAD ...`
blocks have empty HEAD sections, but the cherry-pick itself didn't
pin `merge.conflictStyle`. A developer or CI runner with global
diff3/zdiff3 config would inject `||||||| ancestor` lines into the
HEAD scan and the test would fail for environment reasons rather
than the bug premise. Pin the style on the test's `git()` wrapper
so every git operation in the test is deterministic regardless of
user config.
2. `classify()` ran awk and consumed `r.stdout.trim()` without checking
`r.status` or `r.error`. A failed awk invocation (missing binary,
syntax error, signal) returns empty stdout, which would falsely
classify as "context-missing" and the test would silently pass on
broken predicates. Add `assert.ok(!r.error, ...)` and
`assert.equal(r.status, 0, ...)` before reading stdout.
Local: `node --test tests/bug-2966-...test.cjs tests/bug-2964-...test.cjs`
→ 5/5 pass.
https://claude.ai/code/session_01LApueb9PVs2uSBhsLprVzG
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* fix(#2957): claude+global post-install instructs restart and skill fallback
`npx get-shit-done-cc --claude --global` writes skills to
`~/.claude/skills/gsd-*/SKILL.md` (CC 2.1.88+ format) and removes the
legacy `~/.claude/commands/gsd/`. The post-install message still told
users to type `/gsd-new-project` without mentioning the required Claude
Code restart or the skill-name fallback. On configurations where CC
does not auto-surface skills in the slash menu, users hit "no commands
appear" and assumed the install failed.
Split the post-install message: the existing single-line instruction
stays for every non-Claude runtime and for `--claude --local`. For
`--claude --global` it now reads:
Restart Claude Code, then in any directory either type
/gsd-new-project or ask Claude to run the gsd-new-project skill.
This covers both invocation paths and surfaces the restart requirement.
Add tests/bug-2957-claude-global-postinstall-message.test.cjs as a
regression guard: captures the printed message for claude+global,
claude+local, and opencode+global; asserts content for each. Verified
the test fails on main (pre-fix) and passes after the fix.
Closes#2957
* test(#2957): assert legacy generic instruction is replaced not extended
CodeRabbit flagged that the test would still pass if the new restart/
fallback copy were printed *alongside* the old 'open a blank directory'
instruction. Adding a doesNotMatch assertion proves the claude+global
branch replaces the legacy line rather than appending to it.
* fix(query/agent-skills): emit raw <agent_skills> block instead of JSON-wrapped string
The CLI dispatcher (`cli.ts`) JSON-stringifies all query handler results via
`console.log(JSON.stringify(result.data, null, 2))`. For the `agent-skills`
handler this produced a JSON-quoted string literal — e.g.
`"<agent_skills>\n…</agent_skills>"` — which workflows embedded verbatim via
`$(gsd-sdk query agent-skills gsd-planner)`, breaking all `<agent_skills>`
injection into spawned subagent prompts.
Fix: add an optional `format: 'json' | 'text'` field to `QueryResult`. When a
handler returns `format: 'text'` and `--pick` is not active, the CLI writes the
string directly via `process.stdout.write` instead of JSON-stringifying it.
`agentSkills` sets `format: 'text'` for non-empty blocks.
Regression guard: two new CLI integration tests in `skills.test.ts` spawn the
CLI as a child process and assert that (a) a mapped agent type receives the raw
XML block on stdout and (b) an unmapped agent type produces the existing JSON
empty-string output.
Fixes#2914.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* docs(changelog): add #2917 entry under Unreleased Fixed
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* feat(workflows): hotfix auto-cherry-pick + SDK-bundle parity (#2955)
hotfix.yml:
- create: auto-cherry-picks fix:/chore: commits from origin/main since
BASE_TAG, oldest-first. Patch-equivalents skipped via git cherry.
feat:/refactor: never auto-included. Conflicts halt with offending SHA.
- finalize: install-smoke gate, sdk-bundle/gsd-sdk.tgz parity with
release-sdk.yml, tightened next dist-tag re-point, --latest on gh
release create. SDK package.json bumped in lockstep.
release-sdk.yml:
- New action input (publish | hotfix) and auto_cherry_pick boolean.
- New prepare job branches hotfix/X.YY.Z from highest vX.YY.* tag,
cherry-picks same logic as hotfix.yml, outputs effective ref.
- install-smoke and release consume prepare.outputs.ref.
- Hotfix mode forces tag=latest, opens merge-back PR. Idempotent if
branch already exists.
VERSIONING.md: documents the cumulative-tag invariant
(vX.YY.Z anchors vX.YY.{Z+1}) and both workflow paths.
Closes#2955
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(code-review): wire --fix dispatch and update stale command references (#2947)
* fix(#2893): surface non-canonical plan filenames instead of silently returning zero plans
Reporter saw `plan_count: 0` from `/gsd:execute-phase` even though five
plan files existed on disk. Investigation showed the planner had written
files like `01-PLAN-01-foundation.md`, while `phase-plan-index`'s strict
filter (`f.endsWith('-PLAN.md') || f === 'PLAN.md'`) rejected them
silently — collapsing two distinct states into the same `plans: []`
return:
- directory truly has no plans (legit empty)
- directory has plans but the filter rejected them (user/agent error)
The canonical contract is documented in three places:
- `agents/gsd-planner.md` write_phase_prompt step (lines 1063-1080)
- `commands/gsd/plan-phase.md`
- `references/universal-anti-patterns.md` (rule 26)
It mandates `{padded_phase}-{NN}-PLAN.md` and explicitly forbids
`PLAN-NN.md` / `01-PLAN-01.md` / `plan-NN.md` etc. The strict filter is
correct per that contract. The bug is that the executor never tells the
user when the contract was violated — they just see `plan_count: 0`
with no signal.
Fix: add a diagnostic helper `describeNonCanonicalPlans()` that scans
the phase directory for files matching `*PLAN*.md` (the diagnostic net)
that the canonical filter rejected, excluding legit derivatives like
`*-PLAN-OUTLINE.md` and `*-PLAN.pre-bounce.md`. When offenders exist,
return a `warning` field naming each one and citing the canonical
pattern so the user knows what to rename to.
Wired into the three filter sites:
- `phase-plan-index` (the executor's main entry point)
- `phases list --type plans`
- `find-phase`
The strict filter itself is unchanged — existing canonical plans behave
identically. This is purely a diagnostic that converts silent-empty
into loud-with-actionable-error.
Tests:
- `phase-plan-index returns warning for reporter's exact filename
pattern (`01-PLAN-01-foundation.md`)`
- `truly empty dir does not emit a warning`
- `canonical plans + outline + pre-bounce files do not emit a warning`
Closes#2893
* test(#2893): add parity tests for find-phase and phases list --type plans warnings
CodeRabbit's only finding on the prior commit: I wired the warning into
three filter sites (`phase-plan-index`, `find-phase`,
`phases list --type plans`) but only `phase-plan-index` had test
coverage for the warning shape. The other two paths could silently
diverge during future refactors — exactly the silent-drift class of bug
this fix exists to prevent.
Add four parity tests mirroring the existing two:
- find-phase: non-canonical filenames produce a warning naming each
offender + citing the canonical pattern.
- find-phase: canonical plan + derivative files (PLAN-OUTLINE,
pre-bounce) produce no warning.
- phases list --type plans: same non-canonical case, but assert the
warning is prefixed with `${dir}: ` (this path aggregates across
phase directories so each offender is tagged with its dir).
- phases list --type plans: canonical case, no warning.
`node --test tests/phase.test.cjs`: 98/98 pass (was 94, +4 new).
* docs(changelog): hotfix flow auto-cherry-pick + SDK bundle parity (#2955)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(workflows): address CodeRabbit findings on hotfix flow (#2955)
5 findings, all real:
1. BASE_TAG selection used lexicographic awk compare, breaking on
multi-digit patches (v1.27.10 wrongly < v1.27.2). Fixed in both
hotfix.yml and release-sdk.yml: append TARGET_TAG to candidate list,
sort -V, take preceding entry. Semver-correct.
2,4. Cherry-pick conflict aborted locally with no remote branch to
resolve from. Now the skeleton branch is pushed up-front (real runs);
on conflict we abort, push the partial-pick state with
--force-with-lease, and emit operator instructions in the run summary.
3. release-sdk.yml dry_run exited before cherry-pick, defeating the
purpose. Now dry_run still applies cherry-picks locally (catches
conflicts), just skips push. Downstream install-smoke runs against
BASE_TAG; the cherry-pick verification itself is the dry-run signal.
5. release-sdk.yml release job missing pull-requests: write — gh pr
create for the merge-back PR would have failed under restricted
token defaults. Permission added.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(workflows): CR round 2 — dry-run signal + post-publish reconciliation (#2955)
3 findings, all real:
6. hotfix.yml create dry_run skipped every step (branch creation,
cherry-pick, version bump) — a green dry-run gave no signal at all.
Now the local checkout/cherry-pick/bump always runs; only the git
push calls are gated on dry_run. Conflicts surface in dry-run too.
7,8. "Refuse if version already on npm" preflight hard-failed reruns,
so a transient failure between npm publish and a later step (tag
push, GH release, merge-back PR, dist-tag re-point) left the release
half-shipped with no path to reconcile. Replaced with a
prior_publish detect step that warns and sets skip_publish=true; the
publish step is gated on that flag, but tag/release/PR/dist-tag
continue. GitHub Release create is now idempotent (edit --latest if
already exists).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(workflows): CR round 3 — preserve dry-run cherry-pick history in conflict guidance (#2955)
Dry-run conflict path discarded successful picks with the runner, but
the message told operators to rerun with auto_cherry_pick=false — which
recreates the branch from BASE_TAG and silently loses every pick that
had succeeded before the conflict.
Updated both hotfix.yml and release-sdk.yml: dry-run conflict summary
now lists the lost SHAs and recommends re-running with
auto_cherry_pick=true (real, not dry-run) to materialize the partial
branch on origin. Real-run guidance unchanged.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* fix(#2948): wire spike --wrap-up flag dispatch
Add dispatch block to commands/gsd/spike.md so that /gsd-spike --wrap-up
routes to the spike-wrap-up workflow instead of silently no-oping. Also
add spike-wrap-up.md to execution_context so the runtime can load it, and
update both companion references in workflows/spike.md from the deleted
/gsd-spike-wrap-up entry-point to /gsd-spike --wrap-up.
Fixes#2948
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* test(#2948): rewrite dispatch test using parseFrontmatter + section extraction
Replace raw fs.readFileSync + text.includes() / regex assertions with structural
parsing: parseFrontmatter extracts the YAML frontmatter fields and _body,
extractSection pulls named XML blocks, and parseExecutionContextRefs resolves
the @-prefixed workflow references. Assertions now target the argument-hint
frontmatter field, the execution_context @-ref list, and the routing text within
<context>/<process> sections — not arbitrary substrings in the raw file.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* test(#2948): tighten dispatch assertion to line-level rule check
Replace the co-occurrence check (dispatchText.includes('--wrap-up') &&
dispatchText.includes('spike-wrap-up')) with line-level assertions that parse
the <process> section's rules array, find the exact '- If it is `--wrap-up`:'
line, verify it includes 'strip the flag' and 'spike-wrap-up', and assert the
'- Otherwise:' fallback still routes to the spike workflow.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* test(#2948): anchor parseFrontmatter to line 0 to avoid mid-file --- delimiters
parseFrontmatter was scanning the whole file for the first two '---' lines,
which can match a mid-document horizontal rule as the opening delimiter.
Now requires lines[0].trim() === '---'; returns { _body: content } for files
with no frontmatter, and searches for the closing '---' from line 1 onward.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(#2950): update stale deleted-command references in workflow files
Eight workflow files (help.md, do.md, settings.md, discuss-phase.md,
new-project.md, plan-phase.md, spike.md, sketch.md) referenced command
names removed in #2790. Updated all occurrences to canonical new forms:
/gsd-phase (--insert / --remove), /gsd-capture, /gsd-config (--profile
/ --integrations / --advanced), /gsd-spike --wrap-up,
/gsd-sketch --wrap-up, /gsd-code-review --fix.
Adds regression test (124 assertions) in tests/bug-2950-stale-command-refs.test.cjs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* test(#2950): update pre-existing assertions to accept new consolidated command forms
gsd-settings-advanced.test.cjs and settings-integrations.test.cjs were checking
settings.md for the old micro-skill names (/gsd-settings-advanced,
/gsd-settings-integrations). Now that #2950 updates settings.md to use the
consolidated equivalents, broaden the assertions to accept both old and new forms.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* test(#2950): require canonical command forms and forbid legacy variants
The broadened OR assertions added to unblock CI were too permissive — they
could pass with legacy names still present. Now assert the canonical form is
present (gsd-config --advanced / gsd-config --integrations) AND the legacy
forms are absent (gsd-settings-advanced, gsd:settings-advanced,
/gsd-settings-integrations).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(#2949): wire sketch --wrap-up flag dispatch
Add dispatch logic to commands/gsd/sketch.md so --wrap-up routes to the
sketch-wrap-up workflow instead of silently falling through to the normal
sketch workflow. Also adds sketch-wrap-up.md to execution_context and
updates companion references in workflows/sketch.md from the deleted
/gsd-sketch-wrap-up command to /gsd-sketch --wrap-up.
Fixes#2949
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(#2949): use exact-match "If it is" instead of "If it contains" for --wrap-up dispatch
Aligns with the established pattern across all consolidated commands
(workspace.md, update.md, progress.md) where the first-token check uses
"If it is `--flag`" for exact equality, not substring matching.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(#2893): surface non-canonical plan filenames instead of silently returning zero plans
Reporter saw `plan_count: 0` from `/gsd:execute-phase` even though five
plan files existed on disk. Investigation showed the planner had written
files like `01-PLAN-01-foundation.md`, while `phase-plan-index`'s strict
filter (`f.endsWith('-PLAN.md') || f === 'PLAN.md'`) rejected them
silently — collapsing two distinct states into the same `plans: []`
return:
- directory truly has no plans (legit empty)
- directory has plans but the filter rejected them (user/agent error)
The canonical contract is documented in three places:
- `agents/gsd-planner.md` write_phase_prompt step (lines 1063-1080)
- `commands/gsd/plan-phase.md`
- `references/universal-anti-patterns.md` (rule 26)
It mandates `{padded_phase}-{NN}-PLAN.md` and explicitly forbids
`PLAN-NN.md` / `01-PLAN-01.md` / `plan-NN.md` etc. The strict filter is
correct per that contract. The bug is that the executor never tells the
user when the contract was violated — they just see `plan_count: 0`
with no signal.
Fix: add a diagnostic helper `describeNonCanonicalPlans()` that scans
the phase directory for files matching `*PLAN*.md` (the diagnostic net)
that the canonical filter rejected, excluding legit derivatives like
`*-PLAN-OUTLINE.md` and `*-PLAN.pre-bounce.md`. When offenders exist,
return a `warning` field naming each one and citing the canonical
pattern so the user knows what to rename to.
Wired into the three filter sites:
- `phase-plan-index` (the executor's main entry point)
- `phases list --type plans`
- `find-phase`
The strict filter itself is unchanged — existing canonical plans behave
identically. This is purely a diagnostic that converts silent-empty
into loud-with-actionable-error.
Tests:
- `phase-plan-index returns warning for reporter's exact filename
pattern (`01-PLAN-01-foundation.md`)`
- `truly empty dir does not emit a warning`
- `canonical plans + outline + pre-bounce files do not emit a warning`
Closes#2893
* test(#2893): add parity tests for find-phase and phases list --type plans warnings
CodeRabbit's only finding on the prior commit: I wired the warning into
three filter sites (`phase-plan-index`, `find-phase`,
`phases list --type plans`) but only `phase-plan-index` had test
coverage for the warning shape. The other two paths could silently
diverge during future refactors — exactly the silent-drift class of bug
this fix exists to prevent.
Add four parity tests mirroring the existing two:
- find-phase: non-canonical filenames produce a warning naming each
offender + citing the canonical pattern.
- find-phase: canonical plan + derivative files (PLAN-OUTLINE,
pre-bounce) produce no warning.
- phases list --type plans: same non-canonical case, but assert the
warning is prefixed with `${dir}: ` (this path aggregates across
phase directories so each offender is tagged with its dir).
- phases list --type plans: canonical case, no warning.
`node --test tests/phase.test.cjs`: 98/98 pass (was 94, +4 new).
* feat(workflows): add atomic Write+commit ordering directive for SUMMARY.md
Adds explicit prompt-ordering language to executor spawn prompts and
plan-execution steps so agents commit SUMMARY.md before emitting any
concluding narrative. Mitigates the truncation-between-Write-and-commit
failure mode that has made the #2070 rescue net load-bearing.
Refs #2806
* fix(workflows): condense REQUIRED ORDER blocks to fit XL budget
The two REQUIRED ORDER directives added in bd1956df pushed
execute-phase.md to 1712 lines, exceeding the 1700-line XL budget.
Collapse each 6-line block into a single line that preserves the
semantic intent (Write SUMMARY.md → commit → narration; no text
between Write and commit; #2070 rescue is not primary defense).
File is now exactly 1700 lines; workflow-size-budget test passes.
* fix(execute-plan): move self-check before commit to preserve atomic Write+commit (#2939)
* fix(install): record commands/gsd in manifest for Claude local + per-runtime --minimal coverage
writeManifest gated commands/gsd/ recording to Gemini, leaving Claude
Code local installs with an incomplete manifest. Audit during #2923
investigation showed every runtime adapter correctly honours --minimal
on disk (6 skills, 0 agents) — but Claude local manifest reported 0
skills, breaking saveLocalPatches() drift detection and any downstream
tooling that reads manifest.files for the installed surface.
Drop the isGemini gate so any runtime that writes commands/gsd/ has
those files hashed into the manifest.
Adds tests/install-minimal-all-runtimes.test.cjs: spawns the installer
end-to-end for all 14 supported runtimes in both --global and --local
modes, parses the manifest JSON, and asserts mode === 'minimal',
skill set equals MINIMAL_SKILL_ALLOWLIST, and zero gsd-* agents are
recorded. Cross-checks the manifest against on-disk skill files.
Closes#2923
* test(install): address CR feedback on bug-2923 minimal-runtime tests
- Assert installer exit status in runInstall() so failing installs do not
produce misleading downstream artifact assertions; include stderr in the
failure message for debuggability.
- Guard the on-disk vs manifest parity loop with assert.ok(manifest, ...)
so the equality check cannot pass accidentally when the manifest is
missing.
* fix(workflows): assert HEAD on per-agent branch before worktree commits
Worktree-mode setup could leave HEAD attached to a protected branch (master),
causing agent commits to land there. The previous response was a destructive
self-recovery via 'git update-ref refs/heads/master <sha>', which silently
rewinds the protected branch and destroys concurrent commits in multi-active
scenarios (parallel agents, user committing while agent runs).
- Reorder <worktree_branch_check> in execute-phase.md and quick.md to assert
HEAD via 'git symbolic-ref' BEFORE any 'git reset --hard'. HALT with a
blocker if HEAD is on main/master/develop/trunk/release/* or detached.
- Add a per-commit HEAD assertion (step 0) to gsd-executor.md
<task_commit_protocol>; HEAD attachment can drift after 'git checkout <sha>'.
- Forbid 'git update-ref refs/heads/<protected>' in
<destructive_git_prohibition>; surface the blocker rather than self-heal.
- Remove '--no-verify' as the worktree-mode default in execute-phase.md,
execute-plan.md, quick.md, and references/git-integration.md. Hooks now
run on every executor commit; opt out only via workflow.worktree_skip_hooks.
- Add regression test that parses the worktree_branch_check blocks structurally
and asserts the symbolic-ref check precedes the reset --hard, no workflow
performs update-ref on a protected ref, and --no-verify is no longer the
default in any parallel-execution prompt.
* fix(#2924): address CodeRabbit review findings on worktree HEAD PR
- Add positive worktree-agent-* allow-list to <task_commit_protocol> step 0
in gsd-executor.md and to <worktree_branch_check> in execute-phase.md and
quick.md. The deny-list (main|master|develop|trunk|release/*) silently
allowed feature/* and other arbitrary branches outside the agent namespace.
- Register workflow.worktree_skip_hooks in both config schemas
(sdk/src/query/config-schema.ts and get-shit-done/bin/lib/config-schema.cjs)
and document it in docs/CONFIGURATION.md so config-set accepts it.
- Fix stash lifecycle in execute-phase.md post-wave hook validation: stash
under a named ref and pop after the hook run; warn on pop failure.
- Pre-dispatch PLAN.md commit in quick.md: gate on git diff --cached --quiet
for idempotency and exit 1 with a clear error on commit failure (both the
--no-verify and the normal branches) — no more swallowing real errors.
- Test fixes (tests/bug-2924-worktree-head-attachment.test.cjs):
- Parse the protected-branch alternation structurally and require
main, master, develop, trunk, release/.* (release/* was previously
skipped by the \\b...\\b regex).
- Use fs.readdirSync(dir, { recursive: true }) so workflows in nested
subdirectories are also asserted against the update-ref ban.
- Add allow-list assertions for execute-phase.md, quick.md, and
gsd-executor.md to lock in the new positive namespace check.
* test(#2924): assert sub-section end marker exists before slicing
* test(#2924): use section boundary instead of fixed window for parallel-agents slice
* fix(config-get): return schema default for context_window when absent (#2943)
cmdConfigGet in bin/lib/config.cjs now consults a SCHEMA_DEFAULTS map before
emitting "Key not found", so context_window (and any future schema-defaulted
keys) return their default value (exit 0) when not set in config.json.
Also updates the stale subagent-timeout.test.cjs assertion that expected the
old broken behavior (exit 1 / "Key not found") to match the corrected behavior.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* test: use distinct sentinel to prove --default wins over schema default (#2943)
* docs: update CHANGELOG.md for #2943 fix
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
After v1.39.0 skill consolidation (#2790), skills/ became a GSD-managed
root that the installer wipes on update. GSD_MANAGED_DIRS in gsd-tools.cjs
was missing 'skills', so user-added skill directories (e.g.
skills/custom-skill/SKILL.md) were never walked and silently destroyed
during /gsd-update.
- Add 'skills' to GSD_MANAGED_DIRS so the directory is walked
- Add tests/bug-2942-detect-custom-skills.test.cjs with 5 targeted tests
- Update tests/update-custom-backup.test.cjs: replace the now-incorrect
"skills/ must NOT be scanned" assertion (written pre-#2790) with a test
that verifies custom skills ARE detected and GSD-owned skills are not
falsely flagged
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces stale v1.32/v1.37 highlight blocks with v1.39.0 highlights in
README.md and four translations, adds /gsd-edit-phase to phase-management
tables, documents workstream config inheritance, the post-merge build gate,
and per-runtime review.models.<cli> selection.
Closes#2935
Two CodeRabbit findings on PR #2920:
1. parseRuntimeInput previously only matched the bare "16" exactly for
the all-runtimes shortcut. Inputs the prompt explicitly encourages —
"16,", "16 1", "1,16" — fell through to per-token parsing and
silently installed only Claude or a partial subset. Move the
ALL_RUNTIMES_OPTION check after tokenization so any token equal to
"16" expands. Added regression coverage in
tests/multi-runtime-select.test.cjs for the four mixed-input forms.
2. The "maps Hermes to ~/.hermes for global installs" test invoked
getGlobalDir('hermes') without isolating HERMES_HOME. On a developer
machine that exports HERMES_HOME the assertion would fail even
though getGlobalDir was behaving correctly. Save/clear/restore the
env var around the assertion, mirroring the pattern the later
describe block already uses.
Full suite: 6128/6128 pass.
Per the issue spec for #2841 and CodeRabbit feedback on PR #2920, the
project-context filename rewrite should produce HERMES.md, not
.hermes.md. Reverts the earlier .hermes.md change at all 5 substitution
sites in bin/install.js and updates the corresponding regression test
in tests/hermes-install.test.cjs to assert HERMES.md.
Full suite: 6127/6127 pass.
CodeRabbit pointed out the post-creation guard is structurally
unreachable: immediately after `git checkout -b X origin/$DEFAULT_BRANCH`,
HEAD == origin/$DEFAULT_BRANCH, so both the merge-base form (`MB == DT`)
and the alternative "ahead-of" count form (`AHEAD == 0`) are sentinels
that always pass on a successful fresh checkout. With the explicit base
arg + fail-fast on the checkout, the guard cannot catch anything new.
Removing it (rather than swapping in another no-op that satisfies the
linter but adds no actual coverage) is the honest fix. Comment retained
to explain why no post-creation guard is needed: the explicit base
argument to `git checkout -b` is the single source of correctness for
#2916.
Same simplification mirrored in get-shit-done/workflows/quick.md.
Full suite: 6102/6102.
Two CodeRabbit findings on PR #2921 (review 4209533909 + comment
3171721073, both still unresolved):
A. Branch switch and create steps now abort on non-zero exit. Previously
`git switch "$BRANCH_NAME"` and `git checkout -b "$BRANCH_NAME"
"origin/$DEFAULT_BRANCH"` could fail (locked worktree, dirty tree
refusing the checkout, etc.) and the workflow would silently continue
on the wrong branch — sending the phase's later commits to the wrong
place. Both calls now `|| { echo "ERROR: …" >&2; exit 1; }`.
B. The fork-point base-warning is now scoped to the creation arm of
the if/else. Previously it ran for the resume path too, so a
legitimate resumed branch where origin/$DEFAULT_BRANCH had advanced
since first creation would falsely warn ("does not fork from
origin/<DEFAULT_BRANCH>"). Moving the check inside the else arm
means it only runs immediately after a fresh `git checkout -b`, when
the merge-base check is meaningful.
Same fix mirrored in get-shit-done/workflows/quick.md.
execute-phase.md stays at the 1700-line XL budget. Full suite: 6102/6102.
Two follow-ups on commit 80f14cac (which hardened quick-branching with a
trunk fixture):
1. quick-branching.test.cjs: add a `defaultBranch` parameter to
setupFixture and run the "branches off origin/HEAD" assertion against
both `main` and `trunk`. The wholesale switch to trunk in 80f14cac
removed coverage of the conventional `main` path; parameterizing
restores it without giving up the symbolic-ref guarantee.
2. bug-2916-handle-branching-default-base.test.cjs: apply the same
parameterization here. handle_branching has the same default-branch
detection logic as Step 2.5, so it deserves the same trunk regression
guard. Previously this file only exercised `main`.
A regression that silently defaults to `main` instead of consulting
`git symbolic-ref refs/remotes/origin/HEAD` now fails the `trunk`
variant in both files.
Tests: 10/10 in the touched suites.
- Restrict the "init parse list includes branch_name" assertion to
the bash blocks inside Step 2 (Initialize) so an unrelated step
that mentions branch_name cannot mask the contract.
- Switch the fixture's default branch from main to trunk so the
symbolic-ref code path is locked in: a regression that silently
defaults to "main" instead of consulting origin/HEAD now fails.
Addresses CodeRabbit review on PR #2921.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace the "ahead-of" heuristic with a structural check that compares
the HEAD↔origin/$DEFAULT_BRANCH merge-base to origin/$DEFAULT_BRANCH
itself. The previous count-based warning fired on legitimate WIP that
was simply ahead of the default branch — the correct signal is that
the branch did not fork from the default branch in the first place.
Addresses CodeRabbit review on PR #2921.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two test files were asserting installer prompt behavior by regex/.includes()
against bin/install.js source. Per CONTRIBUTING.md "no-source-grep"
testing standard, replace with structured assertions:
- tests/kilo-install.test.cjs: import runtimeMap and buildRuntimePromptText
from the install module; assert runtimeMap['11'] === 'kilo' and that the
rendered prompt lists Kilo above OpenCode without marketing copy.
- tests/multi-runtime-select.test.cjs: import runtimeMap, allRuntimes,
parseRuntimeInput, buildRuntimePromptText. Assert exported runtimeMap
matches the canonical option list, allRuntimes contains every runtime
exactly once, prompt text lists Hermes (10), Qwen Code (13), Trae (14),
All (16), and parser splits/dedupes by exercising parseRuntimeInput
rather than regexing source code.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Per spec in #2841, all 86 GSD skills must collapse into a single "gsd"
category in Hermes' system prompt. Previous code passed skills/ as the
install root, producing a flat skills/gsd-*/ layout that inflated
Hermes' loader output to 86 top-level entries.
Changes:
- Install path now writes to skills/gsd/{DESCRIPTION.md, gsd-*/SKILL.md}
- Uninstall removes the entire skills/gsd/ category dir plus any leftover
flat-layout gsd-*/ from older installs (graceful migration)
- writeManifest emits skills/gsd/<skill>/<file> paths for Hermes
- --skills-root hermes returns the nested category path so /gsd-sync-skills
syncs into the right directory
- DESCRIPTION.md at category root carries name/version/description so
Hermes' skill loader surfaces the GSD category in the system prompt
Also extracts promptRuntime's runtimeMap, allRuntimes, parseRuntimeInput,
and buildRuntimePromptText to module scope and exports them so tests can
assert structurally instead of grepping bin/install.js source.
Existing hermes-install tests updated to expect the nested layout and
to verify the category DESCRIPTION.md frontmatter (name, version,
description) using the shared parseFrontmatter helper.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
CodeRabbit nitpick (per project policy `feedback_no_source_grep_tests`):
the prior `tests/quick-branching.test.cjs` asserted branching correctness
by `.includes()`-grepping the raw markdown content for literal command
substrings. Those assertions stayed green even when the underlying
behavior regressed (e.g. when `git checkout -b` was unconditionally run
from the wrong HEAD).
Replace with the same pattern as `bug-2916-handle-branching-default-base
.test.cjs`:
- Structurally extract the Step 2.5 bash block from quick.md by
walking the markdown for fenced ```bash blocks under the heading
(no regex on prose).
- Spin up a fixture git repo with a bare origin, a clone whose
`origin/HEAD` points at `main`, and a checked-out previous-task
branch carrying its own unmerged commit.
- Execute the extracted bash block via `bash -c` and assert that
the new branch's tip equals `origin/main` (0 commits inherited
from the previous-task HEAD).
- Add a reuse test that pre-creates the target branch with its own
commit and verifies the script switches back to it without a
rebase or reset.
The two informational tests (workflow file exists, branching runs
before task-directory creation) are retained, plus the `branch_name`
parsing assertion is rewritten to walk fenced bash blocks rather than
substring-grep arbitrary content.
Address CodeRabbit HIGH findings on PR #2921. The previous fix had three
unconditional code paths where `git checkout -b "$BRANCH_NAME"` would run
from the *current* HEAD when the upstream sync failed silently:
- the dirty-tree warn-and-continue path,
- the clean path where `git switch` / `git merge --ff-only` errors were
swallowed by `2>/dev/null` (still falling through to checkout -b),
- any case where `git fetch` failed but the script continued.
This rewrites both `execute-phase.md` (handle_branching) and `quick.md`
(Step 2.5) to:
1. Fetch origin/$DEFAULT_BRANCH; if fetch fails AND no local copy of
origin/$DEFAULT_BRANCH exists, abort with a clear ERROR (exit 1)
rather than create the branch off arbitrary HEAD.
2. Always create the new branch with an explicit start point:
`git checkout -b "$BRANCH_NAME" "origin/$DEFAULT_BRANCH"`. The base
is now deterministic regardless of which branch is currently
checked out, regardless of whether the optional local fast-forward
succeeded, and regardless of dirty-tree state.
3. Carry uncommitted changes onto the new (origin-pinned) branch
instead of inheriting the previous-phase HEAD as a fallback base.
The post-creation INHERITED check now references origin/$DEFAULT_BRANCH
rather than the (possibly-stale) local default branch, so the warning
fires accurately even when the local fast-forward was skipped.
Four fixes from review of hermes-agent.nousresearch.com docs:
1. SKILL.md frontmatter now declares `version` (required field per
Hermes spec). Plumbed through `convertClaudeCommandToClaudeSkill`
gated on runtime='hermes' so other runtimes' frontmatter is unchanged.
2. Project-context filename rewrite changed from `HERMES.md` (not
discovered by Hermes) to `.hermes.md` (top of Hermes' discovery list:
.hermes.md → AGENTS.md → CLAUDE.md → .cursorrules).
3. README + finishInstall now show `/gsd-help` and `/gsd-new-project`
for Hermes; per docs, Hermes auto-exposes skills as slash commands.
4. Hermes tests now parse SKILL.md frontmatter structurally via the
shared parseFrontmatter helper instead of substring-matching source
text, and assert the version/name/description shape required by
Hermes' skill_view().
Full suite: 6128/6128 pass (3 new structural assertions).
The docstring coverage pre-merge check (default: warning at 80% threshold)
produces false-positive warnings on PRs whose new code is entirely test
files: it counts test(...) / beforeEach / afterEach arrow-function
callbacks as functions and reports 0% coverage because nothing has JSDoc.
CR's documented schema for reviews.pre_merge_checks.docstrings only
accepts `mode` and `threshold` — there is no per-check path filter that
would let us exclude tests/** while keeping the check active elsewhere.
The top-level path_filters approach would silence ALL CR review on test
files (security scans, out-of-scope checks, the substantive line-level
findings) which we want to keep.
Disabling the check entirely is the right call for this repo because:
- GSD ships a CLI + agent runtime, not a documented public library
- The internal helpers that warrant JSDoc already have it
- The other CR pre-merge checks (out-of-scope, security, title) are
meaningful for this codebase and stay enabled
Closes#2932
Two existing tests called extractBlockquotes(reportStep) without first
asserting reportStep was non-null. If the workflow file ever loses its
`<step name="report">` block, the test would fail with a confusing
TypeError on the destructuring inside extractBlockquotes instead of a
clear "report step must exist" assertion.
Add assert.ok(reportStep, ...) guards at the two missing call sites
(lines 100 and 130). The other two call sites (lines 75-83) already
had guards.
Addresses CodeRabbit comment on PR #2918.
Ports the pre-publish CI gates that release.yml applies into release-sdk.yml,
so the stopgap workflow ships releases at the same quality bar as the
canonical workflow (minus the @gsd-build/sdk publish, still intentionally
omitted, and the release-branch ceremony, intentionally omitted).
Changes (all mechanical copies of release.yml patterns):
- install-smoke as needs: dependency. The reusable workflow at
.github/workflows/install-smoke.yml runs the cross-platform install
matrix (Ubuntu 22/24, macOS 24, packed-vs-unpacked). Publish job
won't start until install-smoke passes for the dispatched ref.
- npm test → npm run test:coverage. Full coverage gate, matching
release.yml's pre-publish test step.
- Tolerant tag-existence check. The previous upfront "refuse if tag
exists" was too strict — operators re-running after a mid-flight
publish-step failure would be blocked by the tag they successfully
pushed last time. New behavior matches release.yml: skip the tag
step if the tag points at HEAD; error only if it points elsewhere.
- Tag-and-push step gets the same skip-if-at-HEAD pattern.
- New "Re-point next dist-tag at the new latest" step, gated on
tag=latest. Matches release.yml#finalize "Clean up next dist-tag" —
keeps @next from going stale relative to @latest.
- New "Create GitHub Release" step. Per-tag flag selection:
tag=dev, tag=next → --prerelease (won't be highlighted on repo home)
tag=latest → --latest (becomes the highlighted release)
All use --generate-notes so the release body auto-fills from commits.
- Summary updated to mention the GitHub Release and dist-tag re-point.
Out of scope per #2929:
- canary.yml, release.yml unchanged (verified by file diff)
- bin/install.js unchanged (install path already uses bundled SDK)
- No @gsd-build/sdk publish anywhere
- No release/X.Y.Z branch ceremony (this stopgap targets dispatched
ref directly)
Adds a workflow_dispatch-only release path that publishes get-shit-done-cc
to ONE chosen dist-tag per run (dev | next | latest), with the SDK
bundled inside the CC tarball both as the existing loose sdk/dist/ tree
and as a fresh sdk-bundle/gsd-sdk.tgz npm-installable artifact.
Why: @gsd-build/sdk publishes from canary.yml and release.yml fail because
the @gsd-build npm token is currently unavailable. CC users don't consume
@gsd-build/sdk directly — bin/gsd-sdk.js resolves sdk/dist/cli.js from
inside the installed CC package. This workflow ships only get-shit-done-cc
(which we hold the token for) and bundles the SDK two ways so any future
install path can pick whichever shape it needs.
The new sdk-bundle/ directory is added to the CC files whitelist in-tree
at build time only — never committed. Existing canary.yml and release.yml
are intentionally untouched; restore them to primary use once the
@gsd-build/sdk token is recovered.
Per-tag version derivation when the version input is empty:
- dev → <base>-dev.N (next sequential, scanning v<base>-dev.* tags)
- next → <base>-rc.N (matches release.yml convention)
- latest → <base> (clean, no suffix)
Refuses to publish when the version already exists on npm or has an
existing git tag (no accidental overwrites). Verifies the publish landed
on the registry and the dist-tag resolves correctly before marking the
run successful.
handle_branching in execute-phase.md (and the equivalent step in quick.md)
created the per-phase branch from whatever branch happened to be checked
out — typically the previous phase's still-unmerged feature branch — so
consecutive phases compounded on top of each other and stayed unpushed.
Detect the default branch via git symbolic-ref refs/remotes/origin/HEAD,
fast-forward it from origin, and fork the new phase branch off that tip.
Existing branches are still reused as-is. Dirty working trees fall back
to current HEAD with a loud warning, and a post-creation guard reports
any inherited commits.
Regression test extracts the bash from the <step name="handle_branching">
block structurally and runs it against a fixture repo where HEAD sits on
a previous-phase branch with extra commits.
Two bugs in the audit-open dispatch case in bin/gsd-tools.cjs:
1. Bare output(...) calls (only core.output is in scope) threw
ReferenceError: output is not defined on every invocation,
blocking the first step of /gsd-complete-milestone.
2. Even after switching to core.output(formattedReport, raw), the
human-readable branch JSON-stringified the formatted text because
core.output only bypasses JSON encoding when called as
core.output(null, true, rawValue).
Fix:
- --json path: core.output(result, raw) — pass the object,
let core.output JSON-stringify (don't pre-stringify).
- text path: core.output(null, true, formatAuditReport(result))
— use the rawValue form to emit verbatim section dividers and
item lists.
Adds tests/bug-2911-audit-open-output-shape.test.cjs which parses
both modes structurally — line-by-line for text mode (asserting the
report headers exist as standalone lines, not as escaped \n inside a
JSON quoted string), and JSON.parse + key-by-key shape assertions for
--json mode (matching the contract returned by auditOpenArtifacts).
The report step in workflows/progress.md had no directive establishing
PROJECT.md/STATE.md/ROADMAP.md as the authoritative sources for the
progress report. When init.progress returned project_exists: false (e.g.
invoked from a subdirectory without .planning/), the model fell back to
whatever was in its session context — including stale CLAUDE.md
## Project blocks — and produced routing output citing the wrong
milestone/phase.
Add a blockquote directive at the top of the report step that names
PROJECT.md, STATE.md, and ROADMAP.md as authoritative and forbids using
the CLAUDE.md ## Project block as a source for any progress report field.
Fixes#2912
Adds Hermes Agent as a supported installation target. Users can run
\`npx get-shit-done-cc --hermes\` to install all 86 GSD commands as
skills under \`~/.hermes/skills/gsd-*/SKILL.md\`, following the same
open skill standard as Claude Code 2.1.88+, Qwen Code, Antigravity,
Trae, Augment, and Codebuddy.
Hermes Agent is an open-source AI agent framework by Nous Research
(NousResearch/hermes-agent, MIT). Its skill loader accepts the Claude
skill format as-is: frontmatter parsed with PyYAML SafeLoader (unknown
keys like \`allowed-tools\` / \`argument-hint\` ignored), body XML tags
(\`<objective>\`, \`<execution_context>\`, \`<process>\`) passed directly
to the model. Compatibility proven end-to-end with all 86 GSD skills
loading cleanly, \`skill_view()\` returning full bodies, and
\`build_skills_system_prompt()\` emitting them into the agent system
prompt — zero Hermes code changes required.
Changes:
- \`bin/install.js\`: --hermes flag, getDirName/getGlobalDir/getConfigDirFromHome
support, HERMES_HOME env var (native to Hermes — used for profile
mode / Docker deploys), install/uninstall pipelines, interactive
picker option 10 (alphabetical: between Gemini and Kilo), .hermes
path replacements in copyCommandsAsClaudeSkills and
copyWithPathReplacement, legacy commands/gsd cleanup, CLAUDE.md ->
HERMES.md and "Claude Code" -> "Hermes Agent" content rewrites in
skills/agents/hooks, runtime-appropriate finish message.
- \`get-shit-done/bin/lib/core.cjs\`: add hermes to KNOWN_RUNTIMES;
add RUNTIME_PROFILE_MAP.hermes with OpenRouter-slug defaults
(Hermes is provider-agnostic; these defaults resolve across
OpenRouter, native Anthropic, and Copilot via Hermes' aggregator-
aware resolver, and are overridable per-tier via
model_profile_overrides.hermes.{opus,sonnet,haiku}).
- \`README.md\`: Hermes Agent in tagline, runtime list, verification
command, install/uninstall examples, \`--hermes\` flag reference.
- \`tests/hermes-install.test.cjs\`: new, 14 tests covering directory
mapping, HERMES_HOME env var precedence, install/uninstall
lifecycle, user-skill preservation, engine cleanup.
- \`tests/hermes-skills-migration.test.cjs\`: new, 11 tests covering
frontmatter conversion, path replacement (~/.claude/ ->
\$HERMES_HOME/skills/), CLAUDE.md -> HERMES.md, "Claude Code" ->
"Hermes Agent", stale skill cleanup, SKILL.md format validation.
- \`tests/multi-runtime-select.test.cjs\`: updated for new option
numbering (hermes=10, kilo=11, opencode=12, qwen=13, trae=14,
windsurf=15, all=16).
- \`tests/kilo-install.test.cjs\`: updated assertions for Kilo having
moved from option 10 to option 11.
Closes#2841
Implementation notes:
- Zero custom code paths: Hermes reuses copyCommandsAsClaudeSkills()
identical to Qwen Code / Antigravity pattern.
- Path replacement: ~/.claude/, \$HOME/.claude/, ./.claude/ ->
.hermes equivalents in skill/agent/hook content.
- Config precedence: --config-dir > HERMES_HOME > ~/.hermes (matches
how Hermes itself resolves its home directory).
- Legacy cleanup: removes commands/gsd/ if present from a prior
install, preserving dev-preferences.md (same as Qwen).
- No external dependencies added.
Testing: 5841 / 5841 tests pass (0 failures, 0 regressions)
- 14 new tests in hermes-install.test.cjs
- 11 new tests in hermes-skills-migration.test.cjs
- multi-runtime-select.test.cjs renumbered + 1 new test (single choice for hermes)
* fix: parse non-REQ IDs in gap-analysis and ignore table headers
* fix: parse requirement IDs from first traceability column only
---------
Co-authored-by: Tom Boucher <thomas.boucher@sas.com>
* fix(#2893): surface non-canonical plan filenames instead of silently returning zero plans
Reporter saw `plan_count: 0` from `/gsd:execute-phase` even though five
plan files existed on disk. Investigation showed the planner had written
files like `01-PLAN-01-foundation.md`, while `phase-plan-index`'s strict
filter (`f.endsWith('-PLAN.md') || f === 'PLAN.md'`) rejected them
silently — collapsing two distinct states into the same `plans: []`
return:
- directory truly has no plans (legit empty)
- directory has plans but the filter rejected them (user/agent error)
The canonical contract is documented in three places:
- `agents/gsd-planner.md` write_phase_prompt step (lines 1063-1080)
- `commands/gsd/plan-phase.md`
- `references/universal-anti-patterns.md` (rule 26)
It mandates `{padded_phase}-{NN}-PLAN.md` and explicitly forbids
`PLAN-NN.md` / `01-PLAN-01.md` / `plan-NN.md` etc. The strict filter is
correct per that contract. The bug is that the executor never tells the
user when the contract was violated — they just see `plan_count: 0`
with no signal.
Fix: add a diagnostic helper `describeNonCanonicalPlans()` that scans
the phase directory for files matching `*PLAN*.md` (the diagnostic net)
that the canonical filter rejected, excluding legit derivatives like
`*-PLAN-OUTLINE.md` and `*-PLAN.pre-bounce.md`. When offenders exist,
return a `warning` field naming each one and citing the canonical
pattern so the user knows what to rename to.
Wired into the three filter sites:
- `phase-plan-index` (the executor's main entry point)
- `phases list --type plans`
- `find-phase`
The strict filter itself is unchanged — existing canonical plans behave
identically. This is purely a diagnostic that converts silent-empty
into loud-with-actionable-error.
Tests:
- `phase-plan-index returns warning for reporter's exact filename
pattern (`01-PLAN-01-foundation.md`)`
- `truly empty dir does not emit a warning`
- `canonical plans + outline + pre-bounce files do not emit a warning`
Closes#2893
* test(#2893): add parity tests for find-phase and phases list --type plans warnings
CodeRabbit's only finding on the prior commit: I wired the warning into
three filter sites (`phase-plan-index`, `find-phase`,
`phases list --type plans`) but only `phase-plan-index` had test
coverage for the warning shape. The other two paths could silently
diverge during future refactors — exactly the silent-drift class of bug
this fix exists to prevent.
Add four parity tests mirroring the existing two:
- find-phase: non-canonical filenames produce a warning naming each
offender + citing the canonical pattern.
- find-phase: canonical plan + derivative files (PLAN-OUTLINE,
pre-bounce) produce no warning.
- phases list --type plans: same non-canonical case, but assert the
warning is prefixed with `${dir}: ` (this path aggregates across
phase directories so each offender is tagged with its dir).
- phases list --type plans: canonical case, no warning.
`node --test tests/phase.test.cjs`: 98/98 pass (was 94, +4 new).
* feat(#2792): namespace meta-skills retargeted at the post-#2790 surface
This branch is now based on #2790's HEAD (the consolidation PR) instead
of main, and every routing table targets the consolidated surface so a
user routed by a namespace meta-skill never lands at a deleted /
folded sub-skill.
Cross-PR inconsistencies the original PR #2825 carried (vs #2790):
- ns-ideate routed to gsd-note / gsd-add-todo / gsd-add-backlog /
gsd-plant-seed → all folded into gsd-capture by #2790. Now routes
to gsd-capture (the parent picks the mode from the user's intent).
- ns-context routed to gsd-scan and gsd-intel → folded into
gsd-map-codebase --fast / --query by #2790. Now routes to those
flag forms.
- ns-manage routed all workspace intent to gsd-list-workspaces (a
list-only entry) → CR also flagged the over-narrow target. #2790
folds into gsd-workspace; routing now points there.
- ns-workflow routed to gsd-research-phase → deleted outright by
#2790. Removed.
- ns-project routed to gsd-plan-milestone-gaps → deleted outright by
#2790. Removed.
- None of the namespaces previously surfaced #2790's new consolidated
skills (gsd-capture, gsd-phase, gsd-config, gsd-workspace,
gsd-progress). All five are now reachable through the routers.
- extract_learnings → extract-learnings (canonicalized by #2858).
Defect fixes within the namespace skills:
- Hyphen-form `name:` (gsd-workflow, …) per the canonical naming
contract — the colon-form addressed CR's drift complaint.
- `Skill` added to allowed-tools on every router. The body instructs
"Invoke the matched skill directly using the Skill tool" — without
Skill in the permission list the meta-skill cannot route at all.
New regression guard in tests/enh-2792-namespace-skills.test.cjs: every
gsd-* token in any namespace router's table column resolves to a
surviving commands/gsd/*.md file (or to a known consolidated parent for
flag-form targets like gsd-map-codebase --fast). This single test would
have caught every dead-end route the original PR shipped with.
Skill-count cap in tests/enh-2790-skill-consolidation.test.cjs now
filters out ns-*.md from its <= 63 cap. Namespace routers are
descriptor-only entries, not part of the consolidation surface that cap
is policing — they have their own contract in
tests/enh-2792-namespace-skills.test.cjs.
INVENTORY.md gains a "Namespace Meta-Skills" section with the 6 router
rows; INVENTORY-MANIFEST.json gains 6 entries; the headline count moves
59 → 65 to match.
Out of scope for this rebase: the gsd-health --context flag (PR #2825
advertised the contract but didn't implement it). That's a separate
feature concern and is left untouched here.
5908/5908 on `npm test`.
* feat(#2792): implement gsd-health --context utilization guard
The original PR #2825 advertised a `--context` flag on gsd-health with a
60%/70% utilization threshold table but never implemented the workflow
logic — CR caught it as a contract leak, the rebase deferred it. This
commit closes the gap with TDD red/green/refactor.
Math layer (pure):
- get-shit-done/bin/lib/context-utilization.cjs
classifyContextUtilization(tokensUsed, contextWindow) →
{ percent, state }
State boundaries use the exact ratio:
< 60% healthy / 60–70% warning / ≥ 70% critical (fracture point)
Display percent rounded for humans. Throws TypeError on non-integer
or out-of-range inputs.
- STATES = Object.freeze({ HEALTHY, WARNING, CRITICAL }) exported
so callers reference the names by symbol, not by literal string.
SDK CLI integration:
- get-shit-done/bin/gsd-tools.cjs
`validate context --tokens-used N --context-window M [--json]`
routes to the classifier, owns the recommendation copy (the
classifier intentionally does not — keeps the renderer free to
evolve without touching the math layer or its tests), and uses
core.output's rawValue path for the sync-flush guarantee.
- sdk/src/query/validate.ts + sdk/src/query/index.ts
TypeScript validateContext handler registered at 'validate.context'
and 'validate context'. Mirrors the CJS classifier inline (15 lines
of arithmetic; not worth a shared cross-language module).
User-facing wiring:
- commands/gsd/health.md frontmatter advertises --context, body
documents the three-state threshold table.
- get-shit-done/workflows/health.md adds a `context_check` step
that's reached only when --context is set. Step calls
`gsd-sdk query validate.context` with self-reported tokensUsed and
contextWindow, prints the SDK output verbatim, and ends. Includes
a TEXT_MODE plain-text fallback for non-Claude runtimes per #2012.
Tests:
- tests/context-utilization.test.cjs (17 tests) — pure-function
contract: state thresholds at every boundary, percent rounding,
input validation, return-shape (no recommendation field — that's
the renderer's job).
- tests/validate-context.test.cjs (9 tests) — SDK CLI plumbing:
arg parsing errors, JSON vs human rendering, recommendation copy
pinned per state.
- tests/enh-2792-namespace-skills.test.cjs (4 new tests) — markdown
contract: --context advertised in argument-hint, threshold table
in command body, context_check step exists in workflow, step
invokes gsd-sdk query validate.context with both flags.
Inventory bookkeeping:
- docs/INVENTORY.md "CLI Modules" 31 → 32; new row for
context-utilization.cjs.
- docs/INVENTORY-MANIFEST.json mirror.
5939/5939 on `npm test`.
Closes#2876 follow-up — CI on main fails because the punctuation test
in tests/gemini-namespacing.test.cjs hardcoded `/gsd-scan` as a known
command, but #2824 (consolidate 86 → 59 skills) removed scan.md from
commands/gsd/. The roster now correctly returns "scan is unknown, leave
unchanged" — the conversion is right, the test fixture is stale.
Swap `scan` for `health` in the punctuation test. Both are bedrock
commands; the test still exercises the original intent (period vs
exclamation handling on adjacent slash commands).
Note added so the next consolidation reviewer knows the swap pattern.
`npm test`: 5936/5936 pass.
* feat(#2833): parseStateMd reads phase-lifecycle frontmatter fields
Extend parseStateMd() to parse 4 new STATE.md frontmatter fields that drive
the phase-lifecycle status-line proposed in #2833:
- active_phase : phase number when orchestrator is in-flight, null when idle
- next_action : recommended next command when idle
- next_phases : YAML flow array of phase numbers for next_action
- progress : nested block with completed_phases / total_phases / percent
All fields default to undefined when absent — formatGsdState() (next commit)
degrades gracefully so existing STATE.md files keep rendering as before.
YAML scope intentionally narrow:
- Only top-level scalar keys (status, milestone, active_phase, next_action)
- Only single-line flow array for next_phases ([...])
- progress block requires 2-space indent for nested keys
Block sequences (- item over multiple lines) and inline comments inside
nested blocks are NOT parsed — keeping the regex-based parser predictable.
Comments outside frontmatter or after the closing --- still work.
Tests: all 27 existing tests still pass (no behavior change for STATE.md
files that don't carry the new fields).
Refs #2833
* feat(#2833): formatGsdState renders phase-lifecycle scenes + opt-in progress bar
Extend formatGsdState() with three lifecycle scenes that activate when the
new STATE.md frontmatter fields (added in the previous commit) are present.
Also append an opt-in progress bar to the milestone segment when
progress.percent is available.
Scenes (first match wins; falls through to the existing path otherwise):
1. active_phase set → 'v2.0 [██░] X% · Phase 4.5 executing'
(status field carries the lifecycle stage:
discussing / planning / executing / verifying)
2. active_phase null + → 'v2.0 [██░] X% · next execute-phase 4.5'
next_action set (idle state — surfaces what the user should
run next without opening STATE.md)
3. percent=100 (or → 'v2.0 [██████████] 100% · milestone complete'
completed=total)
4. (default fallback) → 'v1.9 Code Quality · executing · ph (1/5)'
(existing rendering, byte-for-byte preserved
when none of the new fields are populated)
Backward compat is the design priority:
- STATE.md files without the new fields render identically to v1.38.x
- progress bar is opt-in (empty string when percent absent)
- Each new scene only activates when its specific fields are populated
A new helper renderProgressBar() generates the 10-segment bar that matches
the existing context meter style (so the two bars on the status-line are
visually consistent).
Tests: 27/27 existing tests still pass.
Refs #2833
* test(#2833): cover parseStateMd lifecycle fields + formatGsdState scenes
26 new tests organized in 5 describe blocks, modeled after the existing
enh-2538-statusline-last-command.test.cjs convention:
parseStateMd #2833 lifecycle fields (7 tests)
- reads active_phase / next_action / next_phases / progress.percent
- 'null' literal handled correctly
- YAML flow array parsing (1 item, multiple items)
- progress nested block (3 fields)
- absent fields return undefined
formatGsdState #2833 lifecycle scenes (6 tests)
- Scene 1: active_phase set → 'Phase X.Y <stage>'
- Scene 2: idle + next_action → 'next <action> <phases>' (1+ phases)
- Scene 3: percent=100 OR completed=total → 'milestone complete'
formatGsdState #2833 backward compatibility (4 tests) — CRITICAL
- Legacy STATE.md (no new fields) renders byte-for-byte unchanged
- Empty state, partial state, progress-bar-opt-in all preserved
progress bar rendering (6 tests)
- 0% / 50% / 100% / clamping / opt-in absence
formatGsdState #2833 scene priority (3 tests)
- active_phase wins over next_action when both populated
- next_action wins over fallback when active_phase null
- percent=100 wins over fallback even with phase set
Combined run: 53/53 tests pass (existing 27 + new 26).
Refs #2833
* docs(#2833): describe phase-lifecycle frontmatter fields and rendering scenes
Add docs/STATE-MD-LIFECYCLE.md as the canonical reference for the four new
STATE.md frontmatter fields and the four status-line rendering scenes
introduced by this proposal:
- Frontmatter field reference (active_phase / next_action / next_phases /
progress.percent) with type and population semantics
- Why progress.percent is intentionally the phase dimension and not the
plans dimension (plans dimension trends optimistic when future phases
are unplanned)
- The four rendering scenes including their priority order
- Stage-label convention for Scene 1 (discussing / planning / executing /
verifying matching the four phase orchestrators)
- Frontmatter parsing constraints — frontmatter must start at file head,
no comments inside nested blocks, next_phases is single-line flow only
- Backward-compatibility guarantee (locked in by the test suite)
- Cross-links to the foundation issue #1989 and the read-side issues
this proposal helps close
The document deliberately scopes itself to the read-side (what the hook
parses, what it renders). Write-side SDK and workflow changes that
auto-maintain the fields are out of scope for this PR so each piece can
be reviewed independently — see the issue thread for the full proposal.
Refs #2833
* test(#2833): simplify '0% renders 10 empty segments' assertion
Address CodeRabbit nitpick — drop the convoluted assert.equal that built
the expected value via .replace() and rely on the existing assert.ok
includes-check. The behavior under test is unchanged; the assertion is
just easier to read.
Refs #2884 review comment
* feat(#2790): consolidate 86 gsd-* skills to 59 — zero functional loss
Closes#2790
- `capture.md` — absorbs add-todo (default), note (--note), add-backlog
(--backlog), plant-seed (--seed), check-todos (--list)
- `phase.md` — absorbs add-phase (default), insert-phase (--insert),
remove-phase (--remove), edit-phase (--edit)
- `config.md` — absorbs settings-advanced (--advanced),
settings-integrations (--integrations), set-profile (--profile);
settings.md retained as-is
- `workspace.md` — absorbs new-workspace (--new), list-workspaces
(--list), remove-workspace (--remove)
- `update.md` — adds --sync (absorbs sync-skills) and --reapply
(absorbs reapply-patches)
- `sketch.md` — adds --wrap-up (absorbs sketch-wrap-up)
- `spike.md` — adds --wrap-up (absorbs spike-wrap-up)
- `map-codebase.md` — adds --fast (absorbs scan) and --query (absorbs
intel)
- `code-review.md` — adds --fix (absorbs code-review-fix)
- `progress.md` — adds --next (absorbs next) and --do (absorbs do)
join-discord, research-phase, session-report, from-gsd2,
analyze-dependencies, list-phase-assumptions, plan-milestone-gaps
autonomous.md: updated Skill(skill="gsd:code-review-fix") →
Skill(skill="gsd:code-review", args="--fix --auto") to match
the consolidated skill name
- New: tests/enh-2790-skill-consolidation.test.cjs (48 tests)
- Updated: 14 existing test files redirected from deleted command paths
to their consolidated equivalents
- docs/INVENTORY.md: Commands count 86→59, ghost rows removed, new
consolidated rows added
- docs/INVENTORY-MANIFEST.json: regenerated to match filesystem
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* docs(#2790): add CHANGELOG entry for skill consolidation
* docs(#2790): update COMMANDS.md for 86→59 skill consolidation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(#2790): address CodeRabbit review findings
- CHANGELOG.md: add --next alongside --do in progress flag list
- config.md: remove trailing space from --profile code span (MD038)
- COMMANDS.md: add required descriptions to /gsd-phase examples;
/gsd-phase without args errors, not interactive
- COMMANDS.md: add --next and --do to /gsd-progress flags table + examples
- test: convert content.includes('--reapply') to structural frontmatter
parse; add allow-test-rule comment for workflow content assertions
- test: replace redundant existsSync duplicate with assertion that verifies
the full consolidated flag surface (--sync | --reapply) in argument-hint
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(#2790): restore reapply-patches workflow and strengthen test assertions
- Create get-shit-done/workflows/reapply-patches.md: the #2790 consolidation
deleted the 14K combined command+workflow file (reapply-patches.md) but
update.md already referenced the workflow via execution_context_extended.
Restoring it fixes a silent behavioral gap where --reapply had no workflow
to load. Includes full three-way merge logic, hunk verification table
(Step 4), and the Hunk Verification Gate (Step 5) that blocks cleanup
until all user-added hunks are confirmed present in the merged output.
- Fix update.md: /gsd-reapply-patches → /gsd-update --reapply (stale ref)
- Fix reapply-verify-hunks.test.cjs: was checking existsSync(update.md) 8×;
now points to the workflow file and asserts real behavioral content
(Post-merge verification, Hunk presence check, Line-count check, backup
reference, per-file tracking, structural ordering)
- Fix reapply-patches.test.cjs: replace content.includes() stubs with
frontmatter-parsed argument-hint assertions; replace 4 existsSync(update.md)
no-ops with real assertions against the workflow content
- Fix edit-phase.test.cjs: /gsd-edit-phase → /gsd-phase (COMMANDS.md now
documents the consolidated command with --edit flag)
- Fix next-safety-gates.test.cjs: split OR predicates into independent
assertions — --next in progress.md and --force in next.md workflow
- Fix workspace.test.cjs: add allow-test-rule comment for routing content
checks (command routing text IS the deployed behavioral contract)
- Fix bug-2439 test: strengthen pre-flight assertion to verify gsd-sdk is
referenced (not just --profile)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: address CodeRabbit review findings (CR round 2)
- INVENTORY.md: update sync-skills.md row to reference /gsd-update --sync
instead of stale /gsd-sync-skills (absorbed in #2790)
- enh-2380-sync-skills.test.cjs: align INVENTORY.md assertion with the
corrected reference; was asserting the old /gsd-sync-skills name while the
manifest test correctly asserted /gsd-update, creating conflicting expectations
in the same suite
- reapply-verify-hunks.test.cjs: add explicit notEqual(-1) assertions for all
three anchors before the ordering check so a missing anchor produces a clear
failure instead of a false positive (writeIdx=-1 < verifyIdx=5 is true)
- bug-2439-set-profile-gsd-sdk-preflight.test.cjs: defer fs.readFileSync until
after the existence assertion; eager describe-level read caused the suite to
crash before the existence test could run, making it effectively dead code
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(#2790): address CR — INVENTORY routing + reapply test contract wording
Two unresolved CodeRabbit findings (Major):
- docs/INVENTORY.md: workflow-file table still pointed at obsolete
/gsd-do, /gsd-next, /gsd-note, /gsd-add-todo, /gsd-add-backlog,
/gsd-check-todos, /gsd-plant-seed slash commands. Re-route to the
consolidated /gsd-progress (--next, --do) and /gsd-capture (--note,
--backlog, --seed, --list) so the inventory is internally consistent.
- tests/reapply-verify-hunks.test.cjs: 'verification tracks per-file
status' asserted on phrasing that doesn't appear in reapply-patches.md
(the 'per-file' substring only matched accidentally via 'sequential
integer per file'). Switch to the actual contract text — Hunk
Verification Table, one row per hunk per file, verified column.
* test(#2790): update CR-INTEGRATION tests for consolidated --fix invocation
After the merge of main (which carries #2843's hyphen-form fix), the
consolidation in this branch absorbs gsd-code-review-fix into gsd-code-review
as the --fix flag. Update the two CR-INTEGRATION tests that previously
asserted on the standalone gsd-code-review-fix skill name to instead assert
on a gsd-code-review invocation carrying --fix in its arg tokens.
Tests still parse Skill() invocations structurally; only the asserted
skill-name + arg-token shape changed.
* test(#2790): scope success_criteria check to the <success_criteria> block
CodeRabbit nitpick: 'success criteria includes verification' did a
whole-file substring check, which can false-pass if the phrase appears
elsewhere in the document. Extract the <success_criteria>...</success_criteria>
block first via extractTagBlock() and assert against that scope only.
* fix(#2790): post-rebase reconciliation with main
- INVENTORY.md/JSON: add reapply-patches workflow row + bump count to 85
- autonomous.md: switch consolidated --fix invocation to hyphen Skill name
- analyze-dependencies test: assert COMMANDS.md does NOT document the
consolidated-away /gsd-analyze-dependencies entry (was: bare .includes())
* fix(#2790): address remaining CR findings — strengthen contract tests
Doc-fixes:
- INVENTORY.md: route transition.md & edit-phase.md rows to consolidated
/gsd-progress --next and /gsd-phase --edit (was: deleted /gsd-next, /gsd-edit-phase)
- config.md --profile branch: document #2439 pre-flight `command -v gsd-sdk`
guard + install hint BEFORE the gsd-sdk invocation (closes opaque
"command not found: gsd-sdk" regression path)
Test discipline (no-source-grep contract):
- bug-2439: replace bare `content.includes('gsd-sdk')` with structured
parse of <context> block + --profile branch; assert pre-flight token,
install hint, #2439 citation, and ordering vs gsd-sdk invocation
- edit-phase: parse INVENTORY.md edit-phase.md row's "Invoked by" column
and assert `/gsd-phase --edit` (not the deleted /gsd-edit-phase)
- next-safety-gates: tighten `--next` documentation contract — require
--next AND --force AND completeness routing (was OR-based, passed when
only --next present)
- reapply-patches: parse argument-hint flag list structurally; scan ALL
<execution_context*> blocks for the @-include of reapply-patches.md;
parse Hunk Verification Table header columns directly; locate Step 5
via heading parsing then assert (i) table reference, (ii) verified=no
gate, (iii) STOP/halt directive, (iv) explicit absent-table halt path
- workspace: parse frontmatter, tokenize argument-hint across multiple
bracketed segments, parse @-include targets from <execution_context>
rather than substring-matching the file body
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(#2876): yamlQuote description in Copilot/Antigravity/Trae/CodeBuddy SKILL.md
A description starting with `[BETA]` (or any YAML flow indicator —
`{`, `*`, `&`, `!`, `|`, `>`, `%`, `@`, backtick) is parsed as a flow
sequence/mapping by YAML 1.2-strict loaders. gh-copilot's frontmatter
loader fails closed:
✖ ~/.copilot/skills/gsd-ultraplan-phase/SKILL.md: failed to parse YAML
frontmatter: Unexpected scalar at node end at line 2, column 21:
description: [BETA] Offload plan phase to Claude Code's ultraplan…
Six emission sites in `bin/install.js` re-wrote the description without
quoting, while the Claude variant (`convertClaudeCommandToClaudeSkill`)
already routed it through `yamlQuote`. Brought all six in line:
- convertClaudeCommandToCopilotSkill
- convertClaudeAgentToCopilotAgent
- convertClaudeCommandToAntigravitySkill
- convertClaudeAgentToAntigravityAgent
- convertClaudeCommandToTraeSkill
- convertClaudeCommandToCodebuddySkill
Each now wraps the value in `yamlQuote(...)` so any leading character is
parser-safe.
Regression test (tests/bug-2876-skill-frontmatter-quote.test.cjs) drives
the four command converters and two agent converters through the
reporter's exact "[BETA] …" description plus a grab-bag of YAML flow
indicators, asserting the emitted `description:` value is a quoted YAML
scalar. Also round-trips the value through `JSON.parse` for converters
that don't apply runtime-name substitution to confirm fidelity.
Updated 7 pre-existing substring assertions in copilot-install.test.cjs
and antigravity-install.test.cjs that hard-coded the unquoted form.
Round trip: 5893/5893 pass on `npm test`.
Closes#2876
* test(#2876): structurally parse frontmatter instead of substring-grep
Addresses CodeRabbit's two nitpicks on PR #2881: the pre-existing
substring assertions in copilot-install.test.cjs (4 sites) and
antigravity-install.test.cjs (3 sites) only got bumped from the unquoted
form (`description: Diagnose...`) to the quoted-prefix form
(`description: "Diagnose...`). Both are still raw-string checks against
rendered YAML and drift on any quoting/order change — exactly what the
project's CONTRIBUTING.md "no-source-grep" testing standard exists to
prevent.
Add `parseFrontmatter()` to tests/helpers.cjs — a small parser that
handles the YAML scalar forms the install converters emit (double-quoted
JSON, single-quoted with `''` escape, bare). Throws if the content has
no closed `---` block so a regression in the emitter shape fails loudly
rather than silently returning {}.
Refactor the 7 description-substring sites to compare on parsed values:
the assertion now reads as `fm.description === 'Diagnose planning
directory health'` rather than `result.includes('description: "Diagnose
planning directory health')`. Same coverage of the #2876 quoting
behavior, no coupling to byte-level quote style.
`npm test`: 5893/5893 pass.
Closes#2876
* test(#2876): make parseFrontmatter delimiter check CRLF/whitespace tolerant
CR nitpick on PR #2881 (review at 03:08:08Z): parseFrontmatter()
splits on '\n' and compares each line strictly to '---'. A
Windows-authored skill file (CRLF endings) leaves a trailing '\r'
on every line, so '---\r' fails the equality check, and the helper
throws "no closed --- block" on perfectly valid input. Same problem
with whitespace-padded delimiter lines.
Switch to splitting on /\r?\n/ and comparing the trimmed line.
Helper is used by tests/copilot-install.test.cjs and
tests/antigravity-install.test.cjs, so this also de-flakes those
suites on Windows runners.
5893/5893 on `npm test`.
* fix(commands): normalize gsd slash namespace drift
* fix(#2855): address CodeRabbit findings on namespace drift PR
Three CR findings, all valid:
1. autonomous.md line 783 still had `gsd:discuss-phase` (the PR's own
normalization missed this line). Switched to `gsd-discuss-phase` and
updated the matching test in autonomous-interactive.test.cjs that was
asserting the now-retired colon form.
2. tests/bug-2543-gsd-slash-namespace.test.cjs source-grepped the
fix-slash-commands.cjs script with .includes() rather than driving
its transform behaviour. Refactored fix-slash-commands.cjs to export
a pure transformContent(src, cmdNames) function, kept the CLI behaviour
unchanged via require.main, and replaced the source-grep block with
five behavioural cases: rewrite, multi-occurrence, idempotence on
canonical input, no-op on gsd-sdk/gsd-tools, and word-boundary safety.
3. tests/bug-2808-skill-hyphen-name.test.cjs matched `name:` anywhere
in SKILL.md; a stray name: in the body could satisfy the assertion.
Scoped the lookup to the YAML frontmatter block via the suggested
diff (parse the leading --- ... --- region first, then find name:
inside it).
Full suite: 5854/5854 passing.
* fix(#2855): address remaining CodeRabbit findings on PR #2858
Three structural concerns flagged on the namespace-drift fix PR:
1. scripts/fix-slash-commands.cjs:24 — `buildPattern([])` compiled
`/gsd:()(?=[^a-zA-Z0-9_-]|$)/g`. The empty capture group still
matches any `/gsd:` token followed by a non-word boundary
(whitespace, EOL, punctuation), rewriting it to a stray `/gsd-`.
Verified live: `transformContent("/gsd:", [])` → `"/gsd-"`. Added
a guard returning null from `buildPattern` on empty input and
updated `transformContent` and `processDir` to no-op when the
pattern is null.
2. tests/autonomous-interactive.test.cjs:44-47 — assertion was
`content.includes('gsd-discuss-phase') && content.includes('INTERACTIVE')`,
which would false-pass on any unrelated co-occurrence (e.g.
`INTERACTIVE=""` initialization plus a stray `gsd-discuss-phase`
prose mention). Replaced with a structural extraction: locate the
`**If \`INTERACTIVE\` is set:**` branch, bound it by the next
`**If` / `<step>` boundary, and assert the
`Skill(skill="gsd-discuss-phase", ...)` invocation lives inside
that region. Tolerates whitespace around `(`, `skill`, and `=`.
3. tests/bug-2808-skill-hyphen-name.test.cjs:104 — colon-call regex
was `Skill\(skill=...` and missed valid formatting like
`Skill(skill = "gsd:cmd")` or `Skill( skill = ...)`. Loosened to
`Skill\(\s*skill\s*=\s*...` so reformatting drift can't slip past
the namespace guard.
Verification: 5854/5854 pass on `npm test` from the rebased branch.
* fix(#2855): drop pre-validation filter that hid namespace drift
CR finding on tests/bug-2808-skill-hyphen-name.test.cjs:128: the test
collected generated skill directories with
`.filter(entry => entry.isDirectory() && entry.name.startsWith('gsd-'))`,
then validated namespace invariants over that filtered list. Anything
that violated the prefix invariant — `gsd:extract-learnings` (colon
form), `extract_learnings` without prefix, `Gsd-foo` mis-cased — would
silently disappear from the iteration and the test would falsely pass.
Drop the `startsWith('gsd-')` filter so every generated directory shows
up. Add explicit assertions before the existing per-skill loop:
- directory list is non-empty (catches a broken converter that
produces nothing)
- every directory begins with `gsd-`
- every directory contains no `:`
- every directory contains no `_`
Re-audited the full PR diff for the same anti-pattern: only this one
site filtered before validating the namespace; bug-2643 and
commands-doc-parity also use `readdirSync().filter()` but only by file
extension, which is correct.
5854/5854 on `npm test`.
* fix(#2855): address remaining CR findings (1 active + 2 nitpicks)
Three findings on PR #2858, all the same root cause: input narrowing
before validation lets drift slip past the guards.
1. tests/bug-2808-...:104 (active) — `colonCallRe` captured local names
with `[a-z0-9-]+`, which excluded the underscore. A drift like
`Skill(skill="gsd:extract_learnings")` (deprecated colon syntax with
the old underscore filename) silently slid through. Broadened the
capture to `[^'"\s)]+` so any malformed local name is surfaced; surrounding pattern (whitespace tolerance, escape support, flags)
unchanged.
2. tests/bug-2643-...:43 (nitpick) — `extractSkillNamesHyphen` and
`extractSkillNamesColon` had the same over-strict capture plus
relied on a single regex over raw bytes, which the project test-
rigor memory bans (`feedback_no_source_grep_tests.md`). Replaced
with `extractSkillCalls(content)` — a small structural extractor
that walks `Skill(` openers, locates each call's matching `)`,
parses the body's `skill = "..."` keyword argument with permissive
whitespace + quoting + escape handling, and returns
`{ name, raw }` records. The two namespace-form helpers become
thin filters over the structured output. Tightened the body class
to `[^'"\\]+` so a trailing escape `\` before the closing quote
(as in `Skill(skill=\"gsd-foo\", …)` written inside another string
context) doesn't get included in the captured name.
3. tests/bug-2543-...:44 (nitpick) — `DOC_SEARCH_FILES` was a hand-
curated 7-entry array. Every doc added in the future would silently
weaken drift detection until someone remembered to extend the list.
Replaced with `discoverDocSearchFiles(ROOT)`: globs every `.md`
under `docs/` and adds `README.md` if present. New docs are picked
up automatically.
Re-audited the diff surface for similar narrowings; no other sites
filter or constrain before validating namespace invariants.
5854/5854 on `npm test`.
* fix(#2855): recurse docs/ tree so localized translations are scanned too
CR finding: discoverDocSearchFiles() stopped at docs/*.md, leaving
localized translation trees (docs/ja-JP/, docs/zh-CN/, docs/ko-KR/,
docs/pt-BR/) and other nested doc collections (docs/skills/,
docs/superpowers/) invisible to the namespace-drift invariant.
Verified the gap: docs/ has 6 nested directories with ~30 .md files
that the previous top-level-only scan was skipping. None contain
/gsd: references today, but a future translation update or new
doc subdir could leak drift.
Switch to an iterative stack walk so every .md under docs/ is scanned
regardless of depth. Stack form (rather than recursion) avoids the
risk of running into the call-stack limit on deep doc trees.
5854/5854 on `npm test`.
---------
Co-authored-by: Tom Boucher <trekkie@nomorestars.com>
* fix(install): use colon namespace for Gemini slash commands and help reference
This fixes unexecutable command recommendations in Gemini CLI by correctly
namespacing slash commands (/gsd: instead of /gsd-) in all installed
artifacts (agents, commands, workflows).
- Implements a lazy command roster discovery to ensure 100% accurate
conversion and protect file paths, URLs, and agent names.
- Adds isolated behavioral and unit tests covering all boundary cases.
- Fixes hardcoded command strings in banners and help output.
Closes#2783
* fix(install): close roster gaps in Gemini /gsd- → /gsd: conversion (#2783)
Addresses adversarial review findings on PR #2768:
- Restore regex boundaries (lookbehind + extension lookahead). Roster-only
matching was insufficient: a URL like `https://example.com/gsd-plan-phase`
ends in a known command and would be incorrectly converted. Boundaries +
roster now agree before any conversion fires.
- Smarter trailing lookahead `(?!\.[a-z])` distinguishes file extensions
(`.cjs`, `.md`) from sentence-ending punctuation (`.` at end of input or
before whitespace), so `/gsd-help.` correctly converts.
- Fail loud on missing roster. `commands/gsd/` not found previously fell
through to an empty Set, silently no-op'ing every conversion — exactly the
bug this code exists to prevent. Now emits a one-shot console.warn (gated
on GSD_TEST_MODE) before returning the empty set.
- Drop unnecessary `i` flag — GSD commands are always lowercase; matching
uppercase tokens against a lowercase roster always misses anyway.
- Export `_resetGsdCommandRoster` for test isolation against the module-level
cache.
Test additions pin the actual safety property of the roster check by using
KNOWN command names embedded in URLs and sub-paths — the cases the prior
tests didn't reach because they used `gsd-tools` (not in roster). Added a
roster-load assertion that fails loudly if the empty-Set fallback path
silently neutralises conversions.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(install): centralize <sub> stripping and add structural test assertions
CodeRabbit findings on the prior commit:
- (actionable) Centralizing the Gemini conversion through
convertClaudeToGeminiMarkdown dropped the stripSubTags() call that the
inline command path used to make before TOML conversion. Move stripSubTags
inside convertClaudeToGeminiMarkdown so command/agent/non-command Gemini
outputs all have <sub> consistently stripped. Remove the now-redundant
stripSubTags call in convertClaudeToGeminiAgent (single source of truth).
- (nitpick) Replace `.includes()` checks in the TOML test with structured
parsing — JSON-decode each TOML value and assert on parsed fields, per
the project's "tests parse, never grep" convention.
- (nitpick) Strengthen the install behavioral test to read a real installed
artifact (.gemini/commands/gsd/plan-phase.toml), parse it, and assert the
prompt body actually contains a /gsd: reference and no unconverted
/gsd-plan-phase. A directory-only check would have passed even if every
conversion silently no-op'd.
- Add a regression test that <sub> tags are stripped through the
convertClaudeToGeminiMarkdown pipeline.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Tom Boucher <trekkie@nomorestars.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* fix(#2851): replace bare gsd-tools invocations with absolute path
`gsd-tools` is not a published bin entry — package.json declares only
get-shit-done-cc and gsd-sdk. The shipped invocation pattern is
`node "$HOME/.claude/get-shit-done/bin/gsd-tools.cjs" <subcommand>`,
used by every other workflow file.
Two leaked bare invocations:
- get-shit-done/workflows/plan-phase.md §13e (gap-analysis)
— reported in #2851; gap-analysis silently skipped on every plan-phase run
- get-shit-done/workflows/ingest-docs.md §finalize (commit)
— caught by the new structural test; ingest-docs commit step was broken
Both updated to canonical absolute-path form.
Adds tests/bug-2851-workflow-bare-gsd-tools.test.cjs which parses every
markdown file under get-shit-done/workflows/, extracts shell-fenced code
blocks, tokenizes each line, and asserts no token in command position is
the bare string `gsd-tools` (the trailing `.cjs` is a different token).
The test also asserts plan-phase.md's gap-analysis call uses the canonical
`node …/gsd-tools.cjs` form.
Closes#2851
* fix(#2851): catch third bare gsd-tools call in ingest-docs.md init
After the first commit, the structural test was strengthened to detect
bare `gsd-tools` inside `$(...)` and backtick command-substitution forms.
The improved test surfaced a third leak:
ingest-docs.md:55: INIT=$(gsd-tools init ingest-docs)
Fixed to canonical form
INIT=$(node "$HOME/.claude/get-shit-done/bin/gsd-tools.cjs" init ingest-docs)
plus the standard `@file:` handoff line that every other workflow uses
when capturing INIT (required by tests/windows-robustness.test.cjs).
Updated tests/bug-2801-ingest-docs-handler.test.cjs to match either
the bare `gsd-tools init ingest-docs` or canonical
`gsd-tools.cjs" init ingest-docs` form — the test's intent is to verify
the dispatch handler is wired, not to lock the bare-bin form that #2851
removes.
Closes#2851
* test(#2851): tighten ingest-docs and gap-analysis assertions to canonical form
CodeRabbit caught two soft assertions in the regression tests:
1. tests/bug-2801: the init-ingest-docs assertion accepted both the
legacy bare `gsd-tools` form and the canonical node-path form.
Since #2851 is the fix that removes the bare form, the test should
only accept the canonical absolute-path invocation. Switched to
parsed-bash-block extraction with an anchored regex on the full
`node "$HOME/.claude/get-shit-done/bin/gsd-tools.cjs"` path.
2. tests/bug-2851: the gap-analysis assertion used two loose
.includes()/word-boundary checks. Replaced with a single
assert.match() against the full canonical path so non-canonical
forms fail.
* test(#2851): env-assignment skip accepts lowercase identifiers too
CodeRabbit caught: the cmdIdx-skip regex /^[A-Z_][A-Z0-9_]*=/ only
matched uppercase variable names, so a line like `tmp=1 gsd-tools init`
would tokenize to ['tmp=1','gsd-tools','init'], the regex would fail on
'tmp=1', cmdIdx would stay at 0, and the command-position check would
compare 'tmp=1' against 'gsd-tools' — false negative.
POSIX shell variable names are [A-Za-z_][A-Za-z0-9_]*. Widen the regex
to match the actual lexical rule. Existing uppercase forms still work
(FOO=bar gsd-tools); now lowercase forms (tmp=1 gsd-tools) and mixed
case forms are also detected.
* fix(#2866): Codex installer strips legacy hooks at end-of-file without trailing newline
The four shape-strip regexes in `bin/install.js` (Codex install path)
required `\r?\n` at end. A stale GSD hook block sitting at end-of-file
without a trailing newline (common — many editors strip them, and the
legacy installer never wrote one) failed every shape, the installer
saw `gsd-check-update` already present, skipped writing the new
Nested-AoT block, and Codex 0.125+ refused to load with
invalid type: map, expected a sequence in `hooks`
Root cause + fix
================
Each shape's terminator changed from `\r?\n` to `(?:\r?\n|$)`, so
end-of-file is also a valid terminator.
Strip logic was lifted into a new pure helper
`stripStaleGsdHookBlocks(configContent)` that the install pipeline now
calls in place of the inline replace chain. The helper is exported via
the GSD_TEST_MODE module.exports for direct unit-test coverage.
Regression test
===============
`tests/bug-2866-codex-strip-no-trailing-newline.test.cjs` exercises
all four historical shapes (Shape 1 — pre-#1755 gsd-update-check;
Shape 2 — flat [[hooks]]+gsd-check-update; Shape 3 — single
[[hooks.SessionStart]] without nested .hooks; Shape 4 — correct
two-block nested) twice each: once with a trailing newline (regression
guard against the existing behavior) and once at end-of-file without a
trailing newline (the reporter's exact repro).
It also asserts:
- the helper is a no-op when no GSD reference is present, and
- Shape 4 strip does not leave an orphaned [[hooks.SessionStart]]
header behind (the same ordering invariant the inline code relied on).
The helper is loaded via `package.json` `bin` field, not a hardcoded
path — `tests/bug-2866-codex-strip-no-trailing-newline.test.cjs`
parses package.json and resolves `pkg.bin['get-shit-done-cc']` to
require the installer.
Closes#2866
* test(#2866): assert TOML structure, not raw-text substrings
CodeRabbit caught the strip assertions using `.includes()` against
raw TOML output. Added a small line-structural parseTomlShape() helper
(table headers + dotted-path key/value map, comments stripped) and
rewrote the assertions to:
- Verify no [[hooks.* table header survives the strip
- Verify no key carries a stale gsd-(update|check)-(check|update) value
- Verify history.persistence is preserved as the parsed string "save-all"
Behaviour is unchanged (the strip function under test is not modified).
The assertions now check structural shape rather than substring presence,
which catches re-shaping regressions that text matching would miss.
No new dependencies — the parser is local to the test and handles only
the small well-formed TOML these tests construct.
* refactor(#2866): replace regex hook strip with TOML AST removal
Per CR feedback on PR #2870: the regex-driven `stripStaleGsdHookBlocks`
implementation was fragile to whitespace, indentation, and key-ordering
variations the regression test never exercised. Variations the regex
silently leaked (verified before the rewrite):
- Shape 4 with an extra blank line between parent/child tables
- Shape 2/3 with `command` ordered before `event`
- Shape 3 with an extra `timeout = 5000` key — worse than a leak: the
regex matched only the command line, leaving `timeout = 5000`
orphaned outside any TOML table (invalid TOML)
- Tight whitespace `event="SessionStart"` (no spaces around `=`)
The structural rewrite uses the TOML parser already present in this file
(`getTomlTableSections` + `getTomlLineRecords` + `parseTomlValue` +
`removeContentRanges` + `collapseTomlBlankLines`):
1. Find every section whose path is `hooks` or starts with `hooks.`.
2. For each, walk the section's line records and parse `command` values
structurally — match by basename equality (`gsd-update-check.js` or
`gsd-check-update.js`), never by regex on raw bytes.
3. Detect orphaned `[[hooks.SessionStart]]` parents: empty body and a
stale child immediately follows → mark for removal.
4. Extend each removal range backward through any preceding
`# GSD Hooks` marker line (detected via line records, not text scan).
5. Remove ranges atomically and collapse resulting blank-line runs.
Legacy hook basenames are hoisted to template-literal constants so the
existing `install-hooks-copy.test.cjs` quoted-literal guard continues to
catch accidental *registration* of the inverted filename, while strip
detection (which legitimately needs both names) bypasses it.
Test coverage added: 8 new sub-tests exercising the four whitespace/
ordering variations (with and without trailing newline) plus a
`[[hooks.UserPromptSubmit]]` user-authored hook to guarantee the strip
only touches GSD-managed sections. 20/20 in the file, 5867/5867 in the
full suite.
* chore(#2868): switch canary publish from main to dev branch
Swaps the four `if:` guards in `.github/workflows/canary.yml` from
`refs/heads/main` to `refs/heads/dev` so the canary stream is owned
by the new long-lived integration branch. Adds a policy comment at
the top of the workflow documenting the branch->dist-tag mapping
(dev=@canary, main=@next/@latest, no overlap).
Closes#2868
* fix(#2868): summary block matches publish-step gate
CodeRabbit caught: the Summary step keyed off DRY_RUN only, so a
non-dry-run on main would falsely report "Published"/"Tagged" even
though all four publish steps were skipped by the new dev-only gate.
Add PUBLISH_ELIGIBLE env mirroring the publish-step `if:` expression
and a VALIDATION ONLY branch in the summary so non-dev runs report
honestly.
The Require Issue Link workflow was posting a comment and failing the
status check, but never transitioning the PR to closed. PR templates
promise auto-close behavior; PR #2863 demonstrated the gap (opened
without a Closes #N, sat open until manually closed).
Adds a `pulls.update({state: 'closed'})` call after the existing
comment, updates the comment heading to 'PR auto-closed', and tells
the author how to reopen after fixing the body.
Closes#2872
rc.7 will be the first RC in the 1.39.0 train that actually rolls in
the post-rc.5 fixes from main (rc.6 was content-identical to rc.5 — see
#2856). Notes enumerate each fix with PR/issue link, recap rc.6 / rc.5 /
rc.4, and follow the established docs/RELEASE-v1.39.0-rc.X.md format.
No SDK-version pinning advice (consistent with the rc.6 doc cleanup).
Markdownlint-clean fenced code blocks.
Closes#2859
* docs(#2856): add release notes for 1.39.0-rc.6
Documents what's actually in rc.6 (= rc.5 content + version-bump only —
release/1.39.0 was not synced with main before the bump) plus the known
SDK publish failure (@gsd-build/sdk@1.39.0-rc.6 is missing from npm with
404 PUT error). Format mirrors RELEASE-v1.39.0-rc.5.md.
Closes#2856
* docs(#2856): drop SDK refs from rc.6 notes; tag git log fence
Per maintainer + CodeRabbit review:
- Strip the 'Known issue: split publish' section, the SDK pin Note, and
the @gsd-build/sdk follow-up bullet. SDK publish failure is a known
separate issue and shouldn't block the rc.6 docs.
- Add bash language tag to the git log fence (markdownlint MD040).
* fix(#2838): SUMMARY rescue handles gitignored .planning explicitly
The pre-fix rescue used `git ls-files --modified --others --exclude-standard`
to detect uncommitted SUMMARY.md before worktree removal. When projects
gitignore .planning/, --exclude-standard filters out the very files the
rescue is meant to save, the rescue branch is skipped, and `git worktree
remove --force` permanently deletes the SUMMARY.
Replace both rescue blocks (quick.md, execute-phase.md) with a
filesystem-level find + cp rescue that bypasses gitignore entirely and
avoids the worktree↔main commit/merge cascade. cmp -s makes it idempotent.
Adds tests/bug-2838-summary-rescue-gitignored-planning.test.cjs which
extracts each rescue block, runs it against a real temp repo with a
gitignored .planning/, and asserts the SUMMARY survives worktree removal.
* test(#2838): assert rescue block exits 0 in idempotency test
CodeRabbit (Minor): the idempotency test pre-creates the destination
SUMMARY.md, so even a syntax/runtime error in the rescue block would
silently false-pass. Add an explicit r.status === 0 assertion.
* fix(#2832): gsd-sdk auto detects Codex runtime correctly
Two-part fix for #2832 (gsd-sdk auto silently routing non-Claude runtime
projects through the Claude Agent SDK):
1. Runtime gate at the `auto` entry point. New `runtime-gate.ts` exports
`assertRuntimeSupportsAutoMode(config)` which throws an actionable error
when `GSD_RUNTIME` / `config.runtime` resolves to a non-Claude runtime
(codex, gemini, opencode, etc.). The autonomous orchestrator only knows
how to drive `@anthropic-ai/claude-agent-sdk` today; failing fast with a
clear pointer at the in-session slash commands beats the previous instant
`[FAILED] $0.00 0.1s` flake. Wired into `cli.ts` before the GSD/InitRunner
construction.
2. Runtime-aware `resolveModel()` in `session-runner.ts`. The profile -> id
map (`balanced -> claude-sonnet-4-6`, etc.) was applied unconditionally,
so even with `runtime: codex` and `resolve_model_ids: omit` the SDK
forced a Claude id into `query()`. Now the profile map only fires when
the runtime is Claude and the explicit `resolve_model_ids: "omit"` knob
short-circuits to undefined, mirroring `query/config-query.ts`.
Tests (vitest, sdk/src):
- runtime-gate.test.ts (8 cases): claude / unset / unknown pass; codex,
gemini, opencode throw; GSD_RUNTIME wins over config.runtime; error
message references #2832 and the slash-command workaround.
- session-runner.test.ts (4 new cases under "resolveModel runtime
awareness (#2832)"): codex runtime + balanced profile -> no model
injected; resolve_model_ids: omit -> no model; claude runtime still
resolves to claude-sonnet-4-6 (no regression); explicit options.model
wins on any runtime.
* fix(#2832): address CR — env-precedence in resolveModel + accurate source attribution
Two CodeRabbit findings on PR #2844:
1. session-runner.ts:resolveModel() (Major) — read runtime via detectRuntime()
so GSD_RUNTIME env precedence is honored. Without this, a Codex run with
a Claude-shaped config still fell into the Claude-only profile-id branch.
2. runtime-gate.ts:assertRuntimeSupportsAutoMode() (Minor) — when GSD_RUNTIME
holds an unsupported value, detectRuntime() falls through to config but
the source label still reported the discarded env value. Fix: validate
env against SUPPORTED_RUNTIMES before attributing the source.
Tests added for both: env-precedence in session-runner, source attribution
in runtime-gate. 17/17 pass.
* chore(#2828): add canary release workflow (dev builds on push to main)
Publishes get-shit-done-cc@canary and @gsd-build/sdk@canary on every
push to main. Version format: {base}-canary.{N} where base strips any
pre-release suffix from package.json (1.39.0-rc.4 → 1.39.0-canary.1).
Sequential canary number is auto-detected from existing git tags so
reruns never collide. Concurrency group cancels stale in-flight canary
runs when commits land quickly.
Mirrors the structure and steps of release.yml: same checkout pins,
Node 24, npm-publish environment, build:sdk, tarball verification,
dry-run publish gate, and publish verification with sleep 10.
Closes#2828
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(#2828): address CodeRabbit review findings on canary.yml
- cancel-in-progress: false — was true, allowing a newer push to cancel a
run mid-publish (after tag push but before SDK publish), leaving a partial
release state that's unrecoverable since npm versions are immutable
- Guard tag/publish/verify steps with github.ref == 'refs/heads/main' so
a manual workflow_dispatch from a feature branch (dry_run defaults false)
cannot accidentally publish unmerged code under the shared canary dist-tag
- Replace fixed sleep 10 with exponential backoff retry loop (delays: 5 10
20 30 45s); fixed sleep is flaky against normal npm CDN replication lag
and a false failure forces a new canary number since the tag already exists
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(plan-phase): expose --mvp flag in command frontmatter
Adds --mvp to argument-hint and Flags doc. Workflow handler in next commit.
* chore(#2828): remove push:main trigger from canary workflow
Submission rate to main is too high to auto-publish a canary on every
merge. Restrict the workflow to manual workflow_dispatch only.
Closes#2828
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(#2836): audit-open quick SUMMARY filename + UAT terminal-status drift
Fixes two convention drifts in bin/lib/audit.cjs that produced false-positive
"open" items at every milestone close:
1. scanQuickTasks: looked for bare `SUMMARY.md`, but workflows/quick.md
mandates `${quick_id}-SUMMARY.md`. Now matches either filename so quick
tasks created via the documented workflow are recognized.
2. scanUatGaps: only treated `status: complete` as terminal, but
workflows/execute-phase.md uses `status: resolved` post-gap-closure.
Now treats both `complete` and `resolved` as terminal, with `result:
all_pass` as a fallback when status is absent.
Also reconciles workflows/help.md one-liner that referenced bare
`SUMMARY.md` so docs match the authoritative quick.md workflow.
Adds tests/bug-2836-audit-open-summary-uat-drift.test.cjs with 6
structural regression tests covering both fixes plus no-regression cases.
* refactor(#2836): hoist TERMINAL_UAT_STATUSES outside scanUatGaps loop
Address CodeRabbit nitpick: the Set was being recreated on each UAT file
iteration. Hoist to module scope so it is constructed once.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* fix(#2829): gsd-sdk resolvable in local-mode installs
Local-mode installs previously short-circuited installSdkIfNeeded() the
moment opts.isLocal was true, leaving every `gsd-sdk query …` call site
unable to resolve the binary on PATH. The published tarball ships
sdk/dist/cli.js and bin/gsd-sdk.js regardless of mode, and the shim
resolves the CLI relative to its own __dirname — so the same self-link
strategy that powers npx-cache global installs (#2775) also works for
local installs. We now run the shared self-link path whenever the dist
is present, and only fall back to a non-fatal warning + early return
when the dist is genuinely missing (preserving the #2678 contract).
* test(#2829): correct precondition comment about ~/.local/bin
Address CodeRabbit feedback — the test does not create ~/.local/bin,
so reword the inline precondition to "any HOME bin candidate remains
off-PATH" to match what the test actually sets up.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* fix(#2835): align CR-INTEGRATION tests with hyphen namespace
PR #2819 changed autonomous.md skill invocations from `gsd:code-review`
(colon) to `gsd-code-review` (hyphen). Tests still asserted the legacy
colon form against the user-installed plugin dir (which lags the repo).
Switch tests to:
- Read autonomous.md from the canonical repo WORKFLOWS_DIR (not the
plugin install location, which can be stale)
- Parse `Skill(skill="...")` invocations structurally instead of
substring matching, and assert the canonical hyphen form is present
while explicitly rejecting the legacy colon form.
Closes#2835
* test(#2835): parse Skill() invocations structurally in CR-INTEGRATION tests
Replace raw-text regex/.includes() assertions with a proper parser that
walks autonomous.md, skips escaped string contexts, and yields
[{ skill, args }] objects. The three CR-INTEGRATION tests now assert
against parsed fields and tokenized args (not substring matches),
addressing CodeRabbit feedback on PR #2843.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* fix(#2831): expand HOME in OpenCode skill/template paths
OpenCode does not shell-expand $HOME in @file references on any platform —
the literal `@$HOME/...` path is resolved relative to the config command/
dir, producing `command/$HOME/...` (file not found). The previous fix for
#2376 only guarded Windows; extend to all platforms.
Closes#2831
* test(#2831): assert behavior via exported computePathPrefix, not source grep
Addresses CodeRabbit review on PR #2842:
- Extracts pathPrefix logic into a named, test-exported computePathPrefix
helper in bin/install.js (no behavior change at the call site).
- Rewrites bug-2376 and bug-2831 regression tests to call the exported
function directly instead of regex-matching install.js source text,
per the repo's no-source-grep testing standard.
- Wraps temp-dir test setup in try/finally so cleanup runs on assertion
failures (no leaked tmp dirs).
* fix(#2839): make /gsd-code-review-fix cleanup transactional
Cleanup tail in agents/gsd-code-fixer.md previously did 'git worktree
remove' without any recovery marker. If the process was killed between
fix commits and worktree removal, the orphan worktree + branch survived
with no resume path — the next run had no way to discover or finish
the cleanup.
Introduce a recovery sentinel at ${phase_dir}/.review-fix-recovery-pending.json
with strict ordering:
- Sentinel written AFTER 'git worktree add' succeeds (never points at a
worktree that does not exist).
- Sentinel removed ONLY AFTER 'git worktree remove' returns successfully
(interruption between commits and removal leaves a sentinel behind).
- New runs detect a pre-existing sentinel, force-remove the recorded
orphan worktree, then drop the stale sentinel before continuing —
making the agent self-healing after a crash.
Closes#2839
* fix(#2839): harden sentinel JSON parse and scope ordering assertion
Address CodeRabbit review feedback on PR #2846:
- agents/gsd-code-fixer.md: Guard the recovery-sentinel JSON parse with
try/catch so a corrupted/truncated sentinel (a realistic crash artifact)
emits a warning and yields an empty prior_wt instead of aborting setup.
This preserves the self-healing recovery path even when the sentinel
itself is the casualty of the original crash.
- tests/bug-2839-review-fix-transactional-cleanup.test.cjs: Scope the
cleanup-ordering assertion to the cleanup-tail section of the
setup_worktree step rather than first global occurrences. Previously
the assertion could pass on pre-recovery references even if cleanup-tail
ordering regressed. The regex also now accepts the shell-variable form
(\`rm -f \"\$sentinel\"\`) used in the cleanup tail.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Adds Hermes Agent as a supported installation target. Users can run
\`npx get-shit-done-cc --hermes\` to install all 86 GSD commands as
skills under \`~/.hermes/skills/gsd-*/SKILL.md\`, following the same
open skill standard as Claude Code 2.1.88+, Qwen Code, Antigravity,
Trae, Augment, and Codebuddy.
Hermes Agent is an open-source AI agent framework by Nous Research
(NousResearch/hermes-agent, MIT). Its skill loader accepts the Claude
skill format as-is: frontmatter parsed with PyYAML SafeLoader (unknown
keys like \`allowed-tools\` / \`argument-hint\` ignored), body XML tags
(\`<objective>\`, \`<execution_context>\`, \`<process>\`) passed directly
to the model. Compatibility proven end-to-end with all 86 GSD skills
loading cleanly, \`skill_view()\` returning full bodies, and
\`build_skills_system_prompt()\` emitting them into the agent system
prompt — zero Hermes code changes required.
Changes:
- \`bin/install.js\`: --hermes flag, getDirName/getGlobalDir/getConfigDirFromHome
support, HERMES_HOME env var (native to Hermes — used for profile
mode / Docker deploys), install/uninstall pipelines, interactive
picker option 10 (alphabetical: between Gemini and Kilo), .hermes
path replacements in copyCommandsAsClaudeSkills and
copyWithPathReplacement, legacy commands/gsd cleanup, CLAUDE.md ->
HERMES.md and "Claude Code" -> "Hermes Agent" content rewrites in
skills/agents/hooks, runtime-appropriate finish message.
- \`get-shit-done/bin/lib/core.cjs\`: add hermes to KNOWN_RUNTIMES;
add RUNTIME_PROFILE_MAP.hermes with OpenRouter-slug defaults
(Hermes is provider-agnostic; these defaults resolve across
OpenRouter, native Anthropic, and Copilot via Hermes' aggregator-
aware resolver, and are overridable per-tier via
model_profile_overrides.hermes.{opus,sonnet,haiku}).
- \`README.md\`: Hermes Agent in tagline, runtime list, verification
command, install/uninstall examples, \`--hermes\` flag reference.
- \`tests/hermes-install.test.cjs\`: new, 14 tests covering directory
mapping, HERMES_HOME env var precedence, install/uninstall
lifecycle, user-skill preservation, engine cleanup.
- \`tests/hermes-skills-migration.test.cjs\`: new, 11 tests covering
frontmatter conversion, path replacement (~/.claude/ ->
\$HERMES_HOME/skills/), CLAUDE.md -> HERMES.md, "Claude Code" ->
"Hermes Agent", stale skill cleanup, SKILL.md format validation.
- \`tests/multi-runtime-select.test.cjs\`: updated for new option
numbering (hermes=10, kilo=11, opencode=12, qwen=13, trae=14,
windsurf=15, all=16).
- \`tests/kilo-install.test.cjs\`: updated assertions for Kilo having
moved from option 10 to option 11.
Closes#2841
Implementation notes:
- Zero custom code paths: Hermes reuses copyCommandsAsClaudeSkills()
identical to Qwen Code / Antigravity pattern.
- Path replacement: ~/.claude/, \$HOME/.claude/, ./.claude/ ->
.hermes equivalents in skill/agent/hook content.
- Config precedence: --config-dir > HERMES_HOME > ~/.hermes (matches
how Hermes itself resolves its home directory).
- Legacy cleanup: removes commands/gsd/ if present from a prior
install, preserving dev-preferences.md (same as Qwen).
- No external dependencies added.
Testing: 5841 / 5841 tests pass (0 failures, 0 regressions)
- 14 new tests in hermes-install.test.cjs
- 11 new tests in hermes-skills-migration.test.cjs
- multi-runtime-select.test.cjs renumbered + 1 new test (single choice for hermes)
* fix(#2787): track fenced code blocks in extractCurrentMilestone
The milestone-end search used a multiline regex against the raw
restContent string. Lines inside fenced code blocks (``` or ~~~)
that matched the milestone-heading pattern (e.g. `# note v1.0`)
prematurely set sectionEnd, hiding all phases after the block from
roadmap analyze, roadmap get-phase, and every downstream command.
Replace the regex match with a line-by-line scan that tracks fence
state. Lines inside an open fence are skipped regardless of content.
Adds three regression tests covering backtick fences, tilde fences,
and the roadmap get-phase code path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(#2787): track fence delimiter instead of toggling bare boolean
Replace the inFence boolean with fenceChar/fenceLen tracking so that
indented fences (up to 3 leading spaces) and mixed-delimiter content
(~~~ inside a backtick fence) are parsed correctly. A closing fence
is only recognised when it uses the same character as the opening
delimiter and has at least the same run length, matching the CommonMark
spec.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(#2787): require fence-only closing line — reject info-string lines as closers
A closing fence delimiter must contain only optional trailing whitespace.
A line like \`\`\`js inside an open fence has an info string and must not
close it. The previous regex /^\s{0,3}([`~]{3,})/ matched the opening of any
such line, so the closing check could toggle fenceChar off on an info-string
line and expose subsequent heading-like content to the milestone-end detector.
Fix: capture the trailing portion of every fence-candidate line and only clear
fenceChar when trailing matches /^\s*$/ (per CommonMark §4.5).
Adds a regression test covering the ```text / ```js nesting scenario.
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(#2791): GSD_WORKSTREAM env var respected by gsd-sdk query + gsd-tools bin alias
Two fixes for gsd-sdk binary issues:
**Issue 1 — Binary name collision:**
Both `get-shit-done-cc` and `@gsd-build/sdk` declare `bin: { "gsd-sdk": ... }`.
Added `"gsd-tools": "bin/gsd-sdk.js"` to `package.json` bin so users with the
collision can invoke `gsd-tools query <cmd>` as a conflict-free alternative.
**Issue 2 — Query registry not workstream-aware:**
`gsd-sdk query` commands ignored `GSD_WORKSTREAM` env var, always reading from
the root `.planning/` even when a workstream was active. `gsd-tools.cjs` reads
`GSD_WORKSTREAM` via `planningDir()`, so all ~35 `gsd-sdk query` call sites in
workflow files were broken in workstream-scoped projects.
Fix: added env var fallback in `sdk/src/cli.ts` — when `--ws` is not provided,
`GSD_WORKSTREAM` is used (with name validation; invalid values are silently
ignored, matching CJS behaviour).
Regression test: `tests/bug-2791-sdk-workstream-env.test.cjs`
Closes#2791
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(#2791): address CodeRabbit — precedence test, invalid env fallback assertion, bash fence
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(#2805): add regression test — archived phase fallback already fixed in source
getPhaseInfoWithFallback already discards archived disk matches when the
current ROADMAP lists the phase (line 133: phaseInfo?.archived &&
roadmapPhase?.found). The regression test confirms this behavior and
prevents the bug from being reintroduced by future refactors.
Regression test: tests/bug-2805-archived-phase-fallback.test.cjs
(3 tests: phase_dir null, phase_found true, phase_name from ROADMAP)
* fix(#2805): address CodeRabbit — exact phase_name assertion, bash fence
* fix(#2788): audit-uat reads frontmatter human_verification array
parseVerificationItems only searched the body for a '## Human Verification'
section. gsd-verifier writes items to the frontmatter human_verification:
YAML array, so audit-uat returned total_items: 0 for all such files.
Two fixes:
1. Read frontmatter human_verification: array first (via extractFrontmatter);
return those items if present (primary path for gsd-verifier output).
2. Relax the body-section heading regex to accept underscore separators and
parenthetical suffixes (e.g. '## human_verification (action required)').
Regression test: tests/bug-2788-audit-uat-frontmatter.test.cjs
* fix(#2788): address CodeRabbit — trim whitespace entries, support hyphenated headings, bash fence
* fix(#2801): add ingest-docs handler to gsd-tools init dispatch
The `/gsd-ingest-docs` workflow was broken because `workflows/ingest-docs.md`
called `gsd-sdk query init.ingest-docs` but the installed binary is `gsd-tools`,
and `gsd-tools init` had no `ingest-docs` case in its dispatch switch.
- Added `cmdInitIngestDocs` function to `init.cjs` and exported it; returns
`project_exists`, `planning_exists`, `has_git`, `project_path`, `commit_docs`
- Added `case 'ingest-docs'` to the `init` switch in `gsd-tools.cjs`
- Updated `workflows/ingest-docs.md` to call `gsd-tools init ingest-docs`
(line 55) and `gsd-tools commit` (line 292) instead of `gsd-sdk query ...`
- Regression test: `tests/bug-2801-ingest-docs-handler.test.cjs`
Closes#2801
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(#2801): address CodeRabbit — commit_docs assertion, broader gsd-sdk detection, bash fence
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(#2808): SKILL.md name uses hyphen form for Claude Code autocomplete
skillFrontmatterName() was converting gsd-<cmd> to gsd:<cmd> (colon) so
installed SKILL.md files had name: gsd:add-phase etc. Claude Code surfaces
this name in autocomplete, showing the deprecated colon form to users even
though the hyphen form is canonical everywhere else.
Root cause: the colon form was needed because workflows called
Skill(skill="gsd:<cmd>"). All 4 remaining colon-form Skill() calls in
autonomous.md and execute-phase.md are updated to hyphen form.
skillFrontmatterName() now returns the hyphen dir name unchanged.
Updated 4 existing tests that asserted colon form.
Regression test: tests/bug-2808-skill-hyphen-name.test.cjs
* fix(#2808): address CodeRabbit — bash/text fences, structured test assertions, fail-loud on errors
* fix(#2796): roadmap update-plan-progress accepts --phase flag form
roadmap-update-plan-progress used positional-only arg parsing: args[0].
When execute-phase.md:228 calls it with --phase <N>, args[0] was the
literal string "--phase", which findPhase received as the phase number.
findPhase returned found:false, causing updated:false with no write.
ROADMAP.md plan checkboxes silently never advanced.
Fix: check for --phase <value> first; fall back to the first non-flag
positional argument for backward-compatible direct calls.
Regression test: tests/bug-2796-arg-parsing-regression.test.cjs
* fix(#2796): address CodeRabbit — guard --phase against flag-like values, bash fence
* fix(#2803): honor --default flag in SDK config-get handler
The gsd-sdk query config-get handler ignored the --default <value> flag.
Missing keys always threw 'Key not found' (exit 1), making 8 workflow
sites that rely on config-get --default fall through to error paths.
The CJS path (gsd-tools.cjs) honored --default since #1893; this ports
that behavior to the SDK configGet handler.
Regression test: tests/bug-2803-config-get-default-flag.test.cjs
* fix(#2803): address CodeRabbit — require --default value, keep missing config.json as error, bash fence
* fix(#2798): add regression test — context_window key already in VALID_CONFIG_KEYS
context_window was already added to both VALID_CONFIG_KEYS allowlists
(CJS and SDK) in a prior fix. The regression test confirms it stays there
and that config-set context_window succeeds end-to-end.
Regression test: tests/bug-2798-context-window-config-key.test.cjs
* fix(#2798): address CodeRabbit — add bash language to release notes fence
* fix(#2784): clear shared ~/.cache/gsd/ cache in update workflow
The SessionStart hook (hooks/gsd-check-update.js) writes update-check
results to $HOME/.cache/gsd/gsd-update-check.json (shared, tool-agnostic).
The update.md run_update step only cleared per-runtime paths like
~/.claude/cache/gsd-update-check.json, so the statusline kept showing the
stale upgrade indicator after a successful update.
Fix: add rm -f "$HOME/.cache/gsd/gsd-update-check.json" to the
cache-clear block in the run_update step.
Regression test: tests/bug-2784-update-cache-clear-path.test.cjs
* fix(#2784): address CodeRabbit review — four edge-cases count, bash fence, structured test assertions
* docs: add CHANGELOG entry and rc.5 release notes for #2809 Codex hooks migrator fixes
Covers the five correctness findings addressed in the round-5 CR of PR #2809:
parseHooksBody key parser (hyphenated/quoted keys), buildNestedBlock empty-handler
guard, legacyMapSections segment-count filter, quoted-dot regression test, and
strengthened command path assertion.
Closes#2810
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(#2794): embed model_profile_overrides.opencode.<tier> into generated OpenCode agents
OpenCode agent files were missing `model:` frontmatter when the user configured
tier-based model resolution via `model_profile_overrides.opencode.*`. Only
explicit `model_overrides[agent]` was consulted; the runtime profile resolver
(used by the Codex path since #2517) was never called for OpenCode agents.
Added a tier-resolver fallback in the OpenCode agent conversion block in
`bin/install.js`. Precedence (matching Codex behavior):
model_overrides[agent] > model_profile_overrides.opencode.<tier> > omit
Regression test: `tests/bug-2794-opencode-model-profile-overrides.test.cjs`
Closes#2794
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(#2773): emit correct Codex 0.124.0+ two-level nested hooks schema
Codex 0.124.0's stable spec requires:
[[hooks.SessionStart]] ← event entry (optional matcher)
[[hooks.SessionStart.hooks]] ← handler sub-table
type = "command"
command = "node ..."
Previous GSD versions wrote the flat [[hooks]] + event = "SessionStart"
form (#2637) or a single-block [[hooks.SessionStart]] without the nested
.hooks sub-table (#2760). Both are rejected by Codex 0.124.0+ at launch.
Changes:
bin/install.js
- Hook block emission now always writes the two-level nested AoT form.
- migrateCodexHooksMapFormat extended to also migrate flat [[hooks]]
array-of-tables entries (event = "..." key → [[hooks.<EVENT>]] form).
Flat [[hooks]] and [[hooks.<EVENT>]] are mutually exclusive TOML types;
any pre-existing flat entries must be promoted before GSD appends its
own namespaced hooks.
- Migrated flat AoT blocks are inserted BEFORE the GSD marker so they
stay in the "user" portion of the file and survive stripGsdFromCodexConfig.
- stripCodexGsd* regexes cover all four historical block shapes.
- validateCodexConfigSchema no longer rejects flat [[hooks]] at the root
level (removing the false-positive that blocked install when users had
their own AfterCommand hooks). The validator still enforces the nested
[[hooks.<EVENT>.hooks]] shape for entries that have a .hooks sub-table.
tests/
- bug-2760-codex-install-defensive.test.cjs: 29/29 passing.
Added 5 new regression cases for fresh install, upgrade from each
legacy shape, idempotent reinstall, and user hook preservation.
- codex-config.test.cjs: 106/106 passing.
All migration tests updated to assert [[hooks.<TYPE>.hooks]] sub-table
(command now in handler level, not event-entry level).
New tests: flat [[hooks]] migration (SessionStart, AfterCommand),
install+uninstall preserves non-GSD AfterCommand hook.
Closes#2773
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: address CodeRabbit review + CI regression in bug-2698-crlf-install
CI regression (#2698 tests):
Strip GSD-managed hook blocks BEFORE running migrateCodexHooksMapFormat.
The previous order let migration convert the stale [[hooks]] + event =
"SessionStart" + gsd-update-check.js block to [[hooks.SessionStart]] form
before Shape 1 strip regex could match it; Shape 1 only matches the flat
[[hooks]] form, so the stale block survived reinstall. Swapping to
strip-then-migrate ensures only user-authored hooks reach the migration step.
Shape 3/4 regexes also extended to match both gsd-check-update.js and the
legacy gsd-update-check.js filename so no variant slips through.
CodeRabbit actionable (major):
migrateCodexHooksMapFormat now accepts single-quoted TOML event values
(event = 'SessionStart') in the flat [[hooks]] filter and event-name
extractor. TOML spec allows single-quoted literal strings; double-quote-only
regexes silently skipped them, leaving the block unmigrated and triggering
the hard-fail validator.
CodeRabbit nitpicks:
tests/codex-config.test.cjs: replace indexOf('[[hooks.AfterCommand]]')
ordering check with parseTomlToObject structural assertions (no-source-grep
rule).
tests/bug-2760-codex-install-defensive.test.cjs: replace three
content.match(/…/g).length raw-text counts with parseTomlToObject structural
assertions for single-handler and single-event-entry invariants.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: address CodeRabbit review #2 — extractFlatHookEventName helper + type assertions
- bin/install.js: consolidate TOML_QUOTED_STRING + TOML_EVENT_CAPTURE into a
single extractFlatHookEventName() helper that rejects empty-string event values
(event = "" or event = ''); previously two independent regexes had to be kept
in sync and neither guarded against a blank event name producing a [[hooks.]]
header
- tests/bug-2760-codex-install-defensive.test.cjs: add comments explaining why
the e.command fallback is retained in both allSessionStartCommands and
afterToolCommands collectors — migration only upgrades [hooks.TYPE] map-format
sections, not existing [[hooks.TYPE]] namespaced AoT entries authored with
command at event-entry level; removing the fallback causes false failures for
preserved user entries
- tests/codex-config.test.cjs: add type = "command" assertions to all migration
tests that verify .command but were missing .type checks; buildNestedBlock
injects type = "command" when the source body has no explicit type key, so
every migrated handler must carry it per the Codex 0.124.0+ schema
138 tests pass, 0 fail.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: CR round 3 + proactive audit — TOML quoting, stale AoT migration, strict validator
Three real issues from CodeRabbit round 3, plus the collateral improvements they
enable:
bin/install.js — tomlBareKey() helper (#2773 CR6a)
buildNestedBlock interpolated the raw event name into [[hooks.${type}]] and
[[hooks.${type}.hooks]] headers without TOML escaping. An event name containing
spaces or punctuation (e.g. "Before Tool") would produce invalid TOML that
parseTomlToObject would subsequently reject. Added tomlBareKey() — wraps the
key in double-quoted TOML strings when it contains non-bare-key characters
([A-Za-z0-9_-]).
bin/install.js — staleNamespacedAotSections migration path (#2773 CR6b)
migrateCodexHooksMapFormat handled [hooks.TYPE] (map-format) and flat [[hooks]]
with event = "..." but ignored [[hooks.TYPE]] AoT entries that carried handler
fields (command, type, timeout, statusMessage) at event-entry level without a
nested [[hooks.TYPE.hooks]] sub-table. This is the pre-#2773 single-block shape
that Codex 0.124.0+ rejects. Added staleNamespacedAotSections as the third
migration category: detected by STALE_HANDLER_FIELD_PATTERN + absence of a
[[hooks.TYPE.hooks]] sub-table in the same file; promoted to the two-level
nested form by buildNestedBlock. Matcher-only entries (no handler fields) are
intentionally skipped.
bin/install.js — validator now rejects event-level handler fields (#2773 CR6c)
With migration covering the stale AoT shape, validateCodexConfigSchema can be
strict: entries that have handler fields at event-entry level but no .hooks
sub-array return ok: false instead of silently passing. Matcher-only entries
(no handler fields and no .hooks) remain valid as event filters.
tests/codex-config.test.cjs — four new migration tests + missing type assertion
Four tests cover the new stale AoT migration path: single-entry promotion,
already-nested entry is left untouched (no double-wrap), multiple event types,
and matcher-only entry is skipped. Added the missing type = "command" assertion
to the CRLF migration test (the one miss from CR round 2).
tests/bug-2760-codex-install-defensive.test.cjs — strict .hooks-only collectors
With stale AoT entries now migrated, the entry.command fallbacks in
allSessionStartCommands and afterToolCommands are dead code. Replaced with
strict entry.hooks-only collection guarded by an every(Array.isArray(e.hooks))
pre-assertion, so any future regression that leaves handler fields at event
level produces an explicit test failure rather than silently collecting them.
142 tests pass, 0 fail.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: CR round 4 — segment-safe quoted-key detection + structural test assertions
bin/install.js — getTomlTableSections now exposes segments (#2773 CR7a)
The staleNamespacedAotSections filter used section.path.split('.').length > 2
to skip [[hooks.TYPE.hooks]] sub-table entries. That check misclassifies quoted
event names containing dots: [[hooks."before.tool"]] has path hooks.before.tool
(3 dot-parts) but only 2 true parsed segments, so it was incorrectly excluded
from migration. Fixed by adding segments to the getTomlTableSections return
shape (already available on record.tableHeader.segments) and replacing the
split-based check with section.segments.length !== 2, which uses the true
parsed key count regardless of dots inside quoted names.
tests/codex-config.test.cjs — replace raw-equality assertions (#2773 CR7b)
The two new no-op migration tests (already-nested and matcher-only) used
assert.strictEqual(result, content) — raw string equality that conflicts with
the repo no-source-grep testing standard. Replaced with structural assertions
using parseTomlToObject: the already-nested test verifies the handler stays
under .hooks[0] and no double-wrap occurs; the matcher-only test verifies the
matcher key is preserved and no .hooks sub-array is added.
142 tests pass, 0 fail.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: CR round 5 — parseHooksBody key parser, empty-handler guard, segment-safe legacyMap filter, stronger test assertions
- parseHooksBody: replace /^([\w.]+)\s*=/ regex with parseTomlKey() so
hyphenated keys (status-message) and quoted keys are not silently dropped
- buildNestedBlock: guard against handlerEntries.length === 0 — do not
synthesise [[hooks.TYPE.hooks]] with type="command" but no command for
matcher-only or otherwise handler-empty stale sections
- legacyMapSections filter: use section.segments.length === 2 (same fix
applied to staleNamespacedAotSections in round 4) to prevent [hooks.X.Y]
3-segment tables from being misclassified as event entries
- tests: add regression test for [[hooks."before.tool"]] quoted-dot event
names; strengthen command path assertion to exact absolute path comparison
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
This directory holds **per-PR CHANGELOG fragments**. Every PR with user-facing changes drops one (or more) `<random-name>.md` files here describing its CHANGELOG entry. Fragments are consolidated into the top-level `CHANGELOG.md` at release time.
## Why
Two PRs that both edit the `### Fixed` block of `CHANGELOG.md` always conflict on merge — git can't pick a serialization order without human input. Two PRs that each add a fresh `.changeset/<unique-name>.md` never conflict because they don't share lines.
See [#2975](https://github.com/gsd-build/get-shit-done/issues/2975) for the full rationale.
## Adding a fragment
```bash
node scripts/changeset/new.cjs \
--type Fixed \
--pr 1234\
--body "fix the thing — explain the user-visible change in one sentence"
```
This writes `.changeset/<adjective>-<noun>-<noun>.md` with frontmatter and a body. Three random words → concurrent PRs don't collide.
## Format
```md
---
type: Fixed
pr: 1234
---
**`/gsd-foo` no longer drops trailing slashes** — explain the user-visible change.
Reads every fragment, groups bullets by `type:`, replaces `## [Unreleased]` with a new `## [vX.Y.Z] - YYYY-MM-DD` block, opens a fresh `## [Unreleased]` above, deletes consumed fragments. Idempotent.
**GSD transport raw-mode handling and timeout fallback hardened** — fixes undefined raw formatting edge case and adds raw-path coverage to prevent regressions.
**query command metadata now flows through a canonical Command Definition Module seam** — registry assembly, mutation semantics, and alias generation consume one Interface (`family`, `canonical`, `aliases`, `mutation`, `output_mode`, `handler_key`) to improve locality and reduce drift.
**query fallback error mapping cleanup** — the CJS fallback catch path now passes original `err` to `mapFallbackDispatchError` (follow-up to prior review feedback missed in PR #3066).
gsd-code-fixer worktree no longer fails on the same-branch checkout — the agent now creates a new gsd-reviewfix/ branch via git worktree add -b and fast-forwards the user's branch on cleanup. See #2990.
Test suite for config-schema.cjs is now mutation-resistant — 95 typed assertions kill the 124 surviving Stryker mutants from the 4.62% baseline. Tests target static-key fast path, dynamic-pattern .some semantics, polarity, and regex-anchor tightening. See #2986.
**`tests/install-minimal.test.cjs:307` no longer races on shared `os.tmpdir()` under parallel CI** — the previous shape compared `listTmpStageDirs()` snapshots before and after the throw. Under `scripts/run-tests.cjs --test-concurrency=4`, `tests/install-minimal-all-runtimes.test.cjs` runs in a parallel process and creates/removes `gsd-minimal-skills-*` dirs in the shared OS tmpdir between snapshots, so `deepStrictEqual` failed deterministically when the parallel process happened to have a live stage dir during the snapshot window. Fix: stub `fs.mkdtempSync` to record THIS call's stage dir, then assert that exact path no longer exists after the throw — no global filesystem snapshot, no race. (#3008)
**Codex SessionStart hook now uses absolute Node binary path** — closes the gap left after #3002. The Codex install path wrote `command = "node ${path}"` directly into config.toml, bypassing `resolveNodeRunner()`. Under GUI/minimal-PATH runtimes (`/usr/bin:/bin:/usr/sbin:/sbin`), bare `node` failed to resolve, exit 127. Now routed through new `buildCodexHookBlock()` helper. Reinstall path migrates legacy bare-node entries via new `rewriteLegacyCodexHookBlock()`. See #3017.
**Codex skill adapter no longer instructs the agent to silently default discuss-phase decisions.** When `request_user_input` was rejected (Default mode), the generated adapter said "pick a reasonable default" — so `$gsd-discuss-phase` proceeded toward writing CONTEXT.md / DISCUSSION-LOG.md / checkpoints without ever asking the user. Adapter prose now requires the agent to STOP, present plain-text questions, and wait, with explicit named exceptions (`--auto`/`--all`/explicit user approval). See #3018.
**query CLI path extracted into a dedicated Query CLI Adapter Module** — `sdk/src/cli.ts` now delegates query-specific dispatch, error mapping, and output/exit handling to `sdk/src/query/query-cli-adapter.ts` for better locality and testability.
**Post-install message and update.md no longer recommend the removed `/gsd-reapply-patches` command** — after PR #2824 consolidated 86 skills into ~58, `/gsd-reapply-patches` was folded into a flag (`/gsd-update --reapply`). The 1.39.1 hotfix (#2954) updated `help.md` but missed `bin/install.js`'s `reportLocalPatches` runtime emitter, `get-shit-done/workflows/update.md` Step 4, and the English + zh-CN/ja-JP/ko-KR doc set. Users hit "Unknown command" after every install with backed-up patches. All five runtime branches in `reportLocalPatches` (claude, opencode, kilo, copilot, gemini, codex, cursor) now emit the consolidated form. Regression: `tests/bug-3010-reapply-patches-references.test.cjs` scans `bin/install.js`, every workflow file, and every doc (excluding CHANGELOG history and help.md's deprecation notice) for stale recommendations. See #3010.
**Documentation refreshed for v1.40.0** — full audit of `docs/` against the 1.40.0-rc.1 release surface. Updates command lists, walkthroughs, and inventory rows for the 86→59 skill consolidation (#2790), the six namespace meta-skills with two-stage routing (#2792), the `/gsd-health --context` guard, the phase-lifecycle status-line read-side (#2833), and the Gemini colon-form / non-Gemini hyphen-form slash-command split. Translations in ja-JP/ko-KR/zh-CN/pt-BR mirror the structural changes; new English prose is marked with `<!-- TODO i18n -->` for human translator follow-up. CHANGELOG.md `[Unreleased]` section regrouped under Feature/Enhancement/Fix headers.
**`dynamic_routing` block in `.planning/config.json` for failure-tier escalation (#3024).** Each agent declares a default tier (`light` / `standard` / `heavy`); when `dynamic_routing.enabled: true`, the resolver picks `tier_models[default_tier]` for the first spawn and escalates one tier up on orchestrator-detected soft failure (capped by `max_escalations`). Disabled by default — fully backward compatible. Composes with `model_overrides` (higher precedence) and `models.<phase_type>` (lower) for full cost-control flexibility. Adds new resolver `resolveModelForTier(cwd, agent, attempt)` to `core.cjs` for orchestrator integration.
**Changeset-fragment workflow** — eliminates CHANGELOG.md merge conflicts. Each PR drops `.changeset/<random-name>.md` with frontmatter (`type:`, `pr:`) plus a markdown body; the release-time `npm run changelog:render` consolidates fragments into `CHANGELOG.md` and deletes them. CI lint (`npm run lint:changeset`) requires a fragment on any PR touching user-facing files (`bin/`, `get-shit-done/`, `agents/`, `commands/`, `hooks/`, `sdk/src/`); contributors can opt out via the `no-changelog` label for purely internal changes. See [.changeset/README.md](.changeset/README.md) and CONTRIBUTING.md for the workflow.
**Gemini local install no longer duplicates `/gsd:*` commands across user and workspace scopes** — when GSD is already installed at the user scope (`~/.gemini/commands/gsd/`) and you run `npx get-shit-done-cc --gemini --local` in a project, the installer now skips writing `commands/gsd/` to `<project>/.gemini/` and prints a one-line warning explaining why. Previously, both scopes received the same 65 command files, and Gemini's conflict detector renamed every `/gsd:*` command to `/workspace.gsd:*` and `/user.gsd:*`, breaking the documented namespace. Closes #3037.
/gsd-reapply-patches Step 5 verifier now resolves at runtime — moved scripts/verify-reapply-patches.cjs to get-shit-done/bin/ which is shipped by the installer. The legacy scripts/ directory is not copied to user installs. See #2994.
**`gsd-sdk query <subcommand> --help` now reaches the handler instead of returning top-level usage.** The query argv parser harvested `--help` as a global flag and `main()` short-circuited dispatch — there was no path to discover what arguments a query subcommand accepts. The parser now leaves `--help` in `queryArgv` so the handler/fallback can render contextual help. The `gsd-tools.cjs` fallback now renders top-level usage on `--help` (instead of erroring), preserving #1818's anti-hallucination invariant by NOT executing the destructive command. See #3019.
**Alias-family handler maps moved to dedicated catalog module** — keeps command keys/order while reducing createRegistry coupling and improving family-level locality.
**Installer no longer prints `✓ GSD SDK ready` when the shim is unreachable from the user's runtime shells.** The previous check used `process.env.PATH` from the install subprocess, which often differs from the user's later interactive shells (POSIX `~/.local/bin` not in login shell, node-version-manager PATH shims). Added `getUserShellPath()` helper that probes `$SHELL -lc 'printf %s "$PATH"'` and `isGsdSdkOnPath(pathString?)` overload that accepts an explicit PATH; the install-time check now downgrades to the actionable `⚠` diagnostic from PR #3014 when install-PATH and user-shell-PATH disagree. Windows cross-shell support tracked separately. See #3020.
**`docs/issue-driven-orchestration.md` — recipe for driving GSD from a tracker issue** — new guide that maps Symphony-style orchestration concepts (workflow, isolated agent workspace, proof-of-work, human review gate, follow-up capture) onto existing GSD primitives (`/gsd-new-workspace`, `/gsd-manager`, `/gsd-autonomous`, `/gsd-verify-work`, `/gsd-review`, `/gsd-ship`, `STATE.md`, phase artifacts). Documentation only — no new commands, no daemon, no tracker integration.
/gsd-reapply-patches Step 5 verifier now resolves at runtime — moved scripts/verify-reapply-patches.cjs to get-shit-done/bin/ which is shipped by the installer. The legacy scripts/ directory is not copied to user installs. See #2994.
Managed JS hooks now resolve under GUI/minimal-PATH runtimes — installer emits process.execPath (absolute, quoted, forward-slash-normalized) as the runner for every .js hook command instead of bare node. See #2979.
Post-install path smoke test for workflow-invoked scripts — audits every node ${GSD_HOME}/...cjs invocation in workflows resolves at the runtime-installed path. See #2995.
**Actionable diagnostic when `gsd-sdk` is not on PATH after install** — Windows users (and others on multi-shell setups) reported that the previous "GSD SDK files are present but `gsd-sdk` is not on your PATH" warning gave them no way to fix it: no path to look at, no shell-specific commands, no mention of the npx-cache caveat. New `formatSdkPathDiagnostic({ shimDir, platform, runDir })` helper returns a typed IR with the resolved shim location, platform-specific PATH-export commands (PowerShell / cmd.exe / Git Bash on Windows; `export PATH` on POSIX), and an npx-specific note when running under an `_npx` cache segment (where the shim may be written to a temp dir that won't persist). The console renderer in `bin/install.js` emits the lines from the IR; tests assert on the typed fields directly. (#3011)
**Documentation: MCP tool schema as a context-budget concern (#3025).** Adds new sections to `get-shit-done/references/context-budget.md` and `docs/USER-GUIDE.md` explaining that every enabled MCP server injects its tool schema into every turn — heavyweight servers (browser/playwright, Mac-tools, Windows-tools) can cost 20k+ tokens each, often dwarfing what `model_profile` tuning saves. The toggle lives in `.claude/settings.json` (`enabledMcpjsonServers` / `disabledMcpjsonServers`) and is a Claude Code harness concern, not a GSD concern. Includes a pre-phase audit checklist (browser, platform-specific, cross-project, duplicates) and notes the multiplier interaction with `model_profile`. Companion to #3023 (per-phase-type model map) and #3024 (dynamic routing); together they cover the three biggest cost levers.
SDK config-set/config-get and init responses no longer echo plaintext API keys. New sdk/src/query/secrets.ts ports SECRET_CONFIG_KEYS masking from CJS; init bundles only mask string values to preserve the boolean availability-flag contract. See #2997.
/gsd-update queries wrong npm package names — moved package name into a deterministic check-latest-version.cjs script and updated the workflow to use ${GSD_DIR} from get_installed_version. See #2992.
**PR templates now point at the changeset workflow** — the `Fix`, `Enhancement`, and `Feature` PR templates previously asked contributors to tick `CHANGELOG.md updated`, which contradicted the post-#2978 rule that `CHANGELOG.md` must not be edited directly. Each checkbox now references `npm run changeset` (and the `no-changelog` opt-out where applicable).
**`models` block in `.planning/config.json` for per-phase-type model selection (#3023).** A new resolution layer between per-agent `model_overrides` and the `model_profile` tier table. Six named slots (`planning` / `discuss` / `research` / `execution` / `verification` / `completion`) accept tier aliases (`opus` / `sonnet` / `haiku` / `inherit`). Lets you express "Opus for planning, Sonnet for the rest" in two lines without learning the agent taxonomy. Fully backward compatible — configs without `models` behave exactly as today.
gsd-pristine/ is now populated by the installer when local patches are detected — saveLocalPatches calls a new populatePristineDir helper that runs the install transform pipeline into a tmp staging dir and copies modified files into pristineDir. The reapply-patches Step 5 verifier no longer falls back to its over-broad heuristic. See #2998.
SDK config-set/config-get and init responses no longer echo plaintext API keys. New sdk/src/query/secrets.ts ports SECRET_CONFIG_KEYS masking from CJS; init bundles only mask string values to preserve the boolean availability-flag contract. See #2997.
Post-install path smoke test for workflow-invoked scripts — audits every node ${GSD_HOME}/...cjs invocation in workflows resolves at the runtime-installed path. See #2995.
**Query fallback orchestration now shared** — CLI and SDK query dispatch now use one planning seam for native vs CJS fallback decisions with behavior parity preserved.
**`/gsd-research-phase` consolidated into `/gsd-plan-phase --research-phase <N>`** — the standalone research command's slash-command stub was never registered (#3042). Rather than restore the orphan, the research-only capability now lives as a flag on `/gsd-plan-phase`. New modifiers: `--view` prints existing `RESEARCH.md` to stdout without spawning, `--research` forces refresh, otherwise prompts `update / view / skip` when `RESEARCH.md` already exists. Also scrubs four other stale slash-command references (`/gsd-check-todos`, `/gsd-new-workspace`, `/gsd-status`, residual `/gsd-plan-milestone-gaps`) across English + 4 localized doc sets (#3044). Closes #3042 and #3044.
**`/gsd-code-review-fix` and `/gsd-plan-milestone-gaps` no longer surface as "Unknown command"** — both were consolidated by #2790 (`/gsd-code-review --fix` and inline gap planning in `/gsd-audit-milestone` respectively), but several user-facing surfaces still emitted the old slash forms in their offer text. Fixed audit-milestone offer blocks, gsd-complete-milestone routing, code-review/execute-phase offer text, gsd-code-fixer agent role card, and the doc surfaces (USER-GUIDE, FEATURES, INVENTORY, AGENTS, CONFIGURATION). Closes #3029, closes #3034.
gsd-code-fixer worktree no longer fails on the same-branch checkout — the agent now creates a new gsd-reviewfix/ branch via git worktree add -b and fast-forwards the user's branch on cleanup. See #2990.
Extended no-source-grep lint to catch var-binding readFileSync.includes() pattern. Tests now fail when source-grep is hidden behind a parser wrapper. See #2982.
**Dispatch policy seam now returns a structured result contract** across native and fallback query execution paths (`ok`, typed error `kind`, `details`, and final `exit_code`), with CLI consuming the unified result instead of mixed throw/result handling.
**Query static command registrations now split into domain catalog modules** — preserves command order/strings while improving registry locality and maintenance.
**`GSDTools` query execution internals now use deep Module seams** — refactors runtime composition, native/subprocess adapters, and output projection behind stable public interfaces for better locality and testability.
Migrated 8 test files from raw text matching (`stdout.includes(...)`, `assert.match(stderr, ...)`) to typed-IR assertions per CONTRIBUTING.md. Adds shared `ERROR_REASON` enum and `--json-errors` flag in `core.cjs`, typed `GRAPHIFY_REASON` in `graphify.cjs`, pure `buildSdkFailFastReport()` IR builder in `bin/install.js`, and Claude Code JSON envelope output (`hookSpecificOutput` with typed fields) for `gsd-session-state.sh` and `gsd-phase-boundary.sh`. Tests now assert on structured fields (`reason`, `context`, `state_present`, `planning_modified`, etc.) instead of substring matching. See #2974.
**Optional update banner for non-GSD statusline users** — when the installer detects you've declined or kept a non-GSD statusline, it now offers an opt-in `SessionStart` banner that surfaces update availability via the existing `~/.cache/gsd/gsd-update-check.json` cache. Silent when up-to-date, rate-limits failure diagnostics to once per 24h, removed cleanly by `npx get-shit-done-cc --uninstall`.
/gsd-profile-user --refresh writes dev-preferences.md to ~/.claude/skills/gsd-dev-preferences/SKILL.md instead of the legacy commands/gsd/ directory. Installer migrates any preserved legacy file to the new location. See #2973.
/gsd-update queries wrong npm package names — moved package name into a deterministic check-latest-version.cjs script and updated the workflow to use ${GSD_DIR} from get_installed_version. See #2992.
Managed JS hooks now resolve under GUI/minimal-PATH runtimes — installer emits process.execPath (absolute, quoted, forward-slash-normalized) as the runner for every .js hook command instead of bare node. See #2979.
Extended no-source-grep lint to catch var-binding readFileSync.includes() pattern. Tests now fail when source-grep is hidden behind a parser wrapper. See #2982.
if git diff --cached --name-only | grep -Eq "^sdk/src/query/command-manifest\.|^sdk/src/query/command-aliases\.generated\.ts$|^get-shit-done/bin/lib/command-aliases\.generated\.cjs$|^sdk/scripts/gen-command-aliases\.ts$"; then
- [ ] Fix is scoped to the reported bug — no unrelated changes included
- [ ] Fix is scoped to the reported bug — no unrelated changes included
- [ ] Regression test added (or explained why not)
- [ ] Regression test added (or explained why not)
- [ ] All existing tests pass (`npm test`)
- [ ] All existing tests pass (`npm test`)
- [ ]CHANGELOG.md updated if this is a user-facing fix
- [ ]`.changeset/` fragment added if this is a user-facing fix (`npm run changeset -- --type Fixed --pr <NNN> --body "..."`) — or `no-changelog` label applied
echo "**Dry run:** branch was not pushed, so the picks below were discarded with the runner."
if [ -n "$INCLUDED" ]; then
echo ""
echo "Already-applied picks (lost — must be re-applied before resolving \`${SHA}\`):"
echo ""
echo "$INCLUDED"
fi
echo ""
echo "**To resolve:** re-run \`create\` with \`auto_cherry_pick=true\` (real, not dry-run) to materialize the partial branch on origin, then resolve \`${SHA}\` manually. Re-running with \`auto_cherry_pick=false\` would recreate the branch from \`${BASE_TAG}\` and lose every pick listed above."
else
echo "Branch \`${BRANCH}\` was pushed with picks applied up to (but not including) the conflicting commit."
echo ""
echo "**To resolve:** \`git fetch origin && git checkout ${BRANCH} && git cherry-pick -x ${SHA}\`, fix the conflict, push, then re-run \`finalize\` with \`auto_cherry_pick=false\`."
fi
} >> "$GITHUB_STEP_SUMMARY"
echo "::error::Cherry-pick of $SHA failed. See summary."
EXISTING=$(npm view get-shit-done-cc@"$VERSION" version 2>/dev/null || true)
if [ -n "$EXISTING" ]; then
echo "::warning::get-shit-done-cc@${VERSION} is already on the registry — entering reconciliation mode (skip publish, continue with tag/release/PR/dist-tag)."
echo "skip_publish=true" >> "$GITHUB_OUTPUT"
else
echo "skip_publish=false" >> "$GITHUB_OUTPUT"
fi
- name:Install and test
- name:Install and test
run:|
run:|
npm ci
npm ci
npm run test:coverage
npm run test:coverage
- name:Create PR to merge hotfix back to main
- name:Build SDK dist for tarball
if:${{ !inputs.dry_run }}
run:npm run build:sdk
- name:Verify CC tarball ships sdk/dist/cli.js (bug#2647 guard)
run:bash scripts/verify-tarball-sdk-dist.sh
- name:Pack SDK as tarball and bundle into CC source tree
echo "Base: \`$BASE_TAG\` → Branch: \`$BRANCH\`$([ "$DRY_RUN" = "true" ] && echo " (DRY RUN — local only)")"
echo ""
if [ -n "$INCLUDED" ]; then
echo "### Included (fix/chore)"
echo ""
echo "$INCLUDED"
else
echo "_No fix/chore commits to include._"
fi
if [ -n "$NON_SHIPPED_SKIPPED" ]; then
echo "### Skipped — touches no shipped paths (informational)"
echo ""
echo "These fix/chore commits don't touch any path in the npm tarball's \`files\` whitelist (or \`package.json\`), so they cannot change the published package's behavior. CI / test / docs / planning-only changes belong on \`main\`, not in a hotfix. No action needed."
git commit -m "chore: bump version to $VERSION for hotfix"
fi
if [ "$DRY_RUN" != "true" ]; then
git push origin "$BRANCH"
else
echo "DRY RUN — cherry-picks applied locally; branch not pushed. Downstream install-smoke will run against \`$BASE_TAG\` (the cherry-pick verification above is the dry-run signal)."
fi
- name:Determine effective ref
id:out
env:
ACTION:${{ inputs.action }}
INPUT_REF:${{ inputs.ref }}
DRY_RUN:${{ inputs.dry_run }}
BASE_TAG:${{ steps.hotfix.outputs.base_tag }}
BRANCH:${{ steps.hotfix.outputs.branch }}
run:|
if [ "$ACTION" = "hotfix" ]; then
if [ "$DRY_RUN" = "true" ]; then
echo "ref=$BASE_TAG" >> "$GITHUB_OUTPUT"
else
echo "ref=$BRANCH" >> "$GITHUB_OUTPUT"
fi
else
echo "ref=$INPUT_REF" >> "$GITHUB_OUTPUT"
fi
# Cross-platform install validation gate (parity with release.yml).
install-smoke:
needs:prepare
permissions:
contents:read
uses:./.github/workflows/install-smoke.yml
with:
ref:${{ needs.prepare.outputs.ref }}
release:
needs:[prepare, install-smoke]
runs-on:ubuntu-latest
timeout-minutes:15
permissions:
contents:write # tag + push + GitHub Release
id-token:write # provenance
# The merge-back PR step (and the pull-request scope it required)
# was removed in #2983 — auto-cherry-pick hotfix flow only picks
# commits already on main, so there's nothing to merge back.
while git tag -l "v${BASE}-dev.${N}" | grep -q .; do
N=$((N + 1))
done
VERSION="${BASE}-dev.${N}"
;;
next)
N=1
while git tag -l "v${BASE}-rc.${N}" | grep -q .; do
N=$((N + 1))
done
VERSION="${BASE}-rc.${N}"
;;
latest)
VERSION="$BASE"
;;
*)
echo "::error::Unknown tag '$INPUT_TAG' (expected dev|next|latest)"
exit 1
;;
esac
fi
echo "version=$VERSION" >> "$GITHUB_OUTPUT"
echo "tag=$INPUT_TAG" >> "$GITHUB_OUTPUT"
echo "→ Will publish v${VERSION} to dist-tag '${INPUT_TAG}'"
# Reconciliation mode: if version is already on npm (a prior run
# published successfully but a downstream step failed), don't hard-fail.
# Set a flag and skip the publish step below; tag/release/PR/dist-tag
# steps still execute so the rerun can finish reconciling state.
- name:Detect prior publish (reconciliation mode)
id:prior_publish
env:
VERSION:${{ steps.ver.outputs.version }}
run:|
EXISTING=$(npm view get-shit-done-cc@"$VERSION" version 2>/dev/null || true)
if [ -n "$EXISTING" ]; then
echo "::warning::get-shit-done-cc@${VERSION} is already on the registry — entering reconciliation mode (skip publish, continue with tag/release/PR/dist-tag)."
echo "skip_publish=true" >> "$GITHUB_OUTPUT"
else
echo "skip_publish=false" >> "$GITHUB_OUTPUT"
fi
# Tolerant tag-existence check (matches release.yml pattern). An
# operator re-running after a mid-flight publish-step failure should
# not be blocked just because the tag step succeeded last time. Only
# error if the existing tag points at a different commit than HEAD.
- name:Check git tag (skip if matches HEAD, error if mismatched)
env:
VERSION:${{ steps.ver.outputs.version }}
run:|
if git rev-parse -q --verify "refs/tags/v${VERSION}" >/dev/null; then
'This PR does not reference an issue. **All PRs must link to an open issue** using a closing keyword in the PR body:',
'This PR does not reference an issue. **All PRs must link to an open issue** using a closing keyword in the PR body:',
'',
'',
@@ -46,7 +47,13 @@ jobs:
'',
'',
`If no issue exists for this change, [open one first](${repoUrl}/issues/new/choose), then update this PR body with the reference.`,
`If no issue exists for this change, [open one first](${repoUrl}/issues/new/choose), then update this PR body with the reference.`,
'',
'',
'This PR will remain blocked until a valid `Closes #NNN`, `Fixes #NNN`, or `Resolves #NNN` line is present in the description.',
'To resume work after fixing the body: edit the PR description to add a valid `Closes #NNN`, `Fixes #NNN`, or `Resolves #NNN` line, then click **Reopen pull request**. The workflow will re-evaluate on reopen.',
].join('\n')
].join('\n')
});
});
core.setFailed('PR body must contain a closing issue reference (e.g. "Closes #123")');
await github.rest.pulls.update({
owner: context.repo.owner,
repo: context.repo.repo,
pull_number: prNumber,
state: 'closed',
});
core.setFailed('PR body must contain a closing issue reference (e.g. "Closes #123") — PR closed.');
`{base}-canary.{N}` builds of `get-shit-done-cc` and `@gsd-build/sdk` under the
`canary` dist-tag on demand via `workflow_dispatch` (manual trigger only — auto-publish
on every push to main was rejected because submission rate is too high). Includes an
optional `dry_run` boolean and the same publish-verification gate as `release.yml`. (#2828)
### Fixed
### Enhancement
- **Test suite for `config-schema.cjs` is now mutation-resistant** — Stryker measured a 4.62% mutation score on `get-shit-done/bin/lib/config-schema.cjs` (6 killed, 124 survived out of 130). Surviving mutants flagged that existing tests were exercising paths but not verifying outputs: a polarity flip (`return true` → `return false`), a predicate swap (`.some` → `.every`), or a guard removal (`if (VALID_CONFIG_KEYS.has(...)) return true;` → unguarded fallthrough) all passed every test. New `tests/bug-2986-config-schema-mutation-killers.test.cjs` adds 95 tests across four suites that target each surviving mutant class: (1) parameterized `isValidConfigKey('${key}') === true` for every member of `VALID_CONFIG_KEYS` (kills the static-key-fast-path mutation), (2) representative dynamic-pattern keys that match exactly one pattern (kills the `.some` → `.every` mutation, with an inline mutual-exclusivity invariant check), (3) `strictEqual` against the literal boolean `true`/`false` instead of `assert.ok` truthy checks (kills polarity-flip mutations), (4) anchor-tightening cases that differ from valid keys by one character beyond the documented shape (kills regex-loosening mutations on `^`, `$`, and character-class boundaries). Tests use the lib's public surface (typed boolean assertions on `isValidConfigKey` return values), no source-grep. (#2986)
- **Hotfix release flow now auto-incorporates fixes from `main` and bundles the SDK** — `hotfix.yml create` auto-cherry-picks every `fix:`/`chore:` commit on `origin/main` not yet shipped (oldest-first; patch-equivalents skipped via `git cherry`; `feat:`/`refactor:` excluded; conflicts halt with the offending SHA; run summary lists every included SHA). `hotfix.yml finalize` adds the `install-smoke` cross-platform gate, bundles `sdk-bundle/gsd-sdk.tgz` inside the CC tarball (parity with `release-sdk.yml`), tightens the `next` dist-tag re-point, and marks the GitHub Release `--latest`. `release-sdk.yml` gains `action: publish | hotfix` plus an `auto_cherry_pick` toggle, with a new `prepare` job that branches `hotfix/X.YY.Z` from the highest existing `vX.YY.*` tag and runs the same cherry-pick logic — idempotent if the branch was pre-prepared via `hotfix.yml`. Hotfix `vX.YY.Z` is now defined as everything in `vX.YY.{Z-1}` plus every `fix:`/`chore:` since that base, so each tag is the cumulative-fix anchor for the next. (#2955)
- **Planning workspace seam extracted from `core.cjs` into `planning-workspace.cjs`** — path/workstream/lock behavior now lives in a dedicated module (`planningDir`, `planningPaths`, `planningRoot`, active-workstream routing, `withPlanningLock`). `core.cjs` keeps compatibility re-exports while call-sites migrate to direct imports, improving locality and reducing coupling. (#2900)
- **Skill surface consolidated 86 → 59 `commands/gsd/*.md` entries** — four new
grouped skills (`capture`, `phase`, `config`, `workspace`) replace clusters of
micro-skills. Six existing parents absorb wrap-up and sub-operations as flags:
on-demand `Read()` calls gated behind mode routing. Tokens loaded at skill entry drop
from ~13k to near zero; only the branch actually invoked is loaded. (#2606)
### Fix
- **`gsd-pristine/` is now populated by the installer when local patches are detected** — `saveLocalPatches` declared a `pristineDir` variable and JSDoc'd "saves pristine copies (from manifest) to gsd-pristine/ to enable three-way merge during reapply-patches", but no code ever wrote to that directory. Effect: the `/gsd-reapply-patches` Step 5 verifier (#2972) silently degraded to its over-broad fallback heuristic ("every significant backup line"), exactly the silent-success-on-lost-content failure mode #2969 was designed to prevent. Fix: new `populatePristineDir({ packageSrc, pristineDir, modified, runtime, pathPrefix, isGlobal })` helper runs the install transform pipeline (`copyWithPathReplacement`) into a tmp staging dir, then copies out only the modified-file paths into `gsd-pristine/`. `saveLocalPatches` now accepts a `pristineCtx` and calls the helper when local patches are detected; the install entry point passes the package source root, runtime, pathPrefix, and isGlobal so transforms produce byte-identical output to what `copyWithPathReplacement` would have written under normal install. Soft-fails on transform errors (logs a warning, continues with empty pristine — no worse than pre-fix behavior). Pristine reflects the about-to-install version's content, which is what the verifier needs as the "what would survive without the user's modifications" baseline. Regression covered by `tests/bug-2998-pristine-dir-populated.test.cjs` (6 tests across two suites): asserts the helper is exported, returns 0 for empty modified list, writes one pristine file per source-existing path, skips ghost paths without corrupting pristine, and produces deterministic output (two runs with same inputs yield byte-identical pristine — the property `pristine_hashes` in `backup-meta.json` depends on). (#2998)
- **`release-sdk` hotfix re-run no longer fails at `Dry-run publish validation` when the version is already on npm** — the `Detect prior publish (reconciliation mode)` step sets `skip_publish=true` when the package version is already on the registry, and the actual publish step honors that gate. The `Dry-run publish validation` step was missing the same guard, so any operator re-run of an already-published hotfix (the typical recovery path when later steps fail mid-flight) hit `npm publish --dry-run` first and got `npm error You cannot publish over the previously published versions: X.Y.Z` — `npm publish --dry-run` contacts the registry and rejects existing-version targets even though it doesn't actually publish. The dry-run validation step is now gated on the same `steps.prior_publish.outputs.skip_publish != 'true'` condition as the publish step. The rehearsal still runs on first publishes (where it has value); it skips only in the specific reconciliation case where the publish itself would be skipped. Trigger run: [25233855236](https://github.com/gsd-build/get-shit-done/actions/runs/25233855236/job/73995605643). Regression covered by `tests/bug-2987-dry-run-validation-skip-on-reconciliation.test.cjs`. (#2987)
- **`release-sdk` hotfix flow hardened against silent classifier failures, missing-classifier-at-base-tag, and a vestigial merge-back PR step** — three issues surfaced by CodeRabbit's post-merge review of #2981 plus a production failure on the v1.39.1 release run. **(1)** `scripts/diff-touches-shipped-paths.cjs` reused exit code `1` for both the legitimate "no shipped paths" classifier result and Node's default uncaught-throw exit, so any tooling failure was indistinguishable from a normal skip. The script now uses `0` (shipped), `1` (not shipped), `2` (classifier error) with `try`/`catch` + `uncaughtException`/`unhandledRejection` handlers routing all failure paths to exit `2`. **(2)** The workflow's `git checkout -b "$BRANCH" "$BASE_TAG"` overwrote the working tree with the base tag's contents *before* the cherry-pick loop ran the classifier — but base tags predating the classifier's introduction (notably v1.39.0) don't have the file in their tree, so `node scripts/diff-touches-shipped-paths.cjs` would exit non-zero and silently drop every commit, producing an empty hotfix release. The classifier is now staged into `$RUNNER_TEMP` at the top of `Prepare hotfix branch` (before any working-tree-mutating git command), and the loop references that staged copy. The cherry-pick loop snapshots `$PIPESTATUS` into a local array (`PIPE_RC=("${PIPESTATUS[@]}")`) immediately after the classifier pipeline — under bracketed `set +e`/`set -e` — and dispatches via explicit `case`: `0` proceeds, `1` skips into `NON_SHIPPED_SKIPPED`, anything else emits `::error::shipped-paths classifier failed for $SHA (exit N)` and fails the workflow. CodeRabbit on PR #2984 caught a subtler bug in the first iteration: `pipeline \|\| true; RC=${PIPESTATUS[1]}` is broken because `\|\| true` runs `true` as its own one-command pipeline on the failure paths, overwriting `PIPESTATUS` to `(0)` and leaving `${PIPESTATUS[1]}` unset. The array-snapshot form is invariant against this. The same hardening also surfaces `git diff-tree`'s exit code (via `PIPE_RC[0]`); a non-zero diff-tree result now also fails the workflow rather than feeding partial input to the classifier. **(3)** Removed the `Open merge-back PR (hotfix only)` step. The auto-cherry-pick hotfix flow only picks commits already on main (`git cherry HEAD origin/main` outputs the unmerged ones), so by construction every code commit on the hotfix branch is already on main. The only hotfix-branch-only commit is the version-bump chore, which would either no-op against main or rewind main's in-progress version. The step also failed in production with `GitHub Actions is not permitted to create or approve pull requests (createPullRequest)` (org policy) on run [25232968975](https://github.com/gsd-build/get-shit-done/actions/runs/25232968975). The `pull-requests: write` permission previously granted to the release job has been dropped in line with least-privilege. The run-summary line that previously echoed `Merge-back PR opened against main` has been replaced with `No merge-back PR (auto-picked commits are already on main)` so operators reading the summary see an accurate non-action statement (CodeRabbit on PR #2984). Regression covered by `tests/bug-2983-classifier-exit-codes-and-base-tag-staging.test.cjs` (15 assertions across exit-code semantics, classifier staging, error dispatch, PIPESTATUS-snapshot hardening, diff-tree fail-fast, merge-back removal, and run-summary accuracy). (#2983)
- **`release-sdk` hotfix only cherry-picks commits that change what actually ships** — the `fix:`/`chore:` filter in `Prepare hotfix branch` was too broad: it picked any commit with that conventional-commit type regardless of whether the diff could affect the published npm package. CI-only fixes (release-sdk.yml itself, hotfix tooling, test-only commits) were getting cherry-picked into hotfix branches even though they cannot change the tarball — and the subset touching `.github/workflows/*` then caused the prepare job's `git push` to be rejected by GitHub because the default `GITHUB_TOKEN` lacks the `workflow` scope, aborting the run. v1.39.1 hit this on PR #2977 (run [25232010071](https://github.com/gsd-build/get-shit-done/actions/runs/25232010071)). The loop now pre-skips any candidate commit whose `git diff-tree` output doesn't intersect the npm tarball's shipped paths (entries in `package.json``files`, plus `package.json` itself, which `npm pack` always includes). Skipped commits land in a new `NON_SHIPPED_SKIPPED` summary bucket framed as informational — non-shipping commits cannot affect the package, so the skip needs no operator action. The shipped-paths classifier lives in `scripts/diff-touches-shipped-paths.cjs` so its rules (file-OR-directory prefix matching `npm pack` semantics, the always-shipped rule for `package.json`, the lockfile-not-shipped rule) are unit-testable. Regression covered by `tests/bug-2980-hotfix-only-picks-shipping-changes.test.cjs`. (#2980)
- **`release-sdk` hotfix workflow fails on real run with `npm error Version not changed`** — the `release` job's `Bump in-tree version (not committed)` step ran `npm version "$VERSION"` without `--allow-same-version`, so it errored on real (non-dry-run) hotfix runs because `prepare` had already committed the bump on the hotfix branch. The release job's checkout `ref` is asymmetric — `BRANCH` (already bumped) on real runs vs `BASE_TAG` (older version) on dry-runs — which is why dry-run never caught the bug. Both `npm version` calls in that step now pass `--allow-same-version`, matching the existing pattern in `release.yml:326`. (#2976)
- **Stale deleted command references updated across workflow files** — `help.md`, `do.md`, `settings.md`, `discuss-phase.md`, `new-project.md`, `plan-phase.md`, `spike.md`, and `sketch.md` referenced command names removed in #2790; updated to new consolidated equivalents. (#2950)
- **`spike --wrap-up` now dispatches correctly** — `/gsd-spike --wrap-up` was silently no-oping because the flag dispatch wiring was omitted when the micro-skill entry point was absorbed in #2790. (#2948)
- **`config-get context_window` returns `200000` when key absent** — querying an unset `context_window` previously exited 1 with "Key not found", surfacing a confusing error in planning logs even though the workflow fallback worked correctly. `cmdConfigGet` now consults a `SCHEMA_DEFAULTS` map and returns the documented default (`200000`, exit 0) for absent schema-defaulted keys; unknown absent keys still error as before. (#2943)
- **`gap-analysis` now parses non-`REQ-` requirement IDs and ignores traceability table headers** — `parseRequirements()` no longer hard-codes the `REQ-` prefix and now accepts uppercase prefixed IDs such as `TST-01`, `BACK-07`, and `INSP-04`; markdown table header rows (for example `| REQ-ID | ... |`) are excluded so header tokens are not reported as phantom uncovered requirements. Added regression coverage for mixed-prefix REQUIREMENTS files with traceability tables. (#2897)
- **Gemini slash commands namespaced as `/gsd:<cmd>` instead of `/gsd-<cmd>`** —
Gemini CLI namespaces commands under `gsd:`, so `/gsd-plan-phase` was unexecutable.
Body-text references in commands, agents, banners, and patch-reapply hints are now
converted via a roster-checked regex (boundary lookbehind + extension-aware
lookahead + roster lookup, defense-in-depth). The roster fail-loud guard prevents
silent no-op'ing if `commands/gsd/` is ever missing. (#2768, #2783)
sites now wrap descriptions in `yamlQuote(...)` (= `JSON.stringify`, a valid YAML
1.2 double-quoted scalar). (#2876)
- **`gsd-tools` invocations use the absolute installed path** — bare `gsd-tools …`
calls inside skill bodies relied on PATH resolution that is not guaranteed in every
runtime; replaced with the absolute path emitted at install time. (#2851)
- **Codex installer preserves trailing newline when stripping legacy hooks** — the
legacy-hook strip in the Codex installer ran against files with no terminating
newline at EOF and emitted a config that lost the newline, breaking downstream
parsers. (#2866)
- **GSD slash command namespace drift cleaned up across docs, workflows, and autocomplete** — remaining active `/gsd:<cmd>` references now use canonical `/gsd-<cmd>`, escaped workflow `Skill(skill=\"gsd:...\")` prompts now use hyphenated skill names, `scripts/fix-slash-commands.cjs` rewrites retired colon syntax to hyphen syntax, and the extract-learnings command file now uses `extract-learnings.md` so generated Claude/Qwen skill autocomplete exposes `gsd-extract-learnings` instead of `gsd-extract_learnings`. (#2855)
- **`extractCurrentMilestone` no longer truncates ROADMAP.md at heading-like lines inside fenced code blocks** — the milestone-end search now scans line-by-line while tracking ` ``` ` / `~~~` fence state, so a line like `# Ops runbook (v1.0 compat)` inside a code block no longer acts as a milestone boundary. Previously, any phase defined after such a block was invisible to `roadmap analyze`, `roadmap get-phase`, `/gsd-autonomous`, and all phase-number commands. (#2787)
- **Codex install no longer corrupts existing `~/.codex/config.toml`** — the installer
- **Codex install no longer corrupts existing `~/.codex/config.toml`** — the installer
now defensively strips legacy `[agents]` (single-bracket) and `[[agents]]` (sequence)
now defensively strips legacy `[agents]` (single-bracket) and `[[agents]]` (sequence)
blocks regardless of GSD marker presence (both invalid in current Codex schema), emits
blocks regardless of GSD marker presence (both invalid in current Codex schema), emits
@@ -26,6 +145,181 @@ Format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
values, and unsupported value types. Both pre-write helper failures and write-time
values, and unsupported value types. Both pre-write helper failures and write-time
failures restore the pre-install snapshot and abort with a clear error rather than
failures restore the pre-install snapshot and abort with a clear error rather than
warn-and-continue. (#2760)
warn-and-continue. (#2760)
- **Codex hooks migrator correctness hardening** — four edge-cases in the
`[[hooks.<Event>]]` → `[[hooks.<Event>.hooks]]` migration path fixed: (1) the TOML
key parser in hook-body classification now uses `parseTomlKey()` instead of a bare
regex, so hyphenated keys (e.g. `status-message`) and quoted keys are no longer
silently dropped; (2) `buildNestedBlock` no longer synthesises an empty
`[[hooks.TYPE.hooks]]` sub-table for matcher-only sections that carry no handler
fields — previously produced a broken entry with `type = "command"` but no
`command`; (3) the `legacyMapSections` filter now uses the parsed segment count
instead of dot-splitting the path string, preventing three-segment tables such as
`[hooks.SessionStart.hooks]` from being misclassified as event entries (same class
of bug fixed for `staleNamespacedAotSections` in the previous round); (4) regression
test added: `[[hooks."before.tool"]]` (a quoted key containing a dot) is correctly
treated as a two-segment namespace and not split on the inner dot. (#2809)
- **Codex `[[agents]]` reverted to `[agents.<name>]` struct format** — the sequence
format introduced in #2645 is rejected by codex-cli 0.124.0 with "invalid type:
sequence, expected struct AgentsToml". Reverted to struct format which is correct for
0.120.0+. The self-healing stripper handles both formats for configs written by prior
rescue blocks used `git ls-files --exclude-standard` which honoured `.gitignore`,
silently no-op'ing when `.planning/` was excluded; the worktree was then deleted with
the SUMMARY. Replaced with filesystem-level `find` + idempotent `cp` that bypasses git
entirely. (#2838)
- **`/gsd-code-review-fix` cleanup tail is transactional** — JSON recovery sentinel at
`${phase_dir}/.review-fix-recovery-pending.json` is written after `git worktree add`
succeeds and removed only after `git worktree remove` returns. A new run that finds a
pre-existing sentinel force-removes the orphan worktree before starting fresh, making
the agent self-healing across crashes. (#2839)
## [1.39.1] - 2026-05-01
Hotfix release. Cherry-picks user-facing fixes from `main` onto the v1.39.0 stable
line. Install: `npm install -g get-shit-done-cc@latest` (or `@1.39.1` to pin).
### Fixed
- **`gsd-sdk query agent-skills` emits raw `<agent_skills>` block instead of JSON-wrapped string** — workflows that embed via `$(gsd-sdk query agent-skills <agent>)` were receiving a JSON-quoted string literal mid-prompt (e.g. `"<agent_skills>\n…"`), silently breaking all `<agent_skills>` injection into spawned subagents. The CLI dispatcher now honors an opt-in `format: 'text'` field on `QueryResult` and writes such results raw via `process.stdout.write`; `--pick` always returns JSON regardless. (#2917)
- **`sketch --wrap-up` now dispatches correctly** — `/gsd-sketch --wrap-up` was silently no-oping because the flag dispatch wiring was omitted when the micro-skill entry point was absorbed in #2790. (#2949)
- **`help.md` no longer advertises eight slash commands removed by the #2824 consolidation** — `/gsd-do`, `/gsd-note`, `/gsd-check-todos`, `/gsd-plant-seed`, `/gsd-research-phase`, `/gsd-list-phase-assumptions`, `/gsd-plan-milestone-gaps`, and `/gsd-join-discord` were removed when 86 skills were folded into 59. `help.md` was not updated alongside, so users typing the documented commands hit *Unknown command*. Each entry is now either rewritten to the surviving flag-based dispatcher (e.g., `/gsd-do …` → `/gsd-progress --do "…"`, `/gsd-note` → `/gsd-capture --note`, `/gsd-plant-seed` → `/gsd-capture --seed`, `/gsd-check-todos` → `/gsd-capture --list`) or removed for skills with no replacement. A regression test now asserts every `/gsd-*` reference in `help.md` has a matching `commands/gsd/*.md` stub. (#2954)
- **`--sdk` install on Windows now writes a callable `gsd-sdk` shim** — `npx get-shit-done-cc@latest --claude --global --sdk` on Windows previously left `gsd-sdk` off PATH because `trySelfLinkGsdSdk` returned `null` unconditionally on `win32` (a missed gap from #2775's POSIX self-link, not an intentional deferral). The function now dispatches to a Windows counterpart that writes the standard npm shim triple (`gsd-sdk.cmd`, `gsd-sdk.ps1`, and a Bash wrapper) to npm's global bin, so `gsd-sdk` resolves in a fresh shell across cmd.exe, PowerShell, and Cygwin/MSYS/Git-Bash. A new regression guard in `tests/no-unconditional-win32-skip.test.cjs` blocks any future `if (process.platform === 'win32') return null;` skip-only branches in `bin/install.js`. (#2962)
- **`/gsd-reapply-patches` Step 5 gate is now deterministic — no more silent content drops** — the prior gate parsed a Claude-generated *Hunk Verification Table* whose `verified: yes` rows were filled in without actually checking content presence, leading to merged files that lost user-added blocks (e.g., a `<visual_companion>` section, an `--execute-only` flag block) while the workflow reported success. The gate now invokes a Node script (`scripts/verify-reapply-patches.cjs`) that diffs each backup against the pristine baseline, computes the user-added significant lines, and asserts each one is present in the merged file. Exits non-zero with a per-file diagnostic on any miss; the workflow halts and surfaces the JSON output to the user. The verifier ignores low-signal lines (too short, pure whitespace, decorative comments) so trivial differences don't trigger false failures. Out of scope here: the manifest-baseline tightening described in #2969 Failure 1 — that's separate work. (#2969)
Canonical command metadata Interface powering alias, catalog, and semantics generation.
### Query Runtime Context Module
Module owning query-time context resolution for `projectDir` and `ws`, including precedence and validation policy used by query adapters.
### Native Dispatch Adapter Module
Adapter Module that satisfies native query dispatch at the Dispatch Policy seam, so policy modules consume a focused dispatch Interface instead of closure-wired call sites.
### Query CLI Output Module
Module owning projection from dispatch results/errors to CLI `{ exitCode, stdoutChunks, stderrLines }` output contract.
### Query Execution Policy Module
Module owning query transport routing policy projection (`preferNative`, fallback policy, workstream subprocess forcing) at execution seam.
Canonical command normalization and resolution Interface (`query-command-resolution-strategy`) used by internal query/transport paths after dead-wrapper convergence.
### Command Topology Module
Module owning command resolution, policy projection (`mutation`, `output_mode`), unknown-command diagnosis, and handler Adapter binding at one seam for query dispatch.
### Query Pre-Project Config Policy Module
Module policy that defines query-time behavior when `.planning/config.json` is absent: use built-in defaults for parity-sensitive query Interfaces, and emit parity-aligned empty model ids for pre-project model resolution surfaces.
The following files are maintainer-owned coding standards and must be treated as canonical when contributing:
-`CONTEXT.md` — domain language and module naming standards
-`docs/adr/` — Architecture Decision Records (ADRs) for accepted architectural decisions
Contributor requirements:
- Read `CONTEXT.md` before naming or refactoring modules/interfaces/seams.
- Use `CONTEXT.md` vocabulary consistently in code comments, tests, issue/PR text, and docs for the touched area.
- Check relevant ADRs in `docs/adr/` before proposing or implementing architectural changes.
- If a change intentionally revisits an ADR decision, call it out explicitly in the linked issue and PR rationale.
- Do not rewrite maintainer intent in `CONTEXT.md`/ADRs as part of drive-by cleanup; propose focused updates tied to approved scope.
**Every PR must link to an approved issue.** PRs without a linked issue are closed without review, no exceptions.
**Every PR must link to an approved issue.** PRs without a linked issue are closed without review, no exceptions.
- **No draft PRs** — draft PRs are automatically closed. Only open a PR when it is complete, tested, and ready for review. If your work is not finished, keep it on your local branch until it is.
- **No draft PRs** — draft PRs are automatically closed. Only open a PR when it is complete, tested, and ready for review. If your work is not finished, keep it on your local branch until it is.
@@ -91,6 +105,23 @@ PRs that arrive without a properly-labeled linked issue are closed automatically
- **CI must pass** — all matrix jobs (Ubuntu × Node 22, 24; macOS × Node 24) must be green
- **CI must pass** — all matrix jobs (Ubuntu × Node 22, 24; macOS × Node 24) must be green
- **Scope matches the approved issue** — if your PR does more than what the issue describes, the extra changes will be asked to be removed or moved to a new issue
- **Scope matches the approved issue** — if your PR does more than what the issue describes, the extra changes will be asked to be removed or moved to a new issue
## CHANGELOG Entries — Drop a Fragment
**Do not edit `CHANGELOG.md` directly.** Two PRs that both append to a `### Fixed` block always conflict on merge — git can't pick a serialization order without a human. Instead, every PR with user-facing changes drops a fragment file in `.changeset/`.
```bash
npm run changeset -- --type Fixed --pr <YOUR_PR_NUMBER> \
--body "**\`/gsd-foo\` no longer drops trailing slashes** — explain the user-visible change."
```
This writes `.changeset/<adjective>-<noun>-<noun>.md`. Three random words → concurrent PRs never collide. Allowed `type:` values follow [Keep a Changelog](https://keepachangelog.com/): `Added`, `Changed`, `Deprecated`, `Removed`, `Fixed`, `Security`.
Fragments are consolidated into `CHANGELOG.md` at release time by the release workflow. See [`.changeset/README.md`](.changeset/README.md) for the format spec and [#2975](https://github.com/gsd-build/get-shit-done/issues/2975) for the rationale.
**CI enforcement:** the `Changeset Required` workflow (`scripts/changeset/lint.cjs`) fails any PR that touches `bin/`, `get-shit-done/`, `agents/`, `commands/`, `hooks/`, or `sdk/src/` without a `.changeset/*.md` fragment.
**Opt-out:** PRs with no user-facing impact (test refactors, lint config changes, CI tweaks, formatting-only changes) can add the `no-changelog` label. The lint honors it. When unsure whether a change is user-facing, **add the fragment**.
## Testing Standards
## Testing Standards
All tests use Node.js built-in test runner (`node:test`) and assertion library (`node:assert`). **Do not use Jest, Mocha, Chai, or any external test framework.**
All tests use Node.js built-in test runner (`node:test`) and assertion library (`node:assert`). **Do not use Jest, Mocha, Chai, or any external test framework.**
@@ -281,6 +312,7 @@ Some tests legitimately read source files. There are six recognized categories:
| `docs-parity` | A reference doc must stay in sync with source-defined constants (e.g., `CONFIG_DEFAULTS`). The source is the canonical list; there is no runtime API to enumerate it. |
| `docs-parity` | A reference doc must stay in sync with source-defined constants (e.g., `CONFIG_DEFAULTS`). The source is the canonical list; there is no runtime API to enumerate it. |
| `integration-test-input` | A source file is used as a real fixture input to a transformation function under test — the file is not inspected for strings but passed as data. |
| `integration-test-input` | A source file is used as a real fixture input to a transformation function under test — the file is not inspected for strings but passed as data. |
| `structural-implementation-guard` | A feature's interception or wiring point is not reachable end-to-end via `runGsdTools`. Used temporarily until a behavioral path exists. |
| `structural-implementation-guard` | A feature's interception or wiring point is not reachable end-to-end via `runGsdTools`. Used temporarily until a behavioral path exists. |
| `pending-migration-to-typed-ir` | **Tracked for correction, not exempted.** Test was identified by the lint as carrying a raw-text-matching pattern that contradicts the rule above. Each annotated file MUST cite the open migration issue (e.g. `// allow-test-rule: pending-migration-to-typed-ir [#NNNN]`) so the tracking is auditable. New tests cannot use this category — they must refactor production to expose typed IR. The annotation is removed when the test is corrected. |
Annotate with a standalone `//` comment before the file's opening block comment:
Annotate with a standalone `//` comment before the file's opening block comment:
@@ -296,6 +328,68 @@ Annotate with a standalone `//` comment before the file's opening block comment:
The annotation **must** be a standalone `// allow-test-rule:` line, not inside a `/** */` block comment — the CI linter scans for the pattern `// allow-test-rule:`.
The annotation **must** be a standalone `// allow-test-rule:` line, not inside a `/** */` block comment — the CI linter scans for the pattern `// allow-test-rule:`.
### Prohibited: Raw Text Matching on Test Outputs (file content, stdout, stderr)
**Source-grep is not just `readFileSync` of a `.cjs` file.** The same anti-pattern shows up wherever a test pattern-matches against text that a system-under-test produced, regardless of whether that text came from a source file, a rendered shim, a child process's stdout, or a free-form `reason` string. **All forms are forbidden.**
The following are all violations of the same rule:
```javascript
// BAD — substring match on text written by the code under test
// BAD — assert.match on a free-form `reason` string from a JSON report
assert.ok(/not a regular file/.test(report.results[0].reason));
```
Each of these passes on accidental near-matches (a comment containing `@node` somewhere, a stack trace that happens to say `Failures: 1`, a mis-typed reason that still contains the substring you're matching) and fails on harmless reformatting (changing `Failures: 1` to `1 failure`, swapping CRLF rendering style, rewording the error prose).
#### The rule
> **Tests assert on typed structured values. If the code under test produces text, the code under test must also expose a structured intermediate representation, and the test must assert on that IR — never on the rendered text.**
Concretely: for any system-under-test that produces text output (a file renderer, a CLI formatter, an error-message builder), the production code MUST expose a typed alternative that the test consumes:
| Output kind | Required structured surface | What the test asserts on |
|---|---|---|
| Rendered file (shim, template, generated code) | A pure builder function returning the IR (`{ invocation, eol, fileNames, render }`) | `triple.invocation.target === expected`, `triple.eol.cmd === '\r\n'` |
| CLI human-formatter output | A `--json` mode that emits the same data structurally | `report.results[0].reason === REASON.FAIL_INSTALLED_NOT_REGULAR_FILE` |
| Error / status / reason | A frozen enum (`Object.freeze({ FAIL_X: 'fail_x', ... })`) | `assert.equal(result.reason, REASON.FAIL_X)` |
| File presence after a write | `fs.statSync().isFile()`, `.size > 0`, `.mtimeMs` advances | Filesystem facts; never read the file content back |
#### Concrete examples from this repo
`buildWindowsShimTriple(shimSrc)` in `bin/install.js` is the canonical IR pattern: pure function, no I/O, returns `{ invocation, eol, fileNames, render }`. `trySelfLinkGsdSdkWindows` calls it and writes `triple.render[kind]()` to disk. Tests assert on `triple.invocation.target`, `triple.eol.cmd`, `Object.keys(triple).sort()` — never on the rendered text. Filesystem-level tests assert `fs.statSync(target).size === Buffer.byteLength(triple.render.cmd())` to prove the writer writes what the renderer produces, **without comparing content**.
`scripts/verify-reapply-patches.cjs` exposes a frozen `REASON` enum and emits it through `--json`. Tests assert `report.results[0].reason === REASON.FAIL_USER_LINES_MISSING`. The human formatter exists for operator console output only — tests must not depend on its prose. Adding a new reason code requires updating the `REASON` enum, the `--json` output, AND the test that locks `Object.keys(REASON).sort()` — three coordinated changes that prevent the code surface from drifting from the test surface.
#### Hiding grep behind a function is still grep
`parseCmdShim`, `parsePs1Invocation`, etc. that internally do `content.split(...)`, `lines[1].trim()`, `content.includes(...)` are still string manipulation. The fact that the entry point looks like a parser doesn't change what's happening underneath — the test is still asserting on the lexical shape of rendered text. The fix is not "wrap the grep in a function with a typed-looking return value." The fix is to **eliminate the rendered text from the test path entirely** by surfacing the IR.
#### When you cannot eliminate text matching
There are exactly two cases where text content is the legitimate object of a test, both already covered by the existing exemption matrix:
1.`source-text-is-the-product` — workflow `.md` / agent `.md` / command `.md` files where the deployed text IS what the runtime loads.
2.`docs-parity` — a reference doc must mirror source-defined constants and there is no runtime enumeration API.
For everything else, if a test reaches for `.includes()` / `.startsWith()` / `assert.match(text, /…/)`, the production code is missing a typed surface. **Add the typed surface; do not work around it.**
**CI enforcement:**`scripts/lint-no-source-grep.cjs` is being extended (see issue tracker for the latest scope) to flag `String#includes`/`String#startsWith`/`String#endsWith`/`assert.match` on `readFileSync` results and on `cp.spawnSync` stdout/stderr in test files, with the same `// allow-test-rule:` exemption mechanism.
### Node.js Version Compatibility
### Node.js Version Compatibility
**Node 22 is the minimum supported version.** Node 24 is the primary CI target. All tests must pass on both.
**Node 22 is the minimum supported version.** Node 24 is the primary CI target. All tests must pass on both.
If you touched any of the command-manifest or generated alias files, run:
```bash
npm run check:alias-drift
```
This verifies generated alias artifacts are in sync with manifest source-of-truth.
Optional local pre-commit hook entry (Git-native):
```bash
# one-time setup
mkdir -p .githooks
cat > .githooks/pre-commit <<'EOF'
#!/usr/bin/env bash
set -euo pipefail
if git diff --cached --name-only | grep -Eq "^sdk/src/query/command-manifest\.|^sdk/src/query/command-aliases\.generated\.ts$|^get-shit-done/bin/lib/command-aliases\.generated\.cjs$|^sdk/scripts/gen-command-aliases\.ts$"; then
npm run check:alias-drift
fi
EOF
chmod +x .githooks/pre-commit
git config core.hooksPath .githooks
```
Optional local pre-push hook to block a private author-email pattern:
The following checks run on every PR in addition to the test suite:
The following checks run on every PR in addition to the test suite:
@@ -357,6 +518,14 @@ Run locally before pushing: `npm run lint:tests`
### Test Requirements by Contribution Type
### Test Requirements by Contribution Type
### Architecture-Aware Testing Requirements
When work touches architecture, routing, policy, registry assembly, or command semantics:
- Write tests against module **interfaces** and seam behavior, not implementation trivia.
- Prefer invariant/contract tests that protect ADR-backed behavior and `CONTEXT.md` terminology.
- Ensure tests validate canonical behavior through the defined seam (for example: structured result contracts, canonical command metadata, and adapter parity), not source-text coupling.
- If ADRs define expected behavior, tests should assert those expectations directly.
The required tests differ depending on what you are contributing:
The required tests differ depending on what you are contributing:
**Bug Fix:** A regression test is required. Write the test first — it must demonstrate the original failure before your fix is applied, then pass after the fix. A PR that fixes a bug without a regression test will be asked to add one. "Tests pass" does not prove correctness; it proves the bug isn't present in the tests that exist.
**Bug Fix:** A regression test is required. Write the test first — it must demonstrate the original failure before your fix is applied, then pass after the fix. A PR that fixes a bug without a regression test will be asked to add one. "Tests pass" does not prove correctness; it proves the bug isn't present in the tests that exist.
@@ -75,15 +75,17 @@ GSD가 그걸 고칩니다. Claude Code를 신뢰할 수 있게 만드는 컨텍
내장 품질 게이트가 실제 문제를 잡아냅니다: 스키마 드리프트 감지는 마이그레이션 누락된 ORM 변경을 플래그하고, 보안 강제는 검증을 위협 모델에 고정시키고, 스코프 축소 감지는 플래너가 요구사항을 몰래 빠뜨리는 걸 방지합니다.
내장 품질 게이트가 실제 문제를 잡아냅니다: 스키마 드리프트 감지는 마이그레이션 누락된 ORM 변경을 플래그하고, 보안 강제는 검증을 위협 모델에 고정시키고, 스코프 축소 감지는 플래너가 요구사항을 몰래 빠뜨리는 걸 방지합니다.
### v1.32.0 하이라이트
### v1.39.0 하이라이트
- **STATE.md 일관성 게이트** — `state validate`가 STATE.md와 파일시스템 간 드리프트를 감지, `state sync`가 실제 프로젝트 상태에서 재구성
전체 목록은 [v1.39.0 릴리스 노트](https://github.com/gsd-build/get-shit-done/releases/tag/v1.39.0)를 참고하세요.
- **`--to N` 플래그** — 자율 실행을 특정 단계 완료 후 중지
- **리서치 게이트** — RESEARCH.md에 미해결 질문이 있으면 기획을 차단
- **`--minimal` 설치 프로파일** — 별칭 `--core-only`. 메인 루프 6개 스킬(`new-project`, `discuss-phase`, `plan-phase`, `execute-phase`, `help`, `update`)만 설치하고 `gsd-*` 서브에이전트는 설치하지 않음. 콜드 스타트 시스템 프롬프트 오버헤드를 ~12k 토큰에서 ~700 토큰으로 축소(≥94% 감소). 32K–128K 컨텍스트의 로컬 LLM이나 토큰 과금 API에 유용.
- **검증 마일스톤 스코프 필터링** — 이후 단계에서 처리될 격차는 "격차"가 아닌 "지연됨"으로 표시
- **`/gsd-edit-phase`** — `ROADMAP.md`에 있는 기존 단계의 임의 필드를 그 자리에서 수정(번호와 위치는 변경되지 않음). `--force`는 확인 diff를 건너뛰고, `depends_on` 참조를 검증하며 쓰기 시 `STATE.md`도 갱신.
- **머지 후 빌드 & 테스트 게이트** — `execute-phase` 5.6 단계가 `workflow.build_command` 설정을 우선 자동 감지하고, 없으면 Xcode(`.xcodeproj`), Makefile, Justfile, Cargo, Go, Python, npm 순으로 폴백. Xcode/iOS 프로젝트는 `xcodebuild build` 및 `xcodebuild test`를 자동 실행. 병렬·직렬 모드 모두에서 동작.
- **컨텍스트 축소** — 마크다운 잘라내기 및 캐시 친화적 프롬프트 순서로 토큰 사용량 절감
- **런타임별 리뷰 모델 선택** — `review.models.<cli>`로 각 외부 리뷰 CLI(codex, gemini 등)가 플래너/실행 프로파일과 독립적으로 자체 모델을 선택할 수 있음.
- **4개의 새 런타임** — Trae, Kilo, Augment, Cline (총 12개 런타임)
- **워크스트림 설정 상속** — `GSD_WORKSTREAM`이 설정되면 루트 `.planning/config.json`을 먼저 로드한 뒤 워크스트림 설정을 딥 머지(충돌 시 워크스트림 우선). 워크스트림 설정에서 명시적 `null`은 루트 값을 덮어씀.
- **스킬 통합: 86 → 59** — 4개의 새로운 그룹 스킬(`capture`, `phase`, `config`, `workspace`)이 31개의 마이크로 스킬을 흡수. 기존 6개의 부모 스킬은 래퍼업/하위 동작을 플래그로 흡수: `update --sync/--reapply`, `sketch --wrap-up`, `spike --wrap-up`, `map-codebase --fast/--query`, `code-review --fix`, `progress --do/--next`. 기능 손실 없음.
**A light-weight and powerful meta-prompting, context engineering and spec-driven development system for Claude Code, OpenCode, Gemini CLI, Kilo, Codex, Copilot, Cursor, Windsurf, Antigravity, Augment, Trae, Qwen Code, Cline, and CodeBuddy.**
**A light-weight and powerful meta-prompting, context engineering and spec-driven development system for Claude Code, OpenCode, Gemini CLI, Kilo, Codex, Copilot, Cursor, Windsurf, Antigravity, Augment, Trae, Qwen Code, Hermes Agent, Cline, and CodeBuddy.**
**Solves context rot — the quality degradation that happens as Claude fills its context window.**
**Solves context rot — the quality degradation that happens as Claude fills its context window.**
@@ -89,11 +89,17 @@ People who want to describe what they want and have it built correctly — witho
Built-in quality gates catch real problems: schema drift detection flags ORM changes missing migrations, security enforcement anchors verification to threat models, and scope reduction detection prevents the planner from silently dropping your requirements.
Built-in quality gates catch real problems: schema drift detection flags ORM changes missing migrations, security enforcement anchors verification to threat models, and scope reduction detection prevents the planner from silently dropping your requirements.
### v1.37.0 Highlights
### v1.39.0 Highlights
- **Spiking & sketching** — `/gsd-spike` runs 2–5 focused experiments with Given/When/Then verdicts; `/gsd-sketch` produces 2–3 interactive HTML mockup variants per design question — both store artifacts in `.planning/` and pair with wrap-up commands to package findings into project-local skills
See the [v1.39.0 release notes](https://github.com/gsd-build/get-shit-done/releases/tag/v1.39.0) for the full list.
- **Shared boilerplate extraction** — Mandatory-initial-read and project-skills-discovery logic extracted to reference files, reducing duplication across a dozen agents
- **`--minimal` install profile** — alias `--core-only`, writes only the six main-loop skills (`new-project`, `discuss-phase`, `plan-phase`, `execute-phase`, `help`, `update`) and zero `gsd-*` subagents. Cuts cold-start system-prompt overhead from ~12k tokens to ~700 (≥94% reduction). Useful for local LLMs with 32K–128K context and token-billed APIs.
- **`/gsd-edit-phase`** — modify any field of an existing phase in `ROADMAP.md` in place, without changing its number or position. `--force` skips the confirmation diff; `depends_on` references are validated and `STATE.md` is updated on write.
- **Post-merge build & test gate** — `execute-phase` step 5.6 now auto-detects the build command from `workflow.build_command`, then falls back to Xcode (`.xcodeproj`), Makefile, Justfile, Cargo, Go, Python, or npm. Xcode/iOS projects get `xcodebuild build` + `xcodebuild test` automatically. Runs in both parallel and serial mode.
- **Per-runtime review-model selection** — `review.models.<cli>` lets each external review CLI (codex, gemini, etc.) pick its own model independently of the planner/executor profile.
- **Workstream config inheritance** — when `GSD_WORKSTREAM` is set, the root `.planning/config.json` is loaded first and deep-merged with the workstream config (workstream wins on conflict). Explicit `null` in a workstream config now correctly overrides a root value.
- **Manual canary release workflow** — `.github/workflows/canary.yml` publishes `{base}-canary.{N}` builds of `get-shit-done-cc` and `@gsd-build/sdk` to the `@canary` dist-tag from `dev` on demand via `workflow_dispatch`.
- **Skill consolidation: 86 → 59** — four new grouped skills (`capture`, `phase`, `config`, `workspace`) absorb 31 micro-skills. Six existing parents absorb wrap-up and sub-operations as flags: `update --sync/--reapply`, `sketch --wrap-up`, `spike --wrap-up`, `map-codebase --fast/--query`, `code-review --fix`, `progress --do/--next`. Zero functional loss.
---
---
@@ -104,11 +110,11 @@ npx get-shit-done-cc@latest
```
```
The installer prompts you to choose:
The installer prompts you to choose:
1.**Runtime** — Claude Code, OpenCode, Gemini, Kilo, Codex, Copilot, Cursor, Windsurf, Antigravity, Augment, Trae, Qwen Code, CodeBuddy, Cline, or all (interactive multi-select — pick multiple runtimes in a single install session)
1.**Runtime** — Claude Code, OpenCode, Gemini, Kilo, Codex, Copilot, Cursor, Windsurf, Antigravity, Augment, Trae, Qwen Code, Hermes Agent, CodeBuddy, Cline, or all (interactive multi-select — pick multiple runtimes in a single install session)
2.**Location** — Global (all projects) or local (current project only)
2.**Location** — Global (all projects) or local (current project only)
npx get-shit-done-cc --qwen --global # Install to ~/.qwen/
npx get-shit-done-cc --qwen --global # Install to ~/.qwen/
npx get-shit-done-cc --qwen --local # Install to ./.qwen/
npx get-shit-done-cc --qwen --local # Install to ./.qwen/
# Hermes Agent
npx get-shit-done-cc --hermes --global # Install to ~/.hermes/ (honors $HERMES_HOME)
npx get-shit-done-cc --hermes --local # Install to ./.hermes/
# CodeBuddy
# CodeBuddy
npx get-shit-done-cc --codebuddy --global # Install to ~/.codebuddy/
npx get-shit-done-cc --codebuddy --global # Install to ~/.codebuddy/
npx get-shit-done-cc --codebuddy --local # Install to ./.codebuddy/
npx get-shit-done-cc --codebuddy --local # Install to ./.codebuddy/
@@ -192,7 +202,7 @@ npx get-shit-done-cc --all --global # Install to all directories
```
```
Use `--global` (`-g`) or `--local` (`-l`) to skip the location prompt.
Use `--global` (`-g`) or `--local` (`-l`) to skip the location prompt.
Use `--claude`, `--opencode`, `--gemini`, `--kilo`, `--codex`, `--copilot`, `--cursor`, `--windsurf`, `--antigravity`, `--augment`, `--trae`, `--qwen`, `--codebuddy`, `--cline`, or `--all` to skip the runtime prompt.
Use `--claude`, `--opencode`, `--gemini`, `--kilo`, `--codex`, `--copilot`, `--cursor`, `--windsurf`, `--antigravity`, `--augment`, `--trae`, `--qwen`,`--hermes`,`--codebuddy`, `--cline`, or `--all` to skip the runtime prompt.
The GSD SDK CLI (`gsd-sdk`) is installed automatically (required by `/gsd-*` commands). Pass `--no-sdk` to skip the SDK install, or `--sdk` to force a reinstall.
The GSD SDK CLI (`gsd-sdk`) is installed automatically (required by `/gsd-*` commands). Pass `--no-sdk` to skip the SDK install, or `--sdk` to force a reinstall.
</details>
</details>
@@ -641,7 +651,7 @@ You're never locked in. The system adapts.
| Command | What it does |
| Command | What it does |
|---------|--------------|
|---------|--------------|
| `/gsd-new-workspace` | Create isolated workspace with repo copies (worktrees or clones) |
| `/gsd-workspace --new` | Create isolated workspace with repo copies (worktrees or clones) |
| `/gsd-list-workspaces` | Show all GSD workspaces and their status |
| `/gsd-list-workspaces` | Show all GSD workspaces and their status |
| `/gsd-remove-workspace` | Remove workspace and clean up worktrees |
| `/gsd-remove-workspace` | Remove workspace and clean up worktrees |
@@ -685,9 +695,9 @@ You're never locked in. The system adapts.
|---------|--------------|
|---------|--------------|
| `/gsd-add-phase` | Append phase to roadmap |
| `/gsd-add-phase` | Append phase to roadmap |
| `/gsd-insert-phase [N]` | Insert urgent work between phases |
| `/gsd-insert-phase [N]` | Insert urgent work between phases |
| `/gsd-edit-phase [N] [--force]` | Modify any field of an existing phase in place — number and position unchanged |
| `/gsd-list-phase-assumptions [N]` | See Claude's intended approach before planning |
| `/gsd-list-phase-assumptions [N]` | See Claude's intended approach before planning |
| `/gsd-plan-milestone-gaps` | Create phases to close gaps from audit |
### Session
### Session
@@ -729,7 +739,7 @@ You're never locked in. The system adapts.
| `/gsd-settings` | Configure model profile and workflow agents |
| `/gsd-settings` | Configure model profile and workflow agents |
| `/gsd-set-profile <profile>` | Switch model profile (quality/balanced/budget/inherit) |
| `/gsd-set-profile <profile>` | Switch model profile (quality/balanced/budget/inherit) |
| `/gsd-add-todo [desc]` | Capture idea for later |
| `/gsd-add-todo [desc]` | Capture idea for later |
| `/gsd-check-todos` | List pending todos |
| `/gsd-capture --list` | List pending todos |
| `/gsd-debug [desc]` | Systematic debugging with persistent state |
| `/gsd-debug [desc]` | Systematic debugging with persistent state |
| `/gsd-do <text>` | Route freeform text to the right GSD command automatically |
| `/gsd-do <text>` | Route freeform text to the right GSD command automatically |
| `/gsd-note <text>` | Zero-friction idea capture — append, list, or promote notes to todos |
| `/gsd-note <text>` | Zero-friction idea capture — append, list, or promote notes to todos |
@@ -746,6 +756,8 @@ You're never locked in. The system adapts.
GSD stores project settings in `.planning/config.json`. Configure during `/gsd-new-project` or update later with `/gsd-settings`. For the full config schema, workflow toggles, git branching options, and per-agent model breakdown, see the [User Guide](docs/USER-GUIDE.md#configuration-reference).
GSD stores project settings in `.planning/config.json`. Configure during `/gsd-new-project` or update later with `/gsd-settings`. For the full config schema, workflow toggles, git branching options, and per-agent model breakdown, see the [User Guide](docs/USER-GUIDE.md#configuration-reference).
When `GSD_WORKSTREAM` is set, GSD loads the root `.planning/config.json` first and deep-merges the workstream's `config.json` on top — workstream values win on conflict, and an explicit `null` in a workstream config overrides a root value.
### Core Settings
### Core Settings
| Setting | Options | Default | What it controls |
| Setting | Options | Default | What it controls |
@@ -774,6 +786,8 @@ Use `inherit` when using non-Anthropic providers (OpenRouter, local models) or t
Or configure via `/gsd-settings`.
Or configure via `/gsd-settings`.
Per-runtime review-model overrides live under `review.models.<cli>` (e.g. `review.models.codex`, `review.models.gemini`) and let each external review CLI pick its own model independently of the planner/executor profile.
### Workflow Agents
### Workflow Agents
These spawn additional agents during planning/execution. They improve quality but add tokens and time.
These spawn additional agents during planning/execution. They improve quality but add tokens and time.
@@ -789,6 +803,7 @@ These spawn additional agents during planning/execution. They improve quality bu
| `workflow.build_command` | _(auto-detect)_ | Override the post-merge build gate command. Falls back to Xcode (`.xcodeproj`), Makefile, Justfile, Cargo, Go, Python, or npm; Xcode/iOS projects also run `xcodebuild test`. |
Use `/gsd-settings` to toggle these, or override per-invocation:
Use `/gsd-settings` to toggle these, or override per-invocation:
@@ -73,15 +73,17 @@ Para quem quer descrever o que precisa e receber isso construído do jeito certo
Quality gates embutidos capturam problemas reais: detecção de schema drift sinaliza mudanças ORM sem migrations, segurança ancora verificação a modelos de ameaça, e detecção de redução de escopo impede o planner de descartar requisitos silenciosamente.
Quality gates embutidos capturam problemas reais: detecção de schema drift sinaliza mudanças ORM sem migrations, segurança ancora verificação a modelos de ameaça, e detecção de redução de escopo impede o planner de descartar requisitos silenciosamente.
### Destaques v1.32.0
### Destaques v1.39.0
- **Gates de consistência STATE.md** — `state validate` detecta divergência entre STATE.md e o filesystem; `state sync` reconstrói a partir do estado real do projeto
Lista completa nas [notas de release v1.39.0](https://github.com/gsd-build/get-shit-done/releases/tag/v1.39.0).
- **Flag `--to N`** — Para a execução autônoma após completar uma fase específica
- **Research gate** — Bloqueia planejamento quando RESEARCH.md tem perguntas abertas não resolvidas
- **Perfil de instalação `--minimal`** — alias `--core-only`. Instala apenas os 6 skills do loop principal (`new-project`, `discuss-phase`, `plan-phase`, `execute-phase`, `help`, `update`) e nenhum subagente `gsd-*`. Reduz o overhead do system prompt no cold-start de ~12k para ~700 tokens (≥94% de redução). Útil para LLMs locais com contexto de 32K–128K e APIs cobradas por token.
- **Filtro de escopo do verificador** — Lacunas abordadas em fases posteriores são marcadas como "adiadas", não como lacunas
- **`/gsd-edit-phase`** — edita qualquer campo de uma fase existente em `ROADMAP.md` no lugar, sem alterar o número ou a posição. `--force` pula o diff de confirmação; referências em `depends_on` são validadas e o `STATE.md` é atualizado na escrita.
- **Guard de leitura antes de edição** — Hook consultivo previne loops de retry infinitos em runtimes não-Claude
- **Build & test gate pós-merge** — o passo 5.6 de `execute-phase` agora detecta automaticamente o comando de build em `workflow.build_command`, com fallback para Xcode (`.xcodeproj`), Makefile, Justfile, Cargo, Go, Python ou npm. Projetos Xcode/iOS rodam `xcodebuild build` e `xcodebuild test` automaticamente. Funciona em modo paralelo e serial.
- **Redução de contexto** — Truncamento de Markdown e ordenação de prompts cache-friendly para menor uso de tokens
- **Modelo de review por runtime** — `review.models.<cli>` permite que cada CLI externa de review (codex, gemini, etc.) escolha seu próprio modelo, independente do perfil de planner/executor.
- **4 novos runtimes** — Trae, Kilo, Augment e Cline (12 runtimes no total)
- **Herança de configuração de workstream** — quando `GSD_WORKSTREAM` está definido, o `.planning/config.json` raiz é carregado primeiro e merge-deep com o config da workstream (workstream vence em conflito). Um `null` explícito no config da workstream sobrescreve corretamente o valor raiz.
- **Workflow manual de canary release** — `.github/workflows/canary.yml` publica builds `{base}-canary.{N}` de `get-shit-done-cc` e `@gsd-build/sdk` na dist-tag `@canary` a partir de `dev`, sob demanda via `workflow_dispatch`.
- **Consolidação de skills: 86 → 59** — 4 novos skills agrupados (`capture`, `phase`, `config`, `workspace`) absorvem 31 micro-skills. 6 skills pais existentes absorvem wrap-up e sub-operações como flags: `update --sync/--reapply`, `sketch --wrap-up`, `spike --wrap-up`, `map-codebase --fast/--query`, `code-review --fix`, `progress --do/--next`. Sem perda funcional.
@@ -67,15 +67,38 @@ main ← stable, always deployable
### Patch Release (Hotfix)
### Patch Release (Hotfix)
For critical bugs that can't wait for the next minor release.
For fixes that need to ship without waiting for the next minor.
1. Trigger `hotfix.yml` with version (e.g., `1.27.1`)
A hotfix `vX.YY.Z` cumulatively includes everything in `vX.YY.{Z-1}` plus every`fix:`/`chore:` commit landed on `main` since that base. The base tag is the anchor — `git cherry $BASE_TAG main` reveals exactly which commits are still unshipped, and the new `vX.YY.Z` tag becomes the next hotfix's base, so the cycle is self-documenting.
2. Workflow creates `hotfix/1.27.1` branch from the latest patch tag for that minor version (e.g., `v1.27.0` or `v1.27.1`)
3. Cherry-pick or apply fix on the hotfix branch
#### Two paths
4. Push — CI runs tests automatically
5. Trigger `hotfix.yml` finalize action
**Path A — `hotfix.yml` (canonical, two-step):**
6. Workflow runs full test suite, bumps version, tags, publishes to `latest`
7.Merge hotfix branch back to main
1.Trigger `hotfix.yml` with `action=create`, `version=1.27.1`, `auto_cherry_pick=true` (default).
- Workflow detects `BASE_TAG` = highest `v1.27.*` < `v1.27.1` (so `1.27.1` branches from `v1.27.0`; `1.27.2` would branch from `v1.27.1`).
- Branches `hotfix/1.27.1` from `BASE_TAG`.
- Auto-cherry-picks every `fix:`/`chore:` commit on `origin/main` not already in the base, oldest-first. Patch-equivalents are skipped via `git cherry`. `feat:`/`refactor:` are **never** auto-included.
- On conflict the workflow halts with the offending SHA. Resolve manually on the branch, then re-run finalize with `auto_cherry_pick=false`.
- Bumps `package.json` (and `sdk/package.json`), pushes the branch, and lists every included SHA in the run summary.
2. (Optional) push additional manual commits to `hotfix/1.27.1`.
3. Trigger `hotfix.yml` with `action=finalize`. The workflow:
- Runs `install-smoke` cross-platform gate.
- Runs full test suite + coverage.
- Builds SDK, bundles `sdk-bundle/gsd-sdk.tgz` inside the CC tarball (parity with `release-sdk.yml`).
- Tags `v1.27.1`, publishes to `@latest`, re-points `@next → v1.27.1`.
- Opens merge-back PR against `main`.
**Path B — `release-sdk.yml` (stopgap, one-shot):**
Active while the `@gsd-build/sdk` npm token is unavailable; bundles the SDK inside the CC tarball.
1. Trigger `release-sdk.yml` with `action=hotfix`, `version=1.27.1`, `auto_cherry_pick=true`.
- The `prepare` job creates the branch and cherry-picks (same logic as Path A).
-`install-smoke` runs against the new branch.
- The `release` job tags, publishes to `@latest`, re-points `@next`, opens merge-back PR.
- Idempotent: if `hotfix/1.27.1` already exists (e.g. you ran `hotfix.yml create` first), the prepare job checks it out and re-runs cherry-pick as a no-op.
2.`dry_run=true` exercises the full pipeline without pushing the branch or publishing.
description: Applies fixes to code review findings from REVIEW.md. Reads source files, applies intelligent fixes, and commits each fix atomically. Spawned by /gsd-code-review-fix.
description: Applies fixes to code review findings from REVIEW.md. Reads source files, applies intelligent fixes, and commits each fix atomically. Spawned by /gsd-code-review --fix.
tools: Read, Edit, Write, Bash, Grep, Glob
tools: Read, Edit, Write, Bash, Grep, Glob
color: "#10B981"
color: "#10B981"
# hooks:
# hooks:
@@ -10,7 +10,7 @@ color: "#10B981"
<role>
<role>
You are a GSD code fixer. You apply fixes to issues found by the gsd-code-reviewer agent.
You are a GSD code fixer. You apply fixes to issues found by the gsd-code-reviewer agent.
Spawned by `/gsd-code-review-fix` workflow. You produce REVIEW-FIX.md artifact in the phase directory.
Spawned by `/gsd-code-review --fix` workflow. You produce REVIEW-FIX.md artifact in the phase directory.
Your job: Read REVIEW.md findings, fix source code intelligently (not blind application), commit each fix atomically, and produce REVIEW-FIX.md report.
Your job: Read REVIEW.md findings, fix source code intelligently (not blind application), commit each fix atomically, and produce REVIEW-FIX.md report.
@@ -214,32 +214,145 @@ If a finding references multiple files (in Fix section or Issue section):
This agent runs as a background process that makes commits. Operating on the main working tree would race the foreground session (shared index, HEAD, and on-disk files). Instead, every instance runs in its own isolated worktree.
This agent runs as a background process that makes commits. Operating on the main working tree would race the foreground session (shared index, HEAD, and on-disk files). Instead, every instance runs in its own isolated worktree.
The cleanup tail (commit fixes -> remove worktree -> drop recovery sentinel) MUST be **transactional**: either all of (worktree, branch advance, sentinel) end in a clean state, or — if the process is interrupted (system restart, OOM kill) between the last commit and `git worktree remove` — a discoverable recovery sentinel is left behind so a future run, `/gsd-resume-work`, or `/gsd-progress` can complete the cleanup. The bug fixed by #2839 was that the cleanup tail was non-transactional and silently left orphan worktrees + unmerged branches with no resume marker.
```bash
```bash
# Derive worktree path from padded_phase (parsed from config in next step,
# Derive worktree path from padded_phase (parsed from config in next step,
# but the shell snippet below is illustrative — adapt once config is parsed).
# but the shell snippet below is illustrative — adapt once config is parsed).
# In practice: parse padded_phase from config first, then run:
# In practice: parse padded_phase from config first, then run:
branch=$(git branch --show-current)
branch=$(git branch --show-current)
test -n "$branch" || { echo "Detached HEAD is not supported for review-fix (#2686)"; exit 1; }
test -n "$branch" || { echo "Detached HEAD is not supported for review-fix (#2686)"; exit 1; }
# Recovery-sentinel handling (#2839):
# Path is ${phase_dir}/.review-fix-recovery-pending.json. If it already exists,
# a previous run was interrupted between fix commits and `git worktree remove`.
# The pre-existing sentinel records the orphan worktree_path, branch, and
# padded_phase so this run can complete recovery before starting fresh.
1. Parse `padded_phase` from the `<config>` block (needed for the path).
1. Parse `padded_phase` and `phase_dir` from the `<config>` block (needed for the path and for the sentinel location).
2. Resolve the current branch: `branch=$(git branch --show-current)`. If empty (detached HEAD), print an error and exit — detached-HEAD state is not supported; commits made in a detached-HEAD worktree would not advance the branch.
2. Resolve the current branch: `branch=$(git branch --show-current)`. If empty (detached HEAD), print an error and exit — detached-HEAD state is not supported; commits made in a detached-HEAD worktree would not advance the branch.
3. Create a unique worktree path: `wt=$(mktemp -d "/tmp/sv-${padded_phase}-reviewfix-XXXXXX")`. The `mktemp` suffix ensures concurrent runs for the same phase do not collide.
3. **Recovery check (#2839, #2990):** If `${phase_dir}/.review-fix-recovery-pending.json` already exists, a prior run was interrupted. Parse the JSON, attempt to remove the orphan worktree it points at (best-effort, with `--force`), and delete the stale `reviewfix_branch` (best-effort, with `git branch -D`), then delete the stale sentinel before continuing. This makes a re-run of `/gsd-code-review --fix` self-healing.
4. Run `git worktree add "$wt" "$branch"` — this attaches the worktree to the current branch so commits advance it.
4. Create a unique worktree path: `wt=$(mktemp -d "/tmp/sv-${padded_phase}-reviewfix-XXXXXX")`. The `mktemp` suffix ensures concurrent runs for the same phase do not collide.
5. All subsequent file reads, edits, and commits happen inside `$wt`.
5. Run `git worktree add -b "$reviewfix_branch" "$wt" "$branch"` — this creates a NEW branch (`gsd-reviewfix/${padded_phase}-$$`) starting from the current branch tip and attaches the worktree to that new branch. Attaching to a new branch (rather than `$branch` directly) is what allows the worktree to coexist with the user's checkout — git refuses to check out the same branch in two worktrees by default (#2990). Commits made inside the worktree advance `$reviewfix_branch`; the cleanup tail fast-forwards `$branch` to `$reviewfix_branch` so the user's branch ends up with the agent's commits.
6. **Write the recovery sentinel** at `${phase_dir}/.review-fix-recovery-pending.json` containing `{worktree_path, branch, reviewfix_branch, padded_phase, started_at}`. Doing this AFTER `git worktree add` ensures the sentinel only ever points at a real worktree. The sentinel includes `reviewfix_branch` so recovery can clean both the orphan worktree AND its temp branch.
7. All subsequent file reads, edits, and commits happen inside `$wt` (which is on `$reviewfix_branch`, not `$branch`).
**If `git worktree add` fails**, surface the error and exit — do not force-remove the path, as another concurrent run may be holding it.
**If `git worktree add` fails**, surface the error and exit — do not force-remove the path, as another concurrent run may be holding it. Do not write the sentinel (the worktree does not exist). Do not delete `$reviewfix_branch` either; if `-b` failed, no temp branch was created.
**Cleanup tail (transactional, ALWAYS — even on failure):** After writing REVIEW-FIX.md and before returning to the orchestrator, run the cleanup in this exact order:
**Cleanup (ALWAYS — even on failure):** After writing REVIEW-FIX.md and before returning to the orchestrator, run:
```bash
```bash
# Step 1 (#2990): fast-forward $branch to capture the commits the agent
# made on $reviewfix_branch. Run from the main repo (not $wt) — the user's
# checkout owns $branch. --ff-only ensures we never silently drop or
# rewrite history if the user committed to $branch concurrently; on
# divergence, this fails loudly and the temp branch is left for the
# user to inspect/merge manually. We deliberately resolve the main repo
# path via `git worktree list --porcelain` rather than assuming $PWD,
# because the agent ran inside $wt.
# Strip the literal "worktree " prefix and print the rest of the line, then
# exit on the first match. This preserves paths that contain spaces
# (awk '$2' would truncate "/path/with spaces/repo" to "/path/with").
# Step 4: drop the recovery sentinel ONLY after `git worktree remove`
# returns successfully. This atomic-ish ordering is what makes the
# cleanup tail transactional from the orchestrator's perspective.
rm -f "$sentinel"
```
```
This cleanup is unconditional — register it mentally as a finally-block obligation. If the agent exits early (config error, no findings, etc.), still run `git worktree remove "$wt" --force` before exit.
This cleanup is unconditional — register it mentally as a finally-block obligation. If the agent exits early (config error, no findings, etc.), still run the cleanup tail in order (fast-forward → worktree remove → temp branch delete → sentinel rm) before exit. The sentinel must NEVER be removed before `git worktree remove` succeeds. The temp branch must NEVER be deleted while the fast-forward is in a diverged state.
</step>
</step>
<step name="load_context">
<step name="load_context">
@@ -471,7 +584,9 @@ _Iteration: {N}_
<critical_rules>
<critical_rules>
**ALWAYS run inside the isolated worktree** — set up via `branch=$(git branch --show-current)` + `wt=$(mktemp -d "/tmp/sv-${padded_phase}-reviewfix-XXXXXX")` + `git worktree add "$wt" "$branch"` at the very start (see `setup_worktree` step). Using `mktemp` ensures concurrent runs do not collide. Attaching to `$branch` (not `HEAD`) ensures commits advance the branch. Every file read, edit, and commit must happen inside `$wt`. Run `git worktree remove "$wt" --force` unconditionally when done (treat it as a finally block). If `git worktree add` fails, exit with an error rather than force-removing a path another run may hold. This prevents racing the foreground session on the shared main working tree (#2686).
**ALWAYS run inside the isolated worktree** — set up via `branch=$(git branch --show-current)` + `wt=$(mktemp -d "/tmp/sv-${padded_phase}-reviewfix-XXXXXX")` + `git worktree add -b "$reviewfix_branch" "$wt" "$branch"` at the very start (see `setup_worktree` step). Using `mktemp` ensures concurrent runs do not collide. Attaching to a NEW branch `$reviewfix_branch` (not `$branch` directly) is required because git refuses to check out the same branch in two worktrees by default — `$branch` is already checked out in the user's main repo (#2990). Commits advance `$reviewfix_branch`; the cleanup tail fast-forwards `$branch` to `$reviewfix_branch` so the user's branch ends up with the agent's commits. Every file read, edit, and commit must happen inside `$wt`. Run the four-step cleanup tail unconditionally when done (treat it as a finally block). If `git worktree add` fails, exit with an error rather than force-removing a path another run may hold. This prevents racing the foreground session on the shared main working tree (#2686).
**ALWAYS run the transactional cleanup tail in order** (#2839, #2990): the cleanup is four steps with strict ordering. (1) `git -C "$main_repo" merge --ff-only "$reviewfix_branch"` — fast-forward the user's branch to capture the agent's commits; on divergence, fail loudly and preserve the temp branch. (2) `git worktree remove "$wt" --force`. (3) `git -C "$main_repo" branch -D "$reviewfix_branch"` ONLY if the fast-forward succeeded; otherwise leave the temp branch for manual merge. (4) `rm -f "$sentinel"` (the recovery sentinel at `${phase_dir}/.review-fix-recovery-pending.json`). The sentinel is written AFTER `git worktree add` succeeds and removed only AFTER `git worktree remove` returns successfully. The temp branch is deleted only when the fast-forward succeeded. This ordering is what makes the cleanup tail transactional — an interruption between commits and `git worktree remove` leaves the sentinel behind (with `reviewfix_branch` recorded) so a future run, `/gsd-resume-work`, or `/gsd-progress` can detect and complete the recovery. Reversing the order recreates the orphan-worktree bug.
**ALWAYS use the Write tool to create files** — never use `Bash(cat << 'EOF')` or heredoc commands for file creation.
**ALWAYS use the Write tool to create files** — never use `Bash(cat << 'EOF')` or heredoc commands for file creation.
@@ -358,6 +358,30 @@ If RED or GREEN gate commits are missing, add a warning to SUMMARY.md under a `#
<task_commit_protocol>
<task_commit_protocol>
After each task completes (verification passed, done criteria met), commit immediately.
After each task completes (verification passed, done criteria met), commit immediately.
**0. Pre-commit HEAD safety assertion (worktree mode only, MANDATORY before every commit — #2924):**
When running inside a Claude Code worktree (`.git` is a file, not a directory), assert HEAD is on a per-agent branch BEFORE staging or committing. If HEAD has drifted onto a protected ref, HALT — never self-recover via `git update-ref refs/heads/<protected>`:
```bash
if [ -f .git ]; then # worktree
HEAD_REF=$(git symbolic-ref --quiet HEAD || echo "DETACHED")
ACTUAL_BRANCH=$(git rev-parse --abbrev-ref HEAD)
# Deny-list: never commit on a protected ref.
if [ "$HEAD_REF" = "DETACHED" ] || \
echo "$ACTUAL_BRANCH" | grep -Eq '^(main|master|develop|trunk|release/.*)$'; then
echo "FATAL: refusing to commit — worktree HEAD is on '$ACTUAL_BRANCH' (expected per-agent branch)." >&2
echo "DO NOT use 'git update-ref' to rewind the protected branch — surface as blocker (#2924)." >&2
exit 1
fi
# Positive allow-list: HEAD must be on the canonical Claude Code worktree-agent
# branch namespace (`worktree-agent-<id>`). This catches feature/* and any other
# arbitrary branch that the deny-list would silently allow (#2924).
if ! echo "$ACTUAL_BRANCH" | grep -Eq '^worktree-agent-[A-Za-z0-9._/-]+$'; then
echo "FATAL: refusing to commit — worktree HEAD '$ACTUAL_BRANCH' is not in the worktree-agent-* namespace." >&2
echo "Agent commits must live on per-agent branches; surface as blocker (#2924)." >&2
description: Generate AI design contract (AI-SPEC.md) for phases that involve building AI systems — framework selection, implementation guidance from official docs, and evaluation strategy
description: Generate an AI-SPEC.md design contract for phases that involve building AI systems.
description: Auto-fix issues found by code review in REVIEW.md. Spawns fixer agent, commits each fix atomically, produces REVIEW-FIX.md summary.
argument-hint: "<phase-number> [--all] [--auto]"
allowed-tools:
- Read
- Bash
- Glob
- Grep
- Write
- Edit
- Task
---
<objective>
Auto-fix issues found by code review. Reads REVIEW.md from the specified phase, spawns gsd-code-fixer agent to apply fixes, and produces REVIEW-FIX.md summary.
Arguments:
- Phase number (required) — which phase's REVIEW.md to fix (e.g., "2" or "02")
-`--all` (optional) — include Info findings in fix scope (default: Critical + Warning only)
Phase: $ARGUMENTS (first positional argument is phase number)
Optional flags parsed from $ARGUMENTS:
-`--all` — Include Info findings in fix scope. Default behavior fixes Critical + Warning only.
-`--auto` — Enable fix + re-review iteration loop. After applying fixes, re-run code-review at same depth. If new issues found, iterate. Cap at 3 iterations total. Without this flag, single fix pass only.
Context files (CLAUDE.md, REVIEW.md, phase state) are resolved inside the workflow via `gsd-sdk query init.phase-op` and delegated to agent via config blocks.
</context>
<process>
This command is a thin dispatch layer. It parses arguments and delegates to the workflow.
Execute the code-review-fix workflow from @~/.claude/get-shit-done/workflows/code-review-fix.md end-to-end.
The workflow (not this command) enforces these gates:
- Phase validation (before config gate)
- Config gate check (workflow.code_review)
- REVIEW.md existence check (error if missing)
- REVIEW.md status check (skip if clean/skipped)
- Agent spawning (gsd-code-fixer)
- Iteration loop (if --auto, capped at 3 iterations)
- Result presentation (inline summary + next steps)
description: Gather phase context through adaptive questioning before planning. Use --all to skip area selection and discuss all gray areas interactively. Use --auto to skip interactive questions (Claude picks recommended defaults). Use --chain for interactive discuss followed by automatic plan+execute. Use --power for bulk question generation into a file-based UI (answer at your own pace).
description: Gather phase context through adaptive questioning before planning.
description: Route freeform text to the right GSD command automatically
argument-hint: "<description of what you want to do>"
allowed-tools:
- Read
- Bash
- AskUserQuestion
---
<objective>
Analyze freeform natural language input and dispatch to the most appropriate GSD command.
Acts as a smart dispatcher — never does the work itself. Matches intent to the best GSD command using routing rules, confirms the match, then hands off.
Use when you know what you want but don't know which `/gsd-*` command to run.
</objective>
<execution_context>
@~/.claude/get-shit-done/workflows/do.md
@~/.claude/get-shit-done/references/ui-brand.md
</execution_context>
<context>
$ARGUMENTS
</context>
<process>
Execute the do workflow from @~/.claude/get-shit-done/workflows/do.md end-to-end.
Route user intent to the best GSD command and invoke it.
description: Retroactively audit an executed AI phase's evaluation coverage — scores each eval dimension as COVERED/PARTIAL/MISSING and produces an actionable EVAL-REVIEW.md with remediation plan
description: Audit an executed AI phase's evaluation coverage and produce an EVAL-REVIEW.md remediation plan.
description: Import a GSD-2 (.gsd/) project back to GSD v1 (.planning/) format
argument-hint: "[--path <dir>] [--force]"
allowed-tools:
- Read
- Write
- Bash
type: prompt
---
<objective>
Reverse-migrate a GSD-2 project (`.gsd/` directory) back to GSD v1 (`.planning/`) format.
Maps the GSD-2 hierarchy (Milestone → Slice → Task) to the GSD v1 hierarchy (Milestone sections in ROADMAP.md → Phase → Plan), preserving completion state, research files, and summaries.
**CJS-only:**`from-gsd2` is not on the `gsd-sdk query` registry; call `gsd-tools.cjs` as shown below (see `docs/CLI-TOOLS.md`).
</objective>
<process>
1.**Locate the .gsd/ directory** — check the current working directory (or `--path` argument):
description: Diagnose planning directory health and optionally repair issues
description: Diagnose planning directory health and optionally repair issues
argument-hint: [--repair]
argument-hint: "[--repair] [--context]"
allowed-tools:
allowed-tools:
- Read
- Read
- Bash
- Bash
@@ -10,6 +10,14 @@ allowed-tools:
---
---
<objective>
<objective>
Validate `.planning/` directory integrity and report actionable issues. Checks for missing files, invalid configurations, inconsistent state, and orphaned plans.
Validate `.planning/` directory integrity and report actionable issues. Checks for missing files, invalid configurations, inconsistent state, and orphaned plans.
`--context` runs an orthogonal check: the running session's context utilization. The workflow asks for the model's tokensUsed + contextWindow, calls `gsd-sdk query validate.context`, and renders one of three states:
description: Scan a repo for mixed ADRs, PRDs, SPECs, and DOCs and bootstrap or merge the full .planning/ setup from them. Classifies each doc in parallel, synthesizes a consolidated context with a conflicts report, and routes to new-project or merge-milestone depending on whether .planning/ already exists.
description: Bootstrap or merge a .planning/ setup from existing ADRs, PRDs, SPECs, and docs in a repo.
**STOP -- DO NOT READ THIS FILE. You are already reading it. This prompt was injected into your context by Claude Code's command system. Using the Read tool on this file wastes tokens. Begin executing Step 0 immediately.**
## Step 0 -- Banner
**Before ANY tool calls**, display this banner:
```
GSD > INTEL
```
Then proceed to Step 1.
## Step 1 -- Config Gate
Check if intel is enabled by reading `.planning/config.json` directly using the Read tool.
**DO NOT use the gsd-tools config get-value command** -- it hard-exits on missing keys.
1. Read `.planning/config.json` using the Read tool
2. If the file does not exist: display the disabled message below and **STOP**
3. Parse the JSON content. Check if `config.intel && config.intel.enabled === true`
4. If `intel.enabled` is NOT explicitly `true`: display the disabled message below and **STOP**
5. If `intel.enabled` is `true`: proceed to Step 2
**Disabled message:**
```
GSD > INTEL
Intel system is disabled. To activate:
gsd-sdk query config-set intel.enabled true
Then run /gsd-intel refresh to build the initial index.
```
---
## Step 2 -- Parse Argument
Parse `$ARGUMENTS` to determine the operation mode:
3. Each file must have a _meta object with updated_at timestamp
4. Use `gsd-sdk query intel.extract-exports <file>` to analyze source files
5. Use `gsd-sdk query intel.patch-meta <file>` to update timestamps after writing
6. Use `gsd-sdk query intel.validate` to check your output
When complete, output: ## INTEL UPDATE COMPLETE
If something fails, output: ## INTEL UPDATE FAILED with details."
)
```
Wait for the agent to complete.
---
## Step 4 -- Post-Refresh Summary
After the agent completes, run:
```bash
gsd-sdk query intel.status
```
Display a summary showing:
- Which intel files were written or updated
- Last update timestamps
- Overall health of the intel index
---
## Anti-Patterns
1. DO NOT spawn an agent for query/status/diff operations -- these are inline CLI calls
2. DO NOT modify intel files directly -- the agent handles writes during refresh
3. DO NOT skip the config gate check
4. DO NOT use the gsd-tools config get-value CLI for the config gate -- it exits on missing keys
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.