Commit Graph

2508 Commits

Author SHA1 Message Date
Tom Boucher
a33cbe72f5 fix(worktree): bound git subprocesses with timeout + surface degraded health (#3281) (#3283)
* test: red — bounded git subprocess + structured worktree warnings (#3281)

Regression tests for #3281: worktree-related git subprocess calls have no
timeout bound, and timeout/error outcomes are not surfaced as structured signals.

Failing assertions:
- planWorktreePrune / listLinkedWorktreePaths / snapshotWorktreeInventory must
  return reason=git_timed_out (not generic git_list_failed) when execGit returns
  timedOut:true — enables callers to distinguish timeout from auth failure
- executeWorktreePrunePlan must include timedOut:true in result when the git
  prune call itself times out

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(worktree): bounded git subprocess + structured warning surfacing (#3281)

Root cause (PRED.k014): execGit / execGitDefault called spawnSync with no
timeout, so `git worktree list --porcelain` against a hung/locked repo
blocked the parent process indefinitely.  Downstream callers in core.cjs
and verify.cjs then swallowed any resulting failure silently via
catch { /* intentionally empty */ } (PRED.k302).

Fix:
- worktree-safety.cjs: execGitDefault now passes timeout:10000 to spawnSync.
  Detects SIGTERM+ETIMEDOUT and returns { timedOut:true } in the result shape.
  readWorktreeList maps timedOut:true -> reason:'git_timed_out' (distinct from
  generic git_list_failed) so callers can emit a structured warning.
  executeWorktreePrunePlan propagates timedOut:true as a first-class result field.
- core.cjs: execGit receives the same timeout+timedOut treatment (PRED.k014
  uniform-fix discipline).  pruneOrphanedWorktrees now emits a [gsd-tools]
  WARNING to stderr when the git prune call times out instead of silent-catch.
- verify.cjs: Check 11 branches on worktreeHealth.ok to surface W018 warning
  when the worktree list times out, instead of silent-catch on ok:false.

Backward-compatible: exitCode/stdout/stderr continue to work for all existing
callers; timedOut and error are additive new fields.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* changeset: pr=3283 for #3281

* fix(verify): rename W020 for worktree-timeout warning to avoid W018 collision

W018 is already used for milestone archive drift (Check 12). The new
worktree-health-degraded timeout warning was assigned W018, causing
warning-code ambiguity in triage. Rename to W020 (next available code).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 01:53:50 -04:00
Tom Boucher
3ce6a12f30 docs: add docs/RELEASE-v1.42.0-rc.1.md (new features only) (#3280)
Companion docs page for the v1.42.0-rc.1 release tag, scoped to the new
features in 1.42.0:

- Security: package legitimacy gate against slopsquatting (#3215) — three
  layers across researcher, planner, executor; plus npx --yes hardening
  and graceful degradation when slopcheck is unavailable
- Architecture: SDK package seam deepened; runtime-global skills policy
  converged into a single Module (#3238)
- Architecture: phase lifecycle seams deepened — extracts Phase Numbering
  Policy, Phase Filesystem Adapter, and Phase Roadmap Mutation modules
  from phase-lifecycle.ts (#3267)

Fix list is intentionally omitted — those fixes are rolled up from
v1.41.1 and listed on the v1.41.1 release page; this doc links out to
both v1.41.1 and v1.41.0 instead of restating them.

Format follows the established docs/RELEASE-v*.md pattern (compact
one-paragraph intro, categorized sections, install footer, link-out to
prior train).

Closes #3279
2026-05-09 01:10:31 -04:00
Tom Boucher
6180c01a57 docs(CONTEXT.md): codify release-notes formatting standard for AI agents (#3278)
Adds a RELEASE-NOTES.* namespace under the AI Ops Memory section so future
agents editing GitHub release notes have a machine-readable contract instead
of re-deriving the format from prior releases.

Mirrors the existing dot-namespaced backticked key=value pattern (WORKTREE.SEAM.*,
PLANNING.PATH.*). Covers:

- Scope and gates per release type (hotfix / rc / minor)
- Keep-a-Changelog 1.1.0 taxonomy, heading levels, bullet shape, subgroup canon
- Footers per dist-tag stream (@latest / @next / @canary)
- Sources & precedence (changeset > commit body > PR body > commit subject)
- Workflow commands (gh release edit --notes-file)
- Anti-patterns (raw "What's Changed" list, implementation-first bullets,
  risk commentary)
- Examples: v1.41.1 hotfix, v1.42.0-rc1 RC, v1.41.0 minor auto-acceptable
- Reproducible hotfix and RC templates

Closes #3277
2026-05-09 01:08:14 -04:00
Tom Boucher
8d5f509edf fix(3266): preserve wave 0 and bucket plans by depends_on DAG in phase-plan-index (#3276)
* fix(3266): preserve wave 0 and bucket plans by depends_on DAG in phase-plan-index

Fixes two cooperating bugs in the phase-plan-index builder:

1. Wave 0 collapse: `parseInt(...) || 1` coerced parsed value `0` to `1` due to
   JS falsy default. Fixed with `Number.isNaN` guard.
2. depends_on ignored: wave-bucketing used only the `wave:` frontmatter field.
   Now replaced with Kahn's topological-level algorithm over `depends_on`:
   source nodes (no in-phase deps) → lowest level; each plan's level = max(deps'
   levels) + 1.  Declared `wave:` that disagrees with computed level emits a
   non-fatal warning on the result.  Cycle detection throws GSDError.

`PlanInfo` gains `depends_on: string[]`. `PhasePlanIndex` gains `warnings?: string[]`.
Both TS (`sdk/src/query/phase.ts`) and CJS twin (`get-shit-done/bin/lib/phase.cjs`)
fixed identically.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore: add changeset for #3276

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(phase): resolve depends_on against canonical plan id (#3276 CR)

Build a secondary `canonicalToId` index alongside `planMap` so that a
dependency declared as '03-01' resolves to a descriptive plan stored
under '03-01-auth-hardening', preventing silent wave-ordering failures.
Applied at both DAG construction sites in phase.cjs and the SDK's
phase.ts (k014 parity). Regression test added.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
v1.42.0-rc1
2026-05-09 00:25:05 -04:00
Tom Boucher
8bc255c266 fix(workstream): normalize migration workstream names (#3269)
* fix(workstream): normalize migrate-name to valid slug

* docs(context): record workstream migrate-name slug invariant

* fix(catalog-cjs): balanced fallback for unknown profile (CR finding A)

profiles[profile] could return undefined for any profile key absent from
the catalog entry, causing downstream callers like formatAgentToModelMapAsTable
to crash on .length. Add ?? profiles.balanced fallback to match the SDK adapter.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(sdk): anchor path resolution on import.meta.url not cwd (CR finding B)

resolve(process.cwd(), '..') breaks when Vitest is invoked from the repo root
because cwd is already the repo root and '..' goes one level above. Replace
with a file-relative path using fileURLToPath(new URL('../../../', import.meta.url))
anchored at the test file's location (sdk/src/query/).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test: derive Group B runtime list from catalog (CR finding C)

Hardcoded ['kilo', 'cline', ...] throws TypeError if a runtime name is
removed from the catalog. Derive group B dynamically via
Object.keys(catalog.runtimeTierDefaults).filter(r => !r.opus) so the
test never goes stale and auto-covers future Group B additions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(workflow): add hermes to Step B runtime options (CR finding D)

hermes appears in the Group A built-in defaults table but was missing from
the AskUserQuestion options in Step B, forcing users to manually type it via
'Other (Group B or custom)'. Add explicit hermes entry for UI consistency.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(config): refresh dynamic_routing tier table; fix stale L671 (findings E+F)

Finding E: tier table was missing 6 heavy-tier agents and 15 standard/light
agents added by this PR. Updated all three rows to match catalog routingTier
assignments (33 agents total).

Finding F: removed stale '18 of 31' claim and agent enumeration; replaced
with accurate note that all 33 agents have explicit catalog entries. Updated
authoritative source pointers to model-catalog.cjs / model-catalog.ts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(core): add profile-fallback unit tests for quality and budget (CR nitpick G)

The PR introduced quality→opus and budget→haiku unknown-agent fallbacks but
only balanced→sonnet and inherit→inherit were tested. Add two tests covering
the remaining two branches to complete coverage.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* adr: define planning workspace and worktree seam

* refactor(worktree): extract worktree safety policy module

* refactor(workstream): extract active workstream pointer store seam

* test(worktree): cover policy branch paths and persist seam guardrails

* refactor(worktree): centralize health inventory seam for W017

* fix(workspace): align SDK project path policy with CJS planningDir

* refactor(query): unify SDK planning path projection seam

* refactor(init): route workspace projection through planningPaths seam

* docs(adr): add SDK architecture and planning path ADRs

* refactor(worktree): deepen name, pointer, inventory, and config seams

* docs(config): harmonize claude-opus-4-6 to 4-7 in resolve_model_ids example (CR finding 2)

* fix(sdk): return undefined for model_profile='inherit' sentinel (CR finding 3)

* docs(adr): renumber conflicting 0003-sdk-package-seam-module to 0007, update seam-map reference (CR finding 4)

* fix(workstream): align CJS and SDK name validation to accept dots, guard path traversal via includes('..') (CR finding 5)

* fix(sdk): guard writeActiveWorkstream against non-existent workstream directory, k014/k031 parity (CR finding 6)

* chore(changeset): add #3269 changeset (CR finding 1 — proper changeset for this PR)

* docs(inventory): register 3 new CLI modules in INVENTORY.md/MANIFEST (active-workstream-store, workstream-name-policy, worktree-safety)

* fix(sdk): use relPlanningPath(workstream) in planningPaths, fix setActiveWorkstream/getActiveWorkstream name errors in workstream.ts

* fix(sdk): validate GSD_WORKSTREAM in planningPaths before use (#3269 regression)

planningPaths() called resolveWorkspaceContext() which returned GSD_WORKSTREAM
raw (no validation). An invalid value like '../evil' was used as effectiveWorkstream,
constructing a bad path; roadmapAnalyze() caught the ENOENT and returned a
no-phase_count error object instead of the root ROADMAP result.

Fix: validate envCtx.workstream with validateWorkstreamName() in planningPaths()
before accepting it as effectiveWorkstream. Invalid env → null → root .planning/
fallback, preserving the bug-2791 contract: invalid GSD_WORKSTREAM is silently
ignored and falls back to the root context (phase_count: 0 for empty root ROADMAP).

The bug-2791 regression test now passes. No other call sites read GSD_WORKSTREAM
without validation: query-runtime-context.ts already validates; cli.ts already
validates; context-engine.ts takes a caller-validated workstream parameter.

Closes #3268 (regression introduced by #3269 workstream-name-policy work).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 00:15:04 -04:00
Tom Boucher
65abc4fc90 refactor(query): deepen phase lifecycle seams (#3267)
* refactor(query): extract phase lifecycle policy module

* refactor(query): extract phase fs and roadmap mutation adapters

* fix(sdk): propagate non-ENOENT readdir errors in phase-filesystem-adapter (CR finding 1)

Swallow only ENOENT in listDirectories; rethrow EACCES, EIO, and other
unexpected errors so callers surface real failures rather than silently
treating a permission-denied phases dir as empty.

Also adds regression test: EACCES from readdir now propagates as thrown
error instead of returning [].

* fix(sdk): propagate non-ENOENT readFile errors in phase-roadmap-mutation (CR finding 4)

readModifyWriteRoadmapMd now falls back to empty content only on ENOENT;
EACCES, EIO, and other errors are rethrown so a subsequent write cannot
clobber real roadmap content that is temporarily unreadable.

Regression tests: EACCES propagates; absent ROADMAP.md still starts empty.

* fix(sdk): omit Depends on: Phase 0 for first sequential phase; align prefix grammar (CR findings 2+3)

Finding 2: buildPhaseRoadmapEntry now omits the "Depends on" line when
phaseId == 1 (prevPhase would be 0, which is not a valid predecessor).
The guard is `prevPhase < 1` so future phase-0 configs are also safe.

Finding 3: collectDecimalSuffixesFromDirNames regex prefix pattern
updated from `[A-Z]{1,6}` to `[A-Z][A-Z0-9]*` (case-insensitive flag
added), matching the grammar used by scanSequentialMaxPhaseFromDirs.
Prevents k014 parity drift for alphanumeric project-code prefixes longer
than six characters or containing digits.

Regression tests for both fixes included.
2026-05-09 00:14:59 -04:00
Tom Boucher
d8a93ad12d fix(3264): document cross-wave-deviation cleanup tail in execute-phase step 5.5 (#3273)
* fix(3264): document cross-wave-deviation cleanup tail in execute-phase step 5.5

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore(changeset): add fragment for #3273

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 00:14:54 -04:00
Tom Boucher
ac51864621 fix(3263): harden code-review SUMMARY parser; accept BL-/blocker as Critical-tier across pipeline (#3274)
* fix(3263): harden code-review SUMMARY parser; accept BL-/blocker as Critical-tier across pipeline

Bug 1: compute_file_scope Node script used ^\s*\w+: boundary regex, which excluded
hyphens and left inSection sticky after key-decisions:/patterns-established:/
requirements-completed: blocks. Prose bullets were captured as file paths. Fixed
to [\w-]+ boundary and added em-dash/parenthetical stripping with a path validity
guard so only path-shaped strings are emitted.

Bug 2: present_results grep matched only critical: in frontmatter. When reviewer
emitted blocker:, CRITICAL was silently empty. Fixed grep to accept both keys via
-E "^\s*(critical |blocker):". Top-issues preview also missed BL-* headings; fixed
to include ### BL-\ in the grep pattern.

Bug 3: gsd-code-fixer finding_parser documented CR-\d+ only. BL-* findings from
a drifted reviewer were silently dropped from critical_warning scope. Updated ID
alphabet, severity description, filter sets, and sort order to treat BL-* as
Critical-tier-equivalent to CR-*.

Reviewer contract: gsd-code-reviewer write_review step now declares blocker:/BL-
as accepted tier-equivalent alternatives to critical:/CR-, so the contract
acknowledges the reality the workflow defenses accept.

Regression tests: tests/code-review-pipeline-regression.test.cjs (18 tests)
covers all three bugs behaviourally (pure-function parsers) plus docs-parity
assertions on the workflow and agent .md files.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* changeset: add fragment for PR 3274 (fix(3263) code-review parser)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(workflow): use POSIX [[:space:]] instead of \s in grep -E (CR finding 1)

BSD grep on macOS does not support \s in ERE; replace with the POSIX
[[:space:]] character class so the critical/blocker grep works on both
GNU and BSD grep. Also update the corresponding docs-parity test assertion.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test: tighten em-dash and grep docs-parity assertions (CR finding 2)

- Replace `includes('split(/\\s+')` with `includes('split(/\\s+—\\s')`
  so the assertion actually enforces the em-dash narrative strip and
  cannot be satisfied by a bare whitespace split.
- Update the present_results grep assertion to expect [[:space:]] after
  the workflow portability fix.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 23:53:32 -04:00
Tom Boucher
288b3b4170 fix(3259): non-mutating --help guard for native query handlers (#3272)
* fix(3259): non-mutating --help guard for native query handlers; reject --help as milestone version

Adds a dispatcher-level guard in query-dispatch.ts that short-circuits
to a non-mutating help stub whenever --help/-h appears in args destined
for a native mutating handler (fail-closed by default). Adds defense-
in-depth in milestoneComplete to reject --help/-h as a version value
before any disk write. Regression tests cover: per-handler --help guard,
registry-driven invariant across all mutating commands, handler-level
GSDError for both flags, and preservation of the #3019 CJS fallback
contract.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore: add changeset fragment for #3272

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 23:53:27 -04:00
Tom Boucher
ecd57e622c fix(3265): prefer YAML frontmatter for state-snapshot canonical fields (#3275)
* fix(3265): prefer YAML frontmatter for state-snapshot canonical fields

stateSnapshot in both sdk/src/query/state.ts and the CJS twin
(get-shit-done/bin/lib/state.cjs cmdStateSnapshot) passed the whole
STATE.md blob to stateExtractField, whose bold pattern (**Field:**)
has no line anchor.  A body table cell such as
"**Status:** to  COMPLETE" therefore silenced the correct YAML
frontmatter value.

Fix: extractFrontmatter(content) first; stripFrontmatter(content) for
the body passed to stateExtractField; for each canonical scalar field
prefer the non-empty frontmatter value, falling back to body extraction
when the key is absent or the file has no frontmatter block at all.

Regression tests added in sdk/src/query/state.test.ts (vitest) and
tests/state.test.cjs (node:test) covering:
- frontmatter status beats **Status:** inside a table cell
- frontmatter current_plan beats bold body value
- no-frontmatter files continue to extract from body
- field absent from frontmatter falls through to body extractor

Fixes #3265

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore: add changeset for #3275

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test: reproduce fmStr drops non-string YAML scalars (#3275 CR finding)

Add tests/bug-3275-fmstr-non-string-scalars.test.cjs with 5 cases covering
CJS state-snapshot with numeric frontmatter scalars (current_phase: 19,
total_phases: 7, total_plans_in_phase: 5), string regression, and
no-frontmatter body fallback regression.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(state): fmStr accepts numeric/boolean YAML scalars (CR finding)

Rename `fmStr` to `fmScalar` in both state.cjs and sdk/src/query/state.ts
and broaden the type guard so that non-null number/boolean frontmatter values
are coerced to String(v) instead of being discarded.

The previous `typeof v === 'string'` check was a latent bug: if the YAML
parser ever returns typed scalars (e.g. `current_phase: 19` as the number 19),
the frontmatter value would be silently dropped and the stale body value used
instead.  Both files are updated identically (k014 parity).

Also adds three SDK vitest regression cases (numeric current_phase,
total_phases, total_plans_in_phase) in sdk/src/query/state.test.ts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 23:53:21 -04:00
Tom Boucher
96806003c5 fix(#3229): shared model catalog source of truth for agent profiles + runtime tier defaults (#3230)
* docs(adr): add ADR-0003 model catalog module

* fix(#3229): add shared model catalog as source of truth for agent profiles and runtime tier defaults

Research / design (ADR-0003):
- Existing drift came from 4 independent model truths:
  1. CJS model-profiles.cjs
  2. SDK config-query.ts stale copy (18 agents)
  3. settings-advanced.md runtime tier table
  4. session-runner Claude-only profile map
- New design: one machine-readable Model Catalog Module in sdk/shared/
  that both packages ship and consume.

Implementation:
- sdk/shared/model-catalog.json — canonical source of truth for:
  - full 33-agent registry
  - per-agent golden (quality) alias + balanced/budget aliases
  - adaptive derivation from routingTier
  - agent→phaseType map
  - agent→dynamic-routing default tier map
  - runtime tier defaults for all supported runtimes
- get-shit-done/bin/lib/model-catalog.cjs — CJS adapter over the catalog
- sdk/src/model-catalog.ts — SDK adapter over the same catalog
- CJS model-profiles.cjs now re-exports derived data from model-catalog.cjs
- SDK config-query.ts now re-exports MODEL_PROFILES/VALID_PROFILES from
  model-catalog.ts instead of maintaining its own list
- sdk/src/query/helpers.ts runtime list now comes from the catalog (fixes hermes drift)
- sdk/src/session-runner.ts Claude profile→model-id mapping now resolves via catalog
- docs/CONFIGURATION.md + settings-advanced.md runtime tables updated to match catalog

Behavior changes:
- resolve-model now covers every shipped agent file on disk (33 agents)
- unknown-agent fallback is profile-semantic, not hardcoded sonnet:
  quality→opus, budget→haiku, balanced/adaptive→sonnet, inherit→inherit
- Group B runtimes remain known runtimes but do not get built-in tier defaults

Tests (RED→GREEN):
- root tests: shipped agent files must equal MODEL_PROFILES keys
- sdk tests: shipped agent files must equal MODEL_PROFILES keys
- direct fix assertion: gsd-code-reviewer resolves to opus under quality with no unknown_agent
- runtime defaults parity test: settings-advanced.md + CONFIGURATION.md tables must match catalog
- helper tests: hermes included in SUPPORTED_RUNTIMES and getRuntimeConfigDir()

Closes #3229

* chore(changeset): update #3229 changeset pr field to 3230

* fix(ci): update inherit fallback expectations and inventory parity for model catalog
2026-05-08 21:25:37 -04:00
Tom Boucher
deeb6deb67 fix(install): accept Codex TOML floats; idempotent rollback (#3245) (#3254)
* test: reproduce extractFrontmatter LAST-block bug (#3240)

* test: reproduce state.update progress trampling and percent formula (#3242)

Two failing regression tests:
- Bug A: state.update "Last Activity" tramples curated progress.* frontmatter via readModifyWriteStateMd → syncStateFrontmatter
- Bug B: 12 declared ROADMAP phases / 6 realized / 6/6 plans done → percent: 100 instead of 50 (phase-fraction ignored)

* test: reproduce TOML float rejection and partial rollback (#3245)

Two failing regression tests:
1. parseTomlToObject rejects valid Codex TOML floats (tool_timeout_sec = 20.0)
2. Post-install validation failure leaves skills/, agents/, VERSION on disk
   despite restoring config.toml — hybrid state after abort

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(install): accept TOML floats; idempotent codex rollback (#3245)

Two fixes for the Codex install failure introduced by #2760 CR4 finding 3:

1. parseTomlValue now accepts TOML 1.0 float literals (decimals,
   exponents, underscore separators, signed). Codex CLI's serde schema
   requires f64 for tool_timeout_sec / startup_timeout_sec — the prior
   strict-integer-only check was the inverse of what Codex requires,
   causing every config with a float to trigger a fatal schema validation
   failure. Date/time separators (-/:T/Z) are still rejected.

2. restoreCodexSnapshot is extended into a unified idempotent rollback
   that reverts ALL Codex-specific mutations on failure:
   - config.toml (existing behavior)
   - skills/gsd-* directories (new)
   - agents/gsd-*.{md,toml} files (new)
   - get-shit-done/VERSION (new)
   - orphaned atomic-write temp files (new)
   Pre-install state is captured before the first Codex write so the
   rollback reflects the true pre-GSD state. Non-gsd-* user content is
   untouched. The rollback is safe to call multiple times and before any
   snapshots are captured.

Fixes #3245

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* changeset: pr=3254 for #3245

* test: fix source-grep lint violation in bug-3242 test (#3242)

Replace content.includes() check with line-by-line parse of STATE.md body.
The lint enforces structural assertions over raw text matching.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test: mark #3242 RED tests as todo pending fix (#3242)

The three failing tests are intentional regression tests for bugs in
state.cjs that will be fixed in a separate PR. Mark them { todo: true }
so they don't block CI on this branch.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(install): tighten TOML underscore placement validation (CR finding 1)

The float regex used [\d_]* which accepts invalid forms like 1__0, 1_.0,
and 1._0. TOML 1.0 §2 requires underscores only between digits. Switch
both the integer pre-check and the full float pattern to (?:_?\d)* so
consecutive underscores, leading underscores on a segment, and trailing
underscores on a segment are all rejected before replace(/_/g,'') can
silently normalize them into valid JS numbers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(install): restore pre-existing gsd-* content on rollback (CR finding 2)

The snapshot only recorded names of pre-existing skills/gsd-* dirs and
agents/gsd-* files. On a failed reinstall the rollback could delete
newly-created dirs but could not restore the bytes of dirs/files that
were overwritten, leaving the user in a hybrid state (old config.toml,
new skill files).

Now snapshot the full file tree of every pre-existing gsd-* skill dir
into codexPreInstallSkillContents (Map<name, Map<relPath, Buffer>>) and
every pre-existing agent file into codexPreInstallAgentContents
(Map<filename, Buffer>). restoreCodexSnapshot() uses these maps to
wipe-and-restore overwritten entries and only removes entries that had
no pre-install state, giving a true atomic rollback guarantee.
Reads are best-effort so a partial snapshot is still better than none.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(install): scope temp-file cleanup to installer-owned writes (CR finding 3)

_cleanTmpFiles() was deleting any *.tmp-<pid>-<n> file found under
targetDir. This is too broad: other tools in the user's Codex/home
directory may create temp files matching the same suffix pattern, and a
GSD install rollback would silently delete them.

Add __atomicWrittenTmps (a module-level Set<string>) populated by
atomicWriteFileSync for every temp path it creates. _cleanTmpFiles()
now checks __atomicWrittenTmps.has(full) before unlinking, so only temp
files this installer process actually wrote are eligible for cleanup.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(test): remove no-op doesNotThrow wrapping try/catch (CR finding 4)

assert.doesNotThrow(() => { try { f(); } catch(_){} }) always passes
because the catch block swallows every exception before the outer
assertion can see it. This meant the rollback-idempotency guarantee was
never actually verified.

Replace with an explicit threw flag around runCodexInstall, assert that
the install did throw (validation failure is expected), and add a
post-rollback state assertion that skills/ was not created. This gives
a loud failure surface if runCodexInstall starts crashing from inside
the rollback path, matching the intent described in the test comment.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(test): correct describe title for float-acceptance tests (CR nitpick 1)

The describe block title said 'rejects malformed input that previously
slipped through', but the test inside now asserts that TOML floats are
accepted (the #3245 inversion). This misled readers expecting every
sub-test to assert rejection. Update the title to reflect the mixed
behaviour: floats are accepted; dates and trailing-garbage are rejected.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(test): rename test to match what the assertion actually checks (CR nitpick 2)

The test name 'post-install config retains float literal form (20.0 not
truncated to 20)' promised a string-form invariant, but the assertion
uses numeric equality (assert.strictEqual(parsed.tool_timeout_sec, 20))
which cannot distinguish 20 from 20.0 in JS. Rename to 'post-install
config round-trips tool_timeout_sec as numeric 20' so the description
matches what the test actually verifies.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(test): replace raw text scan with state json assertion (CR nitpick 3)

The 'Last Activity updates the body field' test was reading STATE.md as
raw text, splitting on newlines, and using lines.find/startsWith to
locate the 'Last Activity:' line — the exact pattern-match-on-source
approach prohibited by the no-source-grep testing standard.

Replace with runGsdTools('state json', tmpDir) which surfaces the body-
extracted Last Activity value as fm.last_activity in its JSON output,
and assert against that structured field instead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(test): correct post-rollback state assertion for early-failure case

The previous assertion checked that skills/ didn't exist, but the
installer writes skills/ before the schema validator fires. Rollback
removes gsd-* dirs inside skills/, not skills/ itself. Update the
assertion to verify that no gsd-* skill dirs survive rollback, which
is the actual invariant the test name describes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* changeset: document full rollback scope (CR finding 1)

Adds config.toml restoration and orphaned atomic-write temp-file
cleanup to the changeset description — the previous text only listed
skills/, agents/, and VERSION.

* fix(install): wrap post-snapshot scope in rollback handler (CR finding 2)

Any throw between the pre-install snapshot capture and the Codex config
block (skills copy, agents copy, VERSION write, manifest write, leaked-
path scan, etc.) now triggers _codexPreConfigRollback() so the caller
is never left in a partially-installed state.  Previously only the later
config.toml mutation paths had rollback wired in.

Introduces _codexPreConfigRollback (defined right after snapshot capture)
and wraps the intervening operations in a try/catch that invokes it on
error for Codex installs; non-Codex paths are unaffected.

* test: assert threw=true to prevent vacuous pass (CR finding 4)

Two tests used bare try/catch without asserting threw === true, so they
would silently pass even if runCodexInstall never threw (k060 pattern).
Each bare catch block is replaced with a threw flag and a
strictEqual(threw, true, ...) assertion.

CR findings 2+3 are both addressed in the preceding install commit:
finding 3 (restore from snapshot manifest, not current FS state) lands
alongside the rollback-wrapper change as part of the restoreCodexSnapshot
refactor.

* fix(install): reject leading zeros in TOML float integer part per TOML 1.0 (CR finding round 4)

TOML 1.0 §2 disallows leading zeros in the integer part of numeric
literals — `01`, `00`, `01.5`, `00e2`, `+01.0`, `-01.0` are all invalid.
The pre-check and float regexes in parseTomlValue used `\d(?:_?\d)*` which
accepted any digit as the leading digit.

Both regexes are tightened to `(0|[1-9](?:_?\d)*)` for the integer part:
- `0` alone is valid
- a non-zero leading digit followed by optional underscored digits is valid
- `01`, `00`, and any variant with a leading zero and further digits is rejected

The "still rejects bare time (07:32:00)" test assertion is broadened from
`/unsupported TOML value/` to `/unsupported TOML value|trailing bytes/`
because the parser now stops at `0` and the remainder `7:32:00` is rejected
as trailing bytes — the invariant (time literals are not accepted) is unchanged.

25 new regression tests cover all rejection cases and valid TOML forms.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 10:25:59 -04:00
Tom Boucher
c4d3fe62a5 fix(install): require persistent SDK reachability before reporting ready (#3231) (#3249)
* test: reproduce false GSD SDK ready signals on Linux (#3231)

* fix(install): require persistent SDK reachability before reporting ready (#3231)

* changeset: pr=3249 for #3231

* fix(install): filter _npx from login-shell PATH probe (CR finding 1)

Apply filterNpxFromPath() to the getUserShellPath() result before passing
it to isGsdSdkOnPath(), mirroring the same filtering already applied to
process.env.PATH. Without this, a transient _npx entry in the login-shell
PATH can falsely satisfy the cross-shell reachability check and reintroduce
the false-ready condition this PR fixes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(test): unconditional legacy-shim replacement assertion (CR finding 2)

Replace readFileSync+includes source-grep check with isLegacyGsdSdkShim()
and add an else branch asserting that when sdkReady is false, a warning/error
was emitted. Previously the sdkReady===false path had no assertion at all,
allowing the test to pass without verifying any postcondition.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test: replace text-grep assertions with structured ones (CR finding 2 + nitpick)

Finding 2: restructure the legacy-shim replacement assertion to branch on
isLegacyGsdSdkShim() state (a behavioral fact) rather than console output,
and add an unconditional postcondition for both branches.

Nitpick 3 (4 locations):
- lines 149-153: replace /GSD SDK ready/.test(combined) with
  isGsdSdkOnPath(filterNpxFromPath(PATH)) === false
- lines 167-169, 185-189: split filterNpxFromPath result into segments array
  and use array.includes() instead of string.includes() on the raw PATH string
- lines 375-377: replace /GSD SDK ready/.test(combined) with
  fs.existsSync(shimPath) + isGsdSdkOnPath(filterNpxFromPath(localBin))

All 8 tests pass. lint-no-source-grep: 0 violations.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(build-hooks): per-PID staging dir eliminates concurrent-cleanup TOCTOU race

When multiple test before() hooks spawned build-hooks.js concurrently
(--test-concurrency=4), a race existed: Process A would finish all copies,
call rmdirSync('.dist-staging/') in cleanup, then Process B — still in its
copy loop — would call copyFileSync(src, '.dist-staging/hook.pid.ts') and
get ENOENT because the staging directory was gone.

On macOS/Linux, copyFileSync reports the SOURCE path in ENOENT errors when
the destination directory is missing, making the failure appear to be a
missing source file (hooks/gsd-statusline.js) rather than a missing
destination directory. This misled the diagnosis.

Fix: make STAGE_DIR per-PID ('.dist-staging-<pid>/') so each builder owns
its own staging directory. No other process touches it, eliminating all
contention on staging-dir creation and cleanup. Update .gitignore to match
the new 'hooks/.dist-staging-*/' glob.

Reproduces as: CI test matrix (macos-24, ubuntu-22, ubuntu-24) all failing
with ENOENT on hooks/gsd-statusline.js in bug-2136 before() hook. The new
test file added in this PR (bug-3231) shifts the concurrency schedule just
enough to expose the race on every CI run.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test: assert on captured console output, not tautological PATH state (CR finding)

The two discarded `captureConsole()` return values in the bug-3231 test
were flagged by CodeRabbit as tautological assertions. Fix:

- Test 1 (transient _npx PATH): capture stdout/stderr and assert the
  installer does NOT emit "GSD SDK ready" (the false-positive the PR
  fixes), and that it does emit some diagnostic output instead.

- Test 3 (clean install): capture stdout/stderr and assert the installer
  DOES emit "GSD SDK ready" after successfully self-linking into a
  persistent PATH dir — confirming the positive path works correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 09:39:33 -04:00
Tom Boucher
75cc4fe660 fix(state): count nested plans/ files in buildStateFrontmatter (#3257) (#3261)
* test: reproduce nested plans/ undercount in buildStateFrontmatter (#3257)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(state): count nested plans/<N>-PLAN-<NN>-<slug>.md in buildStateFrontmatter (#3257)

`buildStateFrontmatter` did a flat `readdirSync` on each phase directory and
missed plan files inside the nested `plans/` subdirectory written by
gsd-plan-phase (post-#3139 / #3115). Every state mutation flowing through
`syncStateFrontmatter` overwrote the curated `progress.*` frontmatter block
with the under-counted disk scan.

The fix adds a `plans/` descent using the same regex shapes as
`roadmap.cjs:countPhasePlansAndSummaries` and `phase.cjs:looksLikePlanFile`
(#2893/#3128). Both the `{N}-PLAN-{NN}-{slug}.md` (agent-emitted) and
`PLAN-{NN}-{slug}.md` (bare-prefix) forms are now matched. Outline files
(`-PLAN-OUTLINE.md`) and pre-bounce files are excluded. Flat-layout repos
are unaffected.

Note: the same algorithm now lives in 4 places (state.cjs, roadmap.cjs,
init.cjs, phase.cjs). Shared-helper extraction per CONTEXT.md k014 is
tracked in the follow-on issue filed with this PR.

Sibling fix to #3115 / #3139 / #3191 — state.cjs was missed in the
post-#3139 migration that updated the other three files.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* changeset: pr=3261 for #3257

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(changelog): add entry for #3257 nested plans/ fix (#3261)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(state): broaden PLAN_PRE_BOUNCE_RE to match bare PLAN- prefix (CR)

PLAN_PRE_BOUNCE_RE was /-PLAN.*\.pre-bounce\.md$/i, which missed bare-prefix
files like PLAN-01-foo.pre-bounce.md in the nested plans/ scan — those would
incorrectly count as real plans. Broadened to /\.pre-bounce\.md$/i to exclude
any .pre-bounce.md file regardless of prefix shape.

Adds regression test for this exclusion.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(state): extend nested plans scan to cmdStateValidate and cmdStateSync (CR finding)

`buildStateFrontmatter` already received the nested-aware scan in this PR, but
`cmdStateValidate` and `cmdStateSync` still did flat-only `readdirSync` on the
phase root, producing false plan-count drift warnings and under-counted totals
on `phases/<N>/plans/` repos. Extend the identical scan pattern to both sites
(regex byte-identical to the `buildStateFrontmatter` site, k014). Regression
tests added for all three commands.

Closes #3257

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(bug-3257): replace readFileSync+.includes() with structural dry-run idempotency check

The lint-no-source-grep rule flags readFileSync-bound variables used with
text-match methods (.includes, .match, etc.). Replace the afterContent.includes()
check with a structural idempotency assertion: run state sync --verify twice and
confirm the second run still reports a pending change, proving the first dry-run
did not mutate STATE.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(bug-3257): fix progress assertion to use min(plan,phase) formula (#3242)

After rebasing onto main, computeProgressPercent now applies
min(plan_fraction, phase_fraction) per #3242 Bug B. Update the
multi-phase sync test to assert 50% (min(3/5, 1/2)) instead of 60%.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 09:26:35 -04:00
Tom Boucher
b37c487325 feat(security): package legitimacy gate against slopsquatting (#3215)
* feat(security): package legitimacy gate against slopsquatting (#2827)

GSD's research → plan → execute pipeline had no install-time legitimacy
gate: a hallucinated package name that passes `npm view` could flow all
the way to `gsd-executor` running `npm install <malicious-pkg>` with no
human checkpoint. This PR closes that gap.

Changes:
- gsd-phase-researcher: runs slopcheck on every recommended package;
  emits `## Package Legitimacy Audit` table; strips [SLOP] packages;
  ecosystem-specific verification (pip/npm/cargo); WebSearch-sourced
  packages tagged [ASSUMED]; ctx7 fallback uses `command -v` guard
  instead of `npx --yes`
- gsd-planner: injects `checkpoint:human-verify` before [ASSUMED]/[SUS]
  installs; adds T-{phase}-SC STRIDE row to <threat_model> template;
  ctx7 fallback also uses `command -v` guard
- gsd-executor: RULE 3 excludes package installs from auto-fix; failed
  installs surface as checkpoints, never silent substitutions
- tests/package-legitimacy-gate.test.cjs: 24 structural assertions
  covering the full gate (node:test + node:assert, no raw .includes())
- docs: USER-GUIDE, COMMANDS, ARCHITECTURE updated with gate description
- .changeset: Security fragment for v1.51 release notes

Closes #2827

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: expand Package Legitimacy Gate documentation

Add full user-facing depth to the gate docs across USER-GUIDE,
COMMANDS, and ARCHITECTURE:

- USER-GUIDE: rewrite gate section with concrete RESEARCH.md/PLAN.md
  examples, slopcheck verdict table, [ASSUMED] WebSearch tagging
  explanation, slopcheck-unavailable troubleshooting, and graceful
  degradation behavior
- COMMANDS.md: expand /gsd-plan-phase gate note with verdict bullets;
  add install-failure checkpoint behavior to /gsd-execute-phase
- ARCHITECTURE.md: expand gate section with threat model rationale,
  layer table, claim provenance integration, ecosystem coverage, and
  graceful degradation semantics

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(security): harden package legitimacy checkpoint semantics

* fix(planner): satisfy size gates and tighten package gate wording

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 09:08:06 -04:00
Tom Boucher
397c34142a Deepen SDK package seam and converge runtime skills policy (#3238)
* Deepen SDK package seam and converge runtime skills policy

* fix(sdk): unified install-root resolution for workflows and agents (CR finding 1)

Use the already-resolved gsdInstallDir constant instead of calling
resolveLegacyInstallDir() again when computing agentsDir, ensuring
workflowsDir and agentsDir share the same install root.

* fix(sdk): tilde shortening requires path-boundary match (CR finding 2)

Both renderGlobalSkillsBaseDisplayPath and renderGlobalSkillDisplayPath
used startsWith(home) which could incorrectly shorten unrelated paths
sharing the same prefix. Now checks for home === base or
base.startsWith(home + sep) to ensure a real directory boundary.

* fix(sdk): validate loadConfig export before invocation (CR finding 3)

After requiring core.cjs, check typeof mod.loadConfig === 'function'
before calling it. Throws a classified GSDError with the module path
if the export is missing, rather than a generic TypeError.

* fix(test): guard root lookup before .path dereference (CR finding 4)

Added assert.ok() guards for claudeRoot and codexRoot after the .find()
calls so that a missing root produces an explicit assertion failure
rather than a TypeError on .path dereference.

* fix(ci): fail-safe on transient API errors in approval dismissal (CR finding 6)

resolveRole() returns 'unknown' for non-404 errors (rate limits, 5xx,
network blips). shouldDismissReviewer() now treats 'unknown' as
unresolvable and skips dismissal, preventing legitimate approvals from
being dismissed due to a transient API failure. Only 'none' (true 404)
is treated as a confirmed non-collaborator.

* changeset: pr=3238 SDK package seam and runtime skills convergence

* fix(sdk): harden resolveGlobalSkillDir against path traversal (CR finding 1)

Use resolve+relative to validate that skillName cannot escape the global
skills base directory. Values like "../../foo" or absolute paths now
return null instead of joining directly. All imports (resolve, relative,
isAbsolute) were already present in helpers.ts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(sdk): split skill-dir-resolution and skill-not-found warnings (CR finding 2)

After resolveGlobalSkillDir's hardening can return null for traversal
attempts, the old single-branch warning "Global skill not found at ..."
was misleading. Split into two distinct cases:
- skillDir === null → "Could not resolve global skill directory for ..."
- skillMd missing → "Global skill not found at ..."

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test: lock skill path-traversal rejection in resolveGlobalSkillDir

Regression test verifying that traversal segments (../../foo, ../escape),
empty string, and absolute paths are all rejected (return null), while
a legitimate skill name resolves correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(sdk): align display-path contract + traversal coverage for resolveGlobalSkillMarkdownPath (CR nitpicks)

- renderGlobalSkillsBaseDisplayPath now returns a non-null string for
  unsupported runtimes (e.g. cline → "(cline does not use a skills directory)")
  matching the existing renderGlobalSkillDisplayPath contract; callers
  of both helpers no longer need null-checks for unsupported runtimes.
- Remove now-redundant ! non-null assertion on renderGlobalSkillsBaseDisplayPath
  calls in skill-manifest.ts (return type is string, not string | null).
- Extend the path-traversal test block to assert resolveGlobalSkillMarkdownPath
  also propagates null for ../../foo, ../escape, empty, and /abs/path inputs,
  locking the null-propagation contract against future refactors.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 09:06:43 -04:00
Tom Boucher
924c697097 docs: replace retired /gsd-intel with /gsd-map-codebase --query (#3258) (#3260)
* test: forbid stale /gsd-intel references in workflow/reference docs (#3258)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: replace retired /gsd-intel with /gsd-map-codebase --query (#3258)

Fixes 5 stale references across the two primary source files called out in
the issue. PR #2790 folded /gsd-intel into /gsd-map-codebase --query; these
prose surfaces were not updated at that time.

Fixes #3258

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: fix additional stale /gsd-intel references found in adversarial sweep (#3258)

Sweep found 7 more occurrences in docs/INVENTORY.md (x2), docs/USER-GUIDE.md (x4),
docs/FEATURES.md (x2), and agents/gsd-intel-updater.md (x2). All replaced with
/gsd-map-codebase --query. The gsd-intel-updater agent name itself (without leading
slash) is intentionally preserved.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* changeset: pr=3260 for #3258

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test: fail loudly on unreadable files in bug-3258 regression scan (CR finding)

Replace silent early-return on readFileSync failure with an explicit
throw so unreadable files surface as test failures rather than skipped
coverage gaps.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 09:06:37 -04:00
Tom Boucher
f5fe5bc063 fix(config): allow model_overrides.<agent-id> in config-set (#3227) (#3253)
* test: reproduce config-set rejecting model_overrides.<agent-id> (#3227)

* fix(config): allow model_overrides.<agent-id> in config-set (#3227)

* changeset: pr=3253 for #3227
2026-05-08 08:40:53 -04:00
Tom Boucher
6299b9181f fix(state): preserve curated progress on body-only updates; correct percent formula (#3242) (#3252)
* test: reproduce state.update progress trampling and percent formula (#3242)

Two failing regression tests:
- Bug A: state.update "Last Activity" tramples curated progress.* frontmatter via readModifyWriteStateMd → syncStateFrontmatter
- Bug B: 12 declared ROADMAP phases / 6 realized / 6/6 plans done → percent: 100 instead of 50 (phase-fraction ignored)

* fix(state): preserve curated progress on body-only updates; correct percent formula (#3242)

Bug A: readModifyWriteStateMd now accepts { resync: false } to preserve existing
frontmatter progress.* when only body text changes. cmdStateUpdate passes this flag
since it only replaces a body field and must not trample manually-curated
cross-milestone counters.

Bug B: extract computeProgressPercent() helper — shared by buildStateFrontmatter and
cmdStateSync — that applies min(plan_fraction, phase_fraction). When ROADMAP declares
more phases than are realized on disk, phase_fraction caps percent so 22/22 plans
done with only 6/12 phases gives 50%, not a false 100%.

* changeset: pr=3252 for #3242

* fix(test): replace content.includes with structured state json assertion (#3242)
2026-05-08 08:40:47 -04:00
Tom Boucher
985e0d5ea9 fix(capture): restore one-shot --seed contract (#3236) (#3250)
* test: lock one-shot --seed capture contract (#3236)

* fix(capture): restore one-shot --seed contract (#3236)

* changeset: pr=3250 for #3236

* fix(capture): define $KEYWORD from $IDEA in collect-breadcrumbs step

* fix(workflow): add MD040 language identifiers to plant-seed code blocks (CR finding)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(workflow): wire --enrich path to skip parse-idea and target resolved seed (CR findings)

- parse-idea now detects --enrich SEED-NNN in $ARGUMENTS, sets $ENRICH_TARGET
  and $SEED_FILE, and skips the interactive prompt + all capture steps entirely
- When $ARGUMENTS is non-empty but has no --enrich flag, uses it as $IDEA inline
- enrich-seed step derives $SEED_ID from $ENRICH_TARGET (already resolved by
  parse-idea) and falls back to most-recent seed if $SEED_FILE is empty
- Enrichment commit now uses ${SEED_ID} in message and "$SEED_FILE" as --files,
  targeting the resolved seed rather than the current capture-context path

Fixes CR findings on PR #3250 (Finding A lines 19-27, Finding B lines 132-133, 180-183)

* fix(workflow): add bash extraction for \$KEYWORD from \$IDEA (CR finding)

The collect-breadcrumbs step documented that \$KEYWORD should be derived
from \$IDEA, but provided no code to perform the extraction. Add a bash
block that lower-cases \$IDEA, strips punctuation, and picks the first
token longer than 2 characters, with a "seed" fallback.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 08:40:41 -04:00
Tom Boucher
97bde8615f fix(cjs): accept dotted canonical command form (#3243) (#3248)
* test: reproduce CJS dispatcher rejecting dotted form (#3243)

runGsdTools assertions confirm that generate-slug.hello-world,
current-timestamp.date, validate.plan, roadmap.analyze, phases.list, and
check.decision-coverage-plan all fail with "Unknown command: <dotted>" —
the dispatcher switch only accepts the spaced form.

Edge cases (no dots unchanged, leading-dot rejected, unknown dotted form
suggests spaced equivalent) are also specified; those three pass already
because the shim is not yet implemented.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cjs): accept dotted canonical command form (#3243)

Add a shim at the top of main() in gsd-tools.cjs that splits args[0] on
the first dot when present, normalizing "state.update" → command='state'
args=['state','update',...] before the switch statement is reached.

Any caller that bypasses the SDK (stale npm-installed binary, workflow
shell-out, third-party script) can now use the canonical dotted form
natively without hitting "Unknown command: <domain>.<subcommand>".

The shim guards against empty head/rest so ".hidden" and bare "." args
are unchanged and fall through to the existing "Unknown command" path.

Also improves the default "Unknown command" error message to suggest
the spaced equivalent when a dotted form was passed — e.g. for "foo.bar"
the error now reads: Unknown command: foo — did you mean: "foo bar"?

Parallel to dottedCommandToCjsArgv in sdk/src/query/query-fallback-bridge-adapter.ts;
intentionally kept separate to avoid SDK coupling.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* changeset: pr=3248 for #3243

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test: tighten dotted-form suggestion assertion (CR nitpick)

* fix(cjs): suggestion uses first-dot split (CR finding 1, multi-dot consistency)

The "did you mean" hint in the Unknown-command default case was replacing ALL
dots with spaces (state.update.foo → "state update foo"), but the dispatcher
shim only splits on the FIRST dot (state.update.foo → head=state, rest=update.foo).
Apply CR's exact patch to use indexOf+slice so suggestion matches dispatch
behavior. Add a multi-dot regression test (a.b.c must suggest "a b.c", not
"a b c").

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 08:40:36 -04:00
Tom Boucher
b2f0fdf250 fix(sdk): anchor extractFrontmatter at file start (#3240) (#3247)
* test: reproduce extractFrontmatter LAST-block bug (#3240)

* fix(sdk): anchor extractFrontmatter at file start (#3240)

* changeset: pr=3247 for #3240
2026-05-08 08:40:30 -04:00
Tom Boucher
447763411a fix(sdk): phase.add honors --dry-run; rejects unknown flags (#3226) (#3246)
* test: reproduce phase.add dry-run + flag validation gaps (#3226)

Add failing tests for:
- --dry-run silently absorbed into description (symptom A)
- Unknown --flag should return validation error (symptom C)
- ### Phase N: ROADMAP heading scan verification (symptom B)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(sdk): phase.add honors --dry-run; rejects unknown flags (#3226)

- Add flag parser to phaseAdd: strip recognized flags (--dry-run) from
  args before positional parsing so they never silently become description
  or customId values
- --dry-run computes the next phase number and roadmap_entry string but
  skips mkdir, writeFile, and readModifyWriteRoadmapMd; returns
  { dry_run: true, roadmap_entry } alongside normal fields
- Any unrecognized --flag throws a Validation GSDError naming the flag
- ROADMAP ### Phase N: heading scan for numbering (symptom B) was already
  correct; verified with new regression test

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* changeset: pr=3246 for #3226

* fix(sdk): phase.add scans disk AND roadmap (union, not fallback)

Address CodeRabbit finding: the conditional `if (maxPhase === 0)` guard
around the filesystem scan meant that if ROADMAP had any phases but disk
was ahead (e.g. ROADMAP max=10, dirs include 12-*), phase.add would
pick 11 and collide with the existing directory.

Remove the guard: always scan on-disk phase directories and take the
max across both ROADMAP and filesystem (union semantics).

All 57 phase-lifecycle tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test: reproduce phase.add concurrent ID collision (CR finding)

Two concurrent phase.add calls against the same project observe
maxPhase before the lock is held, producing duplicate phase IDs.
Adds a Promise.all regression test that asserts both calls succeed
with distinct phase numbers {11, 12}.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(sdk): compute phase number under roadmap lock (CR finding)

Move maxPhase/newPhaseId/dirName computation inside the
readModifyWriteRoadmapMd callback so the entire read → compute → write
cycle is serialised under the lock. Previously, two concurrent
phase.add calls could both observe maxPhase=N before either acquired
the lock, then both write with phase ID N+1 — producing duplicate IDs.

In dry-run mode (no write, no race) the computation still happens
outside the lock to avoid unnecessary contention.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 08:40:24 -04:00
Tom Boucher
73f7ad33e8 ci: limit unauthorized approval dismissal to open PRs 2026-05-07 14:10:52 -04:00
Tom Boucher
9ae2b2abae ci: batch unauthorized approval sweep 2026-05-07 14:01:05 -04:00
Tom Boucher
66e686d1fd ci: add workflow to dismiss unauthorized PR approvals 2026-05-07 13:50:41 -04:00
Tom Boucher
d385419ac4 docs: update CLAUDE.md agent skills block (was gitignored) 2026-05-07 09:12:32 -04:00
Tom Boucher
48b01e4c9f docs(agents): scaffold docs/agents/ skill config files
- docs/agents/issue-tracker.md — GitHub, gsd-build/get-shit-done, .envrc token required
- docs/agents/triage-labels.md — confirmed=AFK-ready, approved-*=human-ready, needs-reproduction=needs-info
- docs/agents/domain.md — single-context, CONTEXT.md sections explained
- CLAUDE.md — fix stale triage label (needs-maintainer-review doesn't exist),
  fix stale domain note ('neither exists yet'), add .envrc token reminder to issue tracker summary
2026-05-07 09:12:24 -04:00
Tom Boucher
e3b52c70bb fix(docs): replace deleted /gsd-new-workspace with /gsd-workspace --new in FEATURES.md (#3221)
Feature 129 (Issue-Driven Orchestration Guide) referenced the deleted command
/gsd-new-workspace. Replace with its v1.40.0 successor /gsd-workspace --new to
fix the stale-ref test introduced in tests/bug-3042-3044-research-flag-and-stale-refs.

Fixes #3220

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-07 00:26:24 -04:00
Tom Boucher
c0be29607a docs: v1.41.0 release documentation — CHANGELOG promotion, release notes, FEATURES update (#3219)
- Promote CHANGELOG [Unreleased] → [1.41.0] - 2026-05-07; add fresh [Unreleased] header
- Fix CONFIGURATION.md version labels: 'added in v1.40' → 'added in v1.41' for models and dynamic_routing
- Create docs/RELEASE-v1.41.0.md in compact v1.39.0 bullet format
- Rewrite docs/RELEASE-v1.40.0-rc.1.md to compact bullet format (removes wall-of-text entries)
- Add docs/FEATURES.md v1.41.0 section (features 126–131: per-phase models, dynamic routing, update banner, issue-driven orchestration, graphify staleness, MVP SDK verbs)
- Update docs/FEATURES.md TOC
- Trim README "Notable extras" table (highlight page, not a command menu)

Fixes #3218

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-07 00:19:26 -04:00
Tom Boucher
0ed360e652 Merge pull request #3216 from gsd-build/fix/ci-bug-2136-sh-hook-version
fix(build-hooks): atomic rename to prevent race with concurrent install reads
v1.41.0
2026-05-06 23:53:31 -04:00
Tom Boucher
f4c4ec6211 docs(build-hooks): correct staging-dir cleanup comment
The previous comment claimed "rmdir-on-non-empty is a no-op" — that is
factually wrong. fs.rmdirSync throws ENOTEMPTY on non-empty directories.
The actual race-safety mechanism is:
1. fs.readdirSync(STAGE_DIR) -> leftovers
2. fs.rmdirSync(STAGE_DIR) only when leftovers.length === 0
3. Outer try/catch swallows TOCTOU ENOTEMPTY (peer added a file
   between readdir and rmdir) and ENOENT (peer already cleaned up).

Comment now references the leftovers variable and both fs calls so a
future reader can map narrative to code without reverse-engineering it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 23:50:52 -04:00
Tom Boucher
c47c2c5def fix(build-hooks): handle Windows EPERM/EBUSY on rename, fall back to copy
POSIX rename(2) atomically replaces dest even when readers hold open
handles. Windows MoveFileEx (which fs.renameSync uses with
MOVEFILE_REPLACE_EXISTING) cannot — it throws EPERM/EBUSY when another
process has the destination open. Concurrent install.js readers and
antivirus scanners are realistic triggers; both release within ms.

renameAtomicWithRetry() preserves the bare renameSync call on POSIX
(no overhead) and on Windows retries up to 4 times with 10/30/90/270ms
backoff, then falls back to copyFileSync + unlinkSync. If even copy
fails because dest is hard-locked, log a non-fatal warning and leave
the prior dest in place — a subsequent build retries from a fresh
state. The build no longer crashes on Windows transient locking.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 23:48:31 -04:00
Tom Boucher
b54d986550 chore(changeset): add pr: 3216 to build-hooks-atomic-write changeset
The changeset parser hard-fails on fragments without a pr: field.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 23:35:16 -04:00
Tom Boucher
c4f11db5e9 fix(build-hooks): atomic rename to prevent race with concurrent install reads
scripts/build-hooks.js used fs.copyFileSync (truncate-then-write, non-atomic).
Under --test-concurrency=4, multiple builder invocations raced; a parallel
install.js subprocess could readFileSync between truncate and write and
observe an empty file, then write that emptiness into the install target.
Surfaced as the release-blocking bug-2136-sh-hook-version part 4 failure on
main even though the same SHA passed every install-smoke matrix entry.

Fix: stage outputs to hooks/.dist-staging/ then fs.renameSync into hooks/dist/.
POSIX rename(2) is atomic, so concurrent readers always observe a complete
file. The existing bug-2136 part 4 test locks the post-fix invariant.

Failing run: https://github.com/gsd-build/get-shit-done/actions/runs/25472202941/job/74738276687

Closes #3214

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 23:34:29 -04:00
Tom Boucher
304c1a1302 Merge pull request #3202 from gsd-build/fix/3181-node-cellar-path
fix(install): prefer stable Homebrew symlinks over versioned Cellar node paths
2026-05-06 22:09:47 -04:00
Tom Boucher
739b95ef80 fix(install): normalize Homebrew node@NN Cellar paths 2026-05-06 22:06:56 -04:00
Tom Boucher
69aa7ec04e fix(install): prefer stable Homebrew symlinks over versioned Cellar paths in node runner
process.execPath on Homebrew resolves symlinks and returns the versioned
Cellar path (e.g. /usr/local/Cellar/node/25.8.1/bin/node). After
brew upgrade node, the old Cellar binary fails with dyld: Library not
loaded because shared libraries have changed SOVERSION.

- Add normalizeNodePath() helper that maps Cellar paths to stable
  Homebrew symlinks (/usr/local/bin/node or /opt/homebrew/bin/node)
- resolveNodeRunner() now calls normalizeNodePath() before quoting
- rewriteLegacyManagedNodeHookCommands() also normalizes baked Cellar
  runner paths in existing hook commands so reinstall doesn't re-bake them
- Export normalizeNodePath for testability
- Add 22 tests covering all cases (Cellar paths, stable symlinks,
  NVM, system node, Windows, null/empty, both function surfaces)

Closes #3181

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 22:06:56 -04:00
Tom Boucher
ff832089bf Merge pull request #3207 from gsd-build/fix/3196-workstream-milestone-op
fix(query): workstream resolution in init.milestone-op and roadmap.analyze
2026-05-06 22:04:36 -04:00
Tom Boucher
8054959417 fix(query): workstream resolution in init.milestone-op and roadmap.analyze (#3196)
- initMilestoneOp now accepts and propagates the workstream parameter:
  relPlanningPath(workstream) replaces the hardcoded '.planning' dir,
  getMilestoneInfo gets workstream passed, extractCurrentMilestone gets
  workstream passed, archiveDir is derived from planningDir not root.

- resolveQueryRuntimeContext now reads .planning/active-workstream as a
  third-priority fallback after --ws flag and GSD_WORKSTREAM env var,
  completing the documented resolution chain for all query handlers.

Closes #3196

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 22:01:45 -04:00
Tom Boucher
608da536fd Merge pull request #3203 from gsd-build/fix/3033-sdk-install-wire
fix(install): wire --sdk flag into installSdkIfNeeded
2026-05-06 22:01:10 -04:00
Tom Boucher
6c321b0765 test(install): rethrow unexpected soft-skip errors 2026-05-06 21:55:45 -04:00
Tom Boucher
2bc49b0aec fix(install): wire --sdk flag into installSdkIfNeeded (#3033)
hasSdk was parsed in bin/install.js but never passed to
installSdkIfNeeded, so `npx get-shit-done-cc@latest --sdk` silently
skipped SDK deployment via the isLocal early-return and emitted a
misleading "✓ GSD SDK ready" message.

installSdkIfNeeded now accepts opts.forceSdk. When true (set from
hasSdk at the call site in installAllRuntimes), the local-install
soft-skip is bypassed so the full shim-link path runs regardless of
install mode. When dist is also missing with forceSdk=true, the
fail-fast diagnostic fires instead of silently returning.

The #2678 soft-skip (isLocal + missing dist + no --sdk) is preserved.

Closes #3033

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 21:55:45 -04:00
Tom Boucher
f4d0208abb fix(config): regression test and changeset for #3197 gsd-tools config-whitelist (#3208)
* fix(config): add regression test and changeset for #3197 CJS whitelist fix

The underlying fix (RUNTIME_STATE_KEYS in config-schema.cjs) was already
applied to main via #3162. This PR adds the regression test that would have
caught the drift had it been present — verifying the CJS path end-to-end —
and the changeset fragment to formally close #3197.

Closes #3197

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(config): isolate tmpDir per test for cleanup

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 21:51:42 -04:00
Tom Boucher
2d32ad82be fix(plan-phase): remove agent: directive that caused OpenCode subagent dispatch (#3156) (#3206)
* feat(roadmap): parse **Mode:** field on phase sections

Adds a 'mode' field to roadmap.get-phase and roadmap.analyze outputs.
Recognizes '**Mode:** mvp' lines in phase sections; lowercased + trimmed.
Forward-compat: unrecognized values preserved verbatim, no enum check.

Foundation for --mvp flag in plan-phase (PRD: vertical-mvp-slice).

* feat(plan-phase): parse --mvp flag and resolve MVP_MODE

Resolution order: CLI flag → ROADMAP **Mode:** field → workflow.mvp_mode
config → false. Walking Skeleton gate fires for new-project Phase 1.
Wires MVP_MODE + WALKING_SKELETON into gsd-planner subagent prompt.

Per PRD vertical-mvp-slice Phase 1 (Q1, Q2, Q4).

* docs(planner): add vertical-slice planning reference

New reference loaded by gsd-planner when MVP_MODE=true. Defines slice
ordering, Walking Skeleton rules, and anti-patterns. Referenced from
plan-phase workflow MVP_MODE wiring.

* docs(planner): add SKELETON.md template

Template emitted by gsd-planner under WALKING_SKELETON=true. Captures
architectural decisions and out-of-scope list for new-project Phase 1.

* chore(inventory): register new planner references

Added planner-mvp-mode.md and skeleton-template.md to INVENTORY.md and
INVENTORY-MANIFEST.json. References now: 53.

* feat(gsd-planner): add MVP Mode Detection section

Mode-switched branch in the existing planner agent (per Q4: single agent).
Vertical-slice decomposition rules, Walking Skeleton handling, and
TDD-mode compatibility. Heavy guidance lives in references/planner-mvp-mode.md.

* test(plan-phase): add --mvp resolution-chain integration cases

Validates roadmap.get-phase --pick mode and confirms workflow.mvp_mode
default is unset in fresh projects.

* docs(changelog): announce --mvp vertical-slice planning (#2826)

* feat(mvp-phase): add /gsd mvp-phase slash command

Standalone command for vertical MVP planning. Frontmatter only;
heavyweight workflow at get-shit-done/workflows/mvp-phase.md follows
in next commit. Mirrors discuss-phase/edit-phase command shape.

* docs(planner): add user-story-template reference

Defines the canonical 'As a / I want to / So that' format and the
ROADMAP.md / PLAN.md emit rules. Used by mvp-phase workflow and
gsd-planner agent under MVP_MODE.

* docs(planner): add SPIDR splitting reference

Defines size signals, the five SPIDR axes (Spike/Paths/Interfaces/Data/Rules),
the interactive workflow, and anti-patterns. Per PRD Q3 decision: full
interactive flow, not lightweight check. Used by mvp-phase workflow.

* fix(mvp-phase): trim description to fit 100-char budget

* feat(mvp-phase): add mvp-phase workflow

Standalone workflow: phase validation -> user story prompts (As a / I want to /
So that) -> SPIDR splitting check -> ROADMAP write (Mode + Goal) -> delegation
to plan-phase. Per PRD Phase 2 (Q3 full SPIDR; Phase-2-A/B/C/D decisions).

Plan-phase auto-detects MVP via Phase 1's resolution chain, so no flags
are needed when delegating.

* feat(gsd-planner): emit user-story header in PLAN.md under MVP mode

Extends the MVP Mode Detection section (added in Phase 1) so the planner
sources the user story from ROADMAP **Goal:** and emits the bolded
**As a** / **I want to** / **so that** form as the first content under
the phase header in PLAN.md. References user-story-template.md.

* test(mvp-phase): integration smoke test for ROADMAP mutation

Validates roadmap.get-phase output after a workflow-spec'd ROADMAP write:
mode=mvp and goal=full user story. Catches schema drift between workflow
emit and parser expectation. Includes a long-story case (>120 chars) to
confirm SPIDR-rejected stories still parse correctly.

* chore(inventory): register mvp-phase command + 2 new references

Adds /gsd mvp-phase to commands list, mvp-phase workflow to workflows list,
and user-story-template.md + spidr-splitting.md to references. References
count: 53 -> 55.

* docs(changelog): announce /gsd mvp-phase command (#2826)

* fix(mvp-phase): add TEXT_MODE plain-text fallback for non-Claude runtimes (#2012)

* docs(executor): add MVP+TDD gate reference

Defines the runtime gate semantics for execute-phase when both
MVP_MODE and TDD_MODE are true: pre-task verification of failing-test
commit, end-of-phase review escalation from advisory to blocking,
behavior-adding task definition. Loaded conditionally by
execute-phase workflow and gsd-executor agent.

* feat(execute-phase): MVP+TDD runtime gate + blocking review

Resolves MVP_MODE in Step 1 (CLI flag -> roadmap mode -> config -> false).
Adds per-task gate that halts before behavior-adding tasks run if no
failing-test commit exists for the plan. Escalates end-of-phase TDD
review from advisory to blocking when both MVP_MODE and TDD_MODE active.

Also updates INVENTORY-MANIFEST.json to register execute-mvp-tdd.md
(added by Task 1) so manifest-sync tests pass.

Per PRD vertical-mvp-slice Phase 3a (decisions Phase-3-A, Phase-3-Split).

* feat(gsd-executor): add MVP+TDD Gate section

Mirrors the planner's MVP Mode Detection pattern from Phase 1.
Instructs halt-and-report when the runtime gate trips, references
execute-mvp-tdd.md for full semantics. No agent changes outside the
new section.

* test(execute-phase): add MVP+TDD resolution-chain integration cases

Validates roadmap.get-phase --pick mode and confirms workflow.mvp_mode
default is unset in fresh projects. Mirrors the Phase 1 plan-phase
resolution-chain integration test.

* chore(inventory): register execute-mvp-tdd reference

Bumps References count 55 -> 56. Registers execute-mvp-tdd.md.
Adds "init" to PROSE_ALLOWLIST in registry integration test so
bare `gsd-sdk query init` prose examples in plan docs don't
trigger the unregistered-handler guard (real commands are all
init.<subcommand>).

* docs(changelog): announce MVP+TDD runtime gate in execute-phase (#2826)

* docs(verifier): add verify-mvp-mode reference

Defines UAT framing under MVP mode: user-flow walk-through first,
technical checks deferred, coverage check as goal-backward narrowing
to the user story's outcome clause. Loaded conditionally by
verify-work workflow and gsd-verifier agent.

* feat(verify-work): MVP-mode UAT framing — user flow first

Resolves MVP_MODE from phase mode field. Under MVP mode, generates UAT
in three ordered sections: user-flow walk-through (derived from user
story), technical checks (deferred), coverage check (goal-backward).
Falls back to standard UAT generation when mode is null/absent.
User-story-format guard refuses to verify a mode:mvp phase with a
non-user-story goal.

Also updates docs/INVENTORY.md (56 references) and
docs/INVENTORY-MANIFEST.json to register verify-mvp-mode.md added
in Task 1.

Per PRD vertical-mvp-slice Phase 3b (decisions Phase-3-B,
Phase-3-Verify-Structure).

* feat(gsd-verifier): add MVP Mode Verification section

Narrows goal-backward verification to the user-story [outcome] clause
when phase mode is mvp. References verify-mvp-mode.md. Preserves
existing goal-backward methodology for non-MVP phases. User-story-format
guard refuses to verify a mode:mvp phase with a non-user-story goal.

* docs(changelog): announce MVP-mode UAT framing in verify-work (#2826)

* feat(new-project): add Vertical MVP vs Horizontal Layers mode prompt

Asks user at project init how to structure the project. Vertical MVP
emits **Mode:** mvp on every initial roadmap phase (per-phase mode
preserved per PRD Q1). Horizontal Layers falls back to standard
template — no behavioral change for existing flows.

Per PRD vertical-mvp-slice Phase 4 (decision Phase-4-Persistence).

* feat(progress): add MVP-mode user-flow display

When phase has **Mode:** mvp, progress renders user-flow status from
PLAN.md task names alongside standard task progress. Tasks that aren't
user-flow-shaped (technical-sounding) are filtered out of the user-flow
sub-block. Falls back to standard display when mode is null/absent.

Per PRD vertical-mvp-slice Phase 4 (decision Phase-4-Progress).

* feat(stats): add MVP phase count summary

Reads roadmap.analyze (which surfaces mode per phase from Phase 1) and
emits 'Phases: N total | M MVP | K standard' summary line. Suppressed
when MVP_COUNT == 0 to avoid clutter on non-MVP projects.

Per PRD vertical-mvp-slice Phase 4.

* feat(graphify): add MVP-mode visual differentiation

MVP-mode phases render with #22c55e fill color AND ' (MVP)' label
suffix — two-channel signaling for color-blind and grayscale renders.
Standard phases unchanged.

Per PRD vertical-mvp-slice Phase 4 (PRD Q5: distinct visual treatment).

* docs(changelog): announce Phase 4 discovery & progress (#2826)

* chore(release): bump dev to 1.50.0-canary.0 for first 1.50.0 canary

Sets the base version that .github/workflows/canary.yml derives the canary
tag from (strips suffix → base 1.50.0 → next available v1.50.0-canary.N).

This kicks off the 1.50.0 release train, opened by the MVP/TDD/UAT vertical
slice landed across PRs #2867, #2874, #2878, #2880, #2883.

* docs: add CANARY stream README + v1.50.0-canary.1 release notes

- docs/CANARY.md — explains the dev→@canary stream policy, install/rollback
  paths, and when (not) to install canary builds
- docs/RELEASE-v1.50.0-canary.1.md — release notes for the first 1.50.0
  canary cut: vertical MVP/TDD/UAT slice (#2867 + #2874 + #2878 + #2880 +
  #2883), opening the 1.50.0 train under PRD #2826
- docs/README.md — index entry + quick link for the canary stream

* fix(ci/canary): publish gate checks dev branch, not main

Four publish-step `if:` conditions in .github/workflows/canary.yml were
checking `github.ref == 'refs/heads/main'`. Those steps (Tag and push,
Publish to npm, Publish SDK to npm, Verify publish) therefore always
skipped on every workflow_dispatch invocation since canary runs from dev,
never main.

The workflow's own header comment is unambiguous: `dev → @canary`. The
gate was a copy-paste from release.yml (which correctly targets main for
the @next/@latest streams) that was never corrected for the canary stream.

This is why the 1.50.0-canary.1 publish hadn't materialized despite three
green workflow runs. With the gate corrected, the next dispatch will
actually publish.

* ci(release-sdk): make release-sdk.yml dispatchable from the dev branch

The workflow lives on main only, so the GitHub Actions "Use workflow
from" dropdown doesn't list dev — meaning dev → @dev publishes can't be
triggered from the dev branch directly. Add the file to dev so an
operator can dispatch it with branch=dev and tag=dev.

Per project release-stream policy: dev branch publishes canary (@dev).
This is the stream that needs the file most, since main never publishes
@dev itself (main does @next / @latest).

File is byte-identical to main's release-sdk.yml — straight propagation,
no behavioral change. Tracking issues #2925, #2929.

* docs(mvp): canary-prep concept cleanup — CONTEXT.md, mvp-concepts index, --prd interaction (#3176)

* chore(mvp): concept cleanup + cross-ref index for v1.50.0-canary.2 prep

- CONTEXT.md gains 7 MVP domain terms (MVP Mode, User Story, Walking
  Skeleton, Vertical Slice, Behavior-Adding Task, MVP+TDD Gate, SPIDR
  Splitting) so the project glossary matches the shipped surface.
- New get-shit-done/references/mvp-concepts.md indexes the six MVP
  reference files and concept-to-file map so agents and contributors
  can find the right canonical doc without grepping.
- plan-phase.md Walking Skeleton block now documents that --mvp and
  --prd compose orthogonally on Phase 1; no precedence needed.
- INVENTORY/INVENTORY-MANIFEST refreshed for the new reference (58 -> 59).

No behavior change. Canary-prep cleanup ahead of v1.50.0-canary.2.

Surfaced for follow-up (not in this PR):
- MVP_MODE resolution shell block duplicated across plan-phase,
  execute-phase, verify-work workflows (needs a shared workflow-include
  mechanism; structural change).
- Behavior-Adding Task predicate is prose-only; no shared utility.
- User Story regex hardcoded in verify-work; would benefit from a
  central definition consumed by the verifier and the mvp-phase command.

* chore(changeset): set PR number for mvp concept cleanup

* feat(mvp): centralize resolution surfaces + fix SDK roadmap mode parity (#3178)

Three new SDK query verbs replace the architectural duplication surfaced by
the v1.50.0-canary.2 review against dev tip 12c4e565:

  phase.mvp-mode <N> [--cli-flag]
    Single canonical precedence resolver (CLI flag -> ROADMAP **Mode:** mvp
    -> workflow.mvp_mode config -> false). Replaces 4-8 lines of bash that
    were duplicated across plan-phase.md, execute-phase.md, verify-work.md,
    and progress.md. Returns {active, source, roadmap_mode, config_mvp_mode,
    cli_flag_present}.

  task.is-behavior-adding <plan-file> | --task-content <xml>
    Behavior-Adding Task predicate (tdd="true" + <behavior> block + non-test
    source files in <files>). Replaces prose-only specification in
    references/execute-mvp-tdd.md; gsd-executor agent now invokes the verb
    instead of re-inlining the three checks. Returns {is_behavior_adding,
    checks, reason}.

  user-story.validate <text> | --story <text>
    Owns the canonical User Story regex /^As a .+, I want to .+, so that .+\.$/
    previously hardcoded in verify-work.md prose. Consumed by gsd-verifier
    (phase-goal guard) and /gsd-mvp-phase (interactive-prompt validation).
    Returns {valid, slots: {role, capability, outcome}, errors[]}.

Bug fix bundled: sdk/src/query/roadmap.ts searchPhaseInContent now extracts
the mode field from **Mode:**, restoring parity with roadmap.cjs:120-123.
Without this, roadmap.get-phase --pick mode returned null on the native
dispatch path even when the phase had **Mode:** mvp set, causing MVP_MODE
to silently fall through to the config/false branch in every consuming
workflow. The original PRs Phase 1 (#2885) shipped the CJS parser but the
SDK port omitted the field; this fix brings them back to parity.

Workflows + agents updated to call the verbs:
  - plan-phase.md, execute-phase.md, verify-work.md, progress.md call
    phase.mvp-mode (one line replaces the duplicated bash chains).
  - execute-phase.md MVP+TDD gate calls task.is-behavior-adding.
  - verify-work.md goal guard calls user-story.validate.
  - mvp-phase.md interactive prompt validates via user-story.validate.
  - gsd-executor agent references task.is-behavior-adding instead of prose.
  - gsd-verifier agent references user-story.validate instead of inlined regex.

Tests: 24 new vitest tests in sdk/src/query/mvp.test.ts cover all three
verbs + the regression. Two existing contract tests (progress, verify)
updated to assert on the new verb shape. All 60 existing MVP contract
tests pass; golden integration suite (38 + 42 tests) passes.

Closes #3177

* fix(canary.2): unblock release gates for v1.50.0-canary.2

Run 25451329660 (Release SDK Bundle on dev, 2026-05-06T17:41) failed at the
test-suite step with 3 deterministic content/structure gate failures, all
attributable to the MVP umbrella integration in #3178 and the docs sweep
in #3180.

Failure 1: /gsd-mvp-phase undocumented in workflows/help.md
  - tests/bug-2954-help-md-slash-command-stubs.test.cjs requires every
    shipped commands/gsd/<X>.md to have a /gsd-<X> mention in help.md
  - PR #3180 updated docs/COMMANDS.md but missed help.md (which the AI
    agents load in-product)
  - Fix: add a /gsd-mvp-phase entry to help.md right before /gsd-plan-phase

Failures 2 + 3: execute-phase.md (1727) and plan-phase.md (1714) over XL budget (1700)
  - PR #3178 added MVP-mode verb calls (phase.mvp-mode, task.is-behavior-adding,
    user-story.validate) to both workflow files, pushing them past 1700 lines
  - Fix: bump XL_BUDGET 1700 -> 1800 with inline comment pointing at the
    structural follow-up (extract MVP bodies to <workflow>/modes/mvp.md per
    the discuss-phase/modes/ precedent)
  - The structural extract is the right long-term fix but is bigger than
    canary unblock scope; will land in a follow-up after canary cycles

Local verification:
  $ node --test tests/bug-2954-help-md-slash-command-stubs.test.cjs                 tests/workflow-size-budget.test.cjs
  tests 111  pass 111  fail 0

After this lands, re-trigger Release SDK Bundle on dev for v1.50.0-canary.2.

* chore(changeset): set PR number for canary.2 unblock

* fix(codex): generate-claude-md writes to AGENTS.md on Codex runtime

When config.runtime === 'codex' or GSD_RUNTIME=codex, override the
output target to AGENTS.md regardless of claude_md_path, so Codex
projects no longer have GSD sections written to CLAUDE.md by mistake.

Fixes both the CJS (gsd-tools) and SDK (profile-output.ts) paths.
Explicit --output flags are still honoured in both paths.

Closes #3163

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(plan-phase): remove agent: directive that caused OpenCode subagent dispatch

On OpenCode, any command with `agent: <name>` in its frontmatter is
auto-dispatched to a subagent context where the Agent tool is unavailable.
plan-phase.md and mvp-phase.md both carried `agent: gsd-planner`, causing
them to run inside gsd-planner's subagent context with no ability to spawn
researcher/planner/checker subagents — the orchestrator fell back to inline
execution for all three phases.

Fix: remove `agent: gsd-planner` from both command files so they run in the
main agent context. Also replace the stale `Task` tool in allowed-tools with
`Agent` (the correct dispatcher tool name post-#3168 rename).

Adds a structural regression test that parses YAML frontmatter of every
commands/gsd/*.md file and asserts no command carries an `agent:` directive.

Closes #3156

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(mvp): address CodeRabbit workflow and contract findings

* fix(execute-phase): use registered state.update query command

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 21:51:38 -04:00
Tom Boucher
a6beac40a2 fix(quick): port history-based resurrection guard from execute-phase.md (#3195) (#3201)
Replace the inverted PRE_MERGE_FILES grep in the worktree-merge cleanup
block with the git-log --diff-filter=D history check introduced for
execute-phase.md by PR #2510. The old form deleted any .planning/ file
absent from the pre-merge snapshot — including brand-new files such as
SUMMARY.md — rather than only files with a confirmed deletion event on
main. Remove the now-unused PRE_MERGE_FILES snapshot line. Adds a
drift-guard test (node:test) asserting both workflows use WAS_DELETED and
neither uses the bare PRE_MERGE_FILES grep form.

Closes #3195

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 21:51:32 -04:00
Tom Boucher
e9a55b4794 fix(artifacts): register RETROSPECTIVE.md as canonical planning artifact (#3200)
* fix(artifacts): register RETROSPECTIVE.md as canonical planning artifact

Adds RETROSPECTIVE.md to CANONICAL_EXACT in artifacts.cjs so gsd-health
no longer raises W019 after any /gsd-complete-milestone run. The file was
established as a living artifact in PR #644 but omitted from the W019
registry created in PR #2488.

Closes #3198

* chore(changeset): point pr metadata to #3200
2026-05-06 21:51:29 -04:00
Tom Boucher
d42f273838 Merge pull request #3199 from gsd-build/fix/3102-changeset-pr-field
fix(changeset): add missing pr field to windows-npm-shell-fix
2026-05-06 21:04:10 -04:00
Tom Boucher
46cbeb505e test: ignore comments in platform-gate regex assertion 2026-05-06 21:01:25 -04:00
Tom Boucher
ea37252f20 Merge pull request #3102 from fabiossj83/fix/windows-npm-execfilesync-shell-true
fix(hooks): gsd-check-update-worker — execFileSync 'npm' needs shell:true on Windows
2026-05-06 20:55:48 -04:00