Commit Graph

196 Commits

Author SHA1 Message Date
Tom Boucher
8d5f509edf fix(3266): preserve wave 0 and bucket plans by depends_on DAG in phase-plan-index (#3276)
* fix(3266): preserve wave 0 and bucket plans by depends_on DAG in phase-plan-index

Fixes two cooperating bugs in the phase-plan-index builder:

1. Wave 0 collapse: `parseInt(...) || 1` coerced parsed value `0` to `1` due to
   JS falsy default. Fixed with `Number.isNaN` guard.
2. depends_on ignored: wave-bucketing used only the `wave:` frontmatter field.
   Now replaced with Kahn's topological-level algorithm over `depends_on`:
   source nodes (no in-phase deps) → lowest level; each plan's level = max(deps'
   levels) + 1.  Declared `wave:` that disagrees with computed level emits a
   non-fatal warning on the result.  Cycle detection throws GSDError.

`PlanInfo` gains `depends_on: string[]`. `PhasePlanIndex` gains `warnings?: string[]`.
Both TS (`sdk/src/query/phase.ts`) and CJS twin (`get-shit-done/bin/lib/phase.cjs`)
fixed identically.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore: add changeset for #3276

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(phase): resolve depends_on against canonical plan id (#3276 CR)

Build a secondary `canonicalToId` index alongside `planMap` so that a
dependency declared as '03-01' resolves to a descriptive plan stored
under '03-01-auth-hardening', preventing silent wave-ordering failures.
Applied at both DAG construction sites in phase.cjs and the SDK's
phase.ts (k014 parity). Regression test added.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 00:25:05 -04:00
Tom Boucher
8bc255c266 fix(workstream): normalize migration workstream names (#3269)
* fix(workstream): normalize migrate-name to valid slug

* docs(context): record workstream migrate-name slug invariant

* fix(catalog-cjs): balanced fallback for unknown profile (CR finding A)

profiles[profile] could return undefined for any profile key absent from
the catalog entry, causing downstream callers like formatAgentToModelMapAsTable
to crash on .length. Add ?? profiles.balanced fallback to match the SDK adapter.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(sdk): anchor path resolution on import.meta.url not cwd (CR finding B)

resolve(process.cwd(), '..') breaks when Vitest is invoked from the repo root
because cwd is already the repo root and '..' goes one level above. Replace
with a file-relative path using fileURLToPath(new URL('../../../', import.meta.url))
anchored at the test file's location (sdk/src/query/).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test: derive Group B runtime list from catalog (CR finding C)

Hardcoded ['kilo', 'cline', ...] throws TypeError if a runtime name is
removed from the catalog. Derive group B dynamically via
Object.keys(catalog.runtimeTierDefaults).filter(r => !r.opus) so the
test never goes stale and auto-covers future Group B additions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(workflow): add hermes to Step B runtime options (CR finding D)

hermes appears in the Group A built-in defaults table but was missing from
the AskUserQuestion options in Step B, forcing users to manually type it via
'Other (Group B or custom)'. Add explicit hermes entry for UI consistency.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(config): refresh dynamic_routing tier table; fix stale L671 (findings E+F)

Finding E: tier table was missing 6 heavy-tier agents and 15 standard/light
agents added by this PR. Updated all three rows to match catalog routingTier
assignments (33 agents total).

Finding F: removed stale '18 of 31' claim and agent enumeration; replaced
with accurate note that all 33 agents have explicit catalog entries. Updated
authoritative source pointers to model-catalog.cjs / model-catalog.ts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(core): add profile-fallback unit tests for quality and budget (CR nitpick G)

The PR introduced quality→opus and budget→haiku unknown-agent fallbacks but
only balanced→sonnet and inherit→inherit were tested. Add two tests covering
the remaining two branches to complete coverage.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* adr: define planning workspace and worktree seam

* refactor(worktree): extract worktree safety policy module

* refactor(workstream): extract active workstream pointer store seam

* test(worktree): cover policy branch paths and persist seam guardrails

* refactor(worktree): centralize health inventory seam for W017

* fix(workspace): align SDK project path policy with CJS planningDir

* refactor(query): unify SDK planning path projection seam

* refactor(init): route workspace projection through planningPaths seam

* docs(adr): add SDK architecture and planning path ADRs

* refactor(worktree): deepen name, pointer, inventory, and config seams

* docs(config): harmonize claude-opus-4-6 to 4-7 in resolve_model_ids example (CR finding 2)

* fix(sdk): return undefined for model_profile='inherit' sentinel (CR finding 3)

* docs(adr): renumber conflicting 0003-sdk-package-seam-module to 0007, update seam-map reference (CR finding 4)

* fix(workstream): align CJS and SDK name validation to accept dots, guard path traversal via includes('..') (CR finding 5)

* fix(sdk): guard writeActiveWorkstream against non-existent workstream directory, k014/k031 parity (CR finding 6)

* chore(changeset): add #3269 changeset (CR finding 1 — proper changeset for this PR)

* docs(inventory): register 3 new CLI modules in INVENTORY.md/MANIFEST (active-workstream-store, workstream-name-policy, worktree-safety)

* fix(sdk): use relPlanningPath(workstream) in planningPaths, fix setActiveWorkstream/getActiveWorkstream name errors in workstream.ts

* fix(sdk): validate GSD_WORKSTREAM in planningPaths before use (#3269 regression)

planningPaths() called resolveWorkspaceContext() which returned GSD_WORKSTREAM
raw (no validation). An invalid value like '../evil' was used as effectiveWorkstream,
constructing a bad path; roadmapAnalyze() caught the ENOENT and returned a
no-phase_count error object instead of the root ROADMAP result.

Fix: validate envCtx.workstream with validateWorkstreamName() in planningPaths()
before accepting it as effectiveWorkstream. Invalid env → null → root .planning/
fallback, preserving the bug-2791 contract: invalid GSD_WORKSTREAM is silently
ignored and falls back to the root context (phase_count: 0 for empty root ROADMAP).

The bug-2791 regression test now passes. No other call sites read GSD_WORKSTREAM
without validation: query-runtime-context.ts already validates; cli.ts already
validates; context-engine.ts takes a caller-validated workstream parameter.

Closes #3268 (regression introduced by #3269 workstream-name-policy work).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 00:15:04 -04:00
Tom Boucher
65abc4fc90 refactor(query): deepen phase lifecycle seams (#3267)
* refactor(query): extract phase lifecycle policy module

* refactor(query): extract phase fs and roadmap mutation adapters

* fix(sdk): propagate non-ENOENT readdir errors in phase-filesystem-adapter (CR finding 1)

Swallow only ENOENT in listDirectories; rethrow EACCES, EIO, and other
unexpected errors so callers surface real failures rather than silently
treating a permission-denied phases dir as empty.

Also adds regression test: EACCES from readdir now propagates as thrown
error instead of returning [].

* fix(sdk): propagate non-ENOENT readFile errors in phase-roadmap-mutation (CR finding 4)

readModifyWriteRoadmapMd now falls back to empty content only on ENOENT;
EACCES, EIO, and other errors are rethrown so a subsequent write cannot
clobber real roadmap content that is temporarily unreadable.

Regression tests: EACCES propagates; absent ROADMAP.md still starts empty.

* fix(sdk): omit Depends on: Phase 0 for first sequential phase; align prefix grammar (CR findings 2+3)

Finding 2: buildPhaseRoadmapEntry now omits the "Depends on" line when
phaseId == 1 (prevPhase would be 0, which is not a valid predecessor).
The guard is `prevPhase < 1` so future phase-0 configs are also safe.

Finding 3: collectDecimalSuffixesFromDirNames regex prefix pattern
updated from `[A-Z]{1,6}` to `[A-Z][A-Z0-9]*` (case-insensitive flag
added), matching the grammar used by scanSequentialMaxPhaseFromDirs.
Prevents k014 parity drift for alphanumeric project-code prefixes longer
than six characters or containing digits.

Regression tests for both fixes included.
2026-05-09 00:14:59 -04:00
Tom Boucher
288b3b4170 fix(3259): non-mutating --help guard for native query handlers (#3272)
* fix(3259): non-mutating --help guard for native query handlers; reject --help as milestone version

Adds a dispatcher-level guard in query-dispatch.ts that short-circuits
to a non-mutating help stub whenever --help/-h appears in args destined
for a native mutating handler (fail-closed by default). Adds defense-
in-depth in milestoneComplete to reject --help/-h as a version value
before any disk write. Regression tests cover: per-handler --help guard,
registry-driven invariant across all mutating commands, handler-level
GSDError for both flags, and preservation of the #3019 CJS fallback
contract.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore: add changeset fragment for #3272

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 23:53:27 -04:00
Tom Boucher
ecd57e622c fix(3265): prefer YAML frontmatter for state-snapshot canonical fields (#3275)
* fix(3265): prefer YAML frontmatter for state-snapshot canonical fields

stateSnapshot in both sdk/src/query/state.ts and the CJS twin
(get-shit-done/bin/lib/state.cjs cmdStateSnapshot) passed the whole
STATE.md blob to stateExtractField, whose bold pattern (**Field:**)
has no line anchor.  A body table cell such as
"**Status:** to  COMPLETE" therefore silenced the correct YAML
frontmatter value.

Fix: extractFrontmatter(content) first; stripFrontmatter(content) for
the body passed to stateExtractField; for each canonical scalar field
prefer the non-empty frontmatter value, falling back to body extraction
when the key is absent or the file has no frontmatter block at all.

Regression tests added in sdk/src/query/state.test.ts (vitest) and
tests/state.test.cjs (node:test) covering:
- frontmatter status beats **Status:** inside a table cell
- frontmatter current_plan beats bold body value
- no-frontmatter files continue to extract from body
- field absent from frontmatter falls through to body extractor

Fixes #3265

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore: add changeset for #3275

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test: reproduce fmStr drops non-string YAML scalars (#3275 CR finding)

Add tests/bug-3275-fmstr-non-string-scalars.test.cjs with 5 cases covering
CJS state-snapshot with numeric frontmatter scalars (current_phase: 19,
total_phases: 7, total_plans_in_phase: 5), string regression, and
no-frontmatter body fallback regression.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(state): fmStr accepts numeric/boolean YAML scalars (CR finding)

Rename `fmStr` to `fmScalar` in both state.cjs and sdk/src/query/state.ts
and broaden the type guard so that non-null number/boolean frontmatter values
are coerced to String(v) instead of being discarded.

The previous `typeof v === 'string'` check was a latent bug: if the YAML
parser ever returns typed scalars (e.g. `current_phase: 19` as the number 19),
the frontmatter value would be silently dropped and the stale body value used
instead.  Both files are updated identically (k014 parity).

Also adds three SDK vitest regression cases (numeric current_phase,
total_phases, total_plans_in_phase) in sdk/src/query/state.test.ts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 23:53:21 -04:00
Tom Boucher
96806003c5 fix(#3229): shared model catalog source of truth for agent profiles + runtime tier defaults (#3230)
* docs(adr): add ADR-0003 model catalog module

* fix(#3229): add shared model catalog as source of truth for agent profiles and runtime tier defaults

Research / design (ADR-0003):
- Existing drift came from 4 independent model truths:
  1. CJS model-profiles.cjs
  2. SDK config-query.ts stale copy (18 agents)
  3. settings-advanced.md runtime tier table
  4. session-runner Claude-only profile map
- New design: one machine-readable Model Catalog Module in sdk/shared/
  that both packages ship and consume.

Implementation:
- sdk/shared/model-catalog.json — canonical source of truth for:
  - full 33-agent registry
  - per-agent golden (quality) alias + balanced/budget aliases
  - adaptive derivation from routingTier
  - agent→phaseType map
  - agent→dynamic-routing default tier map
  - runtime tier defaults for all supported runtimes
- get-shit-done/bin/lib/model-catalog.cjs — CJS adapter over the catalog
- sdk/src/model-catalog.ts — SDK adapter over the same catalog
- CJS model-profiles.cjs now re-exports derived data from model-catalog.cjs
- SDK config-query.ts now re-exports MODEL_PROFILES/VALID_PROFILES from
  model-catalog.ts instead of maintaining its own list
- sdk/src/query/helpers.ts runtime list now comes from the catalog (fixes hermes drift)
- sdk/src/session-runner.ts Claude profile→model-id mapping now resolves via catalog
- docs/CONFIGURATION.md + settings-advanced.md runtime tables updated to match catalog

Behavior changes:
- resolve-model now covers every shipped agent file on disk (33 agents)
- unknown-agent fallback is profile-semantic, not hardcoded sonnet:
  quality→opus, budget→haiku, balanced/adaptive→sonnet, inherit→inherit
- Group B runtimes remain known runtimes but do not get built-in tier defaults

Tests (RED→GREEN):
- root tests: shipped agent files must equal MODEL_PROFILES keys
- sdk tests: shipped agent files must equal MODEL_PROFILES keys
- direct fix assertion: gsd-code-reviewer resolves to opus under quality with no unknown_agent
- runtime defaults parity test: settings-advanced.md + CONFIGURATION.md tables must match catalog
- helper tests: hermes included in SUPPORTED_RUNTIMES and getRuntimeConfigDir()

Closes #3229

* chore(changeset): update #3229 changeset pr field to 3230

* fix(ci): update inherit fallback expectations and inventory parity for model catalog
2026-05-08 21:25:37 -04:00
Tom Boucher
deeb6deb67 fix(install): accept Codex TOML floats; idempotent rollback (#3245) (#3254)
* test: reproduce extractFrontmatter LAST-block bug (#3240)

* test: reproduce state.update progress trampling and percent formula (#3242)

Two failing regression tests:
- Bug A: state.update "Last Activity" tramples curated progress.* frontmatter via readModifyWriteStateMd → syncStateFrontmatter
- Bug B: 12 declared ROADMAP phases / 6 realized / 6/6 plans done → percent: 100 instead of 50 (phase-fraction ignored)

* test: reproduce TOML float rejection and partial rollback (#3245)

Two failing regression tests:
1. parseTomlToObject rejects valid Codex TOML floats (tool_timeout_sec = 20.0)
2. Post-install validation failure leaves skills/, agents/, VERSION on disk
   despite restoring config.toml — hybrid state after abort

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(install): accept TOML floats; idempotent codex rollback (#3245)

Two fixes for the Codex install failure introduced by #2760 CR4 finding 3:

1. parseTomlValue now accepts TOML 1.0 float literals (decimals,
   exponents, underscore separators, signed). Codex CLI's serde schema
   requires f64 for tool_timeout_sec / startup_timeout_sec — the prior
   strict-integer-only check was the inverse of what Codex requires,
   causing every config with a float to trigger a fatal schema validation
   failure. Date/time separators (-/:T/Z) are still rejected.

2. restoreCodexSnapshot is extended into a unified idempotent rollback
   that reverts ALL Codex-specific mutations on failure:
   - config.toml (existing behavior)
   - skills/gsd-* directories (new)
   - agents/gsd-*.{md,toml} files (new)
   - get-shit-done/VERSION (new)
   - orphaned atomic-write temp files (new)
   Pre-install state is captured before the first Codex write so the
   rollback reflects the true pre-GSD state. Non-gsd-* user content is
   untouched. The rollback is safe to call multiple times and before any
   snapshots are captured.

Fixes #3245

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* changeset: pr=3254 for #3245

* test: fix source-grep lint violation in bug-3242 test (#3242)

Replace content.includes() check with line-by-line parse of STATE.md body.
The lint enforces structural assertions over raw text matching.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test: mark #3242 RED tests as todo pending fix (#3242)

The three failing tests are intentional regression tests for bugs in
state.cjs that will be fixed in a separate PR. Mark them { todo: true }
so they don't block CI on this branch.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(install): tighten TOML underscore placement validation (CR finding 1)

The float regex used [\d_]* which accepts invalid forms like 1__0, 1_.0,
and 1._0. TOML 1.0 §2 requires underscores only between digits. Switch
both the integer pre-check and the full float pattern to (?:_?\d)* so
consecutive underscores, leading underscores on a segment, and trailing
underscores on a segment are all rejected before replace(/_/g,'') can
silently normalize them into valid JS numbers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(install): restore pre-existing gsd-* content on rollback (CR finding 2)

The snapshot only recorded names of pre-existing skills/gsd-* dirs and
agents/gsd-* files. On a failed reinstall the rollback could delete
newly-created dirs but could not restore the bytes of dirs/files that
were overwritten, leaving the user in a hybrid state (old config.toml,
new skill files).

Now snapshot the full file tree of every pre-existing gsd-* skill dir
into codexPreInstallSkillContents (Map<name, Map<relPath, Buffer>>) and
every pre-existing agent file into codexPreInstallAgentContents
(Map<filename, Buffer>). restoreCodexSnapshot() uses these maps to
wipe-and-restore overwritten entries and only removes entries that had
no pre-install state, giving a true atomic rollback guarantee.
Reads are best-effort so a partial snapshot is still better than none.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(install): scope temp-file cleanup to installer-owned writes (CR finding 3)

_cleanTmpFiles() was deleting any *.tmp-<pid>-<n> file found under
targetDir. This is too broad: other tools in the user's Codex/home
directory may create temp files matching the same suffix pattern, and a
GSD install rollback would silently delete them.

Add __atomicWrittenTmps (a module-level Set<string>) populated by
atomicWriteFileSync for every temp path it creates. _cleanTmpFiles()
now checks __atomicWrittenTmps.has(full) before unlinking, so only temp
files this installer process actually wrote are eligible for cleanup.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(test): remove no-op doesNotThrow wrapping try/catch (CR finding 4)

assert.doesNotThrow(() => { try { f(); } catch(_){} }) always passes
because the catch block swallows every exception before the outer
assertion can see it. This meant the rollback-idempotency guarantee was
never actually verified.

Replace with an explicit threw flag around runCodexInstall, assert that
the install did throw (validation failure is expected), and add a
post-rollback state assertion that skills/ was not created. This gives
a loud failure surface if runCodexInstall starts crashing from inside
the rollback path, matching the intent described in the test comment.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(test): correct describe title for float-acceptance tests (CR nitpick 1)

The describe block title said 'rejects malformed input that previously
slipped through', but the test inside now asserts that TOML floats are
accepted (the #3245 inversion). This misled readers expecting every
sub-test to assert rejection. Update the title to reflect the mixed
behaviour: floats are accepted; dates and trailing-garbage are rejected.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(test): rename test to match what the assertion actually checks (CR nitpick 2)

The test name 'post-install config retains float literal form (20.0 not
truncated to 20)' promised a string-form invariant, but the assertion
uses numeric equality (assert.strictEqual(parsed.tool_timeout_sec, 20))
which cannot distinguish 20 from 20.0 in JS. Rename to 'post-install
config round-trips tool_timeout_sec as numeric 20' so the description
matches what the test actually verifies.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(test): replace raw text scan with state json assertion (CR nitpick 3)

The 'Last Activity updates the body field' test was reading STATE.md as
raw text, splitting on newlines, and using lines.find/startsWith to
locate the 'Last Activity:' line — the exact pattern-match-on-source
approach prohibited by the no-source-grep testing standard.

Replace with runGsdTools('state json', tmpDir) which surfaces the body-
extracted Last Activity value as fm.last_activity in its JSON output,
and assert against that structured field instead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(test): correct post-rollback state assertion for early-failure case

The previous assertion checked that skills/ didn't exist, but the
installer writes skills/ before the schema validator fires. Rollback
removes gsd-* dirs inside skills/, not skills/ itself. Update the
assertion to verify that no gsd-* skill dirs survive rollback, which
is the actual invariant the test name describes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* changeset: document full rollback scope (CR finding 1)

Adds config.toml restoration and orphaned atomic-write temp-file
cleanup to the changeset description — the previous text only listed
skills/, agents/, and VERSION.

* fix(install): wrap post-snapshot scope in rollback handler (CR finding 2)

Any throw between the pre-install snapshot capture and the Codex config
block (skills copy, agents copy, VERSION write, manifest write, leaked-
path scan, etc.) now triggers _codexPreConfigRollback() so the caller
is never left in a partially-installed state.  Previously only the later
config.toml mutation paths had rollback wired in.

Introduces _codexPreConfigRollback (defined right after snapshot capture)
and wraps the intervening operations in a try/catch that invokes it on
error for Codex installs; non-Codex paths are unaffected.

* test: assert threw=true to prevent vacuous pass (CR finding 4)

Two tests used bare try/catch without asserting threw === true, so they
would silently pass even if runCodexInstall never threw (k060 pattern).
Each bare catch block is replaced with a threw flag and a
strictEqual(threw, true, ...) assertion.

CR findings 2+3 are both addressed in the preceding install commit:
finding 3 (restore from snapshot manifest, not current FS state) lands
alongside the rollback-wrapper change as part of the restoreCodexSnapshot
refactor.

* fix(install): reject leading zeros in TOML float integer part per TOML 1.0 (CR finding round 4)

TOML 1.0 §2 disallows leading zeros in the integer part of numeric
literals — `01`, `00`, `01.5`, `00e2`, `+01.0`, `-01.0` are all invalid.
The pre-check and float regexes in parseTomlValue used `\d(?:_?\d)*` which
accepted any digit as the leading digit.

Both regexes are tightened to `(0|[1-9](?:_?\d)*)` for the integer part:
- `0` alone is valid
- a non-zero leading digit followed by optional underscored digits is valid
- `01`, `00`, and any variant with a leading zero and further digits is rejected

The "still rejects bare time (07:32:00)" test assertion is broadened from
`/unsupported TOML value/` to `/unsupported TOML value|trailing bytes/`
because the parser now stops at `0` and the remainder `7:32:00` is rejected
as trailing bytes — the invariant (time literals are not accepted) is unchanged.

25 new regression tests cover all rejection cases and valid TOML forms.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 10:25:59 -04:00
Tom Boucher
397c34142a Deepen SDK package seam and converge runtime skills policy (#3238)
* Deepen SDK package seam and converge runtime skills policy

* fix(sdk): unified install-root resolution for workflows and agents (CR finding 1)

Use the already-resolved gsdInstallDir constant instead of calling
resolveLegacyInstallDir() again when computing agentsDir, ensuring
workflowsDir and agentsDir share the same install root.

* fix(sdk): tilde shortening requires path-boundary match (CR finding 2)

Both renderGlobalSkillsBaseDisplayPath and renderGlobalSkillDisplayPath
used startsWith(home) which could incorrectly shorten unrelated paths
sharing the same prefix. Now checks for home === base or
base.startsWith(home + sep) to ensure a real directory boundary.

* fix(sdk): validate loadConfig export before invocation (CR finding 3)

After requiring core.cjs, check typeof mod.loadConfig === 'function'
before calling it. Throws a classified GSDError with the module path
if the export is missing, rather than a generic TypeError.

* fix(test): guard root lookup before .path dereference (CR finding 4)

Added assert.ok() guards for claudeRoot and codexRoot after the .find()
calls so that a missing root produces an explicit assertion failure
rather than a TypeError on .path dereference.

* fix(ci): fail-safe on transient API errors in approval dismissal (CR finding 6)

resolveRole() returns 'unknown' for non-404 errors (rate limits, 5xx,
network blips). shouldDismissReviewer() now treats 'unknown' as
unresolvable and skips dismissal, preventing legitimate approvals from
being dismissed due to a transient API failure. Only 'none' (true 404)
is treated as a confirmed non-collaborator.

* changeset: pr=3238 SDK package seam and runtime skills convergence

* fix(sdk): harden resolveGlobalSkillDir against path traversal (CR finding 1)

Use resolve+relative to validate that skillName cannot escape the global
skills base directory. Values like "../../foo" or absolute paths now
return null instead of joining directly. All imports (resolve, relative,
isAbsolute) were already present in helpers.ts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(sdk): split skill-dir-resolution and skill-not-found warnings (CR finding 2)

After resolveGlobalSkillDir's hardening can return null for traversal
attempts, the old single-branch warning "Global skill not found at ..."
was misleading. Split into two distinct cases:
- skillDir === null → "Could not resolve global skill directory for ..."
- skillMd missing → "Global skill not found at ..."

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test: lock skill path-traversal rejection in resolveGlobalSkillDir

Regression test verifying that traversal segments (../../foo, ../escape),
empty string, and absolute paths are all rejected (return null), while
a legitimate skill name resolves correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(sdk): align display-path contract + traversal coverage for resolveGlobalSkillMarkdownPath (CR nitpicks)

- renderGlobalSkillsBaseDisplayPath now returns a non-null string for
  unsupported runtimes (e.g. cline → "(cline does not use a skills directory)")
  matching the existing renderGlobalSkillDisplayPath contract; callers
  of both helpers no longer need null-checks for unsupported runtimes.
- Remove now-redundant ! non-null assertion on renderGlobalSkillsBaseDisplayPath
  calls in skill-manifest.ts (return type is string, not string | null).
- Extend the path-traversal test block to assert resolveGlobalSkillMarkdownPath
  also propagates null for ../../foo, ../escape, empty, and /abs/path inputs,
  locking the null-propagation contract against future refactors.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 09:06:43 -04:00
Tom Boucher
f5fe5bc063 fix(config): allow model_overrides.<agent-id> in config-set (#3227) (#3253)
* test: reproduce config-set rejecting model_overrides.<agent-id> (#3227)

* fix(config): allow model_overrides.<agent-id> in config-set (#3227)

* changeset: pr=3253 for #3227
2026-05-08 08:40:53 -04:00
Tom Boucher
b2f0fdf250 fix(sdk): anchor extractFrontmatter at file start (#3240) (#3247)
* test: reproduce extractFrontmatter LAST-block bug (#3240)

* fix(sdk): anchor extractFrontmatter at file start (#3240)

* changeset: pr=3247 for #3240
2026-05-08 08:40:30 -04:00
Tom Boucher
447763411a fix(sdk): phase.add honors --dry-run; rejects unknown flags (#3226) (#3246)
* test: reproduce phase.add dry-run + flag validation gaps (#3226)

Add failing tests for:
- --dry-run silently absorbed into description (symptom A)
- Unknown --flag should return validation error (symptom C)
- ### Phase N: ROADMAP heading scan verification (symptom B)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(sdk): phase.add honors --dry-run; rejects unknown flags (#3226)

- Add flag parser to phaseAdd: strip recognized flags (--dry-run) from
  args before positional parsing so they never silently become description
  or customId values
- --dry-run computes the next phase number and roadmap_entry string but
  skips mkdir, writeFile, and readModifyWriteRoadmapMd; returns
  { dry_run: true, roadmap_entry } alongside normal fields
- Any unrecognized --flag throws a Validation GSDError naming the flag
- ROADMAP ### Phase N: heading scan for numbering (symptom B) was already
  correct; verified with new regression test

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* changeset: pr=3246 for #3226

* fix(sdk): phase.add scans disk AND roadmap (union, not fallback)

Address CodeRabbit finding: the conditional `if (maxPhase === 0)` guard
around the filesystem scan meant that if ROADMAP had any phases but disk
was ahead (e.g. ROADMAP max=10, dirs include 12-*), phase.add would
pick 11 and collide with the existing directory.

Remove the guard: always scan on-disk phase directories and take the
max across both ROADMAP and filesystem (union semantics).

All 57 phase-lifecycle tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test: reproduce phase.add concurrent ID collision (CR finding)

Two concurrent phase.add calls against the same project observe
maxPhase before the lock is held, producing duplicate phase IDs.
Adds a Promise.all regression test that asserts both calls succeed
with distinct phase numbers {11, 12}.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(sdk): compute phase number under roadmap lock (CR finding)

Move maxPhase/newPhaseId/dirName computation inside the
readModifyWriteRoadmapMd callback so the entire read → compute → write
cycle is serialised under the lock. Previously, two concurrent
phase.add calls could both observe maxPhase=N before either acquired
the lock, then both write with phase ID N+1 — producing duplicate IDs.

In dry-run mode (no write, no race) the computation still happens
outside the lock to avoid unnecessary contention.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 08:40:24 -04:00
Tom Boucher
8054959417 fix(query): workstream resolution in init.milestone-op and roadmap.analyze (#3196)
- initMilestoneOp now accepts and propagates the workstream parameter:
  relPlanningPath(workstream) replaces the hardcoded '.planning' dir,
  getMilestoneInfo gets workstream passed, extractCurrentMilestone gets
  workstream passed, archiveDir is derived from planningDir not root.

- resolveQueryRuntimeContext now reads .planning/active-workstream as a
  third-priority fallback after --ws flag and GSD_WORKSTREAM env var,
  completing the documented resolution chain for all query handlers.

Closes #3196

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 22:01:45 -04:00
Tom Boucher
2d32ad82be fix(plan-phase): remove agent: directive that caused OpenCode subagent dispatch (#3156) (#3206)
* feat(roadmap): parse **Mode:** field on phase sections

Adds a 'mode' field to roadmap.get-phase and roadmap.analyze outputs.
Recognizes '**Mode:** mvp' lines in phase sections; lowercased + trimmed.
Forward-compat: unrecognized values preserved verbatim, no enum check.

Foundation for --mvp flag in plan-phase (PRD: vertical-mvp-slice).

* feat(plan-phase): parse --mvp flag and resolve MVP_MODE

Resolution order: CLI flag → ROADMAP **Mode:** field → workflow.mvp_mode
config → false. Walking Skeleton gate fires for new-project Phase 1.
Wires MVP_MODE + WALKING_SKELETON into gsd-planner subagent prompt.

Per PRD vertical-mvp-slice Phase 1 (Q1, Q2, Q4).

* docs(planner): add vertical-slice planning reference

New reference loaded by gsd-planner when MVP_MODE=true. Defines slice
ordering, Walking Skeleton rules, and anti-patterns. Referenced from
plan-phase workflow MVP_MODE wiring.

* docs(planner): add SKELETON.md template

Template emitted by gsd-planner under WALKING_SKELETON=true. Captures
architectural decisions and out-of-scope list for new-project Phase 1.

* chore(inventory): register new planner references

Added planner-mvp-mode.md and skeleton-template.md to INVENTORY.md and
INVENTORY-MANIFEST.json. References now: 53.

* feat(gsd-planner): add MVP Mode Detection section

Mode-switched branch in the existing planner agent (per Q4: single agent).
Vertical-slice decomposition rules, Walking Skeleton handling, and
TDD-mode compatibility. Heavy guidance lives in references/planner-mvp-mode.md.

* test(plan-phase): add --mvp resolution-chain integration cases

Validates roadmap.get-phase --pick mode and confirms workflow.mvp_mode
default is unset in fresh projects.

* docs(changelog): announce --mvp vertical-slice planning (#2826)

* feat(mvp-phase): add /gsd mvp-phase slash command

Standalone command for vertical MVP planning. Frontmatter only;
heavyweight workflow at get-shit-done/workflows/mvp-phase.md follows
in next commit. Mirrors discuss-phase/edit-phase command shape.

* docs(planner): add user-story-template reference

Defines the canonical 'As a / I want to / So that' format and the
ROADMAP.md / PLAN.md emit rules. Used by mvp-phase workflow and
gsd-planner agent under MVP_MODE.

* docs(planner): add SPIDR splitting reference

Defines size signals, the five SPIDR axes (Spike/Paths/Interfaces/Data/Rules),
the interactive workflow, and anti-patterns. Per PRD Q3 decision: full
interactive flow, not lightweight check. Used by mvp-phase workflow.

* fix(mvp-phase): trim description to fit 100-char budget

* feat(mvp-phase): add mvp-phase workflow

Standalone workflow: phase validation -> user story prompts (As a / I want to /
So that) -> SPIDR splitting check -> ROADMAP write (Mode + Goal) -> delegation
to plan-phase. Per PRD Phase 2 (Q3 full SPIDR; Phase-2-A/B/C/D decisions).

Plan-phase auto-detects MVP via Phase 1's resolution chain, so no flags
are needed when delegating.

* feat(gsd-planner): emit user-story header in PLAN.md under MVP mode

Extends the MVP Mode Detection section (added in Phase 1) so the planner
sources the user story from ROADMAP **Goal:** and emits the bolded
**As a** / **I want to** / **so that** form as the first content under
the phase header in PLAN.md. References user-story-template.md.

* test(mvp-phase): integration smoke test for ROADMAP mutation

Validates roadmap.get-phase output after a workflow-spec'd ROADMAP write:
mode=mvp and goal=full user story. Catches schema drift between workflow
emit and parser expectation. Includes a long-story case (>120 chars) to
confirm SPIDR-rejected stories still parse correctly.

* chore(inventory): register mvp-phase command + 2 new references

Adds /gsd mvp-phase to commands list, mvp-phase workflow to workflows list,
and user-story-template.md + spidr-splitting.md to references. References
count: 53 -> 55.

* docs(changelog): announce /gsd mvp-phase command (#2826)

* fix(mvp-phase): add TEXT_MODE plain-text fallback for non-Claude runtimes (#2012)

* docs(executor): add MVP+TDD gate reference

Defines the runtime gate semantics for execute-phase when both
MVP_MODE and TDD_MODE are true: pre-task verification of failing-test
commit, end-of-phase review escalation from advisory to blocking,
behavior-adding task definition. Loaded conditionally by
execute-phase workflow and gsd-executor agent.

* feat(execute-phase): MVP+TDD runtime gate + blocking review

Resolves MVP_MODE in Step 1 (CLI flag -> roadmap mode -> config -> false).
Adds per-task gate that halts before behavior-adding tasks run if no
failing-test commit exists for the plan. Escalates end-of-phase TDD
review from advisory to blocking when both MVP_MODE and TDD_MODE active.

Also updates INVENTORY-MANIFEST.json to register execute-mvp-tdd.md
(added by Task 1) so manifest-sync tests pass.

Per PRD vertical-mvp-slice Phase 3a (decisions Phase-3-A, Phase-3-Split).

* feat(gsd-executor): add MVP+TDD Gate section

Mirrors the planner's MVP Mode Detection pattern from Phase 1.
Instructs halt-and-report when the runtime gate trips, references
execute-mvp-tdd.md for full semantics. No agent changes outside the
new section.

* test(execute-phase): add MVP+TDD resolution-chain integration cases

Validates roadmap.get-phase --pick mode and confirms workflow.mvp_mode
default is unset in fresh projects. Mirrors the Phase 1 plan-phase
resolution-chain integration test.

* chore(inventory): register execute-mvp-tdd reference

Bumps References count 55 -> 56. Registers execute-mvp-tdd.md.
Adds "init" to PROSE_ALLOWLIST in registry integration test so
bare `gsd-sdk query init` prose examples in plan docs don't
trigger the unregistered-handler guard (real commands are all
init.<subcommand>).

* docs(changelog): announce MVP+TDD runtime gate in execute-phase (#2826)

* docs(verifier): add verify-mvp-mode reference

Defines UAT framing under MVP mode: user-flow walk-through first,
technical checks deferred, coverage check as goal-backward narrowing
to the user story's outcome clause. Loaded conditionally by
verify-work workflow and gsd-verifier agent.

* feat(verify-work): MVP-mode UAT framing — user flow first

Resolves MVP_MODE from phase mode field. Under MVP mode, generates UAT
in three ordered sections: user-flow walk-through (derived from user
story), technical checks (deferred), coverage check (goal-backward).
Falls back to standard UAT generation when mode is null/absent.
User-story-format guard refuses to verify a mode:mvp phase with a
non-user-story goal.

Also updates docs/INVENTORY.md (56 references) and
docs/INVENTORY-MANIFEST.json to register verify-mvp-mode.md added
in Task 1.

Per PRD vertical-mvp-slice Phase 3b (decisions Phase-3-B,
Phase-3-Verify-Structure).

* feat(gsd-verifier): add MVP Mode Verification section

Narrows goal-backward verification to the user-story [outcome] clause
when phase mode is mvp. References verify-mvp-mode.md. Preserves
existing goal-backward methodology for non-MVP phases. User-story-format
guard refuses to verify a mode:mvp phase with a non-user-story goal.

* docs(changelog): announce MVP-mode UAT framing in verify-work (#2826)

* feat(new-project): add Vertical MVP vs Horizontal Layers mode prompt

Asks user at project init how to structure the project. Vertical MVP
emits **Mode:** mvp on every initial roadmap phase (per-phase mode
preserved per PRD Q1). Horizontal Layers falls back to standard
template — no behavioral change for existing flows.

Per PRD vertical-mvp-slice Phase 4 (decision Phase-4-Persistence).

* feat(progress): add MVP-mode user-flow display

When phase has **Mode:** mvp, progress renders user-flow status from
PLAN.md task names alongside standard task progress. Tasks that aren't
user-flow-shaped (technical-sounding) are filtered out of the user-flow
sub-block. Falls back to standard display when mode is null/absent.

Per PRD vertical-mvp-slice Phase 4 (decision Phase-4-Progress).

* feat(stats): add MVP phase count summary

Reads roadmap.analyze (which surfaces mode per phase from Phase 1) and
emits 'Phases: N total | M MVP | K standard' summary line. Suppressed
when MVP_COUNT == 0 to avoid clutter on non-MVP projects.

Per PRD vertical-mvp-slice Phase 4.

* feat(graphify): add MVP-mode visual differentiation

MVP-mode phases render with #22c55e fill color AND ' (MVP)' label
suffix — two-channel signaling for color-blind and grayscale renders.
Standard phases unchanged.

Per PRD vertical-mvp-slice Phase 4 (PRD Q5: distinct visual treatment).

* docs(changelog): announce Phase 4 discovery & progress (#2826)

* chore(release): bump dev to 1.50.0-canary.0 for first 1.50.0 canary

Sets the base version that .github/workflows/canary.yml derives the canary
tag from (strips suffix → base 1.50.0 → next available v1.50.0-canary.N).

This kicks off the 1.50.0 release train, opened by the MVP/TDD/UAT vertical
slice landed across PRs #2867, #2874, #2878, #2880, #2883.

* docs: add CANARY stream README + v1.50.0-canary.1 release notes

- docs/CANARY.md — explains the dev→@canary stream policy, install/rollback
  paths, and when (not) to install canary builds
- docs/RELEASE-v1.50.0-canary.1.md — release notes for the first 1.50.0
  canary cut: vertical MVP/TDD/UAT slice (#2867 + #2874 + #2878 + #2880 +
  #2883), opening the 1.50.0 train under PRD #2826
- docs/README.md — index entry + quick link for the canary stream

* fix(ci/canary): publish gate checks dev branch, not main

Four publish-step `if:` conditions in .github/workflows/canary.yml were
checking `github.ref == 'refs/heads/main'`. Those steps (Tag and push,
Publish to npm, Publish SDK to npm, Verify publish) therefore always
skipped on every workflow_dispatch invocation since canary runs from dev,
never main.

The workflow's own header comment is unambiguous: `dev → @canary`. The
gate was a copy-paste from release.yml (which correctly targets main for
the @next/@latest streams) that was never corrected for the canary stream.

This is why the 1.50.0-canary.1 publish hadn't materialized despite three
green workflow runs. With the gate corrected, the next dispatch will
actually publish.

* ci(release-sdk): make release-sdk.yml dispatchable from the dev branch

The workflow lives on main only, so the GitHub Actions "Use workflow
from" dropdown doesn't list dev — meaning dev → @dev publishes can't be
triggered from the dev branch directly. Add the file to dev so an
operator can dispatch it with branch=dev and tag=dev.

Per project release-stream policy: dev branch publishes canary (@dev).
This is the stream that needs the file most, since main never publishes
@dev itself (main does @next / @latest).

File is byte-identical to main's release-sdk.yml — straight propagation,
no behavioral change. Tracking issues #2925, #2929.

* docs(mvp): canary-prep concept cleanup — CONTEXT.md, mvp-concepts index, --prd interaction (#3176)

* chore(mvp): concept cleanup + cross-ref index for v1.50.0-canary.2 prep

- CONTEXT.md gains 7 MVP domain terms (MVP Mode, User Story, Walking
  Skeleton, Vertical Slice, Behavior-Adding Task, MVP+TDD Gate, SPIDR
  Splitting) so the project glossary matches the shipped surface.
- New get-shit-done/references/mvp-concepts.md indexes the six MVP
  reference files and concept-to-file map so agents and contributors
  can find the right canonical doc without grepping.
- plan-phase.md Walking Skeleton block now documents that --mvp and
  --prd compose orthogonally on Phase 1; no precedence needed.
- INVENTORY/INVENTORY-MANIFEST refreshed for the new reference (58 -> 59).

No behavior change. Canary-prep cleanup ahead of v1.50.0-canary.2.

Surfaced for follow-up (not in this PR):
- MVP_MODE resolution shell block duplicated across plan-phase,
  execute-phase, verify-work workflows (needs a shared workflow-include
  mechanism; structural change).
- Behavior-Adding Task predicate is prose-only; no shared utility.
- User Story regex hardcoded in verify-work; would benefit from a
  central definition consumed by the verifier and the mvp-phase command.

* chore(changeset): set PR number for mvp concept cleanup

* feat(mvp): centralize resolution surfaces + fix SDK roadmap mode parity (#3178)

Three new SDK query verbs replace the architectural duplication surfaced by
the v1.50.0-canary.2 review against dev tip 12c4e565:

  phase.mvp-mode <N> [--cli-flag]
    Single canonical precedence resolver (CLI flag -> ROADMAP **Mode:** mvp
    -> workflow.mvp_mode config -> false). Replaces 4-8 lines of bash that
    were duplicated across plan-phase.md, execute-phase.md, verify-work.md,
    and progress.md. Returns {active, source, roadmap_mode, config_mvp_mode,
    cli_flag_present}.

  task.is-behavior-adding <plan-file> | --task-content <xml>
    Behavior-Adding Task predicate (tdd="true" + <behavior> block + non-test
    source files in <files>). Replaces prose-only specification in
    references/execute-mvp-tdd.md; gsd-executor agent now invokes the verb
    instead of re-inlining the three checks. Returns {is_behavior_adding,
    checks, reason}.

  user-story.validate <text> | --story <text>
    Owns the canonical User Story regex /^As a .+, I want to .+, so that .+\.$/
    previously hardcoded in verify-work.md prose. Consumed by gsd-verifier
    (phase-goal guard) and /gsd-mvp-phase (interactive-prompt validation).
    Returns {valid, slots: {role, capability, outcome}, errors[]}.

Bug fix bundled: sdk/src/query/roadmap.ts searchPhaseInContent now extracts
the mode field from **Mode:**, restoring parity with roadmap.cjs:120-123.
Without this, roadmap.get-phase --pick mode returned null on the native
dispatch path even when the phase had **Mode:** mvp set, causing MVP_MODE
to silently fall through to the config/false branch in every consuming
workflow. The original PRs Phase 1 (#2885) shipped the CJS parser but the
SDK port omitted the field; this fix brings them back to parity.

Workflows + agents updated to call the verbs:
  - plan-phase.md, execute-phase.md, verify-work.md, progress.md call
    phase.mvp-mode (one line replaces the duplicated bash chains).
  - execute-phase.md MVP+TDD gate calls task.is-behavior-adding.
  - verify-work.md goal guard calls user-story.validate.
  - mvp-phase.md interactive prompt validates via user-story.validate.
  - gsd-executor agent references task.is-behavior-adding instead of prose.
  - gsd-verifier agent references user-story.validate instead of inlined regex.

Tests: 24 new vitest tests in sdk/src/query/mvp.test.ts cover all three
verbs + the regression. Two existing contract tests (progress, verify)
updated to assert on the new verb shape. All 60 existing MVP contract
tests pass; golden integration suite (38 + 42 tests) passes.

Closes #3177

* fix(canary.2): unblock release gates for v1.50.0-canary.2

Run 25451329660 (Release SDK Bundle on dev, 2026-05-06T17:41) failed at the
test-suite step with 3 deterministic content/structure gate failures, all
attributable to the MVP umbrella integration in #3178 and the docs sweep
in #3180.

Failure 1: /gsd-mvp-phase undocumented in workflows/help.md
  - tests/bug-2954-help-md-slash-command-stubs.test.cjs requires every
    shipped commands/gsd/<X>.md to have a /gsd-<X> mention in help.md
  - PR #3180 updated docs/COMMANDS.md but missed help.md (which the AI
    agents load in-product)
  - Fix: add a /gsd-mvp-phase entry to help.md right before /gsd-plan-phase

Failures 2 + 3: execute-phase.md (1727) and plan-phase.md (1714) over XL budget (1700)
  - PR #3178 added MVP-mode verb calls (phase.mvp-mode, task.is-behavior-adding,
    user-story.validate) to both workflow files, pushing them past 1700 lines
  - Fix: bump XL_BUDGET 1700 -> 1800 with inline comment pointing at the
    structural follow-up (extract MVP bodies to <workflow>/modes/mvp.md per
    the discuss-phase/modes/ precedent)
  - The structural extract is the right long-term fix but is bigger than
    canary unblock scope; will land in a follow-up after canary cycles

Local verification:
  $ node --test tests/bug-2954-help-md-slash-command-stubs.test.cjs                 tests/workflow-size-budget.test.cjs
  tests 111  pass 111  fail 0

After this lands, re-trigger Release SDK Bundle on dev for v1.50.0-canary.2.

* chore(changeset): set PR number for canary.2 unblock

* fix(codex): generate-claude-md writes to AGENTS.md on Codex runtime

When config.runtime === 'codex' or GSD_RUNTIME=codex, override the
output target to AGENTS.md regardless of claude_md_path, so Codex
projects no longer have GSD sections written to CLAUDE.md by mistake.

Fixes both the CJS (gsd-tools) and SDK (profile-output.ts) paths.
Explicit --output flags are still honoured in both paths.

Closes #3163

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(plan-phase): remove agent: directive that caused OpenCode subagent dispatch

On OpenCode, any command with `agent: <name>` in its frontmatter is
auto-dispatched to a subagent context where the Agent tool is unavailable.
plan-phase.md and mvp-phase.md both carried `agent: gsd-planner`, causing
them to run inside gsd-planner's subagent context with no ability to spawn
researcher/planner/checker subagents — the orchestrator fell back to inline
execution for all three phases.

Fix: remove `agent: gsd-planner` from both command files so they run in the
main agent context. Also replace the stale `Task` tool in allowed-tools with
`Agent` (the correct dispatcher tool name post-#3168 rename).

Adds a structural regression test that parses YAML frontmatter of every
commands/gsd/*.md file and asserts no command carries an `agent:` directive.

Closes #3156

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(mvp): address CodeRabbit workflow and contract findings

* fix(execute-phase): use registered state.update query command

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 21:51:38 -04:00
Tom Boucher
4237b0d78e Merge pull request #3106 from nicholasferrer/fix/3061-commit-pathspec-leak
fix(commit): scope every commit call to its staged pathspec
2026-05-06 20:55:30 -04:00
Tom Boucher
d44fcee013 Merge pull request #3110 from patrickclery/fix/3100-search-dirs-colon-leaks
fix: replace stale /gsd: references in agents/, sdk/src/, and .clinerules
2026-05-06 20:52:43 -04:00
Tom Boucher
995e24431b Merge pull request #3193 from gsd-build/fix/2641-details-summary-milestone-anchor
fix(sdk): extractCurrentMilestone supports <details><summary> milestone headers (#2641)
2026-05-06 20:40:13 -04:00
Tom Boucher
ca148036d2 fix(roadmap): prevent milestone version substring matches 2026-05-06 20:37:18 -04:00
Ben Lamm
13bf56477a fix(#2641): symmetric attribute tolerance in stripShippedMilestones + lockdown tests
Address CodeRabbit follow-up review on PR #3046. One real bug + two lockdown
gaps + one defensive assertion.

REAL BUG — sibling-asymmetry in <details> attribute tolerance:
  extractCurrentMilestone's <details>-aware fallback uses <details\b[^>]*>
  to tolerate attributes (#2641 hardening commit). stripShippedMilestones
  still used literal <details>, so shipped content wrapped in
  `<details open>` (or any attributed tag) leaked through the strip.
  This is the failure mode trek-e's review almost caught with the
  "<details open>" / extended-attribute test gap I deferred — CodeRabbit
  caught the deeper issue: it's not just a test gap, it's an actual
  asymmetry between the two functions that handle <details> blocks.

  Fix: align stripShippedMilestones's regex with extractCurrentMilestone's
  <details\b[^>]*> form. Comment explicitly notes the symmetry contract so
  a future change to either function flags the other.

  Tests added in stripShippedMilestones describe block:
  - removes <details open> blocks
  - removes <details class="..." data-..."> blocks

LOCKDOWN — leading-# strip in synthesized heading:
  My existing inline-HTML test exercised tag-stripping but didn't directly
  exercise the leading-# strip path (`.replace(/^#+\s*/, '')`). Added a
  dedicated test with `<summary># v0.9 Hash-Prefixed</summary>` so a
  future refactor that drops the strip would fail loudly instead of
  producing `## # v0.9 …` (which downstream `#{2,4}` regex parses as a
  4-hash header).

DEFENSIVE — toBeDefined guard in roadmapAnalyze regression test:
  Added `expect(data.milestones).toBeDefined()` before casting and
  calling `.some()`. Failure now reports "expected undefined to be
  defined" instead of TypeError.

META: my prior adversarial pass missed the sibling-asymmetry because
the checklist's "sibling consistency" item only audited PARSERS for the
same INPUT field (STATE.md's `milestone:`), not ADJACENT FUNCTIONS that
process the same DATA SHAPE (<details> blocks). The latter is a wider
audit — every adjacent function that touches the data shape my new code
relies on. Will refine the learned rule.

Verification: 51/51 roadmap.test.ts pass (was 48; +3 tests).
FAMP smoke unchanged: roadmap.get-phase 3 returns active milestone phase.
2026-05-06 15:41:27 -04:00
Ben Lamm
19041b8824 test(#2641): lockdown tests from self-adversarial pass
Self-run adversarial pass on PR #3046 before next reviewer round-trip.
Three lockdown tests added — none uncovered new bugs, all lock current
behavior so a future change doesn't silently flip a convention.

1. Single-quote YAML version (`milestone: 'v0.9'`)
   Parity with the existing double-quote test. The strip pattern
   `/^["']|["']$/g` handles both — locked here so a future change to
   either character class doesn't silently regress one form.

2. Heading-anchor wins over <details> fallback (precedence lock)
   When a ROADMAP has BOTH `### v0.9` heading AND
   `<details><summary>v0.9</summary>` block, the heading-level lookup
   matches first and the fallback never fires. Test asserts the heading
   slice is returned starting at offset 0 AND the synthesized
   `## v0.9 ... details-anchored` heading is NOT prepended (proves
   fallback didn't run). Also documented in-test that the heading-anchor
   slice naturally includes downstream <details> blocks verbatim — a
   property of the heading path, not of this PR's fallback.

3. Multiple <details> blocks for same version → first match wins
   `content.match(detailsPattern)` (non-`g`) returns first match in
   document order. Locked so a future change to the matcher (e.g.
   switching to `matchAll` and picking last) doesn't silently change
   which block is treated as active.

Adversarial-checklist coverage on commit 781cc6f8:
- Boundary cases: empty / whitespace / single-char / single-quote /
  double-quote / digit-suffix (`v0.91`) / dot-suffix (`v0.9.0`) /
  hyphen-suffix (`v0.9-rc.1`, intentional same-milestone match per
  existing currentVersionStr convention) — all covered.
- Sibling consistency: parseMilestoneFromState, getMilestoneInfo,
  extractCurrentMilestone all strip quotes identically.
- Comment-vs-behavior: walked nested-guard, empty-guard, lookahead,
  tag-strip by hand against the regex; all comments accurate.
- Downstream consumers: roadmapAnalyze + roadmapGetPhase both verified
  end-to-end via tests + FAMP smoke.
- Failure-mode locality: all fall-through paths produce loud failures
  (empty arrays, `{found:false}`); no silent confident-wrong outputs.

48/48 roadmap.test.ts tests pass.
2026-05-06 15:41:27 -04:00
Ben Lamm
4b66ca5800 fix(#2641): harden <details> fallback per trek-e review
Address trek-e's adversarial review on PR #3046. Two critical merge-blockers
plus four hardening items, all now covered with tests.

CRITICAL #1 — substring-version trap:
  `[^<]*${escapedVersion}[^<]*` did substring containment, so
  `milestone: v0.1` matched <summary>v0.10 …</summary> and returned the
  v0.10 block's body as the active milestone — confidently-wrong content
  worse than the pre-PR fall-through. Add `(?![\d.])` non-version-character
  lookahead, mirroring the same boundary protection used by the existing
  `currentVersionStr` logic on the heading path. Test asserts v0.1 active
  with v0.10 sibling block returns v0.1's phases, not v0.10's.

CRITICAL #2 — nested <details> silent truncation:
  The lazy `[\s\S]*?</details>` terminates on the FIRST </details>, which
  is the inner closer when nesting is present. Prior comment claimed
  "would mis-anchor (acceptable; falls through)" — factually wrong: the
  match succeeds with truncated body and is returned with a confident
  `## ${summary}` heading. Future maintainer investigating a "missing
  phase" report would be misled. Add `!detailsMatch[2].includes('<details')`
  guard so nesting falls through to stripShippedMilestones (loud failure)
  instead of returning truncated content (silent failure). Test locks
  the contract: no synthesized v0.9 heading anchored to truncated body.

HARDENING:
  - Empty-body guard: `<details><summary>v0.9</summary></details>` would
    synthesize `## v0.9\n` (phantom milestone, zero phases, no error
    signal). Treat as no-match.
  - Inline-HTML in <summary>: rejected by `[^<]*` capture. Widen to
    `(?:(?!</summary>).)*?` (non-greedy until close tag) and strip tags
    + leading `#` from the captured summary before promoting to a `##`
    heading. Covers GitHub-rendered <em>(active)</em>, <code>v0.9</code>,
    <strong>...</strong> patterns.
  - JSDoc: rewrote to describe both anchoring strategies and the
    synthesized-heading contract; demoted stale "Port of core.cjs lines
    1102-1170" to historical context with the divergence list.
  - Comment block: rewrote in contract style ("any consumer scanning
    /##\s*.*vX.Y/ sees the active milestone") instead of coupling to
    specific call sites (roadmapAnalyze, "later in this file"). Adds
    explicit regex anatomy + hardening-guards section so future readers
    can audit each guard.

OUT OF SCOPE (per trek-e's "Recommended action" tier):
  - Debug logging on fall-through paths (Suggestion #10) — adds tracing
    surface to a function that doesn't currently use logger; appropriate
    for a follow-up if/when other extraction bugs surface.
  - Uppercase <DETAILS>/<SUMMARY> + extended attribute coverage
    (Test gap #7 last two rows) — already covered by the documented `i`
    flag and the existing <details open> test; adding redundant cases
    inflates the test set without locking new contracts.

Verification: 45/45 roadmap.test.ts tests pass (was 41/41; added 4
hardening tests). FAMP end-to-end smoke unchanged: roadmap.get-phase 3
returns "Claude Code integration polish", roadmap.analyze surfaces
v0.9 Local-First Bus in data.milestones with phase_count: 4.
2026-05-06 15:41:27 -04:00
Ben Lamm
c8239f67f8 fix(#2641): inject normalized ## heading from <details><summary> capture
Address CodeRabbit review on PR #3046: the prior commit returned only the
body inside <details>...</details>, which fixed the `roadmapGetPhase`
miss but left `roadmapAnalyze`'s downstream `data.milestones` scan
(`/##\s*(.*v(\d+(?:\.\d+)+)[^(\n]*)/gi` at the bottom of roadmap.ts)
without an active-milestone anchor in the returned slice.

Now capture the <summary> text and prepend it as a synthesized `##`
heading on the returned slice. This makes both `data.phases` (the
original bug) AND `data.milestones` (the downstream consumer) surface
the active milestone correctly for <details>-wrapped ROADMAPs.

Also widened the inner tag to `<summary\b[^>]*>` for symmetry with the
outer `<details\b[^>]*>` — both now tolerate attributes.

Verified end-to-end against FAMP's v0.9 ROADMAP:
- Before this commit (after PR #3046 base):
    milestones: [{heading: '# Phase 1: ... (v0.5.2 atomic bump)', version: 'v0.5.2'}]
- After this commit:
    milestones: [{heading: 'v0.9 Local-First Bus', version: 'v0.9'},
                 {heading: '# Phase 1: ... (v0.5.2 ...)', version: 'v0.5.2'}]

(The v0.5.2 entry is pre-existing noise from the loose `##\s*` regex
matching the `### Phase 1: famp-bus (v0.5.2 atomic bump)` body heading;
unrelated to this fix and out of scope for this PR.)

Tests:
- Updated the two `<details><summary>` tests to assert the synthesized
  `## v0.9 Local-First Bus` heading is present on the returned slice.
- Added a 4th regression test (`roadmapAnalyze`) confirming
  `data.milestones` now contains the active milestone for
  <details>-wrapped ROADMAPs.
- All 40 roadmap.test.ts tests pass.
2026-05-06 15:41:27 -04:00
Ben Lamm
ba6a3efc3e fix(#2641): strip YAML quotes from STATE.md milestone version
Address CodeRabbit review on PR #3046: extractCurrentMilestone read the
`milestone:` value from STATE.md frontmatter via `.trim()` only, while
parseMilestoneFromState() and getMilestoneInfo() both also strip
surrounding YAML quotes via `.replace(/^["']|["']$/g, '')`.

For projects whose STATE.md uses quoted YAML (`milestone: "v0.9"`),
`version` carried literal quotes, `escapedVersion` became `\"v0\.9\"`,
and neither the markdown-heading regex nor the new <details><summary>
fallback could match anything — falling through to
stripShippedMilestones() and reintroducing the same archived-milestone
misrouting this PR addresses.

Strip quotes for parity. Three-line addition + one new test.
All 41 roadmap.test.ts tests pass.
2026-05-06 15:41:27 -04:00
Ben Lamm
592b676414 fix(#2641): recognize <details><summary> as active-milestone anchor
`extractCurrentMilestone` only matched markdown headings (## v0.9, ### v0.9)
to find the active milestone slice. Projects that wrap their active
milestone's phase details inside `<details><summary>vX.Y …</summary>`
(a common GitHub-friendly collapse pattern, e.g. FAMP) fell through to
`stripShippedMilestones`, which strips ALL `<details>` blocks indiscriminately.

Net effect: `roadmapGetPhase` returned `{found:false}` for phases that ARE in
the active ROADMAP. The `init.phase-op` safety guard at `init.ts:133`
('drop archived disk match when phase is in current ROADMAP') depends on
`roadmapPhase.found`, so it didn't fire. `init.phase-op` then returned a
`phase_dir` pointing at an ARCHIVED milestone's same-numbered phase —
silently routing downstream workflows (e.g. /gsd-discuss-phase) into
completed phases.

Fix: when no markdown heading matches the active version, try matching
`<details\b[^>]*><summary>...vX.Y...</summary>`. Returns the inner content
of the matching block. Purely additive — `stripShippedMilestones` behavior
and its tests are unchanged.

The `\b[^>]*>` form tolerates attributes like `<details open>` or
`<details class="...">` (GitHub commonly emits `<details open>` for
default-expanded sections). Lazy `[\s\S]*?` matches up to the first
`</details>`; nested `<details>` inside the active milestone are not
expected and would mis-anchor (acceptable; falls through to the existing
`stripShippedMilestones` path with no regression vs. today's behavior).

Closes #2641. Distinct from the closed #2642 which bundled three orthogonal
changes (parser fix + checkbox-scan fix + STATE.md counting auth) into one
PR; this PR addresses only the parser anchoring bug, leaving
`stripShippedMilestones`, `roadmapAnalyze`, and `initMilestoneOp` untouched.

Tests added (3, all in `roadmap.test.ts`):
- `bug-2641: finds active milestone wrapped in <details><summary>vX.Y …</summary>`
- `bug-2641: finds active milestone in <details open><summary>vX.Y …</summary>`
- `bug-2641: returns found:true for phase inside <details>-wrapped active milestone` (end-to-end via `roadmapGetPhase`)

All existing `roadmap.test.ts` tests pass (39/39). Real-world repro
verified against an FAMP-style ROADMAP: before the fix,
`gsd-sdk query roadmap.get-phase 3` returned `{found:false}` despite the
phase being at line 113 of the active ROADMAP; after the fix, it returns
the correct phase metadata, and `init.phase-op 3` no longer returns the
v0.8 archived `phase_dir`.
2026-05-06 15:41:27 -04:00
Tom Boucher
a1a81eec90 fix(config): align SDK runtime-state key validation with CJS 2026-05-06 15:19:34 -04:00
Tom Boucher
96ce608ee6 fix(config): add resolve_model_ids to VALID_CONFIG_KEYS; accept workflow._auto_chain_active via RUNTIME_STATE_KEYS
Fixes #3162

`resolve_model_ids` is a documented top-level config key (CONFIGURATION.md)
read by core.cjs and session-runner.ts, but was missing from the CJS and SDK
VALID_CONFIG_KEYS allowlists — causing config-set to reject it with
"Unknown config key".

`workflow._auto_chain_active` is internal runtime state intentionally excluded
from VALID_CONFIG_KEYS by #2530, but plan-phase, execute-phase, discuss-phase,
transition, and new-project workflows all write it via `config-set`. Without
a valid write path these calls emit spurious errors (silenced with `|| true`
but noisy in logs). A new RUNTIME_STATE_KEYS set in config-schema.cjs holds
keys that isValidConfigKey() accepts without exposing them as user-settable
options — preserving the #2530 intent while fixing the runtime error.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 14:56:34 -04:00
Tom Boucher
3785c09307 fix(sdk): include hotpath fallback reason in bridge observability 2026-05-05 20:29:43 -04:00
Tom Boucher
fe16143e29 fix(sdk): align hotpath observability with actual dispatch mode 2026-05-05 20:22:03 -04:00
Tom Boucher
8ad2e3877f fix(sdk): address CodeRabbit runtime bridge and docs findings 2026-05-05 19:59:56 -04:00
Tom Boucher
51b809e8e9 feat(sdk): expose runtime bridge controls via GSD options 2026-05-05 19:36:47 -04:00
Tom Boucher
00ba404b60 test(sdk): enforce runtime bridge seam and explicit no-fallback behavior 2026-05-05 19:31:03 -04:00
Tom Boucher
54b06e653e docs(sdk): document runtime bridge seam, strict mode, and fallback policy 2026-05-05 19:29:59 -04:00
Tom Boucher
1bd11ab699 feat(sdk): emit runtime bridge dispatch observability events 2026-05-05 19:26:01 -04:00
Tom Boucher
0026065c7a feat(sdk): add strict mode and explicit fallback policy to runtime bridge 2026-05-05 19:23:39 -04:00
Tom Boucher
98dd9e4afb refactor(sdk): add runtime bridge seam for query dispatch 2026-05-05 19:21:31 -04:00
Tom Boucher
9811782e6d fix(#3121): implement commands verb in SDK native registry (#3146)
- Add commandsList handler — returns sorted JSON array of all registered
  verb strings; satisfies workstream-flag.md + agent tooling discoverability
- Register ['commands', commandsList] in DECISION_ROUTING_STATIC_CATALOG
- Add golden-policy exemption (SDK-only, no CJS mirror needed)
- check.decision-coverage-plan/verify were already registered; commands was the remaining gap

Closes #3121
2026-05-05 15:02:34 -04:00
Nicholas Ferrer
85ef9553d2 fix(commit): scope every commit call to its staged pathspec
The commit handler ran `git add <paths>` followed by `git commit`
without a pathspec, so anything pre-staged externally before the
handler ran was swept into the commit. #2767 fixed every call site
to use --files but left the handler emitting a pathspec-less commit,
so the bug survived the well-formed form too.

Compute pathsToCommit once and pass `'--', ...pathsToCommit` to every
git commit invocation: regular, --amend, and commit-to-subrepo. The
staged-files check uses the same pathspec so "nothing staged" reflects
what would actually be committed, not unrelated index entries.

Two follow-up safeguards on the same surface:

* When `--files` is passed but every following token gets filtered out
  (e.g. `--files --no-verify`), reject with `--files requires at least
  one path` instead of silently falling back to .planning/.
* Both `git add` invocations now use the `--` separator so a path
  starting with `-` (e.g. a file literally named `-A.md`) is treated
  as a pathspec rather than a git option.

Adds five regression tests in `commit.test.ts`: three covering the
pathspec scope (`--files`, `.planning/` fallback, and `--amend` with
pre-staged unrelated changes), one covering the empty `--files`
rejection, and one covering the `-A.md` round-trip.

Closes #3061
2026-05-05 15:30:05 -03:00
Patrick Clery
f9c1f01971 fix: extend fix-slash-commands SEARCH_DIRS to agents/, sdk/src/, .clinerules
scripts/fix-slash-commands.cjs SEARCH_DIRS did not cover agents/, sdk/src/,
or top-level files, so 9 colon-form references survived in 6 files. The hit
at agents/gsd-codebase-mapper.md:105 propagated into ~/.claude/agents/ at
install time (the fixer is not wired into install) and produced unrunnable
/gsd:<cmd> suggestions in agent output on non-Gemini runtimes.

This commit includes Pass 1 (the 9 line edits) AND Pass 2 (extending the
fixer's SEARCH_DIRS so future regressions are auto-rewritten and caught by
the bug-2543 guard, which mirrors that list). The standalone bug-3100 test
added in the prior revision is removed in favor of the bug-2543 guard's
extended scan, per CONTRIBUTING.md test standards (no source-grep tests on
non-.md files).

Refs #3100
2026-05-05 13:19:10 -04:00
Tom Boucher
aa64638176 Merge pull request #3112 from gsd-build/fix/3101-plan-summary-matcher-in-core-cjs-reports
fix: canonicalize plan-summary matching for suffixless summaries
2026-05-04 23:35:34 -04:00
Tom Boucher
e7ecd46bbe Merge pull request #3115 from gsd-build/fix/3053-sdk-ignores-multi-plan-phase-layout-plan
fix: count nested plans/ layout in phase status indexing
2026-05-04 23:35:26 -04:00
Tom Boucher
083e813aea Merge pull request #3116 from gsd-build/fix/3055-bug-top-level-branching-strategy-in-plan
fix: normalize legacy top-level branching_strategy into git config
2026-05-04 23:33:28 -04:00
Tom Boucher
f01f6b76dd Merge pull request #3122 from gsd-build/fix/3088-gsd-complete-milestone-leaves-state-md-n
fix: normalize stale STATE narrative tails on milestone completion
2026-05-04 23:31:06 -04:00
Tom Boucher
67684626d8 fix(3088): append missing STATE narrative sections on milestone close 2026-05-04 23:29:45 -04:00
coderabbitai[bot]
2d25c97706 fix: apply CodeRabbit auto-fixes
Fixed 1 file(s) based on 2 unresolved review comments.

Co-authored-by: CodeRabbit <noreply@coderabbit.ai>
2026-05-05 03:17:22 +00:00
Tom Boucher
2dcf374da0 fix(milestone): normalize STATE narrative after milestone completion 2026-05-04 23:17:00 -04:00
Tom Boucher
58062a64a0 fix(sdk-config): honor legacy top-level branching_strategy in init 2026-05-04 23:06:54 -04:00
Tom Boucher
65024683fd fix(init): count plans/ summaries from nested plans/ layout 2026-05-04 23:03:10 -04:00
Tom Boucher
c7886415c3 fix(phase): canonicalize plan-summary matching for suffixless summaries 2026-05-04 22:51:15 -04:00
Tom Boucher
40acf1f02e fix: address CodeRabbit findings on query/transport error handling 2026-05-04 21:49:41 -04:00
Tom Boucher
38718e9d4b fix: avoid unsafe Promise cast in execRaw 2026-05-04 21:49:40 -04:00
Tom Boucher
0500bdf619 refactor: deepen query architecture seams with compatibility shims 2026-05-04 21:49:40 -04:00