Compare commits

...

24 Commits

Author SHA1 Message Date
Lex Christopherson
7bb6b6452a fix: spike workflow defaults to interactive UI demos, not stdout
Flips the bias in step 8b: build a simple HTML page/web UI by default,
fall back to stdout only for pure fact-checking (binary yes/no, benchmarks).
Mirrors upstream spike-idea skill constraint #3 update.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 09:19:04 -06:00
Lex Christopherson
43ea92578b Merge remote-tracking branch 'origin/main' into hotfix/1.38.2
# Conflicts:
#	CHANGELOG.md
#	bin/install.js
#	sdk/src/query/init.ts
2026-04-21 09:16:24 -06:00
Lex Christopherson
a42d5db742 1.38.2 2026-04-21 09:14:52 -06:00
Lex Christopherson
c86ca1b3eb fix: sync spike/sketch workflows with upstream skill v2 improvements
Spike workflow:
- Add frontier mode (no-arg or "frontier" proposes integration + frontier spikes)
- Add depth-over-speed principle — follow surprising findings, test edge cases,
  document investigation trail not just verdict
- Add CONVENTIONS.md awareness — follow established patterns, update after session
- Add Requirements section in MANIFEST — track design decisions as they emerge
- Add re-ground step before each spike to prevent drift in long sessions
- Add Investigation Trail section to README template
- Restructured prior context loading with priority ordering
- Research step now runs per-spike with briefing and approach comparison table

Sketch workflow:
- Add frontier mode (no-arg or "frontier" proposes consistency + frontier sketches)
- Add spike context loading — ground mockups in real data shapes, requirements,
  and conventions from spike findings

Spike wrap-up workflow:
- Add CONVENTIONS.md generation step (recurring stack/structure/pattern choices)
- Reference files now use implementation blueprint format (Requirements, How to
  Build It, What to Avoid, Constraints)
- SKILL.md now includes requirements section from MANIFEST
- Next-steps route to /gsd-spike frontier mode instead of inline analysis

Sketch wrap-up workflow:
- Next-steps route to /gsd-sketch frontier mode

Commands updated with frontier mode in descriptions and argument hints.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 09:14:32 -06:00
github-actions[bot]
337e052aa9 chore: bump version to 1.38.2 for hotfix 2026-04-21 15:13:56 +00:00
Lex Christopherson
969ee38ee5 fix: sync spike/sketch workflows with upstream skill v2 improvements
Spike workflow:
- Add frontier mode (no-arg or "frontier" proposes integration + frontier spikes)
- Add depth-over-speed principle — follow surprising findings, test edge cases,
  document investigation trail not just verdict
- Add CONVENTIONS.md awareness — follow established patterns, update after session
- Add Requirements section in MANIFEST — track design decisions as they emerge
- Add re-ground step before each spike to prevent drift in long sessions
- Add Investigation Trail section to README template
- Restructured prior context loading with priority ordering
- Research step now runs per-spike with briefing and approach comparison table

Sketch workflow:
- Add frontier mode (no-arg or "frontier" proposes consistency + frontier sketches)
- Add spike context loading — ground mockups in real data shapes, requirements,
  and conventions from spike findings

Spike wrap-up workflow:
- Add CONVENTIONS.md generation step (recurring stack/structure/pattern choices)
- Reference files now use implementation blueprint format (Requirements, How to
  Build It, What to Avoid, Constraints)
- SKILL.md now includes requirements section from MANIFEST
- Next-steps route to /gsd-spike frontier mode instead of inline analysis

Sketch wrap-up workflow:
- Next-steps route to /gsd-sketch frontier mode

Commands updated with frontier mode in descriptions and argument hints.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 09:05:47 -06:00
Tom Boucher
2980f0ec48 fix(sdk): stripShippedMilestones handles inline SHIPPED headings; getMilestoneInfo prefers STATE.md (#2508)
* fix(sdk): stripShippedMilestones handles inline SHIPPED headings; getMilestoneInfo prefers STATE.md

Fixes two compounding bugs:

- #2496: stripShippedMilestones only stripped <details> blocks, ignoring
  '## Heading —  SHIPPED ...' inline markers. Shipped milestone sections
  were leaking into downstream parsers.

- #2495: getMilestoneInfo checked STATE.md frontmatter only as a last-resort
  fallback, so it returned the first heading match (often a leaked shipped
  milestone) rather than the current milestone. Moved STATE.md check to
  priority 1, consistent with extractCurrentMilestone.

Closes #2495
Closes #2496

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(roadmap): handle ### SHIPPED headings and STATE.md version-only case

Two follow-up fixes from CodeRabbit review of #2508:

1. stripShippedMilestones only split on ## boundaries; ### headings marked
    SHIPPED were not stripped, leaking into fallback parsers. Expanded
   the split/filter regex to #{2,3} to align with extractCurrentMilestone.

2. getMilestoneInfo's early-return on parseMilestoneFromState discarded the
   real milestone name from ROADMAP.md when STATE.md had only `milestone:`
   (no `milestone_name:`), returning the placeholder name 'milestone'.
   Now only short-circuits when STATE.md provides a real name; otherwise
   falls through to ROADMAP for the name while using stateVersion to
   override the version in every ROADMAP-derived return path.

Tests: +2 new cases (### SHIPPED heading, version-only STATE.md).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 10:41:35 -04:00
Tom Boucher
8789211038 fix(insert-phase): update STATE.md next-phase recommendation after phase insertion (#2509)
* fix(insert-phase): update STATE.md next-phase recommendation after inserting a phase

Closes #2502

* fix(insert-phase): update all STATE.md pointers; tighten test scope

Two follow-up fixes from CodeRabbit review of #2509:

1. The update_project_state instruction only said to find "the line" for
   the next-phase recommendation. STATE.md can have multiple pointers
   (structured current_phase: field AND prose recommendation text).
   Updated wording to explicitly require updating all of them in the same
   edit.

2. The regression test for the next-phase pointer update scanned the
   entire file, so a match anywhere would pass even if update_project_state
   itself was missing the instruction. Scoped the assertion to only the
   content inside <step name="update_project_state"> to prevent false
   positives.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 10:10:45 -04:00
Tom Boucher
57bbfe652b fix: exclude non-wiped dirs from custom-file scan; warn on non-Claude model profiles (#2511)
* fix(detect-custom-files): exclude skills and command dirs not wiped by installer (closes #2505)

GSD_MANAGED_DIRS included 'skills' and 'command' directories, but the
installer never wipes those paths. Users with third-party skills installed
(40+ files, none in GSD's manifest) had every skill flagged as a "custom
file" requiring backup, producing noisy false-positive reports on every
/gsd-update run.

Removes 'skills' and 'command' from both gsd-tools.cjs and the SDK's
detect-custom-files.ts. Adds two regression tests confirming neither
directory is scanned.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(settings): warn that model profiles are no-ops on non-Claude runtimes (closes #2506)

settings.md presented Quality/Balanced/Budget model profiles without any
indication that these tiers map to Claude models (Opus/Sonnet/Haiku) and
have no effect on non-Claude runtimes (Codex, Gemini CLI, OpenRouter).
Users on Codex saw the profile chooser as if it would meaningfully select
models, but all agents silently used the runtime default regardless.

Adds a non-Claude runtime note before the profile question (shown in
TEXT_MODE, the path all non-Claude runtimes take) explaining the profiles
are no-ops and directing users to either choose Inherit or configure
model_overrides manually. Also updates the Inherit option description to
explicitly name the runtimes where it is the correct choice.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 10:10:10 -04:00
Tom Boucher
a4764c5611 fix(execute-phase): resurrection-detection must check git history before deleting new .planning/ files (#2510)
The guard at the worktree-merge resurrection block was inverting the
intended logic: it deleted any .planning/ file absent from PRE_MERGE_FILES,
which includes brand-new files (e.g. SUMMARY.md just created by the
executor). A genuine resurrection is a file that was previously tracked on
main, deliberately removed, and then re-introduced by the merge. Detecting
that requires a git history check — not just tree membership.

Fix: replace the PRE_MERGE_FILES grep guard with a `git log --follow
--diff-filter=D` check that only removes the file if it has a deletion
event in main's ancestry.

Closes #2501
2026-04-21 09:46:01 -04:00
Jeremy McSpadden
30433368a0 fix(install): template bare .claude hook paths for non-Claude runtimes 2026-04-19 18:42:30 -05:00
Jeremy McSpadden
04fab926b5 test: add --no-sdk to hook-deployment installer tests
Tests #1834, #1924, #2136 exercise hook/artifact deployment and don't
care about SDK install. Now that installSdkIfNeeded() failures are
fatal, these tests fail on any CI runner without gsd-sdk pre-built
because the sdk/ tsc build path runs and can fail in CI env.

Pass --no-sdk so each test focuses on its actual subject. SDK install
path has dedicated end-to-end coverage in install-smoke.yml.
2026-04-19 18:39:32 -05:00
Jeremy McSpadden
f98ef1e460 fix(install): fatal SDK install failures + CI smoke gate (#2439)
## Why
#2386 added `installSdkIfNeeded()` to build @gsd-build/sdk from bundled
source and `npm install -g .`, because the npm-published @gsd-build/sdk
is intentionally frozen and version-mismatched with get-shit-done-cc.

But every failure path in that function was warning-only — including
the final `which gsd-sdk` verification. When npm's global bin is off a
user's PATH (common on macOS), the installer printed a yellow warning
then exited 0. Users saw "install complete" and then every `/gsd-*`
command crashed with `command not found: gsd-sdk` (the #2439 symptom).

No CI job executed the install path, so this class of regression could
ship undetected — existing "install" tests only read bin/install.js as
a string.

## What changed

**bin/install.js — installSdkIfNeeded() is now transactional**
- All build/install failures exit non-zero (not just warn).
- Post-install `which gsd-sdk` check is fatal: if the binary landed
  globally but is off PATH, we exit 1 with a red banner showing the
  resolved npm bin dir, the user's shell, the target rc file, and the
  exact `export PATH=…` line to add.
- Escape hatch: `GSD_ALLOW_OFF_PATH=1` downgrades off-PATH to exit 2
  for users with intentionally restricted PATH who will wire up the
  binary manually.
- Resolver uses POSIX `command -v` via `sh -c` (replaces `which`) so
  behavior is consistent across sh/bash/zsh/fish.
- Factored `resolveGsdSdk()`, `detectShellRc()`, `emitSdkFatal()`.

**.github/workflows/install-smoke.yml (new)**
- Executes the real install path: `npm pack` → `npm install -g <tgz>`
  → run installer non-interactively → `command -v gsd-sdk` → run
  `gsd-sdk --version`.
- PRs: path-filtered to installer-adjacent files, ubuntu + Node 22 only.
- main/release branches: full matrix (ubuntu+macos × Node 22+24).
- Reusable via workflow_call with `ref` input for release gating.

**.github/workflows/release.yml — pre-publish gate**
- New `install-smoke-rc` and `install-smoke-finalize` jobs invoke the
  reusable workflow against the release branch. `rc` and `finalize`
  now `needs: [validate-version, install-smoke-*]`, so a broken SDK
  install blocks `npm publish`.

## Test plan
- Local full suite: 4154/4154 pass
- install-smoke.yml will self-validate on this PR (ubuntu+Node22 only)

Addresses root cause of #2439 (the per-command pre-flight in #2440 is
the complementary defensive layer).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 18:39:32 -05:00
Jeremy McSpadden
d0565e95c1 fix(set-profile): use hyphenated /gsd-set-profile in pre-flight message
Project convention (#1748) requires /gsd-<cmd> hyphen form everywhere
except designated test inputs. Fix the colon references in the
pre-flight error and its regression test to satisfy stale-colon-refs.
2026-04-19 18:39:32 -05:00
Jeremy McSpadden
4ef6275e86 fix(set-profile): guard gsd-sdk invocation with command -v pre-flight (#2439)
/gsd:set-profile crashed with `command not found: gsd-sdk` when gsd-sdk
was not on PATH. The command invoked `gsd-sdk query` directly in a `!`
backtick with no guard, so a missing binary produced an opaque shell
error with exit 127.

Add a `command -v gsd-sdk` pre-flight that prints the install/update
hint and exits 1 when absent, mirroring the #2334 fix on /gsd-quick.
The auto-install in #2386 still runs at install time; this guard is the
defensive layer for users whose npm global bin is off-PATH (install.js
warns but does not fail in that case).

Closes #2439
2026-04-19 18:39:32 -05:00
Jeremy McSpadden
6c50490766 fix(sdk): register init.ingest-docs handler and add registry drift guard (#2442)
The ingest-docs workflow called `gsd-sdk query init.ingest-docs` with a
fallback to `init.default` — neither was registered in createRegistry(),
so the workflow proceeded with `{}` and tried to parse project_exists,
planning_exists, has_git, and project_path from empty.

- Add initIngestDocs handler; register dotted + space aliases
- Simplify workflow call; drop broken fallback
- Repo-wide drift guard scans commands/, agents/, get-shit-done/,
  hooks/, bin/, scripts/, docs/ for `gsd-sdk query <cmd>` and fails
  on any reference with no registered handler (file:line citations)
- Unit tests for the new handler

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 18:39:20 -05:00
Jeremy McSpadden
4cbebfe78c docs(readme): add /gsd-ingest-docs to Brownfield commands
Surfaces the new ingest-docs command from the Unreleased changelog in
the README Commands section so users discover it without digging.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 18:39:20 -05:00
Jeremy McSpadden
9e87d43831 fix(build): include gsd-read-injection-scanner in hooks/dist (#2406)
The scanner was added in #2201 but never added to the HOOKS_TO_COPY
allowlist in scripts/build-hooks.js, so it never landed in hooks/dist/.
install.js reads from hooks/dist/, so every install on 1.37.0/1.37.1
emitted "Skipped read injection scanner hook — not found at target"
and the read-time prompt-injection scanner was silently disabled.

- Add gsd-read-injection-scanner.js to HOOKS_TO_COPY
- Add it to EXPECTED_ALL_HOOKS regression test in install-hooks-copy

Fixes #2406

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 18:39:20 -05:00
github-actions[bot]
29ea90bc83 chore: bump version to 1.38.1 for hotfix 2026-04-19 23:37:15 +00:00
github-actions[bot]
0c6172bfad chore: finalize v1.38.0 2026-04-18 03:45:59 +00:00
Jeremy McSpadden
e3bd06c9fd fix(release): make merge-back PR step non-fatal
Repos that disable "Allow GitHub Actions to create and approve pull
requests" (org-level policy or repo-level setting) cause the "Create PR
to merge release back to main" step to fail with a GraphQL 403. That
failure cascades: Tag and push, npm publish, GitHub Release creation
are all skipped, and the entire release aborts.

The merge-back PR is a convenience — it's re-openable manually after
the release. Making it non-fatal with continue-on-error lets the rest
of the release complete. The step now emits ::warning:: annotations
pointing at the manual-recovery command when it fails.

Shell pipelines also fall through with `|| echo "::warning::..."` so
transient gh CLI failures don't mask the underlying policy issue.

Covers the failure mode seen on run 24596079637 where dry-run publish
validation passed but the release halted at the PR-creation step.
2026-04-17 22:45:22 -05:00
github-actions[bot]
c69ecd975a chore: bump to 1.38.0-rc.1 2026-04-18 03:05:35 +00:00
Jeremy McSpadden
06c4ded4ec docs(changelog): promote Unreleased to [1.38.0] + add ultraplan entry 2026-04-17 22:03:26 -05:00
github-actions[bot]
341bb941c6 chore: bump version to 1.38.0 for release 2026-04-18 03:02:41 +00:00
20 changed files with 864 additions and 251 deletions

View File

@@ -342,23 +342,32 @@ jobs:
- name: Create PR to merge release back to main
if: ${{ !inputs.dry_run }}
continue-on-error: true
env:
GH_TOKEN: ${{ github.token }}
BRANCH: ${{ needs.validate-version.outputs.branch }}
VERSION: ${{ inputs.version }}
run: |
EXISTING_PR=$(gh pr list --base main --head "$BRANCH" --state open --json number --jq '.[0].number')
# Non-fatal: repos that disable "Allow GitHub Actions to create and
# approve pull requests" cause this step to fail with GraphQL 403.
# The release itself (tag + npm publish + GitHub Release) must still
# proceed. Open the merge-back PR manually afterwards with:
# gh pr create --base main --head release/${VERSION} \
# --title "chore: merge release v${VERSION} to main"
EXISTING_PR=$(gh pr list --base main --head "$BRANCH" --state open --json number --jq '.[0].number' 2>/dev/null || echo "")
if [ -n "$EXISTING_PR" ]; then
echo "PR #$EXISTING_PR already exists; updating"
gh pr edit "$EXISTING_PR" \
--title "chore: merge release v${VERSION} to main" \
--body "Merge release branch back to main after v${VERSION} stable release."
--body "Merge release branch back to main after v${VERSION} stable release." \
|| echo "::warning::Could not update merge-back PR (likely PR-creation policy disabled). Open it manually after release."
else
gh pr create \
--base main \
--head "$BRANCH" \
--title "chore: merge release v${VERSION} to main" \
--body "Merge release branch back to main after v${VERSION} stable release."
--body "Merge release branch back to main after v${VERSION} stable release." \
|| echo "::warning::Could not create merge-back PR (likely PR-creation policy disabled). Open it manually after release."
fi
- name: Tag and push

View File

@@ -1,7 +1,7 @@
---
name: gsd:sketch
description: Rapidly sketch UI/design ideas using throwaway HTML mockups with multi-variant exploration
argument-hint: "<design idea to explore> [--quick] [--text]"
description: Sketch UI/design ideas with throwaway HTML mockups, or propose what to sketch next (frontier mode)
argument-hint: "[design idea to explore] [--quick] [--text] or [frontier]"
allowed-tools:
- Read
- Write
@@ -18,7 +18,12 @@ allowed-tools:
<objective>
Explore design directions through throwaway HTML mockups before committing to implementation.
Each sketch produces 2-3 variants for comparison. Sketches live in `.planning/sketches/` and
integrate with GSD commit patterns, state tracking, and handoff workflows.
integrate with GSD commit patterns, state tracking, and handoff workflows. Loads spike
findings to ground mockups in real data shapes and validated interaction patterns.
Two modes:
- **Idea mode** (default) — describe a design idea to sketch
- **Frontier mode** (no argument or "frontier") — analyzes existing sketch landscape and proposes consistency and frontier sketches
Does not require `/gsd-new-project` — auto-creates `.planning/sketches/` if needed.
</objective>

View File

@@ -1,7 +1,7 @@
---
name: gsd:spike
description: Rapidly spike an idea with throwaway experiments to validate feasibility before planning
argument-hint: "<idea to validate> [--quick] [--text]"
description: Spike an idea through experiential exploration, or propose what to spike next (frontier mode)
argument-hint: "[idea to validate] [--quick] [--text] or [frontier]"
allowed-tools:
- Read
- Write
@@ -16,9 +16,14 @@ allowed-tools:
- mcp__context7__query-docs
---
<objective>
Rapid feasibility validation through focused, throwaway experiments. Each spike answers one
specific question with observable evidence. Spikes live in `.planning/spikes/` and integrate
with GSD commit patterns, state tracking, and handoff workflows.
Spike an idea through experiential exploration — build focused experiments to feel the pieces
of a future app, validate feasibility, and produce verified knowledge for the real build.
Spikes live in `.planning/spikes/` and integrate with GSD commit patterns, state tracking,
and handoff workflows.
Two modes:
- **Idea mode** (default) — describe an idea to spike
- **Frontier mode** (no argument or "frontier") — analyzes existing spike landscape and proposes integration and frontier spikes
Does not require `/gsd-new-project` — auto-creates `.planning/spikes/` if needed.
</objective>

View File

@@ -1204,10 +1204,6 @@ async function runCommand(command, args, cwd, raw, defaultValue) {
'agents',
path.join('commands', 'gsd'),
'hooks',
// OpenCode/Kilo flat command dir
'command',
// Codex/Copilot skills dir
'skills',
];
function walkDir(dir, baseDir) {

View File

@@ -649,10 +649,15 @@ Execute each selected wave in sequence. Within a wave: parallel if `PARALLELIZAT
# Detect files deleted on main but re-added by worktree merge
# (e.g., archived phase directories that were intentionally removed)
# A "resurrected" file must have a deletion event in main's ancestry —
# brand-new files (e.g. SUMMARY.md just created by the executor) have no
# such history and must NOT be removed (#2501).
DELETED_FILES=$(git diff --diff-filter=A --name-only HEAD~1 -- .planning/ 2>/dev/null || true)
for RESURRECTED in $DELETED_FILES; do
# Check if this file was NOT in main's pre-merge tree
if ! echo "$PRE_MERGE_FILES" | grep -qxF "$RESURRECTED"; then
# Only delete if this file was previously tracked on main and then
# deliberately removed (has a deletion event in git history).
WAS_DELETED=$(git log --follow --diff-filter=D --name-only --format="" HEAD~1 -- "$RESURRECTED" 2>/dev/null | grep -c . || true)
if [ "${WAS_DELETED:-0}" -gt 0 ]; then
git rm -f "$RESURRECTED" 2>/dev/null || true
fi
done

View File

@@ -66,7 +66,11 @@ Extract from result: `phase_number`, `after_phase`, `name`, `slug`, `directory`.
Update STATE.md to reflect the inserted phase:
1. Read `.planning/STATE.md`
2. Under "## Accumulated Context" → "### Roadmap Evolution" add entry:
2. Update STATE.md's next-phase pointers to the newly inserted phase `{decimal_phase}`:
- Update structured field(s) used by tooling (e.g. `current_phase:`) to `{decimal_phase}`.
- Update human-readable recommendation text (e.g. `## Current Phase`, `Next recommended run:`) to `{decimal_phase}`.
- If multiple pointer locations exist, update all of them in the same edit.
3. Under "## Accumulated Context" → "### Roadmap Evolution" add entry:
```
- Phase {decimal_phase} inserted after Phase {after_phase}: {description} (URGENT)
```

View File

@@ -51,6 +51,17 @@ Parse current values (default to `true` if not present):
<step name="present_settings">
**Text mode (`workflow.text_mode: true` in config or `--text` flag):** Set `TEXT_MODE=true` if `--text` is present in `$ARGUMENTS` OR `text_mode` from init JSON is `true`. When TEXT_MODE is active, replace every `AskUserQuestion` call with a plain-text numbered list and ask the user to type their choice number. This is required for non-Claude runtimes (OpenAI Codex, Gemini CLI, etc.) where `AskUserQuestion` is not available.
**Non-Claude runtime note:** If `TEXT_MODE` is active (i.e. the runtime is non-Claude), prepend the following notice before the model profile question:
```
Note: Quality, Balanced, and Budget profiles select Claude model tiers (Opus/Sonnet/Haiku).
On non-Claude runtimes (Codex, Gemini CLI, etc.) these profiles have no effect on actual
model selection — GSD agents will use the runtime's default model.
Choose "Inherit" to use the session model for all agents, or configure model_overrides
manually in .planning/config.json to target specific models for this runtime.
```
Use AskUserQuestion with current values pre-selected:
```
@@ -60,10 +71,10 @@ AskUserQuestion([
header: "Model",
multiSelect: false,
options: [
{ label: "Quality", description: "Opus everywhere except verification (highest cost)" },
{ label: "Balanced (Recommended)", description: "Opus for planning, Sonnet for research/execution/verification" },
{ label: "Budget", description: "Sonnet for writing, Haiku for research/verification (lowest cost)" },
{ label: "Inherit", description: "Use current session model for all agents (best for OpenRouter, local models, or runtime model switching)" }
{ label: "Quality", description: "Opus everywhere except verification (highest cost) — Claude only" },
{ label: "Balanced (Recommended)", description: "Opus for planning, Sonnet for research/execution/verification — Claude only" },
{ label: "Budget", description: "Sonnet for writing, Haiku for research/verification (lowest cost) — Claude only" },
{ label: "Inherit", description: "Use current session model for all agents (required for non-Claude runtimes: Codex, Gemini CLI, OpenRouter, local models)" }
]
},
{

View File

@@ -255,15 +255,16 @@ The sketch-findings skill will auto-load when building the UI.
## ▶ Next Up
**Start building** — implement the validated design
**Explore frontier sketches** — see what else is worth sketching based on what we've explored
`/gsd-plan-phase`
`/gsd-sketch` (run with no argument — its frontier mode analyzes the sketch landscape and proposes consistency and frontier sketches)
───────────────────────────────────────────────────────────────
**Also available:**
- `/gsd-plan-phase` — start building the real UI
- `/gsd-ui-phase` — generate a UI design contract for a frontend phase
- `/gsd-sketch` — sketch additional design areas
- `/gsd-sketch [idea]` — sketch a specific new design area
- `/gsd-explore` — continue exploring
───────────────────────────────────────────────────────────────
@@ -279,5 +280,6 @@ The sketch-findings skill will auto-load when building the UI.
- [ ] Reference files contain design decisions, CSS patterns, HTML structures, anti-patterns
- [ ] `.planning/sketches/WRAP-UP-SUMMARY.md` written for project history
- [ ] Project CLAUDE.md has auto-load routing line
- [ ] Summary presented with next-step routing
- [ ] Summary presented
- [ ] Next-step options presented (including frontier sketch exploration via `/gsd-sketch`)
</success_criteria>

View File

@@ -2,6 +2,10 @@
Explore design directions through throwaway HTML mockups before committing to implementation.
Each sketch produces 2-3 variants for comparison. Saves artifacts to `.planning/sketches/`.
Companion to `/gsd-sketch-wrap-up`.
Supports two modes:
- **Idea mode** (default) — user describes a design idea to sketch
- **Frontier mode** — no argument or "frontier" / "what should I sketch?" — analyzes existing sketch landscape and proposes consistency and frontier sketches
</purpose>
<required_reading>
@@ -25,9 +29,60 @@ Read all files referenced by the invoking prompt's execution_context before star
Parse `$ARGUMENTS` for:
- `--quick` flag → set `QUICK_MODE=true`
- `--text` flag → set `TEXT_MODE=true`
- `frontier` or empty → set `FRONTIER_MODE=true`
- Remaining text → the design idea to sketch
**Text mode (`workflow.text_mode: true` in config or `--text` flag):** Set `TEXT_MODE=true` if `--text` is present in `$ARGUMENTS` OR `text_mode` from init JSON is `true`. When TEXT_MODE is active, replace every `AskUserQuestion` call with a plain-text numbered list and ask the user to type their choice number. This is required for non-Claude runtimes (OpenAI Codex, Gemini CLI, etc.) where `AskUserQuestion` is not available.
**Text mode:** If TEXT_MODE is enabled, replace AskUserQuestion calls with plain-text numbered lists.
</step>
<step name="route">
## Routing
- **FRONTIER_MODE is true** → Jump to `frontier_mode`
- **Otherwise** → Continue to `setup_directory`
</step>
<step name="frontier_mode">
## Frontier Mode — Propose What to Sketch Next
### Load the Sketch Landscape
If no `.planning/sketches/` directory exists, tell the user there's nothing to analyze and offer to start fresh with an idea instead.
Otherwise, load in this order:
**a. MANIFEST.md** — the design direction, reference points, and sketch table with winners.
**b. Findings skills** — glob `./.claude/skills/sketch-findings-*/SKILL.md` and read any that exist, plus their `references/*.md`. These contain curated design decisions from prior wrap-ups.
**c. All sketch READMEs** — read `.planning/sketches/*/README.md` for design questions, winners, and tags.
### Analyze for Consistency Sketches
Review winning variants across all sketches. Look for:
- **Visual consistency gaps:** Two sketches made independent design choices that haven't been tested together.
- **State combinations:** Individual states validated but not seen in sequence.
- **Responsive gaps:** Validated at one viewport but the real app needs multiple.
- **Theme coherence:** Individual components look good but haven't been composed into a full-page view.
If consistency risks exist, present them as concrete proposed sketches with names and design questions. If no meaningful gaps, say so and skip.
### Analyze for Frontier Sketches
Think laterally about the design direction from MANIFEST.md and what's been explored:
- **Unsketched screens:** UI surfaces assumed but unexplored.
- **Interaction patterns:** Static layouts validated but transitions, loading, drag-and-drop need feeling.
- **Edge case UI:** 0 items, 1000 items, errors, slow connections.
- **Alternative directions:** Fresh takes on "fine but not great" sketches.
- **Polish passes:** Typography, spacing, micro-interactions, empty states.
Present frontier sketches as concrete proposals numbered from the highest existing sketch number.
### Get Alignment and Execute
Present all consistency and frontier candidates, then ask which to run. When the user picks sketches, update `.planning/sketches/MANIFEST.md` and proceed directly to building them starting at `build_sketches`.
</step>
<step name="setup_directory">
@@ -49,25 +104,45 @@ COMMIT_DOCS=$(gsd-sdk query config-get commit_docs 2>/dev/null || echo "true")
</step>
<step name="mood_intake">
**If `QUICK_MODE` is true:** Skip mood intake. Use whatever the user provided in `$ARGUMENTS` as the design direction. Jump to `decompose`.
**If `QUICK_MODE` is true:** Skip mood intake. Use whatever the user provided in `$ARGUMENTS` as the design direction. Jump to `load_spike_context`.
**Otherwise:**
Before sketching anything, explore the design intent through conversation. Ask one question at a time using AskUserQuestion, with a paragraph of context and reasoning for each.
Before sketching anything, explore the design intent through conversation. Ask one question at a time using AskUserQuestion in normal mode, or a plain-text numbered list if TEXT_MODE is active.
**Questions to cover (adapt to what the user has already shared):**
1. **Feel:** "What should this feel like? Give me adjectives, emotions, or a vibe." (e.g., "clean and clinical", "warm and playful", "dense and powerful")
2. **References:** "What apps, sites, or products have a similar feel to what you're imagining?" (gives concrete visual anchors)
3. **Core action:** "What's the single most important thing a user does here?" (focuses the sketch on what matters)
1. **Feel:** "What should this feel like? Give me adjectives, emotions, or a vibe."
2. **References:** "What apps, sites, or products have a similar feel to what you're imagining?"
3. **Core action:** "What's the single most important thing a user does here?"
You may need more or fewer questions depending on how much the user shares upfront. After each answer, briefly reflect what you heard and how it shapes your thinking.
After each answer, briefly reflect what you heard and how it shapes your thinking.
When you have enough signal, ask: **"I think I have a good sense of the direction. Ready for me to sketch, or want to keep discussing?"**
Only proceed when the user says go.
</step>
<step name="load_spike_context">
## Load Spike Context
If spikes exist for this project, read them to ground the sketches in reality. Mockups are still pure HTML, but they should reflect what's actually been proven — real data shapes, real component names, real interaction patterns.
**a.** Glob for `./.claude/skills/spike-findings-*/SKILL.md` and read any that exist, plus their `references/*.md`. These contain validated patterns and requirements.
**b.** Read `.planning/spikes/MANIFEST.md` if it exists — check the Requirements section for non-negotiable design constraints (e.g., "must support streaming", "must render markdown"). These requirements should be visible in the mockup even though the mockup doesn't implement them for real.
**c.** Read `.planning/spikes/CONVENTIONS.md` if it exists — the established stack informs what's buildable and what interaction patterns are idiomatic.
**How spike context improves sketches:**
- Use real field names and data shapes from spike findings instead of generic placeholders
- Show realistic UI states that match what the spikes proved (e.g., if streaming was validated, show a streaming message state)
- Reference real component names and patterns from the target stack
- Include interaction states that reflect what the spikes discovered (loading, error, reconnection states)
**If no spikes exist**, skip this step.
</step>
<step name="decompose">
Break the idea into 2-5 design questions. Present as a table:
@@ -98,18 +173,18 @@ Before sketching, ground the design in what's actually buildable. Sketches are H
**a. Identify the target stack.** Check for package.json, Cargo.toml, etc. If the user mentioned a framework (React, SwiftUI, Flutter, etc.), note it.
**b. Check component/pattern availability.** Use context7 (resolve-library-id → query-docs) or web search to answer:
- What layout primitives does the target framework provide? (grid systems, nav patterns, panel components)
- Are there existing component libraries in use? (shadcn, Material UI, etc.) What components are available?
- What interaction patterns are idiomatic? (e.g., sheet vs modal vs dialog in mobile)
- What layout primitives does the target framework provide?
- Are there existing component libraries in use? What components are available?
- What interaction patterns are idiomatic?
**c. Note constraints that affect design.** Some things that look great in HTML are painful or impossible in certain stacks:
**c. Note constraints that affect design:**
- Platform conventions (iOS nav patterns, desktop menu bars, terminal grid constraints)
- Framework limitations (what's easy vs requires custom work)
- Existing design tokens or theme systems already in the project
**d. Let research inform variants.** Use findings to make variants that are actually buildable — at least one variant should follow the path of least resistance for the target stack.
**d. Let research inform variants.** At least one variant should follow the path of least resistance for the target stack.
**Skip when unnecessary.** If it's a greenfield project with no stack chosen, or the user explicitly says "just explore visually, don't worry about implementation," skip this step entirely. The point is grounding, not gatekeeping.
**Skip when unnecessary.** Greenfield project with no stack, or user says "just explore visually." The point is grounding, not gatekeeping.
</step>
<step name="create_manifest">
@@ -144,26 +219,24 @@ Build each sketch in order.
### For Each Sketch:
**a.** Find next available number by checking existing `.planning/sketches/NNN-*/` directories.
Format: three-digit zero-padded + hyphenated descriptive name.
**a.** Find next available number. Format: three-digit zero-padded + hyphenated descriptive name.
**b.** Create the sketch directory: `.planning/sketches/NNN-descriptive-name/`
**c.** Build `index.html` with 2-3 variants:
**First round — dramatic differences:** Build 2-3 meaningfully different approaches to the design question. Different layouts, different visual structures, different interaction models.
**Subsequent rounds — refinements:** Once the user has picked a direction or cherry-picked elements, build subtler variations within that direction.
**First round — dramatic differences:** 2-3 meaningfully different approaches.
**Subsequent rounds — refinements:** Subtler variations within the chosen direction.
Each variant is a page/tab in the same HTML file. Include:
- Tab navigation to switch between variants (see `sketch-variant-patterns.md`)
- Clear labels: "Variant A: Sidebar Layout", "Variant B: Top Nav", etc.
- The sketch toolbar (see `sketch-tooling.md`)
- All interactive elements functional (see `sketch-interactivity.md`)
- Real-ish content, not lorem ipsum
- Real-ish content, not lorem ipsum (use real field names from spike context if available)
- Link to `../themes/default.css` for shared theme variables
**All sketches are plain HTML with inline CSS and JS.** No build step, no npm, no framework. Opens instantly in a browser.
**All sketches are plain HTML with inline CSS and JS.** No build step, no npm, no framework.
**d.** Write `README.md`:
@@ -210,16 +283,16 @@ Compare: {what to look for between variants}
──────────────────────────────────────────────────────────────
**f.** Handle feedback:
- **Pick a direction:** "I like variant B" → mark winner in README, move to next sketch
- **Cherry-pick elements:** "Rounded edges from A, color treatment from C" → build a synthesis as a new variant, show again
- **Want more exploration:** "None of these feel right, try X instead" → build new variants
- **Pick a direction:** mark winner, move to next sketch
- **Cherry-pick elements:** build synthesis as new variant, show again
- **Want more exploration:** build new variants
Iterate until the user is satisfied with a direction for this sketch.
Iterate until satisfied.
**g.** Finalize:
1. Mark the winning variant in the README frontmatter (`winner: "B"`)
2. Add ★ indicator to the winning tab in the HTML
3. Update `.planning/sketches/MANIFEST.md` with the sketch row
1. Mark winning variant in README frontmatter (`winner: "B"`)
2. Add ★ indicator to winning tab in HTML
3. Update `.planning/sketches/MANIFEST.md`
**h.** Commit (if `COMMIT_DOCS` is true):
```bash
@@ -235,7 +308,7 @@ gsd-sdk query commit "docs(sketch-NNN): [winning direction] — [key visual insi
</step>
<step name="report">
After all sketches complete, present the summary:
After all sketches complete:
```
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
@@ -263,8 +336,8 @@ After all sketches complete, present the summary:
───────────────────────────────────────────────────────────────
**Also available:**
- `/gsd-sketch` — sketch more (or run with no argument for frontier mode)
- `/gsd-plan-phase` — start building the real UI
- `/gsd-explore` — continue exploring the concept
- `/gsd-spike` — spike technical feasibility of a design pattern
───────────────────────────────────────────────────────────────
@@ -275,8 +348,9 @@ After all sketches complete, present the summary:
<success_criteria>
- [ ] `.planning/sketches/` created (auto-creates if needed, no project init required)
- [ ] Design direction explored conversationally before any code (unless --quick)
- [ ] Target stack researched — component availability, constraints, and idioms noted (unless greenfield/skipped)
- [ ] Each sketch has 2-3 variants for comparison (at least one follows path of least resistance for target stack)
- [ ] Spike context loaded — real data shapes, requirements, and conventions inform mockups
- [ ] Target stack researched — component availability, constraints, idioms (unless greenfield/skipped)
- [ ] Each sketch has 2-3 variants for comparison (at least one follows path of least resistance)
- [ ] User can open and interact with sketches in a browser
- [ ] Winning variant selected and marked for each sketch
- [ ] All variants preserved (winner marked, not others deleted)

View File

@@ -1,8 +1,8 @@
<purpose>
Curate spike experiment findings and package them into a persistent project skill for future
build conversations. Reads from `.planning/spikes/`, writes skill to `./.claude/skills/spike-findings-[project]/`
(project-local) and summary to `.planning/spikes/WRAP-UP-SUMMARY.md`.
Companion to `/gsd-spike`.
Package spike experiment findings into a persistent project skill — an implementation blueprint
for future build conversations. Reads from `.planning/spikes/`, writes skill to
`./.claude/skills/spike-findings-[project]/` (project-local) and summary to
`.planning/spikes/WRAP-UP-SUMMARY.md`. Companion to `/gsd-spike`.
</purpose>
<required_reading>
@@ -22,7 +22,7 @@ Read all files referenced by the invoking prompt's execution_context before star
<step name="gather">
## Gather Spike Inventory
1. Read `.planning/spikes/MANIFEST.md` for the overall idea context
1. Read `.planning/spikes/MANIFEST.md` for the overall idea context and requirements
2. Glob `.planning/spikes/*/README.md` and parse YAML frontmatter from each
3. Check if `./.claude/skills/spike-findings-*/SKILL.md` exists for this project
- If yes: read its `processed_spikes` list from the metadata section and filter those out
@@ -93,21 +93,29 @@ For each included spike:
<step name="synthesize">
## Synthesize Reference Files
For each feature-area group, write a reference file at `references/[feature-area-name].md`:
For each feature-area group, write a reference file at `references/[feature-area-name].md` as an **implementation blueprint** — it should read like a recipe, not a research paper. A future build session should be able to follow this and build the feature correctly without re-spiking anything.
```markdown
# [Feature Area Name]
## Validated Patterns
[For each validated finding: describe the approach that works, include key code snippets extracted from the spike source, explain why it works]
## Requirements
## Landmines
[Things that look right but aren't. Gotchas. Anti-patterns discovered during spiking.]
[Non-negotiable design decisions from MANIFEST.md Requirements section that apply to this feature area. These MUST be honored in the real build. E.g., "Must use streaming JSON output", "Must support reconnection".]
## How to Build It
[Step-by-step: what to install, how to configure, what code pattern to use. Include key code snippets extracted from the spike source. This is the proven approach — not theory, but tested and working code.]
## What to Avoid
[Things that look right but aren't. Gotchas. Anti-patterns discovered during spiking. Dead ends that were tried and failed.]
## Constraints
[Hard facts: rate limits, library limitations, version requirements, incompatibilities]
## Origin
Synthesized from spikes: NNN, NNN, NNN
Source files available in: sources/NNN-spike-name/, sources/NNN-spike-name/
```
@@ -121,7 +129,7 @@ Create (or update) the generated skill's SKILL.md:
```markdown
---
name: spike-findings-[project-dir-name]
description: Validated patterns, constraints, and implementation knowledge from spike experiments. Auto-loaded during implementation work on [project-dir-name].
description: Implementation blueprint from spike experiments. Requirements, proven patterns, and verified knowledge for building [project-dir-name]. Auto-loaded during implementation work.
---
<context>
@@ -132,6 +140,15 @@ description: Validated patterns, constraints, and implementation knowledge from
Spike sessions wrapped: [date(s)]
</context>
<requirements>
## Requirements
[Copied directly from MANIFEST.md Requirements section. These are non-negotiable design decisions that emerged from the user's choices during spiking. Every feature area reference must honor these.]
- [requirement 1]
- [requirement 2]
</requirements>
<findings_index>
## Feature Areas
@@ -189,11 +206,47 @@ Add an auto-load routing line to the project's CLAUDE.md (create the file if it
If this routing line already exists (append mode), leave it as-is.
</step>
<step name="generate_conventions">
## Generate or Update CONVENTIONS.md
Analyze all processed spikes for recurring patterns and write `.planning/spikes/CONVENTIONS.md`. This file tells future spike sessions *how we spike* — the stack, structure, and patterns that have been established.
1. Read all spike source code and READMEs looking for:
- **Stack choices** — What language/framework/runtime appears across multiple spikes?
- **Structure patterns** — Common file layouts, port numbers, naming schemes
- **Recurring approaches** — How auth is handled, how styling is done, how data is served
- **Tools & libraries** — Packages that showed up repeatedly with versions that worked
2. Write or update `.planning/spikes/CONVENTIONS.md`:
```markdown
# Spike Conventions
Patterns and stack choices established across spike sessions. New spikes follow these unless the question requires otherwise.
## Stack
[What we use for frontend, backend, scripts, and why — derived from what repeated across spikes]
## Structure
[Common file layouts, port assignments, naming patterns]
## Patterns
[Recurring approaches: how we handle auth, how we style, how we serve, etc.]
## Tools & Libraries
[Preferred packages with versions that worked, and any to avoid]
```
3. Only include patterns that appeared in 2+ spikes or were explicitly chosen by the user.
4. If `CONVENTIONS.md` already exists (append mode), update sections with new patterns. Remove entries contradicted by newer spikes.
</step>
<step name="commit">
Commit all artifacts (if `COMMIT_DOCS` is true):
```bash
gsd-sdk query commit "docs(spike-wrap-up): package [N] spike findings into project skill" .planning/spikes/WRAP-UP-SUMMARY.md
gsd-sdk query commit "docs(spike-wrap-up): package [N] spike findings into project skill" .planning/spikes/WRAP-UP-SUMMARY.md .planning/spikes/CONVENTIONS.md
```
</step>
@@ -206,6 +259,7 @@ gsd-sdk query commit "docs(spike-wrap-up): package [N] spike findings into proje
**Processed:** {N} spikes
**Feature areas:** {list}
**Skill:** `./.claude/skills/spike-findings-[project]/`
**Conventions:** `.planning/spikes/CONVENTIONS.md`
**Summary:** `.planning/spikes/WRAP-UP-SUMMARY.md`
**CLAUDE.md:** routing line added
@@ -214,56 +268,27 @@ The spike-findings skill will auto-load in future build conversations.
</step>
<step name="whats_next">
## What's Next — Intelligent Spike Routing
## What's Next
Analyze the full spike landscape (MANIFEST.md, all curated findings, feature-area groupings, validated/invalidated/partial verdicts) and present three categories of next-step options:
After the summary, present next-step options:
### Category A: Integration Spikes — "Do any validated spikes need to be tested together?"
───────────────────────────────────────────────────────────────
Review every pair and cluster of VALIDATED spikes. Look for:
## ▶ Next Up
- **Shared resources:** Two spikes that both touch the same API, database, state, or data format but were tested independently. Will they conflict, race, or step on each other?
- **Data handoffs:** Spike A produces output that Spike B consumes. The formats were assumed compatible but never proven.
- **Timing/ordering:** Spikes that work in isolation but have sequencing dependencies in the real flow (e.g., auth must complete before streaming starts).
- **Resource contention:** Spikes that individually work but may compete for connections, memory, rate limits, or tokens when combined.
**Explore frontier spikes** — see what else is worth spiking based on what we've learned
If integration risks exist, present them as concrete proposed spikes:
`/gsd-spike` (run with no argument — its frontier mode analyzes the spike landscape and proposes integration and frontier spikes)
> **Integration spike candidates:**
> - "Spikes 001 + 003 together: streaming through the authenticated connection" — these were tested separately but the real app needs both at once
> - "Spikes 002 + 005 data handoff: does the parser output match what the renderer expects?"
───────────────────────────────────────────────────────────────
If no meaningful integration risks exist, say so and skip this category.
### Category B: Frontier Spikes — "What else should we spike?"
Think laterally about the overall idea from MANIFEST.md and what's been proven so far. Consider:
- **Gaps in the vision:** What does the user's idea need that hasn't been spiked yet? Look at the MANIFEST.md idea description and identify capabilities that are assumed but unproven.
- **Discovered dependencies:** Findings from completed spikes that reveal new questions. A spike that validated "X works" may imply "but we'd also need Y" — surface those implied needs.
- **Alternative approaches:** If any spike was PARTIAL or INVALIDATED, suggest a different angle to achieve the same goal.
- **Adjacent capabilities:** Things that aren't strictly required but would meaningfully improve the idea if feasible — worth a quick spike to find out.
- **Comparison opportunities:** If a spike used one library/approach and it worked but felt heavy or awkward, suggest a comparison spike with an alternative.
Present frontier spikes as concrete proposals with names, validation questions (Given/When/Then), and risk-ordering:
> **Frontier spike candidates:**
> 1. `NNN-descriptive-name` — Given [X], when [Y], then [Z]. *Why now: [reason this is the logical next thing to explore]*
> 2. `NNN-descriptive-name` — Given [X], when [Y], then [Z]. *Why now: [reason]*
Number them continuing from the highest existing spike number.
### Category C: Standard Options
- `/gsd-plan-phase` — Start planning the real implementation
- `/gsd-add-phase` — Add a phase based on spike findings
- `/gsd-spike` — Spike additional ideas
- `/gsd-explore` — Continue exploring
**Also available:**
- `/gsd-plan-phase` — start planning the real implementation
- `/gsd-spike [idea]` — spike a specific new idea
- `/gsd-explore` — continue exploring
- Other
### Presenting the Options
Present all applicable categories, then ask the user which direction to go. If the user picks a frontier or integration spike, write the spike definitions directly into `.planning/spikes/MANIFEST.md` (appending to the existing table) and kick off `/gsd-spike` with those spikes pre-defined — the user shouldn't have to re-describe what was just proposed.
───────────────────────────────────────────────────────────────
</step>
</process>
@@ -271,11 +296,11 @@ Present all applicable categories, then ask the user which direction to go. If t
<success_criteria>
- [ ] All unprocessed spikes auto-included and processed
- [ ] Spikes grouped by feature area
- [ ] Spike-findings skill exists at `./.claude/skills/` with SKILL.md, references/, sources/
- [ ] Core source files from all spikes copied into sources/
- [ ] Reference files contain validated patterns, code snippets, landmines, constraints
- [ ] Spike-findings skill exists at `./.claude/skills/` with SKILL.md (including requirements), references/, sources/
- [ ] Reference files are implementation blueprints with Requirements, How to Build It, What to Avoid, Constraints
- [ ] `.planning/spikes/CONVENTIONS.md` created or updated with recurring stack/structure/pattern choices
- [ ] `.planning/spikes/WRAP-UP-SUMMARY.md` written for project history
- [ ] Project CLAUDE.md has auto-load routing line
- [ ] Summary presented
- [ ] Intelligent next-step analysis presented with integration spike candidates, frontier spike candidates, and standard options
- [ ] Next-step options presented (including frontier spike exploration via `/gsd-spike`)
</success_criteria>

View File

@@ -1,7 +1,11 @@
<purpose>
Rapid feasibility validation through focused, throwaway experiments. Each spike answers one
specific question with observable evidence. Saves artifacts to `.planning/spikes/`.
Companion to `/gsd-spike-wrap-up`.
Spike an idea through experiential exploration — build focused experiments to feel the pieces
of a future app, validate feasibility, and produce verified knowledge for the real build.
Saves artifacts to `.planning/spikes/`. Companion to `/gsd-spike-wrap-up`.
Supports two modes:
- **Idea mode** (default) — user describes an idea to spike
- **Frontier mode** — no argument or "frontier" / "what should I spike?" — analyzes existing spike landscape and proposes integration and frontier spikes
</purpose>
<required_reading>
@@ -20,9 +24,62 @@ Read all files referenced by the invoking prompt's execution_context before star
Parse `$ARGUMENTS` for:
- `--quick` flag → set `QUICK_MODE=true`
- `--text` flag → set `TEXT_MODE=true`
- `frontier` or empty → set `FRONTIER_MODE=true`
- Remaining text → the idea to spike
**Text mode:** If TEXT_MODE is enabled, replace AskUserQuestion calls with plain-text numbered lists — emit the options and ask the user to type the number of their choice.
**Text mode:** If TEXT_MODE is enabled, replace AskUserQuestion calls with plain-text numbered lists.
</step>
<step name="route">
## Routing
- **FRONTIER_MODE is true** → Jump to `frontier_mode`
- **Otherwise** → Continue to `setup_directory`
</step>
<step name="frontier_mode">
## Frontier Mode — Propose What to Spike Next
### Load the Spike Landscape
If no `.planning/spikes/` directory exists, tell the user there's nothing to analyze and offer to start fresh with an idea instead.
Otherwise, load in this order:
**a. MANIFEST.md** — the overall idea, requirements, and spike table with verdicts.
**b. Findings skills** — glob `./.claude/skills/spike-findings-*/SKILL.md` and read any that exist, plus their `references/*.md`. These contain curated knowledge from prior wrap-ups.
**c. CONVENTIONS.md** — read `.planning/spikes/CONVENTIONS.md` if it exists. Established stack and patterns.
**d. All spike READMEs** — read `.planning/spikes/*/README.md` for verdicts, results, investigation trails, and tags.
### Analyze for Integration Spikes
Review every pair and cluster of VALIDATED spikes. Look for:
- **Shared resources:** Two spikes that both touch the same API, database, state, or data format but were tested independently.
- **Data handoffs:** Spike A produces output that Spike B consumes. The formats were assumed compatible but never proven.
- **Timing/ordering:** Spikes that work in isolation but have sequencing dependencies in the real flow.
- **Resource contention:** Spikes that individually work but may compete for connections, memory, rate limits, or tokens when combined.
If integration risks exist, present them as concrete proposed spikes with names and Given/When/Then validation questions. If no meaningful integration risks exist, say so and skip this category.
### Analyze for Frontier Spikes
Think laterally about the overall idea from MANIFEST.md and what's been proven so far. Consider:
- **Gaps in the vision:** Capabilities assumed but unproven.
- **Discovered dependencies:** Findings that reveal new questions.
- **Alternative approaches:** Different angles for PARTIAL or INVALIDATED spikes.
- **Adjacent capabilities:** Things that would meaningfully improve the idea if feasible.
- **Comparison opportunities:** Approaches that worked but felt heavy.
Present frontier spikes as concrete proposals numbered from the highest existing spike number with Given/When/Then and risk ordering.
### Get Alignment and Execute
Present all integration and frontier candidates, then ask which to run. When the user picks spikes, write definitions into `.planning/spikes/MANIFEST.md` (appending to existing table) and proceed directly to building them starting at `research`.
</step>
<step name="setup_directory">
@@ -44,13 +101,16 @@ COMMIT_DOCS=$(gsd-sdk query config-get commit_docs 2>/dev/null || echo "true")
</step>
<step name="detect_stack">
Check for the project's tech stack to inform spike technology choices:
Check for the project's tech stack to inform spike technology choices.
**Check conventions first.** If `.planning/spikes/CONVENTIONS.md` exists, follow its stack and patterns — these represent validated choices the user expects to see continued.
**Then check the project stack:**
```bash
ls package.json pyproject.toml Cargo.toml go.mod 2>/dev/null
```
Use the project's language/framework by default. For greenfield projects with no existing stack, pick whatever gets to a runnable result fastest (Python, Node, Bash, single HTML file).
Use the project's language/framework by default. For greenfield projects with no conventions and no existing stack, pick whatever gets to a runnable result fastest.
Avoid unless the spike specifically requires it:
- Complex package management beyond `npm install` or `pip install`
@@ -59,22 +119,31 @@ Avoid unless the spike specifically requires it:
- Env files or config systems — hardcode everything
</step>
<step name="check_prior_spikes">
If `.planning/spikes/MANIFEST.md` exists, read it. Scan the verdicts, names, and validation questions of all prior spikes. When decomposing the new idea, cross-reference against this history:
<step name="load_prior_context">
If `.planning/spikes/` has existing content, load context in this priority order:
- **Skip already-validated questions.** If a prior spike proved "WebSocket streaming works" with a VALIDATED verdict, don't re-spike it. Note the prior spike number and move on.
- **Build on prior findings.** If a prior spike was INVALIDATED or PARTIAL, factor that into the new decomposition — don't repeat the same approach, and flag the constraint to the user.
- **Call out relevant prior art.** When presenting the decomposition, mention any prior spikes that overlap: "Spike 003 already validated X, so we can skip that and focus on Y."
**a. Conventions:** Read `.planning/spikes/CONVENTIONS.md` if it exists.
If no `.planning/spikes/MANIFEST.md` exists, skip this step.
**b. Findings skills:** Glob for `./.claude/skills/spike-findings-*/SKILL.md` and read any that exist, plus their `references/*.md` files.
**c. Manifest:** Read `.planning/spikes/MANIFEST.md` for the index of all spikes.
**d. Related READMEs:** Based on the new idea, identify which prior spikes are related by matching tags, names, technologies, or domain overlap. Read only those `.planning/spikes/*/README.md` files. Skip unrelated ones.
Cross-reference against this full body of prior work:
- **Skip already-validated questions.** Note the prior spike number and move on.
- **Build on prior findings.** Don't repeat failed approaches. Use their Research and Results sections.
- **Reuse prior research.** Carry findings forward rather than re-researching.
- **Follow established conventions.** Mention any deviation.
- **Call out relevant prior art** when presenting the decomposition.
If no `.planning/spikes/` exists, skip this step.
</step>
<step name="decompose">
**If `QUICK_MODE` is true:** Skip decomposition and alignment. Take the user's idea as a single spike question. Assign it spike number `001` (or next available). Jump to `research`.
**If `QUICK_MODE` is true:** Skip decomposition and alignment. Take the user's idea as a single spike question. Assign it the next available number. Jump to `research`.
**Otherwise:**
Break the idea into 2-5 independent questions that each prove something specific. Frame each as an informal Given/When/Then. Present as a table:
Break the idea into 2-5 independent questions. Frame each as Given/When/Then. Present as a table:
```
| # | Spike | Type | Validates (Given/When/Then) | Risk |
@@ -86,30 +155,17 @@ Break the idea into 2-5 independent questions that each prove something specific
**Spike types:**
- **standard** — one approach answering one question
- **comparison** — same question, different approaches. Use a shared number with lettered variants: `NNN-a-name` and `NNN-b-name`. Both built back-to-back, then head-to-head comparison.
- **comparison** — same question, different approaches. Shared number with letter suffix.
Good spikes answer one specific feasibility question:
- "Can we parse X format and extract Y?" — script that does it on a sample file
- "How fast is X approach?" — benchmark with real-ish data
- "Can we get X and Y to talk to each other?" — thinnest integration
- "What does X feel like as a UI?" — minimal interactive prototype
- "Does X API actually support Y?" — script that calls it and shows the response
- "Should we use X or Y for this?" — **comparison spike**: same thin proof built with both
Good spikes: specific feasibility questions with observable output.
Bad spikes: too broad, no observable output, or just reading/planning.
Bad spikes are too broad or don't produce observable output:
- "Set up the project" — not a question, just busywork
- "Design the architecture" — planning, not spiking
- "Build the backend" — too broad, no specific question
- "Research best practices" — open-ended reading with no runnable output
Order by risk — the spike most likely to kill the idea runs first.
Order by risk — most likely to kill the idea runs first.
</step>
<step name="align">
**If `QUICK_MODE` is true:** Skip.
Present the ordered spike list and ask which to build:
╔══════════════════════════════════════════════════════════════╗
║ CHECKPOINT: Decision Required ║
╚══════════════════════════════════════════════════════════════╝
@@ -119,35 +175,33 @@ Present the ordered spike list and ask which to build:
──────────────────────────────────────────────────────────────
→ Build all in this order, or adjust the list?
──────────────────────────────────────────────────────────────
The user may reorder, merge, split, or skip spikes. Wait for alignment.
</step>
<step name="research">
## Research Before Building
## Research and Briefing Before Each Spike
Before writing any spike code, ground each spike in reality. This prevents building against outdated APIs, picking the wrong library, or discovering mid-spike that the approach is impossible.
This step runs **before each individual spike**, not once at the start.
For each spike about to be built:
**a. Present a spike briefing:**
**a. Identify unknowns.** What libraries, APIs, protocols, or techniques does this spike depend on? What assumptions are you making about how they work?
> **Spike NNN: Descriptive Name**
> [2-3 sentences: what this spike is, why it matters, key risk or unknown.]
**b. Check current docs.** Use context7 (resolve-library-id → query-docs) for any library or framework involved. Use web search for APIs, services, or techniques without a context7 entry. Read actual documentation — not training data, which may be stale.
**b. Research the current state of the art.** Use context7 (resolve-library-id → query-docs) for libraries/frameworks. Use web search for APIs/services without a context7 entry. Read actual documentation.
**c. Validate feasibility before coding.** Specifically check:
- Does the API/library actually support what the spike assumes? (Check endpoints, methods, capabilities)
- What's the current recommended approach? (The "right way" changes — what was learned in training may be deprecated)
- Are there version constraints, breaking changes, or migration gotchas?
- Are there rate limits, auth requirements, or platform restrictions that would block the spike?
**c. Surface competing approaches** as a table:
**d. Pick the right tool.** If multiple libraries could solve the problem, briefly compare them on: current maintenance status, API fit for the specific spike question, and complexity. Pick the one that gets to a runnable answer fastest with the fewest surprises.
| Approach | Tool/Library | Pros | Cons | Status |
|----------|-------------|------|------|--------|
| ... | ... | ... | ... | ... |
**e. Capture research findings.** Add a `## Research` section to the spike's README (before `## How to Run`) with:
- Which docs were checked and key findings
- The chosen approach and why
- Any gotchas or constraints discovered
**Chosen approach:** [which one and why]
**Skip research when unnecessary.** If the spike uses only well-known, stable tools already verified in this session, or if the entire spike is pure logic with no external dependencies, skip this step. The goal is grounding in reality, not busywork.
If 2+ credible approaches exist, plan to build quick variants within the spike and compare them.
**d. Capture research findings** in a `## Research` section in the README.
**Skip when unnecessary** for pure logic with no external dependencies.
</step>
<step name="create_manifest">
@@ -159,57 +213,67 @@ Create or update `.planning/spikes/MANIFEST.md`:
## Idea
[One paragraph describing the overall idea being explored]
## Requirements
[Design decisions that emerged from the user's choices during spiking. Non-negotiable for the real build. Updated as spikes progress.]
- [e.g., "Must use streaming JSON output, not single-response"]
- [e.g., "Must support reconnection on network failure"]
## Spikes
| # | Name | Type | Validates | Verdict | Tags |
|---|------|------|-----------|---------|------|
| 001 | websocket-streaming | standard | WS connections can stream LLM output | VALIDATED | websocket, real-time |
| 002a | pdf-parse-pdfjs | comparison | PDF table extraction | WINNER | pdf, parsing |
| 002b | pdf-parse-camelot | comparison | PDF table extraction | — | pdf, parsing |
```
If MANIFEST.md already exists, append new spikes to the existing table.
**Track requirements as they emerge.** When the user expresses a preference during spiking, add it to the Requirements section immediately.
</step>
<step name="reground">
## Re-Ground Before Each Spike
Before starting each spike (not just the first), re-read `.planning/spikes/MANIFEST.md` and `.planning/spikes/CONVENTIONS.md` to prevent drift within long sessions. Check the Requirements section — make sure the spike doesn't contradict any established requirements.
</step>
<step name="build_spikes">
Build each spike sequentially, highest-risk first.
## Build Each Spike Sequentially
**Comparison spikes** use a shared number with lettered variants: `NNN-a-descriptive-name` and `NNN-b-descriptive-name`. Both answer the same question using different approaches. Build them back-to-back, then report a head-to-head comparison before moving on. Judge on criteria that matter for the real build: API ergonomics, output quality, complexity, performance, or whatever the user cares about. The comparison spike's verdict names the winner and why.
**Depth over speed.** The goal is genuine understanding, not a quick verdict. Never declare VALIDATED after a single happy-path test. Follow surprising findings. Test edge cases. Document the investigation trail, not just the conclusion.
**Comparison spikes** use shared number with letter suffix: `NNN-a-name` / `NNN-b-name`. Build back-to-back, then head-to-head comparison.
### For Each Spike:
**a.** Find next available number by checking existing `.planning/spikes/NNN-*/` directories.
Format: three-digit zero-padded + hyphenated descriptive name. Comparison spikes: same number with letter suffix — `002a-pdf-parse-pdfjs`, `002b-pdf-parse-camelot`.
**a.** Create `.planning/spikes/NNN-descriptive-name/`
**b.** Create the spike directory: `.planning/spikes/NNN-descriptive-name/`
**b.** Default to giving the user something they can experience. The bias should be toward building a simple UI or interactive demo, not toward stdout that only Claude reads. The user wants to *feel* the spike working, not just be told it works.
**c.** Assess observability needs before writing code. Ask: **can Claude fully verify this spike's outcome by running a command and reading stdout, or does it require human interaction with a runtime?**
**The default is: build something the user can interact with.** This could be:
- A simple HTML page that shows the result visually
- A web UI with a button that triggers the action and shows the response
- A page that displays data flowing through a pipeline
- A minimal interface where the user can try different inputs and see outputs
Spikes that need runtime observability:
- **UI spikes** — anything with a browser, clicks, visual feedback
- **Streaming spikes** — WebSockets, SSE, real-time data flow
- **Multi-process spikes** — client/server, IPC, subprocess orchestration
- **Timing-sensitive spikes** — race conditions, debounce, polling, reconnection
- **External API spikes** — where the API response shape, latency, or error behavior matters for the verdict
**Only fall back to stdout/CLI verification when the spike is genuinely about a fact, not a feeling:**
- Pure data transformation where the answer is "yes it parses correctly"
- Binary yes/no questions (does this API authenticate? does this library exist?)
- Benchmark numbers (how fast is X? how much memory does Y use?)
Spikes that do NOT need it:
- Pure computation (parse this file, transform this data)
- Single-run scripts with deterministic stdout
- Anything Claude can run and check the output of directly
When in doubt, build the UI. It takes a few extra minutes but produces a spike the user can actually demo and feel confident about.
**If the spike needs runtime observability,** build a forensic log layer into the spike:
**If the spike needs runtime observability,** build a forensic log layer:
1. Event log array with ISO timestamps and category tags
2. Export mechanism (server: GET endpoint, CLI: JSON file, browser: Export button)
3. Log summary (event counts, duration, errors, metadata)
4. Analysis helpers if volume warrants it
1. **An event log array** at module level that captures every meaningful event with an ISO timestamp and a direction/category tag (e.g., `"user_input"`, `"api_response"`, `"sse_frame"`, `"error"`, `"state_change"`)
2. **A log export mechanism** appropriate to the spike's runtime:
- For server spikes: a `GET /api/export-log` endpoint returning downloadable JSON
- For CLI spikes: write `spike-log-{timestamp}.json` to the spike directory on exit or on signal
- For browser spikes: a visible "Export Log" button that triggers a JSON download
3. **A log summary** included in the export: total event counts by category, duration, errors detected, environment metadata
4. **Analysis helpers** if the event volume warrants it: a small script (bash/python) in the spike directory that extracts the signal from the log. Name it `analyze-log.sh` or similar.
**c.** Build the code. Start with simplest version, then deepen.
Keep the logging lightweight — an array push per event, not a logging framework. Inline it in the spike code.
**d.** Iterate when findings warrant it:
- **Surprising surface?** Write a follow-up test that isolates and explores it.
- **Answer feels shallow?** Probe edge cases — large inputs, concurrent requests, malformed data, network failures.
- **Assumption wrong?** Adjust. Note the pivot in the README.
**d.** Build the minimum code that answers the spike's question (with the observability layer from step c if applicable). Every line must serve the question — nothing incidental.
Multiple files per spike are expected for complex questions (e.g., `test-basic.js`, `test-edge-cases.js`, `benchmark.js`).
**e.** Write `README.md` with YAML frontmatter:
@@ -227,36 +291,38 @@ tags: [tag1, tag2]
# Spike NNN: Descriptive Name
## What This Validates
[The specific feasibility question, framed as Given/When/Then]
[Given/When/Then]
## Research
[Docs checked, key findings, chosen approach and why, gotchas discovered. Omit if no external dependencies.]
[Docs checked, approach comparison table, chosen approach, gotchas. Omit if no external deps.]
## How to Run
[Single command or short sequence to run the spike]
[Command(s)]
## What to Expect
[Concrete observable outcomes: "When you click X, you should see Y within Z seconds"]
[Concrete observable outcomes]
## Observability
[If this spike has a forensic log layer: describe what's captured, how to export the log, and how to analyze it. Omit for spikes without runtime observability.]
[If forensic log layer exists. Omit otherwise.]
## Investigation Trail
[Updated as spike progresses. Document each iteration: what tried, what revealed, what tried next.]
## Results
[Filled in after running — verdict, evidence, surprises. If a forensic log was exported, include key findings from the log analysis here.]
[Verdict, evidence, surprises, log analysis findings.]
```
**f.** Auto-link related spikes: read existing spike READMEs and infer relationships from tags, names, and descriptions. Write the `related` field silently.
**f.** Auto-link related spikes silently.
**g.** Run and verify:
- If self-verifiable: run it, check output, update README verdict and Results section
- If needs human judgment: run it, present instructions using a checkpoint box:
- Self-verifiable: run, iterate if findings warrant deeper investigation, update verdict
- Needs human judgment: present checkpoint box:
╔══════════════════════════════════════════════════════════════╗
║ CHECKPOINT: Verification Required ║
╚══════════════════════════════════════════════════════════════╝
**Spike {NNN}: {name}**
**How to run:** {command}
**What to expect:** {concrete outcomes}
@@ -264,45 +330,69 @@ tags: [tag1, tag2]
→ Does this match what you expected? Describe what you see.
──────────────────────────────────────────────────────────────
- If the spike has a forensic log layer: after verification, export the log and include key findings in the Results section. If something went wrong, ask the user to export the log and provide it for diagnosis.
**h.** Update `.planning/spikes/MANIFEST.md` with the spike's row.
**h.** Update verdict to VALIDATED / INVALIDATED / PARTIAL (or WINNER for comparison spike winners). Update Results section with evidence.
**i.** Update `.planning/spikes/MANIFEST.md` with the spike's row.
**j.** Commit (if `COMMIT_DOCS` is true):
**i.** Commit (if `COMMIT_DOCS` is true):
```bash
gsd-sdk query commit "docs(spike-NNN): [VERDICT] — [key finding in one sentence]" .planning/spikes/NNN-descriptive-name/ .planning/spikes/MANIFEST.md
gsd-sdk query commit "docs(spike-NNN): [VERDICT] — [key finding]" .planning/spikes/NNN-descriptive-name/ .planning/spikes/MANIFEST.md
```
**k.** Report before moving to next spike:
**j.** Report:
```
◆ Spike NNN: {name}
Verdict: {VALIDATED ✓ / INVALIDATED ✗ / PARTIAL ⚠}
Finding: {one sentence}
Impact: {effect on remaining spikes, if any}
Key findings: {not just verdict — investigation trail, surprises, edge cases explored}
Impact: {effect on remaining spikes}
```
**l.** If a spike invalidates a core assumption: stop and present:
Do not rush to a verdict. A spike that says "VALIDATED — it works" with no nuance is almost always incomplete.
**k.** If core assumption invalidated:
╔══════════════════════════════════════════════════════════════╗
║ CHECKPOINT: Decision Required ║
╚══════════════════════════════════════════════════════════════╝
Core assumption invalidated by Spike {NNN}.
{what was invalidated and why}
──────────────────────────────────────────────────────────────
→ Continue with remaining spikes / Pivot approach / Abandon
──────────────────────────────────────────────────────────────
</step>
Only proceed if the user says to.
<step name="update_conventions">
## Update Conventions
After all spikes in this session are built, update `.planning/spikes/CONVENTIONS.md` with patterns that emerged or solidified.
```markdown
# Spike Conventions
Patterns and stack choices established across spike sessions. New spikes follow these unless the question requires otherwise.
## Stack
[What we use for frontend, backend, scripts, and why]
## Structure
[Common file layouts, port assignments, naming patterns]
## Patterns
[Recurring approaches: how we handle auth, how we style, how we serve]
## Tools & Libraries
[Preferred packages with versions that worked, and any to avoid]
```
Only include patterns that repeated across 2+ spikes or were explicitly chosen by the user. If `CONVENTIONS.md` already exists, update sections with new patterns from this session.
Commit (if `COMMIT_DOCS` is true):
```bash
gsd-sdk query commit "docs(spikes): update conventions" .planning/spikes/CONVENTIONS.md
```
</step>
<step name="report">
After all spikes complete, present the consolidated report:
```
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
GSD ► SPIKE COMPLETE ✓
@@ -314,32 +404,31 @@ After all spikes complete, present the consolidated report:
|---|------|------|---------|
| 001 | {name} | standard | ✓ VALIDATED |
| 002a | {name} | comparison | ✓ WINNER |
| 002b | {name} | comparison | — |
## Key Discoveries
{surprises, gotchas, things that weren't expected}
{surprises, gotchas, investigation trail highlights}
## Feasibility Assessment
{overall, is the idea viable?}
{overall viability}
## Signal for the Build
{what the real implementation should use, avoid, or watch out for}
{what to use, avoid, watch out for}
```
───────────────────────────────────────────────────────────────
## ▶ Next Up
**Package findings** — wrap spike knowledge into a reusable skill
**Package findings** — wrap spike knowledge into an implementation blueprint
`/gsd-spike-wrap-up`
───────────────────────────────────────────────────────────────
**Also available:**
- `/gsd-spike` — spike more ideas (or run with no argument for frontier mode)
- `/gsd-plan-phase` — start planning the real implementation
- `/gsd-explore` — continue exploring the idea
- `/gsd-add-phase` — add a phase to the roadmap based on findings
───────────────────────────────────────────────────────────────
</step>
@@ -348,15 +437,16 @@ After all spikes complete, present the consolidated report:
<success_criteria>
- [ ] `.planning/spikes/` created (auto-creates if needed, no project init required)
- [ ] Prior spikes checked — already-validated questions skipped, prior findings factored in
- [ ] Research grounded each spike in current docs before coding (unless pure logic/no deps)
- [ ] Prior spikes and findings skills consulted before building
- [ ] Conventions followed (or deviation documented)
- [ ] Research grounded each spike in current docs before coding
- [ ] Depth over speed — edge cases tested, surprising findings followed, investigation trail documented
- [ ] Comparison spikes built back-to-back with head-to-head verdict
- [ ] Spikes needing human interaction have forensic log layer (event capture, export, analysis)
- [ ] Each spike answers one specific question with observable evidence
- [ ] Each spike README has complete frontmatter (including type), run instructions, and results
- [ ] User verified each spike (self-verified or human checkpoint)
- [ ] MANIFEST.md is current (with Type column)
- [ ] Spikes needing human interaction have forensic log layer
- [ ] Requirements tracked in MANIFEST.md as they emerge from user choices
- [ ] CONVENTIONS.md created or updated with patterns that emerged
- [ ] Each spike README has complete frontmatter, Investigation Trail, and Results
- [ ] MANIFEST.md is current (with Type column and Requirements section)
- [ ] Commits use `docs(spike-NNN): [VERDICT]` format
- [ ] Consolidated report presented with next-step routing
- [ ] If core assumption invalidated, execution stopped and user consulted
</success_criteria>

4
package-lock.json generated
View File

@@ -1,12 +1,12 @@
{
"name": "get-shit-done-cc",
"version": "1.37.1",
"version": "1.38.2",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "get-shit-done-cc",
"version": "1.37.1",
"version": "1.38.2",
"license": "MIT",
"bin": {
"get-shit-done-cc": "bin/install.js"

View File

@@ -1,6 +1,6 @@
{
"name": "get-shit-done-cc",
"version": "1.37.1",
"version": "1.38.2",
"description": "A meta-prompting, context engineering and spec-driven development system for Claude Code, OpenCode, Gemini and Codex by TÂCHES.",
"bin": {
"get-shit-done-cc": "bin/install.js"

View File

@@ -14,8 +14,6 @@ const GSD_MANAGED_DIRS = [
'agents',
join('commands', 'gsd'),
'hooks',
'command',
'skills',
];
function walkDir(dir: string, baseDir: string): string[] {

View File

@@ -100,6 +100,63 @@ describe('stripShippedMilestones', () => {
it('returns content unchanged when no details blocks', () => {
expect(stripShippedMilestones('no details here')).toBe('no details here');
});
// Bug #2496: inline ✅ SHIPPED heading sections must be stripped
it('strips ## heading sections marked ✅ SHIPPED', () => {
const content = [
'## Milestone v1.0: MVP — ✅ SHIPPED 2026-01-15',
'',
'Phase 1, Phase 2',
'',
'## Milestone v2.0: Current',
'',
'Phase 3',
].join('\n');
const stripped = stripShippedMilestones(content);
expect(stripped).not.toContain('MVP');
expect(stripped).not.toContain('v1.0');
expect(stripped).toContain('v2.0');
expect(stripped).toContain('Current');
});
it('strips multiple inline SHIPPED sections and leaves non-shipped content', () => {
const content = [
'## Milestone v1.0: Alpha — ✅ SHIPPED 2026-01-01',
'',
'Old content',
'',
'## Milestone v1.5: Beta — ✅ SHIPPED 2026-02-01',
'',
'More old content',
'',
'## Milestone v2.0: Gamma',
'',
'Current content',
].join('\n');
const stripped = stripShippedMilestones(content);
expect(stripped).not.toContain('Alpha');
expect(stripped).not.toContain('Beta');
expect(stripped).toContain('Gamma');
expect(stripped).toContain('Current content');
});
// Bug #2508 follow-up: ### headings must be stripped too
it('strips ### heading sections marked ✅ SHIPPED', () => {
const content = [
'### Milestone v1.0: MVP — ✅ SHIPPED 2026-01-15',
'',
'Phase 1, Phase 2',
'',
'### Milestone v2.0: Current',
'',
'Phase 3',
].join('\n');
const stripped = stripShippedMilestones(content);
expect(stripped).not.toContain('MVP');
expect(stripped).not.toContain('v1.0');
expect(stripped).toContain('v2.0');
expect(stripped).toContain('Current');
});
});
// ─── getMilestoneInfo ─────────────────────────────────────────────────────
@@ -158,6 +215,66 @@ describe('getMilestoneInfo', () => {
expect(info.version).toBe('v1.0');
expect(info.name).toBe('milestone');
});
// Bug #2495: STATE.md must take priority over ROADMAP heading matching
it('prefers STATE.md milestone over ROADMAP heading match', async () => {
const roadmap = [
'## Milestone v1.0: Shipped — ✅ SHIPPED 2026-01-01',
'',
'Phase 1',
'',
'## Milestone v2.0: Current Active',
'',
'Phase 2',
].join('\n');
await writeFile(join(tmpDir, '.planning', 'ROADMAP.md'), roadmap);
await writeFile(
join(tmpDir, '.planning', 'STATE.md'),
'---\nmilestone: v2.0\nmilestone_name: Current Active\n---\n',
);
const info = await getMilestoneInfo(tmpDir);
expect(info.version).toBe('v2.0');
expect(info.name).toBe('Current Active');
});
// Bug #2508 follow-up: STATE.md has milestone version but no milestone_name —
// should use ROADMAP for the real name, still prefer STATE.md for version.
it('uses ROADMAP name when STATE.md has milestone version but no milestone_name', async () => {
const roadmap = [
'## Milestone v2.0: Real Name From Roadmap',
'',
'Phase 2',
].join('\n');
await writeFile(join(tmpDir, '.planning', 'ROADMAP.md'), roadmap);
await writeFile(
join(tmpDir, '.planning', 'STATE.md'),
'---\nmilestone: v2.0\n---\n', // no milestone_name
);
const info = await getMilestoneInfo(tmpDir);
expect(info.version).toBe('v2.0');
expect(info.name).toBe('Real Name From Roadmap');
});
it('returns correct milestone from STATE.md even when ROADMAP inline-SHIPPED stripping would fix it', async () => {
// ROADMAP with an unstripped shipped milestone heading (pre-fix state)
const roadmap = [
'## Milestone v1.0: Old — ✅ SHIPPED 2026-01-01',
'',
'Old phases',
'',
'## Milestone v2.0: New',
'',
'New phases',
].join('\n');
await writeFile(join(tmpDir, '.planning', 'ROADMAP.md'), roadmap);
await writeFile(
join(tmpDir, '.planning', 'STATE.md'),
'---\nmilestone: v2.0\nmilestone_name: New\n---\n',
);
const info = await getMilestoneInfo(tmpDir);
expect(info.version).toBe('v2.0');
expect(info.name).toBe('New');
});
});
// ─── extractCurrentMilestone ──────────────────────────────────────────────

View File

@@ -50,7 +50,13 @@ interface PhaseSection {
* Port of stripShippedMilestones from core.cjs line 1082-1084.
*/
export function stripShippedMilestones(content: string): string {
return content.replace(/<details>[\s\S]*?<\/details>/gi, '');
// Pattern 1: <details>...</details> blocks (explicit collapse)
let result = content.replace(/<details>[\s\S]*?<\/details>/gi, '');
// Pattern 2: inline milestone headings marked as shipped.
// Keep aligned with heading levels accepted by extractCurrentMilestone() (## and ###).
const sections = result.split(/(?=^#{2,3}\s)/m);
result = sections.filter(s => !/^#{2,3}\s[^\n]*✅\s*SHIPPED\b/im.test(s)).join('');
return result;
}
/**
@@ -84,25 +90,35 @@ async function parseMilestoneFromState(projectDir: string): Promise<{ version: s
*/
export async function getMilestoneInfo(projectDir: string): Promise<{ version: string; name: string }> {
try {
// Priority 1: STATE.md frontmatter (authoritative for version; name only when real)
const fromState = await parseMilestoneFromState(projectDir);
const stateVersion = fromState?.version ?? null;
const stateName = fromState && fromState.name !== 'milestone' ? fromState.name : null;
if (stateVersion && stateName) {
return { version: stateVersion, name: stateName };
}
// STATE.md has a version but no real name — fall through to ROADMAP for the name,
// then override the version with the authoritative STATE.md value.
const roadmap = await readFile(planningPaths(projectDir).roadmap, 'utf-8');
// List-format: construction / blocked (legacy emoji)
const barricadeMatch = roadmap.match(/🚧\s*\*\*v(\d+(?:\.\d+)+)\s+([^*]+)\*\*/);
if (barricadeMatch) {
return { version: 'v' + barricadeMatch[1], name: barricadeMatch[2].trim() };
return { version: stateVersion ?? 'v' + barricadeMatch[1], name: barricadeMatch[2].trim() };
}
// List-format: in flight / active (GSD ROADMAP template uses 🟡 for current milestone)
const inFlightMatch = roadmap.match(/🟡\s*\*\*v(\d+(?:\.\d+)+)\s+([^*]+)\*\*/);
if (inFlightMatch) {
return { version: 'v' + inFlightMatch[1], name: inFlightMatch[2].trim() };
return { version: stateVersion ?? 'v' + inFlightMatch[1], name: inFlightMatch[2].trim() };
}
// Heading-format — strip shipped <details> blocks first
const cleaned = stripShippedMilestones(roadmap);
const headingMatch = cleaned.match(/##\s+.*v(\d+(?:\.\d+)+)[:\s]+([^\n(]+)/);
if (headingMatch) {
return { version: 'v' + headingMatch[1], name: headingMatch[2].trim() };
return { version: stateVersion ?? 'v' + headingMatch[1], name: headingMatch[2].trim() };
}
// Milestone bullet list (## Milestones … ## Phases): use last **vX.Y Title** — typically the current row
@@ -110,21 +126,16 @@ export async function getMilestoneInfo(projectDir: string): Promise<{ version: s
const boldMatches = [...beforePhases.matchAll(/\*\*v(\d+(?:\.\d+)+)\s+([^*]+)\*\*/g)];
if (boldMatches.length > 0) {
const last = boldMatches[boldMatches.length - 1];
return { version: 'v' + last[1], name: last[2].trim() };
}
const fromState = await parseMilestoneFromState(projectDir);
if (fromState) {
return fromState;
return { version: stateVersion ?? 'v' + last[1], name: last[2].trim() };
}
const allBare = [...cleaned.matchAll(/\bv(\d+(?:\.\d+)+)\b/g)];
if (allBare.length > 0) {
const lastBare = allBare[allBare.length - 1];
return { version: lastBare[0], name: 'milestone' };
return { version: stateVersion ?? lastBare[0], name: 'milestone' };
}
return { version: 'v1.0', name: 'milestone' };
return { version: stateVersion ?? 'v1.0', name: 'milestone' };
} catch {
const fromState = await parseMilestoneFromState(projectDir);
if (fromState) return fromState;

View File

@@ -0,0 +1,90 @@
/**
* Tests for bug #2501: resurrection-detection block in execute-phase.md must
* check git history before deleting new .planning/ files.
*
* Root cause: the original logic deleted ANY .planning/ file that was absent
* from PRE_MERGE_FILES, which includes brand-new files (e.g. SUMMARY.md)
* that the executor just created. A true "resurrection" is a file that was
* previously tracked on main, deliberately deleted, and then re-introduced by
* a worktree merge. Detecting that requires a git history check, not just a
* pre-merge tree membership check.
*/
'use strict';
const { test, describe } = require('node:test');
const assert = require('node:assert/strict');
const fs = require('fs');
const path = require('path');
const EXECUTE_PHASE = path.join(
__dirname, '..', 'get-shit-done', 'workflows', 'execute-phase.md'
);
describe('execute-phase.md — resurrection-detection guard (#2501)', () => {
let content;
// Load once; each test reads from the cached string.
test('file is readable', () => {
content = fs.readFileSync(EXECUTE_PHASE, 'utf-8');
assert.ok(content.length > 0, 'execute-phase.md must not be empty');
});
test('resurrection block checks git history for a prior deletion event', () => {
if (!content) content = fs.readFileSync(EXECUTE_PHASE, 'utf-8');
// Scope check to the resurrection block only (up to 1200 chars from its heading).
const resurrectionStart = content.indexOf('# Detect files deleted on main');
assert.ok(resurrectionStart !== -1, 'resurrection comment must exist');
const window = content.slice(resurrectionStart, resurrectionStart + 1200);
// The fix must add a git log --diff-filter=D check inside this block so that
// only files with a deletion event in the main branch ancestry are removed.
const hasHistoryCheck =
window.includes('--diff-filter=D') &&
window.includes('git log');
assert.ok(
hasHistoryCheck,
'execute-phase.md resurrection block must use "git log ... --diff-filter=D" to verify a file was previously deleted before removing it'
);
});
test('resurrection block does not delete files solely because they are absent from PRE_MERGE_FILES', () => {
if (!content) content = fs.readFileSync(EXECUTE_PHASE, 'utf-8');
// Extract the resurrection section (between the "Detect files deleted on main"
// comment and the next empty line / next major comment block).
const resurrectionStart = content.indexOf('# Detect files deleted on main');
assert.ok(
resurrectionStart !== -1,
'execute-phase.md must contain the resurrection-detection comment block'
);
// Grab a window of text around the resurrection block (up to 1200 chars).
const window = content.slice(resurrectionStart, resurrectionStart + 1200);
// The ONLY deletion guard should be the history check.
// The buggy pattern: `if ! echo "$PRE_MERGE_FILES" | grep -qxF "$RESURRECTED"`
// with NO accompanying history check. After the fix the sole condition
// determining deletion must involve a git-log history lookup.
const hasBuggyStandaloneGuard =
/if\s*!\s*echo\s*"\$PRE_MERGE_FILES"\s*\|\s*grep\s+-qxF\s*"\$RESURRECTED"/.test(window) &&
!/git log/.test(window);
assert.ok(
!hasBuggyStandaloneGuard,
'resurrection block must NOT delete files based solely on absence from PRE_MERGE_FILES without a git-history check'
);
});
test('resurrection block still removes files that have a deletion history on main', () => {
if (!content) content = fs.readFileSync(EXECUTE_PHASE, 'utf-8');
// The fix must still call `git rm` for genuine resurrections.
const resurrectionStart = content.indexOf('# Detect files deleted on main');
assert.ok(resurrectionStart !== -1, 'resurrection comment must exist');
const window = content.slice(resurrectionStart, resurrectionStart + 1200);
assert.ok(
window.includes('git rm'),
'resurrection block must still call git rm to remove genuinely resurrected files'
);
});
});

View File

@@ -0,0 +1,62 @@
/**
* Regression test for #2502: insert-phase does not update STATE.md's
* next-phase recommendation after inserting a decimal phase.
*
* Root cause: insert-phase.md's update_project_state step only added a
* "Roadmap Evolution" note to STATE.md, but never updated the "Current Phase"
* / next-run recommendation to point at the newly inserted phase.
*
* Fix: insert-phase.md must include a step that updates STATE.md's next-phase
* pointer (current_phase / next recommended run) to the newly inserted phase.
*/
'use strict';
const { describe, test } = require('node:test');
const assert = require('node:assert/strict');
const fs = require('fs');
const path = require('path');
const INSERT_PHASE_PATH = path.join(
__dirname, '..', 'get-shit-done', 'workflows', 'insert-phase.md'
);
describe('bug-2502: insert-phase must update STATE.md next-phase recommendation', () => {
test('insert-phase.md exists', () => {
assert.ok(fs.existsSync(INSERT_PHASE_PATH), 'insert-phase.md should exist');
});
test('insert-phase.md contains a STATE.md next-phase update instruction', () => {
const content = fs.readFileSync(INSERT_PHASE_PATH, 'utf-8');
// Must reference STATE.md and the concept of updating the next/current phase pointer
const mentionsStateUpdate = (
/STATE\.md.{0,200}(next.phase|current.phase|next.run|recommendation)/is.test(content) ||
/(next.phase|current.phase|next.run|recommendation).{0,200}STATE\.md/is.test(content)
);
assert.ok(
mentionsStateUpdate,
'insert-phase.md must instruct updating STATE.md\'s next-phase recommendation to point to the newly inserted phase'
);
});
test('insert-phase.md update_project_state step covers next-phase pointer', () => {
const content = fs.readFileSync(INSERT_PHASE_PATH, 'utf-8');
const stepMatch = content.match(/<step name="update_project_state">([\s\S]*?)<\/step>/i);
assert.ok(stepMatch, 'insert-phase.md must contain update_project_state step');
const stepContent = stepMatch[1];
const hasNextPhasePointerUpdate = (
/\bcurrent[_ -]?phase\b/i.test(stepContent) ||
/\bnext[_ -]?phase\b/i.test(stepContent) ||
/\bnext recommended run\b/i.test(stepContent)
);
assert.ok(
hasNextPhasePointerUpdate,
'insert-phase.md update_project_state step must update STATE.md\'s next-phase pointer (current_phase) to the inserted decimal phase'
);
});
});

View File

@@ -0,0 +1,55 @@
/**
* Regression test for bug #2506
*
* /gsd-settings presents Quality/Balanced/Budget model profiles without any
* warning that on non-Claude runtimes (Codex, Gemini CLI, etc.) these profiles
* select Claude model tiers and have no effect on actual agent model selection.
*
* Fix: settings.md must include a non-Claude runtime note instructing users to
* use "Inherit" or configure model_overrides manually, and the Inherit option
* description must explicitly call out non-Claude runtimes.
*
* Closes: #2506
*/
'use strict';
const { describe, test, before } = require('node:test');
const assert = require('node:assert/strict');
const fs = require('fs');
const path = require('path');
const SETTINGS_PATH = path.join(__dirname, '..', 'get-shit-done', 'workflows', 'settings.md');
describe('bug #2506: settings.md non-Claude runtime warning for model profiles', () => {
let content;
before(() => {
content = fs.readFileSync(SETTINGS_PATH, 'utf-8');
});
test('settings.md contains a non-Claude runtime note for model profiles', () => {
assert.ok(
content.includes('non-Claude runtime') || content.includes('non-Claude runtimes'),
'settings.md must include a note about non-Claude runtimes and model profiles'
);
});
test('non-Claude note explains profiles are no-ops without model_overrides', () => {
assert.ok(
content.includes('model_overrides') || content.includes('no effect'),
'note must explain profiles have no effect on non-Claude runtimes without model_overrides'
);
});
test('Inherit option description explicitly mentions non-Claude runtimes', () => {
// The Inherit option in AskUserQuestion must call out non-Claude runtimes
const inheritOptionMatch = content.match(/label:\s*"Inherit"[^}]*description:\s*"([^"]+)"/s);
assert.ok(inheritOptionMatch, 'Inherit option with label/description must exist in settings.md');
const desc = inheritOptionMatch[1];
assert.ok(
desc.includes('non-Claude') || desc.includes('Codex') || desc.includes('Gemini'),
`Inherit option description must mention non-Claude runtimes; got: "${desc}"`
);
});
});

View File

@@ -225,4 +225,58 @@ describe('detect-custom-files — update workflow backup detection (#1997)', ()
`should detect custom reference; got: ${JSON.stringify(json.custom_files)}`
);
});
// #2505 — installer does NOT wipe skills/ or command/; scanning them produces
// false-positive "custom file" reports for every skill the user has installed
// from other packages.
test('does not scan skills/ directory (installer does not wipe it)', () => {
writeManifest(tmpDir, {
'get-shit-done/workflows/execute-phase.md': '# Execute Phase\n',
});
// Simulate user having third-party skills installed — none in manifest
const skillsDir = path.join(tmpDir, 'skills');
fs.mkdirSync(skillsDir, { recursive: true });
fs.writeFileSync(path.join(skillsDir, 'my-custom-skill.md'), '# My Skill\n');
fs.writeFileSync(path.join(skillsDir, 'another-plugin-skill.md'), '# Another\n');
const result = runGsdTools(
['detect-custom-files', '--config-dir', tmpDir],
tmpDir
);
assert.ok(result.success, `Command failed: ${result.error}`);
const json = JSON.parse(result.output);
const skillFiles = json.custom_files.filter(f => f.startsWith('skills/'));
assert.strictEqual(
skillFiles.length, 0,
`skills/ should not be scanned; got false positives: ${JSON.stringify(skillFiles)}`
);
});
test('does not scan command/ directory (installer does not wipe it)', () => {
writeManifest(tmpDir, {
'get-shit-done/workflows/execute-phase.md': '# Execute Phase\n',
});
// Simulate files in command/ dir not wiped by installer
const commandDir = path.join(tmpDir, 'command');
fs.mkdirSync(commandDir, { recursive: true });
fs.writeFileSync(path.join(commandDir, 'user-command.md'), '# User Command\n');
const result = runGsdTools(
['detect-custom-files', '--config-dir', tmpDir],
tmpDir
);
assert.ok(result.success, `Command failed: ${result.error}`);
const json = JSON.parse(result.output);
const commandFiles = json.custom_files.filter(f => f.startsWith('command/'));
assert.strictEqual(
commandFiles.length, 0,
`command/ should not be scanned; got false positives: ${JSON.stringify(commandFiles)}`
);
});
});