Files
get-shit-done/tests/plan-review-convergence.test.cjs
Logan fbf30792f3 docs: authoritative shipped-surface inventory with filesystem-backed parity tests (#2390)
* docs: finish trust-bug fixes in user guide and commands

Correct load-bearing defects in the v1.36.0 docs corpus so readers stop
acting on wrong defaults and stale exhaustiveness claims.

- README.md: drop "Complete feature"/"Every command"/"All 18 agents"
  exhaustiveness claims; replace version-pinned "What's new in v1.32"
  bullet with a CHANGELOG pointer.
- CONFIGURATION.md: fix `claude_md_path` default (null/none -> `./CLAUDE.md`)
  in both Full Schema and core settings table; correct `workflow.tdd_mode`
  provenance from "Added in v1.37" to "Added in v1.36".
- USER-GUIDE.md: fix `workflow.discuss_mode` default (`standard` ->
  `discuss`) in the workflow-toggles table AND in the abbreviated Full
  Schema JSON block above it; align the Options cell with the shipped
  enum.
- COMMANDS.md: drop "Complete command syntax" subtitle overclaim to
  match the README posture.
- AGENTS.md: weaken "All 21 specialized agents" header to reflect that
  the `agents/` filesystem is authoritative (shipped roster is 31).

Part 1 of a stacked docs refresh series (PR 1/4).

* docs: refresh shipped surface coverage for v1.36

Close the v1.36.0 shipped-surface gaps in the docs corpus.

- COMMANDS.md: add /gsd-graphify section (build/query/status/diff) and
  its config gate; expand /gsd-quick with --validate flag and list/
  status/resume subcommands; expand /gsd-thread with list --open, list
  --resolved, close <slug>, status <slug>.
- CLI-TOOLS.md: replace the hardcoded "15 domain modules" count with a
  pointer to the Module Architecture table; add a graphify verb-family
  section (build/query/status/diff/snapshot); add Graphify and Learnings
  rows to the Module Architecture table.
- FEATURES.md: add TOC entries for #116 TDD Pipeline Mode and #117
  Knowledge Graph Integration; add the #117 body with REQ-GRAPH-01..05.
- CONFIGURATION.md: move security_enforcement / security_asvs_level /
  security_block_on from root into `workflow.*` in Full Schema to match
  templates/config.json and the gsd-sdk runtime reads; update Security
  Settings table to use the workflow.* prefix; add planning.sub_repos
  to Full Schema and description table; add a Graphify Settings section
  documenting graphify.enabled and graphify.build_timeout.

Note: VALID_CONFIG_KEYS in bin/lib/config.cjs does not yet include
workflow.security_* or planning.sub_repos, so config-set currently
rejects them. That is a pre-existing validator gap that this PR does
not attempt to fix; the docs now correctly describe where these keys
live per the shipped template and runtime reads.

Part 2 of a stacked docs refresh series (PR 2/5), based on PR 1.

* docs: make inventory authoritative and reconcile architecture

Upgrade docs/INVENTORY.md from "complete for agents, selective for others"
to authoritative across all six shipped-surface families, and reconcile
docs/ARCHITECTURE.md against the new inventory so the PR that introduces
INVENTORY does not also introduce an INVENTORY/ARCHITECTURE contradiction.

- docs/AGENTS.md: weaken "21 specialized agents" header to 21 primary +
  10 advanced (31 shipped); add new "Advanced and Specialized Agents"
  section with concise role cards for the 10 previously-omitted shipped
  agents (pattern-mapper, debug-session-manager, code-reviewer,
  code-fixer, ai-researcher, domain-researcher, eval-planner,
  eval-auditor, framework-selector, intel-updater); footnote the Agent
  Tool Permissions Summary as primary-agents-only so it no longer
  misleads.

- docs/INVENTORY.md (rewritten to be authoritative):
  * Full 31-agent roster with one-line role + spawner + primary-doc
    status per agent (unchanged from prior partial work).
  * Commands: full 75-row enumeration grouped by Core Workflow, Phase &
    Milestone Management, Session & Navigation, Codebase Intelligence,
    Review/Debug/Recovery, and Docs/Profile/Utilities — each row
    carries a one-line role derived from the command's frontmatter and
    a link to the source file.
  * Workflows: full 72-row enumeration covering every
    get-shit-done/workflows/*.md, with a one-line role per workflow and
    a column naming the user-facing command (or internal orchestrator)
    that invokes it.
  * References: full 41-row enumeration grouped by Core, Workflow,
    Thinking-Model clusters, and the Modular Planner decomposition,
    matching the groupings docs/ARCHITECTURE.md already uses; notes
    the few-shot-examples subdirectory separately.
  * CLI Modules and Hooks: unchanged — already full rosters.
  * Maintenance section rewritten to describe the drift-guard test
    suite that will land in PR4 (inventory-counts, commands-doc-parity,
    agents-doc-parity, cli-modules-doc-parity, hooks-doc-parity).

- docs/ARCHITECTURE.md reconciled against INVENTORY:
  * References block: drop the stale "(35 total)" count; point at
    INVENTORY.md#references-41-shipped for the authoritative count.
  * CLI Tools block: drop the stale "19 domain modules" count; point
    at INVENTORY.md#cli-modules-24-shipped for the authoritative roster.
  * Agent Spawn Categories: relabel as "Primary Agent Spawn Categories"
    and add a footer naming the 10 advanced agents and pointing at
    INVENTORY.md#agents-31-shipped for the full 31-agent roster.

- docs/CONFIGURATION.md: preserve the six model-profile rows added in
  the prior partial work, and tighten the fallback note so it names the
  13 shipped agents without an explicit profile row, documents
  model_overrides as the escape hatch, and points at INVENTORY.md for
  the authoritative 31-agent roster.

Part 3 of a stacked docs refresh series (PR 3/4). Remaining consistency
work (USER-GUIDE config-section delete-and-link, FEATURES.md TOC
reorder, ARCHITECTURE.md Hook-table expansion + installation-layout
collapse, CLI-TOOLS.md module-row additions, workflow-discuss-mode
invocation normalization, and the five doc-parity tests) lands in PR4.

* test(docs): add consistency guards and remove duplicate refs

Consolidates USER-GUIDE.md's command/config duplicates into pointers to
COMMANDS.md and CONFIGURATION.md (kills a ghost `resolve_model_ids` key
and a stale `discuss_mode: standard` default); reorders FEATURES.md TOC
chronologically so v1.32 precedes v1.34/1.35/1.36; expands
ARCHITECTURE.md's Hook table to the 11 shipped hooks
(gsd-read-injection-scanner, gsd-check-update-worker) and collapses
the installation-layout hook enumeration to the *.js/*.sh pattern form;
adds audit/gsd2-import/intel rows and state signal-*, audit-open,
from-gsd2 verbs to CLI-TOOLS.md; normalizes workflow-discuss-mode.md
invocations to `node gsd-tools.cjs config-set`.

Adds five drift guards anchored on docs/INVENTORY.md as the
authoritative roster: inventory-counts (all six families),
commands/agents/cli-modules/hooks parity checks that every shipped
surface has a row somewhere.

* fix(convergence): thread --ws to review agent; add stall and max-cycles behavioral tests

- Thread GSD_WS through to review agent spawn in plan-review-convergence
  workflow (step 5a) so --ws scoping is symmetric with planning step
- Add behavioral stall detection test: asserts workflow compares
  HIGH_COUNT >= prev_high_count and emits a stall warning
- Add behavioral --max-cycles 1 test: asserts workflow reaches escalation
  gate when cycle >= MAX_CYCLES with HIGH > 0 after a single cycle
- Include original PR files (commands, workflow, tests) as the branch
  predated the PR commits

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(docs,config): PR #2390 review — security_* config keys and REQ-GRAPH-02 scope

Addresses trek-e's review items that don't require rebase:

- config.cjs: add workflow.security_enforcement, workflow.security_asvs_level,
  workflow.security_block_on to VALID_CONFIG_KEYS so gsd-sdk config-set accepts
  them (closed the gap where docs/CONFIGURATION.md listed keys the validator
  rejected).
- core.cjs: add matching CONFIG_DEFAULTS entries (true / 1 / 'high') so the
  canonical defaults table matches the documented values.
- config.cjs: wire the three keys into the new-project workflow defaults so
  fresh configs inherit them.
- planning-config.md: document the three keys in the Workflow Fields table,
  keeping the CONFIG_DEFAULTS ↔ doc parity test happy.
- config-field-docs.test.cjs: extend NAMESPACE_MAP so the flat keys in
  CONFIG_DEFAULTS resolve to their workflow.* doc rows.
- FEATURES.md REQ-GRAPH-02: split the slash-command surface (build|query|
  status|diff) from the CLI surface which additionally exposes `snapshot`
  (invoked automatically at the tail of `graphify build`). The prior text
  overstated the slash-command surface.

* docs(inventory): refresh rosters and counts for post-rebase drift

origin/main accumulated surfaces since this PR was authored:

- Agents: 31 → 33 (+ gsd-doc-classifier, gsd-doc-synthesizer)
- Commands: 76 → 82 (+ ingest-docs, ultraplan-phase, spike, spike-wrap-up,
  sketch, sketch-wrap-up)
- Workflows: 73 → 79 (same 6 names)
- References: 41 → 49 (+ debugger-philosophy, doc-conflict-engine,
  mandatory-initial-read, project-skills-discovery, sketch-interactivity,
  sketch-theme-system, sketch-tooling, sketch-variant-patterns)

Adds rows in the existing sub-groupings, introduces a Sketch References
subsection, and bumps all four headline counts. Roles are pulled from
source frontmatter / purpose blocks for each file. All 5 parity tests
(inventory-counts, agents-doc-parity, commands-doc-parity,
cli-modules-doc-parity, hooks-doc-parity) pass against this state —
156 assertions, 0 failures.

Also updates the 'Coverage note' advanced-agent count 10 → 12 and the
few-shot-examples footnote "41 top-level references" → "49" to keep the
file internally consistent.

* docs(agents): add advanced stubs for gsd-doc-classifier and gsd-doc-synthesizer

Both agents ship on main (spawned by /gsd-ingest-docs) but had no
coverage in docs/AGENTS.md. Adds the "advanced stub" entries (Role,
property table, Key behaviors) following the template used by the other
10 advanced/specialized agents in the same section.

Also updates the Agent Tool Permissions Summary scope note from
"10 advanced/specialized agents" to 12 to reflect the two new stubs.

* docs(commands): add entries for ingest-docs, ultraplan-phase, plan-review-convergence

These three commands ship on main (plan-review-convergence via trek-e's
4b452d29 commit on this branch) but had no user-facing section in
docs/COMMANDS.md — they lived only in INVENTORY.md. The commands-doc-parity
test already passes via INVENTORY, but the user-facing doc was missing
canonical explanations, argument tables, and examples.

- /gsd-plan-review-convergence → Core Workflow (after /gsd-plan-phase)
- /gsd-ultraplan-phase → Core Workflow (after plan-review-convergence)
- /gsd-ingest-docs → Brownfield (after /gsd-import, since both consume
  the references/doc-conflict-engine.md contract)

Content pulled from each command's frontmatter and workflow purpose block.

* test: remove redundant ARCHITECTURE.md count tests

tests/architecture-counts.test.cjs and tests/command-count-sync.test.cjs
were added when docs/ARCHITECTURE.md carried hardcoded counts for commands/
workflows/agents. With the PR #2390 cleanup, ARCHITECTURE.md no longer
owns those numbers — docs/INVENTORY.md does, enforced by
tests/inventory-counts.test.cjs (scans the same filesystem directories
with the same readdirSync filter).

Keeping these ARCHITECTURE-specific tests would re-introduce the hardcoded
counts they guard, defeating trek-e's review point. The single-source-of-
truth parity tests already catch the same drift scenarios.

Related: #2257 (the regression this replaced).

---------

Co-authored-by: Tom Boucher <trekkie@nomorestars.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 09:31:34 -04:00

316 lines
13 KiB
JavaScript

/**
* Tests for gsd:plan-review-convergence command (#2306)
*
* Validates that the command source and workflow contain the key structural
* elements required for correct cross-AI plan convergence loop behavior:
* initial planning gate, review agent spawning, HIGH count detection,
* stall detection, escalation gate, and STATE.md update on convergence.
*/
const { test, describe } = require('node:test');
const assert = require('node:assert/strict');
const fs = require('fs');
const path = require('path');
const COMMAND_PATH = path.join(__dirname, '..', 'commands', 'gsd', 'plan-review-convergence.md');
const WORKFLOW_PATH = path.join(__dirname, '..', 'get-shit-done', 'workflows', 'plan-review-convergence.md');
// ─── Command source ────────────────────────────────────────────────────────
describe('plan-review-convergence command source (#2306)', () => {
const command = fs.readFileSync(COMMAND_PATH, 'utf8');
test('command name uses gsd: prefix (installer converts to gsd- on install)', () => {
assert.ok(
command.includes('name: gsd:plan-review-convergence'),
'command name must use gsd: prefix so installer converts it to gsd-plan-review-convergence'
);
});
test('command declares all reviewer flags in context', () => {
assert.ok(command.includes('--codex'), 'must document --codex flag');
assert.ok(command.includes('--gemini'), 'must document --gemini flag');
assert.ok(command.includes('--claude'), 'must document --claude flag');
assert.ok(command.includes('--opencode'), 'must document --opencode flag');
assert.ok(command.includes('--all'), 'must document --all flag');
assert.ok(command.includes('--max-cycles'), 'must document --max-cycles flag');
});
test('command references the workflow file via execution_context', () => {
assert.ok(
command.includes('@$HOME/.claude/get-shit-done/workflows/plan-review-convergence.md'),
'execution_context must reference the workflow file'
);
});
test('command references supporting reference files', () => {
assert.ok(
command.includes('revision-loop.md'),
'must reference revision-loop.md for stall detection pattern'
);
assert.ok(
command.includes('gates.md'),
'must reference gates.md for gate taxonomy'
);
assert.ok(
command.includes('agent-contracts.md'),
'must reference agent-contracts.md for completion markers'
);
});
test('command declares Agent in allowed-tools (required for spawning sub-agents)', () => {
assert.ok(
command.includes('- Agent'),
'Agent must be in allowed-tools — command spawns isolated agents for planning and reviewing'
);
});
test('command has Copilot runtime_note for AskUserQuestion fallback', () => {
assert.ok(
command.includes('vscode_askquestions'),
'must document vscode_askquestions fallback for Copilot compatibility'
);
});
test('--codex is the default reviewer when no flag is specified', () => {
assert.ok(
command.includes('default if no reviewer specified') ||
command.includes('default: --codex') ||
command.includes('(default if no reviewer specified)'),
'--codex must be documented as the default reviewer'
);
});
});
// ─── Workflow: initialization ──────────────────────────────────────────────
describe('plan-review-convergence workflow: initialization (#2306)', () => {
const workflow = fs.readFileSync(WORKFLOW_PATH, 'utf8');
test('workflow calls gsd-tools.cjs init plan-phase for initialization', () => {
assert.ok(
workflow.includes('gsd-tools.cjs') && workflow.includes('init') && workflow.includes('plan-phase'),
'workflow must initialize via gsd-tools.cjs init plan-phase'
);
});
test('workflow parses --max-cycles with default of 3', () => {
assert.ok(
workflow.includes('MAX_CYCLES') && workflow.includes('3'),
'workflow must parse --max-cycles with default of 3'
);
});
test('workflow displays a startup banner with phase number and reviewer flags', () => {
assert.ok(
workflow.includes('PLAN CONVERGENCE') || workflow.includes('Plan Convergence'),
'workflow must display a startup banner'
);
});
});
// ─── Workflow: initial planning gate ──────────────────────────────────────
describe('plan-review-convergence workflow: initial planning gate (#2306)', () => {
const workflow = fs.readFileSync(WORKFLOW_PATH, 'utf8');
test('workflow skips initial planning when plans already exist', () => {
assert.ok(
workflow.includes('has_plans') || workflow.includes('plan_count'),
'workflow must check whether plans already exist before spawning planning agent'
);
});
test('workflow spawns isolated planning agent when no plans exist', () => {
assert.ok(
workflow.includes('gsd-plan-phase'),
'workflow must spawn Agent → gsd-plan-phase when no plans exist'
);
});
test('workflow errors if initial planning produces no PLAN.md files', () => {
assert.ok(
workflow.includes('PLAN_COUNT') || workflow.includes('plan_count'),
'workflow must verify PLAN.md files were created after initial planning'
);
});
});
// ─── Workflow: convergence loop ────────────────────────────────────────────
describe('plan-review-convergence workflow: convergence loop (#2306)', () => {
const workflow = fs.readFileSync(WORKFLOW_PATH, 'utf8');
test('workflow spawns isolated review agent each cycle', () => {
assert.ok(
workflow.includes('gsd-review'),
'workflow must spawn Agent → gsd-review each cycle'
);
});
test('workflow detects HIGH concerns by grepping REVIEWS.md', () => {
assert.ok(
workflow.includes('HIGH_COUNT') || workflow.includes('grep'),
'workflow must grep REVIEWS.md for HIGH concerns to determine convergence'
);
});
test('workflow exits loop when HIGH_COUNT == 0 (converged)', () => {
assert.ok(
workflow.includes('HIGH_COUNT == 0') ||
workflow.includes('HIGH_COUNT === 0') ||
workflow.includes('converged'),
'workflow must exit the loop when no HIGH concerns remain'
);
});
test('workflow updates STATE.md on convergence', () => {
assert.ok(
workflow.includes('planned-phase') || workflow.includes('state'),
'workflow must update STATE.md via gsd-tools.cjs when converged'
);
});
test('workflow spawns replan agent with --reviews flag', () => {
assert.ok(
workflow.includes('--reviews'),
'replan agent must pass --reviews so gsd-plan-phase incorporates review feedback'
);
});
test('workflow passes --skip-research to replan agent (research already done)', () => {
assert.ok(
workflow.includes('--skip-research'),
'replan agent must skip research — only initial planning needs research'
);
});
});
// ─── Workflow: stall detection ─────────────────────────────────────────────
describe('plan-review-convergence workflow: stall detection (#2306)', () => {
const workflow = fs.readFileSync(WORKFLOW_PATH, 'utf8');
test('workflow tracks previous HIGH count to detect stalls', () => {
assert.ok(
workflow.includes('prev_high_count') || workflow.includes('prev_HIGH'),
'workflow must track the previous cycle HIGH count for stall detection'
);
});
test('workflow warns when HIGH count is not decreasing', () => {
assert.ok(
workflow.includes('stall') || workflow.includes('Stall') || workflow.includes('not decreasing'),
'workflow must warn user when HIGH count is not decreasing between cycles'
);
});
});
// ─── Workflow: escalation gate ────────────────────────────────────────────
describe('plan-review-convergence workflow: escalation gate (#2306)', () => {
const workflow = fs.readFileSync(WORKFLOW_PATH, 'utf8');
test('workflow escalates to user when max cycles reached with HIGHs remaining', () => {
assert.ok(
workflow.includes('MAX_CYCLES') &&
(workflow.includes('AskUserQuestion') || workflow.includes('vscode_askquestions')),
'workflow must escalate to user via AskUserQuestion when max cycles reached'
);
});
test('escalation offers "Proceed anyway" option', () => {
assert.ok(
workflow.includes('Proceed anyway'),
'escalation gate must offer "Proceed anyway" to accept plans with remaining HIGH concerns'
);
});
test('escalation offers "Manual review" option', () => {
assert.ok(
workflow.includes('Manual review') || workflow.includes('manual'),
'escalation gate must offer a manual review option'
);
});
test('workflow has text-mode fallback for escalation (plain numbered list)', () => {
assert.ok(
workflow.includes('TEXT_MODE') || workflow.includes('text_mode'),
'workflow must support TEXT_MODE for plain-text escalation prompt'
);
});
});
// ─── Workflow: stall detection — behavioral ───────────────────────────────
describe('plan-review-convergence workflow: stall detection behavioral (#2306)', () => {
const workflow = fs.readFileSync(WORKFLOW_PATH, 'utf8');
test('workflow surfaces stall warning when prev_high_count equals current HIGH_COUNT', () => {
// Behavioral test: two consecutive cycles with the same HIGH count must trigger
// the stall warning. The workflow must compare HIGH_COUNT >= prev_high_count and
// emit a warning string that would appear in output.
assert.ok(
workflow.includes('prev_high_count') || workflow.includes('prev_HIGH'),
'workflow must track prev_high_count across cycles'
);
// The comparison that detects the stall
assert.ok(
workflow.includes('HIGH_COUNT >= prev_high_count') ||
workflow.includes('HIGH_COUNT >= prev_HIGH') ||
workflow.includes('not decreasing'),
'workflow must compare current HIGH count against previous to detect stall'
);
// The stall warning text that appears in output
assert.ok(
workflow.includes('stall') || workflow.includes('Stall') || workflow.includes('not decreasing'),
'workflow must emit a stall warning when HIGH count is not decreasing'
);
});
});
// ─── Workflow: --max-cycles 1 immediate escalation — behavioral ────────────
describe('plan-review-convergence workflow: --max-cycles 1 immediate escalation behavioral (#2306)', () => {
const workflow = fs.readFileSync(WORKFLOW_PATH, 'utf8');
test('workflow escalates immediately after cycle 1 when --max-cycles 1 and HIGH > 0', () => {
// Behavioral test: when max_cycles=1, after the first review cycle, if HIGH_COUNT > 0
// the workflow must trigger the escalation gate (cycle >= MAX_CYCLES check fires on
// cycle 1 itself). Verify the workflow contains the logic for this edge case.
assert.ok(
workflow.includes('cycle >= MAX_CYCLES') ||
workflow.includes('cycle >= max_cycles') ||
(workflow.includes('MAX_CYCLES') && workflow.includes('AskUserQuestion')),
'workflow must check cycle >= MAX_CYCLES so --max-cycles 1 triggers escalation after first cycle'
);
// Escalation gate must fire when HIGH > 0 (not just at exactly max_cycles)
assert.ok(
workflow.includes('HIGH_COUNT > 0') ||
workflow.includes('HIGH concerns remain') ||
workflow.includes('Proceed anyway'),
'escalation gate must be reachable when HIGH_COUNT > 0 after a single cycle'
);
});
});
// ─── Workflow: REVIEWS.md verification ────────────────────────────────────
describe('plan-review-convergence workflow: artifact verification (#2306)', () => {
const workflow = fs.readFileSync(WORKFLOW_PATH, 'utf8');
test('workflow verifies REVIEWS.md exists after each review cycle', () => {
assert.ok(
workflow.includes('REVIEWS.md') || workflow.includes('REVIEWS_FILE'),
'workflow must verify REVIEWS.md was produced by the review agent each cycle'
);
});
test('workflow errors if review agent does not produce REVIEWS.md', () => {
assert.ok(
workflow.includes('REVIEWS_FILE') || workflow.includes('review agent did not produce'),
'workflow must error if the review agent fails to produce REVIEWS.md'
);
});
});