Files
get-shit-done/get-shit-done/workflows/add-tests.md
Tom Boucher 47badff2ee fix(workflow): add plain-text fallback for AskUserQuestion on non-Claude runtimes (#2042)
AskUserQuestion is a Claude Code-only tool. When running GSD on OpenAI Codex,
Gemini CLI, or other non-Claude runtimes, the model renders the tool call as a
markdown code block instead of executing it, so the interactive TUI never
appears and the session stalls without collecting user input.

The workflow.text_mode / --text flag mechanism already handles this in 5 of
the 37 affected workflows. This commit adds the same TEXT_MODE fallback
instruction to all remaining 32 workflows so that, when text_mode is enabled,
every AskUserQuestion call is replaced with a plain-text numbered list that
any runtime can handle.

Fixes #2012

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 12:30:46 -04:00

12 KiB

Generate unit and E2E tests for a completed phase based on its SUMMARY.md, CONTEXT.md, and implementation. Classifies each changed file into TDD (unit), E2E (browser), or Skip categories, presents a test plan for user approval, then generates tests following RED-GREEN conventions.

Users currently hand-craft /gsd-quick prompts for test generation after each phase. This workflow standardizes the process with proper classification, quality gates, and gap reporting.

<required_reading> Read all files referenced by the invoking prompt's execution_context before starting. </required_reading>

Parse `$ARGUMENTS` for: - Phase number (integer, decimal, or letter-suffix) → store as `$PHASE_ARG` - Remaining text after phase number → store as `$EXTRA_INSTRUCTIONS` (optional)

Example: /gsd-add-tests 12 focus on edge cases$PHASE_ARG=12, $EXTRA_INSTRUCTIONS="focus on edge cases"

If no phase argument provided:

ERROR: Phase number required
Usage: /gsd-add-tests <phase> [additional instructions]
Example: /gsd-add-tests 12
Example: /gsd-add-tests 12 focus on edge cases in the pricing module

Exit.

Load phase operation context:
INIT=$(node "$HOME/.claude/get-shit-done/bin/gsd-tools.cjs" init phase-op "${PHASE_ARG}")
if [[ "$INIT" == @file:* ]]; then INIT=$(cat "${INIT#@file:}"); fi

Extract from init JSON: phase_dir, phase_number, phase_name.

Verify the phase directory exists. If not:

ERROR: Phase directory not found for phase ${PHASE_ARG}
Ensure the phase exists in .planning/phases/

Exit.

Read the phase artifacts (in order of priority):

  1. ${phase_dir}/*-SUMMARY.md — what was implemented, files changed
  2. ${phase_dir}/CONTEXT.md — acceptance criteria, decisions
  3. ${phase_dir}/*-VERIFICATION.md — user-verified scenarios (if UAT was done)

If no SUMMARY.md exists:

ERROR: No SUMMARY.md found for phase ${PHASE_ARG}
This command works on completed phases. Run /gsd-execute-phase first.

Exit.

Present banner:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 GSD ► ADD TESTS — Phase ${phase_number}: ${phase_name}
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Extract the list of files modified by the phase from SUMMARY.md ("Files Changed" or equivalent section).

For each file, classify into one of three categories:

Category Criteria Test Type
TDD Pure functions where expect(fn(input)).toBe(output) is writable Unit tests
E2E UI behavior verifiable by browser automation Playwright/E2E tests
Skip Not meaningfully testable or already covered None

TDD classification — apply when:

  • Business logic: calculations, pricing, tax rules, validation
  • Data transformations: mapping, filtering, aggregation, formatting
  • Parsers: CSV, JSON, XML, custom format parsing
  • Validators: input validation, schema validation, business rules
  • State machines: status transitions, workflow steps
  • Utilities: string manipulation, date handling, number formatting

E2E classification — apply when:

  • Keyboard shortcuts: key bindings, modifier keys, chord sequences
  • Navigation: page transitions, routing, breadcrumbs, back/forward
  • Form interactions: submit, validation errors, field focus, autocomplete
  • Selection: row selection, multi-select, shift-click ranges
  • Drag and drop: reordering, moving between containers
  • Modal dialogs: open, close, confirm, cancel
  • Data grids: sorting, filtering, inline editing, column resize

Skip classification — apply when:

  • UI layout/styling: CSS classes, visual appearance, responsive breakpoints
  • Configuration: config files, environment variables, feature flags
  • Glue code: dependency injection setup, middleware registration, routing tables
  • Migrations: database migrations, schema changes
  • Simple CRUD: basic create/read/update/delete with no business logic
  • Type definitions: records, DTOs, interfaces with no logic

Read each file to verify classification. Don't classify based on filename alone.

Present the classification to the user for confirmation before proceeding:

Text mode (workflow.text_mode: true in config or --text flag): Set TEXT_MODE=true if --text is present in $ARGUMENTS OR text_mode from init JSON is true. When TEXT_MODE is active, replace every AskUserQuestion call with a plain-text numbered list and ask the user to type their choice number. This is required for non-Claude runtimes (OpenAI Codex, Gemini CLI, etc.) where AskUserQuestion is not available.

AskUserQuestion(
  header: "Test Classification",
  question: |
    ## Files classified for testing

    ### TDD (Unit Tests) — {N} files
    {list of files with brief reason}

    ### E2E (Browser Tests) — {M} files
    {list of files with brief reason}

    ### Skip — {K} files
    {list of files with brief reason}

    {if $EXTRA_INSTRUCTIONS: "Additional instructions: ${EXTRA_INSTRUCTIONS}"}

    How would you like to proceed?
  options:
    - "Approve and generate test plan"
    - "Adjust classification (I'll specify changes)"
    - "Cancel"
)

If user selects "Adjust classification": apply their changes and re-present. If user selects "Cancel": exit gracefully.

Before generating the test plan, discover the project's existing test structure:
# Find existing test directories
find . -type d -name "*test*" -o -name "*spec*" -o -name "*__tests__*" 2>/dev/null | head -20
# Find existing test files for convention matching
find . -type f \( -name "*.test.*" -o -name "*.spec.*" -o -name "*Tests.fs" -o -name "*Test.fs" \) 2>/dev/null | head -20
# Check for test runners
ls package.json *.sln 2>/dev/null || true

Identify:

  • Test directory structure (where unit tests live, where E2E tests live)
  • Naming conventions (.test.ts, .spec.ts, *Tests.fs, etc.)
  • Test runner commands (how to execute unit tests, how to execute E2E tests)
  • Test framework (xUnit, NUnit, Jest, Playwright, etc.)

If test structure is ambiguous, ask the user:

AskUserQuestion(
  header: "Test Structure",
  question: "I found multiple test locations. Where should I create tests?",
  options: [list discovered locations]
)
For each approved file, create a detailed test plan.

For TDD files, plan tests following RED-GREEN-REFACTOR:

  1. Identify testable functions/methods in the file
  2. For each function: list input scenarios, expected outputs, edge cases
  3. Note: since code already exists, tests may pass immediately — that's OK, but verify they test the RIGHT behavior

For E2E files, plan tests following RED-GREEN gates:

  1. Identify user scenarios from CONTEXT.md/VERIFICATION.md
  2. For each scenario: describe the user action, expected outcome, assertions
  3. Note: RED gate means confirming the test would fail if the feature were broken

Present the complete test plan:

AskUserQuestion(
  header: "Test Plan",
  question: |
    ## Test Generation Plan

    ### Unit Tests ({N} tests across {M} files)
    {for each file: test file path, list of test cases}

    ### E2E Tests ({P} tests across {Q} files)
    {for each file: test file path, list of test scenarios}

    ### Test Commands
    - Unit: {discovered test command}
    - E2E: {discovered e2e command}

    Ready to generate?
  options:
    - "Generate all"
    - "Cherry-pick (I'll specify which)"
    - "Adjust plan"
)

If "Cherry-pick": ask user which tests to include. If "Adjust plan": apply changes and re-present.

For each approved TDD test:
  1. Create test file following discovered project conventions (directory, naming, imports)

  2. Write test with clear arrange/act/assert structure:

    // Arrange — set up inputs and expected outputs
    // Act — call the function under test
    // Assert — verify the output matches expectations
    
  3. Run the test:

    {discovered test command}
    
  4. Evaluate result:

    • Test passes: Good — the implementation satisfies the test. Verify the test checks meaningful behavior (not just that it compiles).
    • Test fails with assertion error: This may be a genuine bug discovered by the test. Flag it:
      ⚠️ Potential bug found: {test name}
      Expected: {expected}
      Actual: {actual}
      File: {implementation file}
      
      Do NOT fix the implementation — this is a test-generation command, not a fix command. Record the finding.
    • Test fails with error (import, syntax, etc.): This is a test error. Fix the test and re-run.
For each approved E2E test:
  1. Check for existing tests covering the same scenario:

    grep -r "{scenario keyword}" {e2e test directory} 2>/dev/null || true
    

    If found, extend rather than duplicate.

  2. Create test file targeting the user scenario from CONTEXT.md/VERIFICATION.md

  3. Run the E2E test:

    {discovered e2e command}
    
  4. Evaluate result:

    • GREEN (passes): Record success
    • RED (fails): Determine if it's a test issue or a genuine application bug. Flag bugs:
      ⚠️ E2E failure: {test name}
      Scenario: {description}
      Error: {error message}
      
    • Cannot run: Report blocker. Do NOT mark as complete.
      🛑 E2E blocker: {reason tests cannot run}
      

No-skip rule: If E2E tests cannot execute (missing dependencies, environment issues), report the blocker and mark the test as incomplete. Never mark success without actually running the test.

Create a test coverage report and present to user:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 GSD ► TEST GENERATION COMPLETE
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

## Results

| Category | Generated | Passing | Failing | Blocked |
|----------|-----------|---------|---------|---------|
| Unit     | {N}       | {n1}    | {n2}    | {n3}    |
| E2E      | {M}       | {m1}    | {m2}    | {m3}    |

## Files Created/Modified
{list of test files with paths}

## Coverage Gaps
{areas that couldn't be tested and why}

## Bugs Discovered
{any assertion failures that indicate implementation bugs}

Record test generation in project state:

node "$HOME/.claude/get-shit-done/bin/gsd-tools.cjs" state-snapshot

If there are passing tests to commit:

git add {test files}
git commit -m "test(phase-${phase_number}): add unit and E2E tests from add-tests command"

Present next steps:

---

## ▶ Next Up

{if bugs discovered:}
**Fix discovered bugs:** `/gsd-quick fix the {N} test failures discovered in phase ${phase_number}`

{if blocked tests:}
**Resolve test blockers:** {description of what's needed}

{otherwise:}
**All tests passing!** Phase ${phase_number} is fully tested.

---

**Also available:**
- `/gsd-add-tests {next_phase}` — test another phase
- `/gsd-verify-work {phase_number}` — run UAT verification

---

<success_criteria>

  • Phase artifacts loaded (SUMMARY.md, CONTEXT.md, optionally VERIFICATION.md)
  • All changed files classified into TDD/E2E/Skip categories
  • Classification presented to user and approved
  • Project test structure discovered (directories, conventions, runners)
  • Test plan presented to user and approved
  • TDD tests generated with arrange/act/assert structure
  • E2E tests generated targeting user scenarios
  • All tests executed — no untested tests marked as passing
  • Bugs discovered by tests flagged (not fixed)
  • Test files committed with proper message
  • Coverage gaps documented
  • Next steps presented to user </success_criteria>