feat(workflow): add opt-in TDD pipeline mode (#2119)

* feat(workflow): add opt-in TDD pipeline mode (workflow.tdd_mode)

Add workflow.tdd_mode config key (default: false) that enables
red-green-refactor as a first-class phase execution mode. When
enabled, the planner aggressively applies type: tdd to eligible
tasks and the executor enforces RED/GREEN/REFACTOR gate sequence
with fail-fast on unexpected GREEN before RED. An end-of-phase
collaborative review checkpoint verifies gate compliance.

Closes #1871

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): allowlist plan-phase.md in prompt injection scan

plan-phase.md exceeds 50K chars after TDD mode integration.
This is legitimate orchestration complexity, not prompt stuffing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* ci: trigger CI run

* ci: trigger CI run

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Tom Boucher
2026-04-11 14:42:01 -04:00
committed by GitHub
parent d19b61a158
commit e24cb18b72
11 changed files with 299 additions and 2 deletions

View File

@@ -344,7 +344,20 @@ When executing task with `tdd="true"`:
**4. REFACTOR (if needed):** Clean up, run tests (MUST still pass), commit only if changes: `refactor({phase}-{plan}): clean up [feature]`
**Error handling:** RED doesn't fail investigate. GREEN doesn't pass → debug/iterate. REFACTOR breaks → undo.
**Error handling:** RED doesn't fail <EFBFBD><EFBFBD><EFBFBD> investigate. GREEN doesn't pass → debug/iterate. REFACTOR breaks → undo.
## Plan-Level TDD Gate Enforcement (type: tdd plans)
When the plan frontmatter has `type: tdd`, the entire plan follows the RED/GREEN/REFACTOR cycle as a single feature. Gate sequence is mandatory:
**Fail-fast rule:** If a test passes unexpectedly during the RED phase (before any implementation), STOP. The feature may already exist or the test is not testing what you think. Investigate and fix the test before proceeding to GREEN. Do NOT skip RED by proceeding with a passing test.
**Gate sequence validation:** After completing the plan, verify in git log:
1. A `test(...)` commit exists (RED gate)
2. A `feat(...)` commit exists after it (GREEN gate)
3. Optionally a `refactor(...)` commit exists after GREEN (REFACTOR gate)
If RED or GREEN gate commits are missing, add a warning to SUMMARY.md under a `## TDD Gate Compliance` section.
</tdd_execution>
<task_commit_protocol>

View File

@@ -290,6 +290,10 @@ This prevents the "scavenger hunt" anti-pattern where executors explore the code
## TDD Detection
**When `workflow.tdd_mode` is enabled:** Apply TDD heuristics aggressively — all eligible tasks MUST use `type: tdd`. Read @~/.claude/get-shit-done/references/tdd.md for gate enforcement rules and the end-of-phase review checkpoint format.
**When `workflow.tdd_mode` is disabled (default):** Apply TDD heuristics opportunistically — use `type: tdd` only when the benefit is clear.
**Heuristic:** Can you write `expect(fn(input)).toBe(output)` before writing `fn`?
- Yes → Create a dedicated TDD plan (type: tdd)
- No → Standard task in standard plan

View File

@@ -34,6 +34,7 @@ GSD stores project settings in `.planning/config.json`. Created during `/gsd-new
"research_before_questions": false,
"discuss_mode": "discuss",
"skip_discuss": false,
"tdd_mode": false,
"text_mode": false,
"use_worktrees": true,
"code_review": true,
@@ -142,6 +143,7 @@ All workflow toggles follow the **absent = enabled** pattern. If a key is missin
| `workflow.plan_bounce_script` | string | (none) | Path to the external script invoked for plan bounce validation. Receives the PLAN.md path as its first argument. Required when `plan_bounce` is `true`. Added in v1.36 |
| `workflow.plan_bounce_passes` | number | `2` | Number of sequential bounce passes to run. Each pass feeds the previous pass's output back into the validator. Higher values increase rigor at the cost of latency. Added in v1.36 |
| `workflow.code_review_command` | string | (none) | Shell command for external code review integration in `/gsd-ship`. Receives changed file paths via stdin. Non-zero exit blocks the ship workflow. Added in v1.36 |
| `workflow.tdd_mode` | boolean | `false` | Enable TDD pipeline as a first-class execution mode. When `true`, the planner aggressively applies `type: tdd` to eligible tasks (business logic, APIs, validations, algorithms) and the executor enforces RED/GREEN/REFACTOR gate sequence. An end-of-phase collaborative review checkpoint verifies gate compliance. Added in v1.37 |
| `workflow.cross_ai_execution` | boolean | `false` | Delegate phase execution to an external AI CLI instead of spawning local executor agents. Useful for leveraging a different model's strengths for specific phases. Added in v1.36 |
| `workflow.cross_ai_command` | string | (none) | Shell command template for cross-AI execution. Receives the phase prompt via stdin. Must produce SUMMARY.md-compatible output. Required when `cross_ai_execution` is `true`. Added in v1.36 |
| `workflow.cross_ai_timeout` | number | `300` | Timeout in seconds for cross-AI execution commands. Prevents runaway external processes. Added in v1.36 |

View File

@@ -2404,3 +2404,20 @@ Test suite that scans all agent, workflow, and command files for embedded inject
- REQ-CMDPATH-03: Relative paths are resolved from the project root
**Configuration:** `claude_md_path`
---
### 116. TDD Pipeline Mode
**Purpose:** Opt-in TDD (red-green-refactor) as a first-class phase execution mode. When enabled, the planner aggressively selects `type: tdd` for eligible tasks and the executor enforces RED/GREEN/REFACTOR gate sequence with fail-fast on unexpected GREEN before RED.
**Requirements:**
- REQ-TDD-01: `workflow.tdd_mode` config key (boolean, default `false`)
- REQ-TDD-02: When enabled, planner applies TDD heuristics from `references/tdd.md` to all eligible tasks (business logic, APIs, validations, algorithms, state machines)
- REQ-TDD-03: Executor enforces gate sequence for `type: tdd` plans — RED commit (`test(...)`) must precede GREEN commit (`feat(...)`)
- REQ-TDD-04: Executor fails fast if tests pass unexpectedly during RED phase (feature already exists or test is wrong)
- REQ-TDD-05: End-of-phase collaborative review checkpoint verifies gate compliance across all TDD plans (advisory, non-blocking)
- REQ-TDD-06: Gate violations surfaced in SUMMARY.md under `## TDD Gate Compliance` section
**Configuration:** `workflow.tdd_mode`
**Reference files:** `tdd.md`, `checkpoints.md`

View File

@@ -17,6 +17,7 @@ const VALID_CONFIG_KEYS = new Set([
'workflow.research', 'workflow.plan_check', 'workflow.verifier',
'workflow.nyquist_validation', 'workflow.ai_integration_phase', 'workflow.ui_phase', 'workflow.ui_safety_gate',
'workflow.auto_advance', 'workflow.node_repair', 'workflow.node_repair_budget',
'workflow.tdd_mode',
'workflow.text_mode',
'workflow.research_before_questions',
'workflow.discuss_mode',
@@ -157,6 +158,7 @@ function buildNewProjectConfig(userChoices) {
ui_phase: true,
ui_safety_gate: true,
ai_integration_phase: true,
tdd_mode: false,
text_mode: false,
research_before_questions: false,
discuss_mode: 'discuss',

View File

@@ -759,6 +759,36 @@ timeout 30 bash -c 'until node -e "fetch(\"http://localhost:3000\").then(r=>{pro
</anti_patterns>
<type name="tdd-review">
## checkpoint:tdd-review (TDD Mode Only)
**When:** All waves in a phase complete and `workflow.tdd_mode` is enabled. Inserted by the execute-phase orchestrator after `aggregate_results`.
**Purpose:** Collaborative review of TDD gate compliance across all `type: tdd` plans in the phase. Advisory — does not block execution.
**Use for:**
- Verifying RED/GREEN/REFACTOR commit sequence for each TDD plan
- Surfacing gate violations (missing RED or GREEN commits)
- Reviewing test quality (tests fail for the right reason)
- Confirming minimal GREEN implementations
**Structure:**
```xml
<task type="checkpoint:tdd-review" gate="advisory">
<what-checked>TDD gate compliance for {count} plans in Phase {X}</what-checked>
<gate-results>
| Plan | RED | GREEN | REFACTOR | Status |
|------|-----|-------|----------|--------|
| {id} | ✓ | ✓ | ✓ | Pass |
</gate-results>
<violations>[List of gate violations, or "None"]</violations>
<resume-signal>Review complete — proceed to phase verification</resume-signal>
</task>
```
**Auto-mode behavior:** When `workflow._auto_chain_active` or `workflow.auto_advance` is true, the TDD review checkpoint auto-approves (advisory gate — never blocks).
</type>
<summary>
Checkpoints formalize human-in-the-loop points for verification and decisions, not manual work.

View File

@@ -247,6 +247,73 @@ Both follow same format: `{type}({phase}-{plan}): {description}`
- Consistent with overall commit strategy
</commit_pattern>
<gate_enforcement>
## Gate Enforcement Rules
When `workflow.tdd_mode` is enabled in config, the RED/GREEN/REFACTOR gate sequence is enforced for all `type: tdd` plans.
### Gate Definitions
| Gate | Required | Commit Pattern | Validation |
|------|----------|---------------|------------|
| RED | Yes | `test({phase}-{plan}): ...` | Test exists AND fails before implementation |
| GREEN | Yes | `feat({phase}-{plan}): ...` | Test passes after implementation |
| REFACTOR | No | `refactor({phase}-{plan}): ...` | Tests still pass after cleanup |
### Fail-Fast Rules
1. **Unexpected GREEN in RED phase:** If the test passes before any implementation code is written, STOP. The feature may already exist or the test is wrong. Investigate before proceeding.
2. **Missing RED commit:** If no `test(...)` commit precedes the `feat(...)` commit, the TDD discipline was violated. Flag in SUMMARY.md.
3. **REFACTOR breaks tests:** Undo the refactor immediately. Commit was premature — refactor in smaller steps.
### Executor Gate Validation
After completing a `type: tdd` plan, the executor validates the git log:
```bash
# Check for RED gate commit
git log --oneline --grep="^test(${PHASE}-${PLAN})" | head -1
# Check for GREEN gate commit
git log --oneline --grep="^feat(${PHASE}-${PLAN})" | head -1
# Check for optional REFACTOR gate commit
git log --oneline --grep="^refactor(${PHASE}-${PLAN})" | head -1
```
If RED or GREEN gate commits are missing, add a `## TDD Gate Compliance` section to SUMMARY.md with the violation details.
</gate_enforcement>
<end_of_phase_review>
## End-of-Phase TDD Review Checkpoint
When `workflow.tdd_mode` is enabled, the execute-phase orchestrator inserts a collaborative review checkpoint after all waves complete but before phase verification.
### Review Checkpoint Format
```
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
TDD REVIEW — Phase {X}
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
TDD Plans: {count} | Gate violations: {count}
| Plan | RED | GREEN | REFACTOR | Status |
|------|-----|-------|----------|--------|
| {id} | ✓ | ✓ | ✓ | Pass |
| {id} | ✓ | ✗ | — | FAIL |
{If violations exist:}
⚠ Gate violations are advisory — review before advancing.
```
### What the Review Checks
1. **Gate sequence:** Each TDD plan has RED → GREEN commits in order
2. **Test quality:** RED phase tests fail for the right reason (not import errors or syntax)
3. **Minimal GREEN:** Implementation is minimal — no premature optimization in GREEN phase
4. **Refactor discipline:** If REFACTOR commit exists, tests still pass
This checkpoint is advisory — it does not block phase completion but surfaces TDD discipline issues for human review.
</end_of_phase_review>
<context_budget>
## Context Budget

View File

@@ -919,6 +919,50 @@ If `SECURITY_CFG` is `true` AND SECURITY.md exists: check frontmatter `threats_o
```
</step>
<step name="tdd_review_checkpoint">
**Optional step — TDD collaborative review.**
```bash
TDD_MODE=$(node "$HOME/.claude/get-shit-done/bin/gsd-tools.cjs" config-get workflow.tdd_mode --default false 2>/dev/null)
```
**Skip if `TDD_MODE` is `false`.**
When `TDD_MODE` is `true`, check whether any completed plans in this phase have `type: tdd` in their frontmatter:
```bash
TDD_PLANS=$(grep -rl "^type: tdd" "${PHASE_DIR}"/*-PLAN.md 2>/dev/null | wc -l | tr -d ' ')
```
**If `TDD_PLANS` > 0:** Insert end-of-phase collaborative review checkpoint.
1. Collect all SUMMARY.md files for TDD plans
2. For each TDD plan summary, verify the RED/GREEN/REFACTOR gate sequence:
- RED gate: A failing test commit exists (`test(...)` commit with MUST-fail evidence)
- GREEN gate: An implementation commit exists (`feat(...)` commit making tests pass)
- REFACTOR gate: Optional cleanup commit (`refactor(...)` commit, tests still pass)
3. If any TDD plan is missing the RED or GREEN gate commits, flag it:
```
⚠ TDD gate violation: Plan {plan_id} missing {RED|GREEN} phase commit.
Expected commit pattern: test({phase}-{plan}): ... → feat({phase}-{plan}): ...
```
4. Present collaborative review summary:
```
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
TDD REVIEW — Phase {X}
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
TDD Plans: {TDD_PLANS} | Gate violations: {count}
| Plan | RED | GREEN | REFACTOR | Status |
|------|-----|-------|----------|--------|
| {id} | ✓ | ✓ | ✓ | Pass |
| {id} | ✓ | ✗ | — | FAIL |
```
**Gate violations are advisory** — they do not block execution but are surfaced to the user for review. The verifier agent (step `verify_phase_goal`) will also check TDD discipline as part of its quality assessment.
</step>
<step name="handle_partial_wave_execution">
If `WAVE_FILTER` was used, re-run plan discovery after execution:

View File

@@ -33,8 +33,11 @@ AGENT_SKILLS_RESEARCHER=$(node "$HOME/.claude/get-shit-done/bin/gsd-tools.cjs" a
AGENT_SKILLS_PLANNER=$(node "$HOME/.claude/get-shit-done/bin/gsd-tools.cjs" agent-skills gsd-planner 2>/dev/null)
AGENT_SKILLS_CHECKER=$(node "$HOME/.claude/get-shit-done/bin/gsd-tools.cjs" agent-skills gsd-checker 2>/dev/null)
CONTEXT_WINDOW=$(node "$HOME/.claude/get-shit-done/bin/gsd-tools.cjs" config-get context_window 2>/dev/null || echo "200000")
TDD_MODE=$(node "$HOME/.claude/get-shit-done/bin/gsd-tools.cjs" config-get workflow.tdd_mode 2>/dev/null || echo "false")
```
When `TDD_MODE` is `true`, the planner agent is instructed to apply `type: tdd` to eligible tasks using heuristics from `references/tdd.md`. The planner's `<required_reading>` is extended to include `@~/.claude/get-shit-done/references/tdd.md` so gate enforcement rules are available during planning.
When `CONTEXT_WINDOW >= 500000`, the planner prompt includes the 3 most recent prior phase CONTEXT.md and SUMMARY.md files PLUS any phases explicitly listed in the current phase's `Depends on:` field in ROADMAP.md. Explicit dependencies always load regardless of recency (e.g., Phase 7 declaring `Depends on: Phase 2` always sees Phase 2's context). Bounded recency keeps the planner's context budget focused on recent work.
Parse JSON for: `researcher_model`, `planner_model`, `checker_model`, `research_enabled`, `plan_checker_enabled`, `nyquist_validation_enabled`, `commit_docs`, `text_mode`, `phase_found`, `phase_dir`, `phase_number`, `phase_name`, `phase_slug`, `padded_phase`, `has_research`, `has_context`, `has_reviews`, `has_plans`, `plan_count`, `planning_exists`, `roadmap_exists`, `phase_req_ids`, `response_language`.
@@ -719,6 +722,16 @@ ${AGENT_SKILLS_PLANNER}
**Project instructions:** Read ./CLAUDE.md if exists — follow project-specific guidelines
**Project skills:** Check .claude/skills/ or .agents/skills/ directory (if either exists) — read SKILL.md files, plans should account for project skill rules
${TDD_MODE === 'true' ? `
<tdd_mode_active>
**TDD Mode is ENABLED.** Apply TDD heuristics from @~/.claude/get-shit-done/references/tdd.md to all eligible tasks:
- Business logic with defined I/O → type: tdd
- API endpoints with request/response contracts → type: tdd
- Data transformations, validation, algorithms → type: tdd
- UI, config, glue code, CRUD → standard plan (type: execute)
Each TDD plan gets one feature with RED/GREEN/REFACTOR gate sequence.
</tdd_mode_active>
` : ''}
</planning_context>
<downstream_consumer>

View File

@@ -53,7 +53,7 @@ const ALLOWLIST = new Set([
'get-shit-done/bin/lib/security.cjs', // The security module itself
'get-shit-done/workflows/discuss-phase.md', // Large workflow (~50K) with power mode + i18n
'get-shit-done/workflows/execute-phase.md', // Large orchestration workflow (~51K) with wave execution + code-review gate
'get-shit-done/workflows/plan-phase.md', // Large orchestration workflow (~52K) with pattern mapper + TDD mode
'get-shit-done/workflows/plan-phase.md', // Large orchestration workflow (~51K) with TDD mode integration
'hooks/gsd-prompt-guard.js', // The prompt guard hook
'tests/security.test.cjs', // Security tests
'tests/prompt-injection-scan.test.cjs', // This file

105
tests/tdd-mode.test.cjs Normal file
View File

@@ -0,0 +1,105 @@
/**
* GSD Tools Tests — workflow.tdd_mode config key
*
* Validates that the tdd_mode workflow toggle is a first-class config key
* with correct default, round-trip behavior, and presence in VALID_CONFIG_KEYS.
*
* Requirements: #1871
*/
const { test, describe, beforeEach, afterEach } = require('node:test');
const assert = require('node:assert/strict');
const fs = require('fs');
const path = require('path');
const { runGsdTools, createTempProject, cleanup } = require('./helpers.cjs');
// ─── helpers ──────────────────────────────────────────────────────────────────
function readConfig(tmpDir) {
const configPath = path.join(tmpDir, '.planning', 'config.json');
return JSON.parse(fs.readFileSync(configPath, 'utf-8'));
}
// ─── VALID_CONFIG_KEYS ──────────────────────────────────────────────────────
describe('workflow.tdd_mode in VALID_CONFIG_KEYS', () => {
test('workflow.tdd_mode is a recognized config key', () => {
const { VALID_CONFIG_KEYS } = require('../get-shit-done/bin/lib/config.cjs');
assert.ok(
VALID_CONFIG_KEYS.has('workflow.tdd_mode'),
'workflow.tdd_mode should be in VALID_CONFIG_KEYS'
);
});
});
// ─── config default value ───────────────────────────────────────────────────
describe('workflow.tdd_mode default value', () => {
let tmpDir;
beforeEach(() => {
tmpDir = createTempProject();
});
afterEach(() => {
cleanup(tmpDir);
});
test('defaults to false in new project config', () => {
// Ensure config is created with defaults
const result = runGsdTools('config-ensure-section', tmpDir, { HOME: tmpDir });
assert.ok(result.success, `config-ensure-section failed: ${result.error}`);
const config = readConfig(tmpDir);
assert.strictEqual(
config.workflow.tdd_mode,
false,
'workflow.tdd_mode should default to false'
);
});
});
// ─── config round-trip (set / get) ─────────────────────────────────────────
describe('workflow.tdd_mode config round-trip', () => {
let tmpDir;
beforeEach(() => {
tmpDir = createTempProject();
// Create a config file first
runGsdTools('config-ensure-section', tmpDir, { HOME: tmpDir });
});
afterEach(() => {
cleanup(tmpDir);
});
test('config-set workflow.tdd_mode true round-trips via config-get', () => {
const setResult = runGsdTools('config-set workflow.tdd_mode true', tmpDir);
assert.ok(setResult.success, `config-set failed: ${setResult.error}`);
const getResult = runGsdTools('config-get workflow.tdd_mode', tmpDir);
assert.ok(getResult.success, `config-get failed: ${getResult.error}`);
assert.strictEqual(getResult.output, 'true');
});
test('config-set workflow.tdd_mode false round-trips via config-get', () => {
// First set to true, then back to false
runGsdTools('config-set workflow.tdd_mode true', tmpDir);
const setResult = runGsdTools('config-set workflow.tdd_mode false', tmpDir);
assert.ok(setResult.success, `config-set failed: ${setResult.error}`);
const getResult = runGsdTools('config-get workflow.tdd_mode', tmpDir);
assert.ok(getResult.success, `config-get failed: ${getResult.error}`);
assert.strictEqual(getResult.output, 'false');
});
test('persists in config.json as boolean', () => {
runGsdTools('config-set workflow.tdd_mode true', tmpDir);
const config = readConfig(tmpDir);
assert.strictEqual(config.workflow.tdd_mode, true);
assert.strictEqual(typeof config.workflow.tdd_mode, 'boolean');
});
});