feat: add test quality audit step to verify-phase workflow

Adds a new `audit_test_quality` step between `scan_antipatterns` and `identify_human_verification` that catches test-level deceptions: - Disabled tests (it.skip/xit/test.todo) covering phase requirements - Circular tests (system generating its own expected values) - Weak assertions (existence-only when value-level proof needed) - Expected value provenance tracking for parity/migration phases Any blocker from this audit forces `gaps_found` status, preventing phases from being marked complete with inadequate test evidence. Fixes #1457 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-25 17:25:23 +02:00 · 2026-03-28 17:24:59 -03:00
parent 8af7ad96fc
commit 759ff38d44
2 changed files with 295 additions and 2 deletions
--- a/get-shit-done/workflows/verify-phase.md
+++ b/get-shit-done/workflows/verify-phase.md
@@ -13,6 +13,7 @@ Goal-backward verification:
 1. What must be TRUE for the goal to be achieved?
 2. What must EXIST for those truths to hold?
 3. What must be WIRED for those artifacts to function?
+4. What must TESTS PROVE for those truths to be evidenced?

 Then verify each level against the actual codebase.
 </core_principle>
@@ -190,6 +191,93 @@ Extract files modified in this phase from SUMMARY.md, scan each:
 Categorize: 🛑 Blocker (prevents goal) | ⚠️ Warning (incomplete) | ℹ️ Info (notable).
 </step>

+<step name="audit_test_quality">
+**Verify that tests PROVE what they claim to prove.**
+
+This step catches test-level deceptions that pass all prior checks: files exist, are substantive, are wired, and tests pass — but the tests don't actually validate the requirement.
+
+**1. Identify requirement-linked test files**
+
+From PLAN and SUMMARY files, map each requirement to the test files that are supposed to prove it.
+
+**2. Disabled test scan**
+
+For ALL test files linked to requirements, search for disabled/skipped patterns:
+
+```bash
+grep -rn -E "it\.skip|describe\.skip|test\.skip|xit\(|xdescribe\(|xtest\(|@pytest\.mark\.skip|@unittest\.skip|#\[ignore\]|\.pending|it\.todo|test\.todo" "$TEST_FILE"
+```
+
+**Rule:** A disabled test linked to a requirement = requirement NOT tested.
+- 🛑 BLOCKER if the disabled test is the only test proving that requirement
+- ⚠️ WARNING if other active tests also cover the requirement
+
+**3. Circular test detection**
+
+Search for scripts/utilities that generate expected values by running the system under test:
+
+```bash
+grep -rn -E "writeFileSync|writeFile|fs\.write|open\(.*w\)" "$TEST_DIRS"
+```
+
+For each match, check if it also imports the system/service/module being tested. If a script both imports the system-under-test AND writes expected output values → CIRCULAR.
+
+**Circular test indicators:**
+- Script imports a service AND writes to fixture files
+- Expected values have comments like "computed from engine", "captured from baseline"
+- Script filename contains "capture", "baseline", "generate", "snapshot" in test context
+- Expected values were added in the same commit as the test assertions
+
+**Rule:** A test comparing system output against values generated by the same system is circular. It proves consistency, not correctness.
+
+**4. Expected value provenance** (for comparison/parity/migration requirements)
+
+When a requirement demands comparison with an external source ("identical to X", "matches Y", "same output as Z"):
+
+- Is the external source actually invoked or referenced in the test pipeline?
+- Do fixture files contain data sourced from the external system?
+- Or do all expected values come from the new system itself or from mathematical formulas?
+
+**Provenance classification:**
+- VALID: Expected value from external/legacy system output, manual capture, or independent oracle
+- PARTIAL: Expected value from mathematical derivation (proves formula, not system match)
+- CIRCULAR: Expected value from the system being tested
+- UNKNOWN: No provenance information — treat as SUSPECT
+
+**5. Assertion strength**
+
+For each test linked to a requirement, classify the strongest assertion:
+
+| Level | Examples | Proves |
+|-------|---------|--------|
+| Existence | `toBeDefined()`, `!= null` | Something returned |
+| Type | `typeof x === 'number'` | Correct shape |
+| Status | `code === 200` | No error |
+| Value | `toEqual(expected)`, `toBeCloseTo(x)` | Specific value |
+| Behavioral | Multi-step workflow assertions | End-to-end correctness |
+
+If a requirement demands value-level or behavioral-level proof and the test only has existence/type/status assertions → INSUFFICIENT.
+
+**6. Coverage quantity**
+
+If a requirement specifies a quantity of test cases (e.g., "30 calculations"), check if the actual number of active (non-skipped) test cases meets the requirement.
+
+**Reporting — add to VERIFICATION.md:**
+
+```markdown
+### Test Quality Audit
+
+| Test File | Linked Req | Active | Skipped | Circular | Assertion Level | Verdict |
+|-----------|-----------|--------|---------|----------|----------------|---------|
+
+**Disabled tests on requirements:** {N} → {BLOCKER if any req has ONLY disabled tests}
+**Circular patterns detected:** {N} → {BLOCKER if any}
+**Insufficient assertions:** {N} → {WARNING}
+```
+
+**Impact on status:** Any BLOCKER from test quality audit <20><><EFBFBD> overall status = `gaps_found`, regardless of other checks passing.
+</step>
+
 <step name="identify_human_verification">
 **Always needs human:** Visual appearance, user flow completion, real-time behavior (WebSocket/SSE), external service integration, performance feel, error message clarity.

@@ -201,7 +289,7 @@ Format each as: Test Name → What to do → Expected result → Why can't verif
 <step name="determine_status">
 Classify status using this decision tree IN ORDER (most restrictive first):

-1. IF any truth FAILED, artifact MISSING/STUB, key link NOT_WIRED, or blocker found:
+1. IF any truth FAILED, artifact MISSING/STUB, key link NOT_WIRED, blocker found, **or test quality audit found blockers (disabled requirement tests, circular tests)**:
   → **gaps_found**

 2. IF the previous step produced ANY human verification items:
@@ -222,7 +310,7 @@ If gaps_found:

 2. **Generate plan per cluster:** Objective, 2-3 tasks (files/action/verify each), re-verify step. Keep focused: single concern per plan.

-3. **Order by dependency:** Fix missing → fix stubs → fix wiring → verify.
+3. **Order by dependency:** Fix missing → fix stubs → fix wiring → **fix test evidence** → verify.
 </step>

 <step name="create_report">
@@ -253,6 +341,7 @@ Orchestrator routes: `passed` → update_roadmap | `gaps_found` → create/execu
 - [ ] All key links verified
 - [ ] Requirements coverage assessed (if applicable)
 - [ ] Anti-patterns scanned and categorized
+- [ ] Test quality audited (disabled tests, circular patterns, assertion strength, provenance)
 - [ ] Human verification items identified
 - [ ] Overall status determined
 - [ ] Fix plans generated (if gaps_found)
--- a/tests/verify-test-quality.test.cjs
+++ b/tests/verify-test-quality.test.cjs
@@ -0,0 +1,204 @@
+/**
+ * Tests for the audit_test_quality step in verify-phase.md
+ *
+ * Validates that the verifier's test quality audit detects:
+ * - Disabled tests (it.skip) covering requirements
+ * - Circular tests (system generating its own expected values)
+ * - Weak assertions on requirement-linked tests
+ */
+
+const { describe, test, beforeEach, afterEach } = require('node:test');
+const assert = require('node:assert/strict');
+const fs = require('fs');
+const path = require('path');
+const { createTempProject, cleanup } = require('./helpers.cjs');
+
+describe('audit_test_quality step', () => {
+  let tmpDir;
+
+  beforeEach(() => {
+    tmpDir = createTempProject();
+  });
+
+  afterEach(() => {
+    cleanup(tmpDir);
+  });
+
+  describe('disabled test detection', () => {
+    test('detects it.skip in test files', () => {
+      const testDir = path.join(tmpDir, 'tests');
+      fs.mkdirSync(testDir, { recursive: true });
+
+      fs.writeFileSync(path.join(testDir, 'parity.test.js'), [
+        'describe("parity", () => {',
+        '  it.skip("matches PHP output", async () => {',
+        '    expect(result).toBeCloseTo(155.96, 2);',
+        '  });',
+        '});',
+      ].join('\n'));
+
+      const content = fs.readFileSync(path.join(testDir, 'parity.test.js'), 'utf8');
+      const skipPatterns = /it\.skip|describe\.skip|test\.skip|xit\(|xdescribe\(|xtest\(/g;
+      const matches = content.match(skipPatterns);
+
+      assert.ok(matches, 'Should detect skip patterns');
+      assert.strictEqual(matches.length, 1);
+    });
+
+    test('detects multiple skip patterns across frameworks', () => {
+      const testDir = path.join(tmpDir, 'tests');
+      fs.mkdirSync(testDir, { recursive: true });
+
+      fs.writeFileSync(path.join(testDir, 'multi.test.js'), [
+        'describe.skip("suite", () => {});',
+        'xit("old jasmine", () => {});',
+        'test.skip("jest skip", () => {});',
+        'it.todo("not implemented");',
+      ].join('\n'));
+
+      const content = fs.readFileSync(path.join(testDir, 'multi.test.js'), 'utf8');
+      const skipPatterns = /it\.skip|describe\.skip|test\.skip|xit\(|xdescribe\(|xtest\(|it\.todo|test\.todo/g;
+      const matches = content.match(skipPatterns);
+
+      assert.ok(matches, 'Should detect all skip variants');
+      assert.strictEqual(matches.length, 4);
+    });
+
+    test('does not flag active tests as skipped', () => {
+      const testDir = path.join(tmpDir, 'tests');
+      fs.mkdirSync(testDir, { recursive: true });
+
+      fs.writeFileSync(path.join(testDir, 'active.test.js'), [
+        'describe("active suite", () => {',
+        '  it("does the thing", () => {',
+        '    expect(result).toBe(true);',
+        '  });',
+        '  test("also works", () => {',
+        '    expect(other).toBe(42);',
+        '  });',
+        '});',
+      ].join('\n'));
+
+      const content = fs.readFileSync(path.join(testDir, 'active.test.js'), 'utf8');
+      const skipPatterns = /it\.skip|describe\.skip|test\.skip|xit\(|xdescribe\(|xtest\(|it\.todo|test\.todo/g;
+      const matches = content.match(skipPatterns);
+
+      assert.strictEqual(matches, null, 'Active tests should not match skip patterns');
+    });
+  });
+
+  describe('circular test detection', () => {
+    test('detects script that imports system-under-test and writes fixtures', () => {
+      const testDir = path.join(tmpDir, 'tests');
+      fs.mkdirSync(testDir, { recursive: true });
+
+      fs.writeFileSync(path.join(testDir, 'captureBaseline.js'), [
+        'import { CalculationService } from "../server/services/calculationService.js";',
+        'import { writeFileSync } from "fs";',
+        '',
+        'const result = await CalculationService.execute(input);',
+        'fixture.expectedOutput = result.value;',
+        'writeFileSync("fixtures/data.json", JSON.stringify(fixture));',
+      ].join('\n'));
+
+      const content = fs.readFileSync(path.join(testDir, 'captureBaseline.js'), 'utf8');
+
+      const importsSystem = /import.*(?:Service|Engine|Calculator|Controller)/.test(content);
+      const writesFiles = /writeFileSync|writeFile|fs\.write/.test(content);
+
+      assert.ok(importsSystem, 'Should detect system-under-test import');
+      assert.ok(writesFiles, 'Should detect file writing');
+      assert.ok(importsSystem && writesFiles, 'Script that imports SUT and writes fixtures is CIRCULAR');
+    });
+
+    test('does not flag test helpers that only read fixtures', () => {
+      const testDir = path.join(tmpDir, 'tests');
+      fs.mkdirSync(testDir, { recursive: true });
+
+      fs.writeFileSync(path.join(testDir, 'loadFixtures.js'), [
+        'import { readFileSync } from "fs";',
+        'export function loadFixture(name) {',
+        '  return JSON.parse(readFileSync(`fixtures/${name}.json`, "utf8"));',
+        '}',
+      ].join('\n'));
+
+      const content = fs.readFileSync(path.join(testDir, 'loadFixtures.js'), 'utf8');
+
+      const importsSystem = /import.*(?:Service|Engine|Calculator|Controller)/.test(content);
+      const writesFiles = /writeFileSync|writeFile|fs\.write/.test(content);
+
+      assert.ok(!importsSystem, 'Should not flag read-only helper as importing SUT');
+      assert.ok(!writesFiles, 'Should not flag read-only helper as writing files');
+    });
+  });
+
+  describe('assertion strength classification', () => {
+    test('classifies existence-only assertions as INSUFFICIENT for value requirements', () => {
+      const assertions = [
+        'expect(result).toBeDefined()',
+        'expect(result).not.toBeNull()',
+        'assert.ok(result)',
+      ];
+
+      const existencePattern = /toBeDefined|not\.toBeNull|assert\.ok\(/;
+      const valuePattern = /toEqual|toBeCloseTo|strictEqual|deepStrictEqual/;
+
+      for (const assertion of assertions) {
+        assert.ok(existencePattern.test(assertion), `"${assertion}" should match existence pattern`);
+        assert.ok(!valuePattern.test(assertion), `"${assertion}" should NOT match value pattern`);
+      }
+    });
+
+    test('classifies value assertions as sufficient', () => {
+      const assertions = [
+        'expect(result).toBeCloseTo(155.96, 2)',
+        'expect(result).toEqual({ amount: 100 })',
+        'assert.strictEqual(result, 42)',
+      ];
+
+      const valuePattern = /toEqual|toBeCloseTo|strictEqual|deepStrictEqual/;
+
+      for (const assertion of assertions) {
+        assert.ok(valuePattern.test(assertion), `"${assertion}" should match value pattern`);
+      }
+    });
+  });
+
+  describe('provenance classification', () => {
+    test('fixture with legacy system comment classified as VALID', () => {
+      const fixture = {
+        legacyId: 10341,
+        comment: 'Real PHP fixture - output from legacy system',
+        dbDependent: true,
+        expectedOutput: { value: 155.96 },
+      };
+
+      const hasLegacySource = /legacy|php|real|manual|captured from/i.test(fixture.comment || '');
+      assert.ok(hasLegacySource, 'Comment referencing legacy system = VALID provenance');
+    });
+
+    test('fixture with synthetic/baseline comment classified as SUSPECT', () => {
+      const fixture = {
+        legacyId: null,
+        comment: 'Synthetic offline fixture - computed from known algorithm',
+        dbDependent: false,
+        expectedOutput: { value: 1240.68 },
+      };
+
+      const hasSyntheticSource = /synthetic|computed|baseline|generated|captured from engine/i.test(fixture.comment || '');
+      const hasLegacySource = /legacy|php|real output|manual capture/i.test(fixture.comment || '');
+
+      assert.ok(hasSyntheticSource, 'Comment indicating synthetic source detected');
+      assert.ok(!hasLegacySource, 'Should NOT be classified as legacy source');
+    });
+
+    test('fixture with no comment classified as UNKNOWN', () => {
+      const fixture = {
+        expectedOutput: { value: 42 },
+      };
+
+      const hasAnyProvenance = (fixture.comment || '').length > 0;
+      assert.ok(!hasAnyProvenance, 'No comment = UNKNOWN provenance');
+    });
+  });
+});