mirror of
https://github.com/glittercowboy/get-shit-done
synced 2026-04-25 17:25:23 +02:00
feat: add test quality audit step to verify-phase workflow
Adds a new `audit_test_quality` step between `scan_antipatterns` and `identify_human_verification` that catches test-level deceptions: - Disabled tests (it.skip/xit/test.todo) covering phase requirements - Circular tests (system generating its own expected values) - Weak assertions (existence-only when value-level proof needed) - Expected value provenance tracking for parity/migration phases Any blocker from this audit forces `gaps_found` status, preventing phases from being marked complete with inadequate test evidence. Fixes #1457 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -13,6 +13,7 @@ Goal-backward verification:
|
|||||||
1. What must be TRUE for the goal to be achieved?
|
1. What must be TRUE for the goal to be achieved?
|
||||||
2. What must EXIST for those truths to hold?
|
2. What must EXIST for those truths to hold?
|
||||||
3. What must be WIRED for those artifacts to function?
|
3. What must be WIRED for those artifacts to function?
|
||||||
|
4. What must TESTS PROVE for those truths to be evidenced?
|
||||||
|
|
||||||
Then verify each level against the actual codebase.
|
Then verify each level against the actual codebase.
|
||||||
</core_principle>
|
</core_principle>
|
||||||
@@ -190,6 +191,93 @@ Extract files modified in this phase from SUMMARY.md, scan each:
|
|||||||
Categorize: 🛑 Blocker (prevents goal) | ⚠️ Warning (incomplete) | ℹ️ Info (notable).
|
Categorize: 🛑 Blocker (prevents goal) | ⚠️ Warning (incomplete) | ℹ️ Info (notable).
|
||||||
</step>
|
</step>
|
||||||
|
|
||||||
|
<step name="audit_test_quality">
|
||||||
|
**Verify that tests PROVE what they claim to prove.**
|
||||||
|
|
||||||
|
This step catches test-level deceptions that pass all prior checks: files exist, are substantive, are wired, and tests pass — but the tests don't actually validate the requirement.
|
||||||
|
|
||||||
|
**1. Identify requirement-linked test files**
|
||||||
|
|
||||||
|
From PLAN and SUMMARY files, map each requirement to the test files that are supposed to prove it.
|
||||||
|
|
||||||
|
**2. Disabled test scan**
|
||||||
|
|
||||||
|
For ALL test files linked to requirements, search for disabled/skipped patterns:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
grep -rn -E "it\.skip|describe\.skip|test\.skip|xit\(|xdescribe\(|xtest\(|@pytest\.mark\.skip|@unittest\.skip|#\[ignore\]|\.pending|it\.todo|test\.todo" "$TEST_FILE"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Rule:** A disabled test linked to a requirement = requirement NOT tested.
|
||||||
|
- 🛑 BLOCKER if the disabled test is the only test proving that requirement
|
||||||
|
- ⚠️ WARNING if other active tests also cover the requirement
|
||||||
|
|
||||||
|
**3. Circular test detection**
|
||||||
|
|
||||||
|
Search for scripts/utilities that generate expected values by running the system under test:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
grep -rn -E "writeFileSync|writeFile|fs\.write|open\(.*w\)" "$TEST_DIRS"
|
||||||
|
```
|
||||||
|
|
||||||
|
For each match, check if it also imports the system/service/module being tested. If a script both imports the system-under-test AND writes expected output values → CIRCULAR.
|
||||||
|
|
||||||
|
**Circular test indicators:**
|
||||||
|
- Script imports a service AND writes to fixture files
|
||||||
|
- Expected values have comments like "computed from engine", "captured from baseline"
|
||||||
|
- Script filename contains "capture", "baseline", "generate", "snapshot" in test context
|
||||||
|
- Expected values were added in the same commit as the test assertions
|
||||||
|
|
||||||
|
**Rule:** A test comparing system output against values generated by the same system is circular. It proves consistency, not correctness.
|
||||||
|
|
||||||
|
**4. Expected value provenance** (for comparison/parity/migration requirements)
|
||||||
|
|
||||||
|
When a requirement demands comparison with an external source ("identical to X", "matches Y", "same output as Z"):
|
||||||
|
|
||||||
|
- Is the external source actually invoked or referenced in the test pipeline?
|
||||||
|
- Do fixture files contain data sourced from the external system?
|
||||||
|
- Or do all expected values come from the new system itself or from mathematical formulas?
|
||||||
|
|
||||||
|
**Provenance classification:**
|
||||||
|
- VALID: Expected value from external/legacy system output, manual capture, or independent oracle
|
||||||
|
- PARTIAL: Expected value from mathematical derivation (proves formula, not system match)
|
||||||
|
- CIRCULAR: Expected value from the system being tested
|
||||||
|
- UNKNOWN: No provenance information — treat as SUSPECT
|
||||||
|
|
||||||
|
**5. Assertion strength**
|
||||||
|
|
||||||
|
For each test linked to a requirement, classify the strongest assertion:
|
||||||
|
|
||||||
|
| Level | Examples | Proves |
|
||||||
|
|-------|---------|--------|
|
||||||
|
| Existence | `toBeDefined()`, `!= null` | Something returned |
|
||||||
|
| Type | `typeof x === 'number'` | Correct shape |
|
||||||
|
| Status | `code === 200` | No error |
|
||||||
|
| Value | `toEqual(expected)`, `toBeCloseTo(x)` | Specific value |
|
||||||
|
| Behavioral | Multi-step workflow assertions | End-to-end correctness |
|
||||||
|
|
||||||
|
If a requirement demands value-level or behavioral-level proof and the test only has existence/type/status assertions → INSUFFICIENT.
|
||||||
|
|
||||||
|
**6. Coverage quantity**
|
||||||
|
|
||||||
|
If a requirement specifies a quantity of test cases (e.g., "30 calculations"), check if the actual number of active (non-skipped) test cases meets the requirement.
|
||||||
|
|
||||||
|
**Reporting — add to VERIFICATION.md:**
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
### Test Quality Audit
|
||||||
|
|
||||||
|
| Test File | Linked Req | Active | Skipped | Circular | Assertion Level | Verdict |
|
||||||
|
|-----------|-----------|--------|---------|----------|----------------|---------|
|
||||||
|
|
||||||
|
**Disabled tests on requirements:** {N} → {BLOCKER if any req has ONLY disabled tests}
|
||||||
|
**Circular patterns detected:** {N} → {BLOCKER if any}
|
||||||
|
**Insufficient assertions:** {N} → {WARNING}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Impact on status:** Any BLOCKER from test quality audit <20><><EFBFBD> overall status = `gaps_found`, regardless of other checks passing.
|
||||||
|
</step>
|
||||||
|
|
||||||
<step name="identify_human_verification">
|
<step name="identify_human_verification">
|
||||||
**Always needs human:** Visual appearance, user flow completion, real-time behavior (WebSocket/SSE), external service integration, performance feel, error message clarity.
|
**Always needs human:** Visual appearance, user flow completion, real-time behavior (WebSocket/SSE), external service integration, performance feel, error message clarity.
|
||||||
|
|
||||||
@@ -201,7 +289,7 @@ Format each as: Test Name → What to do → Expected result → Why can't verif
|
|||||||
<step name="determine_status">
|
<step name="determine_status">
|
||||||
Classify status using this decision tree IN ORDER (most restrictive first):
|
Classify status using this decision tree IN ORDER (most restrictive first):
|
||||||
|
|
||||||
1. IF any truth FAILED, artifact MISSING/STUB, key link NOT_WIRED, or blocker found:
|
1. IF any truth FAILED, artifact MISSING/STUB, key link NOT_WIRED, blocker found, **or test quality audit found blockers (disabled requirement tests, circular tests)**:
|
||||||
→ **gaps_found**
|
→ **gaps_found**
|
||||||
|
|
||||||
2. IF the previous step produced ANY human verification items:
|
2. IF the previous step produced ANY human verification items:
|
||||||
@@ -222,7 +310,7 @@ If gaps_found:
|
|||||||
|
|
||||||
2. **Generate plan per cluster:** Objective, 2-3 tasks (files/action/verify each), re-verify step. Keep focused: single concern per plan.
|
2. **Generate plan per cluster:** Objective, 2-3 tasks (files/action/verify each), re-verify step. Keep focused: single concern per plan.
|
||||||
|
|
||||||
3. **Order by dependency:** Fix missing → fix stubs → fix wiring → verify.
|
3. **Order by dependency:** Fix missing → fix stubs → fix wiring → **fix test evidence** → verify.
|
||||||
</step>
|
</step>
|
||||||
|
|
||||||
<step name="create_report">
|
<step name="create_report">
|
||||||
@@ -253,6 +341,7 @@ Orchestrator routes: `passed` → update_roadmap | `gaps_found` → create/execu
|
|||||||
- [ ] All key links verified
|
- [ ] All key links verified
|
||||||
- [ ] Requirements coverage assessed (if applicable)
|
- [ ] Requirements coverage assessed (if applicable)
|
||||||
- [ ] Anti-patterns scanned and categorized
|
- [ ] Anti-patterns scanned and categorized
|
||||||
|
- [ ] Test quality audited (disabled tests, circular patterns, assertion strength, provenance)
|
||||||
- [ ] Human verification items identified
|
- [ ] Human verification items identified
|
||||||
- [ ] Overall status determined
|
- [ ] Overall status determined
|
||||||
- [ ] Fix plans generated (if gaps_found)
|
- [ ] Fix plans generated (if gaps_found)
|
||||||
|
|||||||
204
tests/verify-test-quality.test.cjs
Normal file
204
tests/verify-test-quality.test.cjs
Normal file
@@ -0,0 +1,204 @@
|
|||||||
|
/**
|
||||||
|
* Tests for the audit_test_quality step in verify-phase.md
|
||||||
|
*
|
||||||
|
* Validates that the verifier's test quality audit detects:
|
||||||
|
* - Disabled tests (it.skip) covering requirements
|
||||||
|
* - Circular tests (system generating its own expected values)
|
||||||
|
* - Weak assertions on requirement-linked tests
|
||||||
|
*/
|
||||||
|
|
||||||
|
const { describe, test, beforeEach, afterEach } = require('node:test');
|
||||||
|
const assert = require('node:assert/strict');
|
||||||
|
const fs = require('fs');
|
||||||
|
const path = require('path');
|
||||||
|
const { createTempProject, cleanup } = require('./helpers.cjs');
|
||||||
|
|
||||||
|
describe('audit_test_quality step', () => {
|
||||||
|
let tmpDir;
|
||||||
|
|
||||||
|
beforeEach(() => {
|
||||||
|
tmpDir = createTempProject();
|
||||||
|
});
|
||||||
|
|
||||||
|
afterEach(() => {
|
||||||
|
cleanup(tmpDir);
|
||||||
|
});
|
||||||
|
|
||||||
|
describe('disabled test detection', () => {
|
||||||
|
test('detects it.skip in test files', () => {
|
||||||
|
const testDir = path.join(tmpDir, 'tests');
|
||||||
|
fs.mkdirSync(testDir, { recursive: true });
|
||||||
|
|
||||||
|
fs.writeFileSync(path.join(testDir, 'parity.test.js'), [
|
||||||
|
'describe("parity", () => {',
|
||||||
|
' it.skip("matches PHP output", async () => {',
|
||||||
|
' expect(result).toBeCloseTo(155.96, 2);',
|
||||||
|
' });',
|
||||||
|
'});',
|
||||||
|
].join('\n'));
|
||||||
|
|
||||||
|
const content = fs.readFileSync(path.join(testDir, 'parity.test.js'), 'utf8');
|
||||||
|
const skipPatterns = /it\.skip|describe\.skip|test\.skip|xit\(|xdescribe\(|xtest\(/g;
|
||||||
|
const matches = content.match(skipPatterns);
|
||||||
|
|
||||||
|
assert.ok(matches, 'Should detect skip patterns');
|
||||||
|
assert.strictEqual(matches.length, 1);
|
||||||
|
});
|
||||||
|
|
||||||
|
test('detects multiple skip patterns across frameworks', () => {
|
||||||
|
const testDir = path.join(tmpDir, 'tests');
|
||||||
|
fs.mkdirSync(testDir, { recursive: true });
|
||||||
|
|
||||||
|
fs.writeFileSync(path.join(testDir, 'multi.test.js'), [
|
||||||
|
'describe.skip("suite", () => {});',
|
||||||
|
'xit("old jasmine", () => {});',
|
||||||
|
'test.skip("jest skip", () => {});',
|
||||||
|
'it.todo("not implemented");',
|
||||||
|
].join('\n'));
|
||||||
|
|
||||||
|
const content = fs.readFileSync(path.join(testDir, 'multi.test.js'), 'utf8');
|
||||||
|
const skipPatterns = /it\.skip|describe\.skip|test\.skip|xit\(|xdescribe\(|xtest\(|it\.todo|test\.todo/g;
|
||||||
|
const matches = content.match(skipPatterns);
|
||||||
|
|
||||||
|
assert.ok(matches, 'Should detect all skip variants');
|
||||||
|
assert.strictEqual(matches.length, 4);
|
||||||
|
});
|
||||||
|
|
||||||
|
test('does not flag active tests as skipped', () => {
|
||||||
|
const testDir = path.join(tmpDir, 'tests');
|
||||||
|
fs.mkdirSync(testDir, { recursive: true });
|
||||||
|
|
||||||
|
fs.writeFileSync(path.join(testDir, 'active.test.js'), [
|
||||||
|
'describe("active suite", () => {',
|
||||||
|
' it("does the thing", () => {',
|
||||||
|
' expect(result).toBe(true);',
|
||||||
|
' });',
|
||||||
|
' test("also works", () => {',
|
||||||
|
' expect(other).toBe(42);',
|
||||||
|
' });',
|
||||||
|
'});',
|
||||||
|
].join('\n'));
|
||||||
|
|
||||||
|
const content = fs.readFileSync(path.join(testDir, 'active.test.js'), 'utf8');
|
||||||
|
const skipPatterns = /it\.skip|describe\.skip|test\.skip|xit\(|xdescribe\(|xtest\(|it\.todo|test\.todo/g;
|
||||||
|
const matches = content.match(skipPatterns);
|
||||||
|
|
||||||
|
assert.strictEqual(matches, null, 'Active tests should not match skip patterns');
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
describe('circular test detection', () => {
|
||||||
|
test('detects script that imports system-under-test and writes fixtures', () => {
|
||||||
|
const testDir = path.join(tmpDir, 'tests');
|
||||||
|
fs.mkdirSync(testDir, { recursive: true });
|
||||||
|
|
||||||
|
fs.writeFileSync(path.join(testDir, 'captureBaseline.js'), [
|
||||||
|
'import { CalculationService } from "../server/services/calculationService.js";',
|
||||||
|
'import { writeFileSync } from "fs";',
|
||||||
|
'',
|
||||||
|
'const result = await CalculationService.execute(input);',
|
||||||
|
'fixture.expectedOutput = result.value;',
|
||||||
|
'writeFileSync("fixtures/data.json", JSON.stringify(fixture));',
|
||||||
|
].join('\n'));
|
||||||
|
|
||||||
|
const content = fs.readFileSync(path.join(testDir, 'captureBaseline.js'), 'utf8');
|
||||||
|
|
||||||
|
const importsSystem = /import.*(?:Service|Engine|Calculator|Controller)/.test(content);
|
||||||
|
const writesFiles = /writeFileSync|writeFile|fs\.write/.test(content);
|
||||||
|
|
||||||
|
assert.ok(importsSystem, 'Should detect system-under-test import');
|
||||||
|
assert.ok(writesFiles, 'Should detect file writing');
|
||||||
|
assert.ok(importsSystem && writesFiles, 'Script that imports SUT and writes fixtures is CIRCULAR');
|
||||||
|
});
|
||||||
|
|
||||||
|
test('does not flag test helpers that only read fixtures', () => {
|
||||||
|
const testDir = path.join(tmpDir, 'tests');
|
||||||
|
fs.mkdirSync(testDir, { recursive: true });
|
||||||
|
|
||||||
|
fs.writeFileSync(path.join(testDir, 'loadFixtures.js'), [
|
||||||
|
'import { readFileSync } from "fs";',
|
||||||
|
'export function loadFixture(name) {',
|
||||||
|
' return JSON.parse(readFileSync(`fixtures/${name}.json`, "utf8"));',
|
||||||
|
'}',
|
||||||
|
].join('\n'));
|
||||||
|
|
||||||
|
const content = fs.readFileSync(path.join(testDir, 'loadFixtures.js'), 'utf8');
|
||||||
|
|
||||||
|
const importsSystem = /import.*(?:Service|Engine|Calculator|Controller)/.test(content);
|
||||||
|
const writesFiles = /writeFileSync|writeFile|fs\.write/.test(content);
|
||||||
|
|
||||||
|
assert.ok(!importsSystem, 'Should not flag read-only helper as importing SUT');
|
||||||
|
assert.ok(!writesFiles, 'Should not flag read-only helper as writing files');
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
describe('assertion strength classification', () => {
|
||||||
|
test('classifies existence-only assertions as INSUFFICIENT for value requirements', () => {
|
||||||
|
const assertions = [
|
||||||
|
'expect(result).toBeDefined()',
|
||||||
|
'expect(result).not.toBeNull()',
|
||||||
|
'assert.ok(result)',
|
||||||
|
];
|
||||||
|
|
||||||
|
const existencePattern = /toBeDefined|not\.toBeNull|assert\.ok\(/;
|
||||||
|
const valuePattern = /toEqual|toBeCloseTo|strictEqual|deepStrictEqual/;
|
||||||
|
|
||||||
|
for (const assertion of assertions) {
|
||||||
|
assert.ok(existencePattern.test(assertion), `"${assertion}" should match existence pattern`);
|
||||||
|
assert.ok(!valuePattern.test(assertion), `"${assertion}" should NOT match value pattern`);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
test('classifies value assertions as sufficient', () => {
|
||||||
|
const assertions = [
|
||||||
|
'expect(result).toBeCloseTo(155.96, 2)',
|
||||||
|
'expect(result).toEqual({ amount: 100 })',
|
||||||
|
'assert.strictEqual(result, 42)',
|
||||||
|
];
|
||||||
|
|
||||||
|
const valuePattern = /toEqual|toBeCloseTo|strictEqual|deepStrictEqual/;
|
||||||
|
|
||||||
|
for (const assertion of assertions) {
|
||||||
|
assert.ok(valuePattern.test(assertion), `"${assertion}" should match value pattern`);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
describe('provenance classification', () => {
|
||||||
|
test('fixture with legacy system comment classified as VALID', () => {
|
||||||
|
const fixture = {
|
||||||
|
legacyId: 10341,
|
||||||
|
comment: 'Real PHP fixture - output from legacy system',
|
||||||
|
dbDependent: true,
|
||||||
|
expectedOutput: { value: 155.96 },
|
||||||
|
};
|
||||||
|
|
||||||
|
const hasLegacySource = /legacy|php|real|manual|captured from/i.test(fixture.comment || '');
|
||||||
|
assert.ok(hasLegacySource, 'Comment referencing legacy system = VALID provenance');
|
||||||
|
});
|
||||||
|
|
||||||
|
test('fixture with synthetic/baseline comment classified as SUSPECT', () => {
|
||||||
|
const fixture = {
|
||||||
|
legacyId: null,
|
||||||
|
comment: 'Synthetic offline fixture - computed from known algorithm',
|
||||||
|
dbDependent: false,
|
||||||
|
expectedOutput: { value: 1240.68 },
|
||||||
|
};
|
||||||
|
|
||||||
|
const hasSyntheticSource = /synthetic|computed|baseline|generated|captured from engine/i.test(fixture.comment || '');
|
||||||
|
const hasLegacySource = /legacy|php|real output|manual capture/i.test(fixture.comment || '');
|
||||||
|
|
||||||
|
assert.ok(hasSyntheticSource, 'Comment indicating synthetic source detected');
|
||||||
|
assert.ok(!hasLegacySource, 'Should NOT be classified as legacy source');
|
||||||
|
});
|
||||||
|
|
||||||
|
test('fixture with no comment classified as UNKNOWN', () => {
|
||||||
|
const fixture = {
|
||||||
|
expectedOutput: { value: 42 },
|
||||||
|
};
|
||||||
|
|
||||||
|
const hasAnyProvenance = (fixture.comment || '').length > 0;
|
||||||
|
assert.ok(!hasAnyProvenance, 'No comment = UNKNOWN provenance');
|
||||||
|
});
|
||||||
|
});
|
||||||
|
});
|
||||||
Reference in New Issue
Block a user