mirror of
https://github.com/glittercowboy/get-shit-done
synced 2026-04-25 17:25:23 +02:00
feat: add test quality audit step to verify-phase workflow
Adds a new `audit_test_quality` step between `scan_antipatterns` and `identify_human_verification` that catches test-level deceptions: - Disabled tests (it.skip/xit/test.todo) covering phase requirements - Circular tests (system generating its own expected values) - Weak assertions (existence-only when value-level proof needed) - Expected value provenance tracking for parity/migration phases Any blocker from this audit forces `gaps_found` status, preventing phases from being marked complete with inadequate test evidence. Fixes #1457 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -13,6 +13,7 @@ Goal-backward verification:
|
||||
1. What must be TRUE for the goal to be achieved?
|
||||
2. What must EXIST for those truths to hold?
|
||||
3. What must be WIRED for those artifacts to function?
|
||||
4. What must TESTS PROVE for those truths to be evidenced?
|
||||
|
||||
Then verify each level against the actual codebase.
|
||||
</core_principle>
|
||||
@@ -190,6 +191,93 @@ Extract files modified in this phase from SUMMARY.md, scan each:
|
||||
Categorize: 🛑 Blocker (prevents goal) | ⚠️ Warning (incomplete) | ℹ️ Info (notable).
|
||||
</step>
|
||||
|
||||
<step name="audit_test_quality">
|
||||
**Verify that tests PROVE what they claim to prove.**
|
||||
|
||||
This step catches test-level deceptions that pass all prior checks: files exist, are substantive, are wired, and tests pass — but the tests don't actually validate the requirement.
|
||||
|
||||
**1. Identify requirement-linked test files**
|
||||
|
||||
From PLAN and SUMMARY files, map each requirement to the test files that are supposed to prove it.
|
||||
|
||||
**2. Disabled test scan**
|
||||
|
||||
For ALL test files linked to requirements, search for disabled/skipped patterns:
|
||||
|
||||
```bash
|
||||
grep -rn -E "it\.skip|describe\.skip|test\.skip|xit\(|xdescribe\(|xtest\(|@pytest\.mark\.skip|@unittest\.skip|#\[ignore\]|\.pending|it\.todo|test\.todo" "$TEST_FILE"
|
||||
```
|
||||
|
||||
**Rule:** A disabled test linked to a requirement = requirement NOT tested.
|
||||
- 🛑 BLOCKER if the disabled test is the only test proving that requirement
|
||||
- ⚠️ WARNING if other active tests also cover the requirement
|
||||
|
||||
**3. Circular test detection**
|
||||
|
||||
Search for scripts/utilities that generate expected values by running the system under test:
|
||||
|
||||
```bash
|
||||
grep -rn -E "writeFileSync|writeFile|fs\.write|open\(.*w\)" "$TEST_DIRS"
|
||||
```
|
||||
|
||||
For each match, check if it also imports the system/service/module being tested. If a script both imports the system-under-test AND writes expected output values → CIRCULAR.
|
||||
|
||||
**Circular test indicators:**
|
||||
- Script imports a service AND writes to fixture files
|
||||
- Expected values have comments like "computed from engine", "captured from baseline"
|
||||
- Script filename contains "capture", "baseline", "generate", "snapshot" in test context
|
||||
- Expected values were added in the same commit as the test assertions
|
||||
|
||||
**Rule:** A test comparing system output against values generated by the same system is circular. It proves consistency, not correctness.
|
||||
|
||||
**4. Expected value provenance** (for comparison/parity/migration requirements)
|
||||
|
||||
When a requirement demands comparison with an external source ("identical to X", "matches Y", "same output as Z"):
|
||||
|
||||
- Is the external source actually invoked or referenced in the test pipeline?
|
||||
- Do fixture files contain data sourced from the external system?
|
||||
- Or do all expected values come from the new system itself or from mathematical formulas?
|
||||
|
||||
**Provenance classification:**
|
||||
- VALID: Expected value from external/legacy system output, manual capture, or independent oracle
|
||||
- PARTIAL: Expected value from mathematical derivation (proves formula, not system match)
|
||||
- CIRCULAR: Expected value from the system being tested
|
||||
- UNKNOWN: No provenance information — treat as SUSPECT
|
||||
|
||||
**5. Assertion strength**
|
||||
|
||||
For each test linked to a requirement, classify the strongest assertion:
|
||||
|
||||
| Level | Examples | Proves |
|
||||
|-------|---------|--------|
|
||||
| Existence | `toBeDefined()`, `!= null` | Something returned |
|
||||
| Type | `typeof x === 'number'` | Correct shape |
|
||||
| Status | `code === 200` | No error |
|
||||
| Value | `toEqual(expected)`, `toBeCloseTo(x)` | Specific value |
|
||||
| Behavioral | Multi-step workflow assertions | End-to-end correctness |
|
||||
|
||||
If a requirement demands value-level or behavioral-level proof and the test only has existence/type/status assertions → INSUFFICIENT.
|
||||
|
||||
**6. Coverage quantity**
|
||||
|
||||
If a requirement specifies a quantity of test cases (e.g., "30 calculations"), check if the actual number of active (non-skipped) test cases meets the requirement.
|
||||
|
||||
**Reporting — add to VERIFICATION.md:**
|
||||
|
||||
```markdown
|
||||
### Test Quality Audit
|
||||
|
||||
| Test File | Linked Req | Active | Skipped | Circular | Assertion Level | Verdict |
|
||||
|-----------|-----------|--------|---------|----------|----------------|---------|
|
||||
|
||||
**Disabled tests on requirements:** {N} → {BLOCKER if any req has ONLY disabled tests}
|
||||
**Circular patterns detected:** {N} → {BLOCKER if any}
|
||||
**Insufficient assertions:** {N} → {WARNING}
|
||||
```
|
||||
|
||||
**Impact on status:** Any BLOCKER from test quality audit <20><><EFBFBD> overall status = `gaps_found`, regardless of other checks passing.
|
||||
</step>
|
||||
|
||||
<step name="identify_human_verification">
|
||||
**Always needs human:** Visual appearance, user flow completion, real-time behavior (WebSocket/SSE), external service integration, performance feel, error message clarity.
|
||||
|
||||
@@ -201,7 +289,7 @@ Format each as: Test Name → What to do → Expected result → Why can't verif
|
||||
<step name="determine_status">
|
||||
Classify status using this decision tree IN ORDER (most restrictive first):
|
||||
|
||||
1. IF any truth FAILED, artifact MISSING/STUB, key link NOT_WIRED, or blocker found:
|
||||
1. IF any truth FAILED, artifact MISSING/STUB, key link NOT_WIRED, blocker found, **or test quality audit found blockers (disabled requirement tests, circular tests)**:
|
||||
→ **gaps_found**
|
||||
|
||||
2. IF the previous step produced ANY human verification items:
|
||||
@@ -222,7 +310,7 @@ If gaps_found:
|
||||
|
||||
2. **Generate plan per cluster:** Objective, 2-3 tasks (files/action/verify each), re-verify step. Keep focused: single concern per plan.
|
||||
|
||||
3. **Order by dependency:** Fix missing → fix stubs → fix wiring → verify.
|
||||
3. **Order by dependency:** Fix missing → fix stubs → fix wiring → **fix test evidence** → verify.
|
||||
</step>
|
||||
|
||||
<step name="create_report">
|
||||
@@ -253,6 +341,7 @@ Orchestrator routes: `passed` → update_roadmap | `gaps_found` → create/execu
|
||||
- [ ] All key links verified
|
||||
- [ ] Requirements coverage assessed (if applicable)
|
||||
- [ ] Anti-patterns scanned and categorized
|
||||
- [ ] Test quality audited (disabled tests, circular patterns, assertion strength, provenance)
|
||||
- [ ] Human verification items identified
|
||||
- [ ] Overall status determined
|
||||
- [ ] Fix plans generated (if gaps_found)
|
||||
|
||||
204
tests/verify-test-quality.test.cjs
Normal file
204
tests/verify-test-quality.test.cjs
Normal file
@@ -0,0 +1,204 @@
|
||||
/**
|
||||
* Tests for the audit_test_quality step in verify-phase.md
|
||||
*
|
||||
* Validates that the verifier's test quality audit detects:
|
||||
* - Disabled tests (it.skip) covering requirements
|
||||
* - Circular tests (system generating its own expected values)
|
||||
* - Weak assertions on requirement-linked tests
|
||||
*/
|
||||
|
||||
const { describe, test, beforeEach, afterEach } = require('node:test');
|
||||
const assert = require('node:assert/strict');
|
||||
const fs = require('fs');
|
||||
const path = require('path');
|
||||
const { createTempProject, cleanup } = require('./helpers.cjs');
|
||||
|
||||
describe('audit_test_quality step', () => {
|
||||
let tmpDir;
|
||||
|
||||
beforeEach(() => {
|
||||
tmpDir = createTempProject();
|
||||
});
|
||||
|
||||
afterEach(() => {
|
||||
cleanup(tmpDir);
|
||||
});
|
||||
|
||||
describe('disabled test detection', () => {
|
||||
test('detects it.skip in test files', () => {
|
||||
const testDir = path.join(tmpDir, 'tests');
|
||||
fs.mkdirSync(testDir, { recursive: true });
|
||||
|
||||
fs.writeFileSync(path.join(testDir, 'parity.test.js'), [
|
||||
'describe("parity", () => {',
|
||||
' it.skip("matches PHP output", async () => {',
|
||||
' expect(result).toBeCloseTo(155.96, 2);',
|
||||
' });',
|
||||
'});',
|
||||
].join('\n'));
|
||||
|
||||
const content = fs.readFileSync(path.join(testDir, 'parity.test.js'), 'utf8');
|
||||
const skipPatterns = /it\.skip|describe\.skip|test\.skip|xit\(|xdescribe\(|xtest\(/g;
|
||||
const matches = content.match(skipPatterns);
|
||||
|
||||
assert.ok(matches, 'Should detect skip patterns');
|
||||
assert.strictEqual(matches.length, 1);
|
||||
});
|
||||
|
||||
test('detects multiple skip patterns across frameworks', () => {
|
||||
const testDir = path.join(tmpDir, 'tests');
|
||||
fs.mkdirSync(testDir, { recursive: true });
|
||||
|
||||
fs.writeFileSync(path.join(testDir, 'multi.test.js'), [
|
||||
'describe.skip("suite", () => {});',
|
||||
'xit("old jasmine", () => {});',
|
||||
'test.skip("jest skip", () => {});',
|
||||
'it.todo("not implemented");',
|
||||
].join('\n'));
|
||||
|
||||
const content = fs.readFileSync(path.join(testDir, 'multi.test.js'), 'utf8');
|
||||
const skipPatterns = /it\.skip|describe\.skip|test\.skip|xit\(|xdescribe\(|xtest\(|it\.todo|test\.todo/g;
|
||||
const matches = content.match(skipPatterns);
|
||||
|
||||
assert.ok(matches, 'Should detect all skip variants');
|
||||
assert.strictEqual(matches.length, 4);
|
||||
});
|
||||
|
||||
test('does not flag active tests as skipped', () => {
|
||||
const testDir = path.join(tmpDir, 'tests');
|
||||
fs.mkdirSync(testDir, { recursive: true });
|
||||
|
||||
fs.writeFileSync(path.join(testDir, 'active.test.js'), [
|
||||
'describe("active suite", () => {',
|
||||
' it("does the thing", () => {',
|
||||
' expect(result).toBe(true);',
|
||||
' });',
|
||||
' test("also works", () => {',
|
||||
' expect(other).toBe(42);',
|
||||
' });',
|
||||
'});',
|
||||
].join('\n'));
|
||||
|
||||
const content = fs.readFileSync(path.join(testDir, 'active.test.js'), 'utf8');
|
||||
const skipPatterns = /it\.skip|describe\.skip|test\.skip|xit\(|xdescribe\(|xtest\(|it\.todo|test\.todo/g;
|
||||
const matches = content.match(skipPatterns);
|
||||
|
||||
assert.strictEqual(matches, null, 'Active tests should not match skip patterns');
|
||||
});
|
||||
});
|
||||
|
||||
describe('circular test detection', () => {
|
||||
test('detects script that imports system-under-test and writes fixtures', () => {
|
||||
const testDir = path.join(tmpDir, 'tests');
|
||||
fs.mkdirSync(testDir, { recursive: true });
|
||||
|
||||
fs.writeFileSync(path.join(testDir, 'captureBaseline.js'), [
|
||||
'import { CalculationService } from "../server/services/calculationService.js";',
|
||||
'import { writeFileSync } from "fs";',
|
||||
'',
|
||||
'const result = await CalculationService.execute(input);',
|
||||
'fixture.expectedOutput = result.value;',
|
||||
'writeFileSync("fixtures/data.json", JSON.stringify(fixture));',
|
||||
].join('\n'));
|
||||
|
||||
const content = fs.readFileSync(path.join(testDir, 'captureBaseline.js'), 'utf8');
|
||||
|
||||
const importsSystem = /import.*(?:Service|Engine|Calculator|Controller)/.test(content);
|
||||
const writesFiles = /writeFileSync|writeFile|fs\.write/.test(content);
|
||||
|
||||
assert.ok(importsSystem, 'Should detect system-under-test import');
|
||||
assert.ok(writesFiles, 'Should detect file writing');
|
||||
assert.ok(importsSystem && writesFiles, 'Script that imports SUT and writes fixtures is CIRCULAR');
|
||||
});
|
||||
|
||||
test('does not flag test helpers that only read fixtures', () => {
|
||||
const testDir = path.join(tmpDir, 'tests');
|
||||
fs.mkdirSync(testDir, { recursive: true });
|
||||
|
||||
fs.writeFileSync(path.join(testDir, 'loadFixtures.js'), [
|
||||
'import { readFileSync } from "fs";',
|
||||
'export function loadFixture(name) {',
|
||||
' return JSON.parse(readFileSync(`fixtures/${name}.json`, "utf8"));',
|
||||
'}',
|
||||
].join('\n'));
|
||||
|
||||
const content = fs.readFileSync(path.join(testDir, 'loadFixtures.js'), 'utf8');
|
||||
|
||||
const importsSystem = /import.*(?:Service|Engine|Calculator|Controller)/.test(content);
|
||||
const writesFiles = /writeFileSync|writeFile|fs\.write/.test(content);
|
||||
|
||||
assert.ok(!importsSystem, 'Should not flag read-only helper as importing SUT');
|
||||
assert.ok(!writesFiles, 'Should not flag read-only helper as writing files');
|
||||
});
|
||||
});
|
||||
|
||||
describe('assertion strength classification', () => {
|
||||
test('classifies existence-only assertions as INSUFFICIENT for value requirements', () => {
|
||||
const assertions = [
|
||||
'expect(result).toBeDefined()',
|
||||
'expect(result).not.toBeNull()',
|
||||
'assert.ok(result)',
|
||||
];
|
||||
|
||||
const existencePattern = /toBeDefined|not\.toBeNull|assert\.ok\(/;
|
||||
const valuePattern = /toEqual|toBeCloseTo|strictEqual|deepStrictEqual/;
|
||||
|
||||
for (const assertion of assertions) {
|
||||
assert.ok(existencePattern.test(assertion), `"${assertion}" should match existence pattern`);
|
||||
assert.ok(!valuePattern.test(assertion), `"${assertion}" should NOT match value pattern`);
|
||||
}
|
||||
});
|
||||
|
||||
test('classifies value assertions as sufficient', () => {
|
||||
const assertions = [
|
||||
'expect(result).toBeCloseTo(155.96, 2)',
|
||||
'expect(result).toEqual({ amount: 100 })',
|
||||
'assert.strictEqual(result, 42)',
|
||||
];
|
||||
|
||||
const valuePattern = /toEqual|toBeCloseTo|strictEqual|deepStrictEqual/;
|
||||
|
||||
for (const assertion of assertions) {
|
||||
assert.ok(valuePattern.test(assertion), `"${assertion}" should match value pattern`);
|
||||
}
|
||||
});
|
||||
});
|
||||
|
||||
describe('provenance classification', () => {
|
||||
test('fixture with legacy system comment classified as VALID', () => {
|
||||
const fixture = {
|
||||
legacyId: 10341,
|
||||
comment: 'Real PHP fixture - output from legacy system',
|
||||
dbDependent: true,
|
||||
expectedOutput: { value: 155.96 },
|
||||
};
|
||||
|
||||
const hasLegacySource = /legacy|php|real|manual|captured from/i.test(fixture.comment || '');
|
||||
assert.ok(hasLegacySource, 'Comment referencing legacy system = VALID provenance');
|
||||
});
|
||||
|
||||
test('fixture with synthetic/baseline comment classified as SUSPECT', () => {
|
||||
const fixture = {
|
||||
legacyId: null,
|
||||
comment: 'Synthetic offline fixture - computed from known algorithm',
|
||||
dbDependent: false,
|
||||
expectedOutput: { value: 1240.68 },
|
||||
};
|
||||
|
||||
const hasSyntheticSource = /synthetic|computed|baseline|generated|captured from engine/i.test(fixture.comment || '');
|
||||
const hasLegacySource = /legacy|php|real output|manual capture/i.test(fixture.comment || '');
|
||||
|
||||
assert.ok(hasSyntheticSource, 'Comment indicating synthetic source detected');
|
||||
assert.ok(!hasLegacySource, 'Should NOT be classified as legacy source');
|
||||
});
|
||||
|
||||
test('fixture with no comment classified as UNKNOWN', () => {
|
||||
const fixture = {
|
||||
expectedOutput: { value: 42 },
|
||||
};
|
||||
|
||||
const hasAnyProvenance = (fixture.comment || '').length > 0;
|
||||
assert.ok(!hasAnyProvenance, 'No comment = UNKNOWN provenance');
|
||||
});
|
||||
});
|
||||
});
|
||||
Reference in New Issue
Block a user