test(resilience): address PR #2944 review (exhaustive taxonomy sanity check)

Addresses the P2 comment from Greptile on PR #2944: > "The invariant is verified only for IMPUTATION.crisis_monitoring_absent > vs IMPUTATION.curated_list_absent, so a future IMPUTE entry that > violates the ordering (e.g. a stable-absence override with a score > below an unmonitored one) would not be caught here. Extending the > loop to group all stable-absence and unmonitored entries from both > tables and assert the min stable-absence > max unmonitored would > make the guard exhaustive." Fair. The earlier test only pinned the two base entries, so a regression in a per-metric IMPUTE override (the exact layer most likely to drift over time) would have slipped through CI. This commit rewrites the semantic-sanity test to: 1. Collect every entry from both IMPUTATION and IMPUTE into a single labeled list. 2. Partition by imputationClass. 3. Assert min stable-absence score > max unmonitored score. 4. Assert min stable-absence certaintyCoverage > max unmonitored certaintyCoverage. 5. Report the offending entries in the failure message so a reviewer who hits the failure immediately sees which entry broke the invariant and from which table. Under the current values the invariant holds: - stable-absence scores: IMPUTATION.crisis_monitoring_absent=85, IMPUTE.ipcFood=88, IMPUTE.unhcrDisplacement=85. - unmonitored scores: IMPUTATION.curated_list_absent=50, IMPUTE.wtoData=60, IMPUTE.bisEer=50, IMPUTE.bisCredit=50. - min stable-absence = 85 > max unmonitored = 60, so the check passes with a 25-point gap. Testing: - npx tsx --test tests/resilience-dimension-scorers.test.mts: 51/51 pass (46 existing + 5 taxonomy tests, including the rewritten semantic-sanity check) - npm run typecheck: clean Generated with Claude Opus 4.6 (1M context) via Claude Code + Compound Engineering v2.49.0 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-25 17:14:57 +02:00 · 2026-04-11 14:26:33 +04:00
parent 61ca872b9a
commit 0f3d05ac04
1 changed files with 36 additions and 7 deletions
--- a/tests/resilience-dimension-scorers.test.mts
+++ b/tests/resilience-dimension-scorers.test.mts
@@ -755,18 +755,47 @@ describe('resilience imputation taxonomy (T1.7)', () => {
    }
  });

-  it('stable-absence entries score higher than unmonitored (semantic sanity)', () => {
+  it('stable-absence entries score higher than unmonitored, across BOTH tables (semantic sanity)', () => {
    // stable-absence = strong positive signal (feed is comprehensive,
    // nothing happened). unmonitored = we do not know, penalized.
-    // If this assertion ever fails, the semantic meaning of the classes
-    // has drifted and the taxonomy needs to be re-argued.
+    // The invariant must hold across every entry in both IMPUTATION and
+    // IMPUTE, otherwise a per-metric override can silently break the
+    // ordering (e.g. a `stable-absence` override with a score lower than
+    // an `unmonitored` entry would pass a tables-only check but violate
+    // the taxonomy's semantic meaning).
+    //
+    // Raised in review of PR #2944: the earlier version of this test
+    // only checked the two base entries in IMPUTATION and would have
+    // missed a regression in an IMPUTE override.
+    const allEntries = [
+      ...Object.entries(IMPUTATION).map(([k, v]) => ({ label: `IMPUTATION.${k}`, entry: v })),
+      ...Object.entries(IMPUTE).map(([k, v]) => ({ label: `IMPUTE.${k}`, entry: v })),
+    ];
+
+    const stableAbsence = allEntries.filter((e) => e.entry.imputationClass === 'stable-absence');
+    const unmonitored = allEntries.filter((e) => e.entry.imputationClass === 'unmonitored');
+
+    assert.ok(stableAbsence.length > 0, 'expected at least one stable-absence entry across both tables');
+    assert.ok(unmonitored.length > 0, 'expected at least one unmonitored entry across both tables');
+
+    const minStableScore = Math.min(...stableAbsence.map((e) => e.entry.score));
+    const maxUnmonitoredScore = Math.max(...unmonitored.map((e) => e.entry.score));
    assert.ok(
-      IMPUTATION.crisis_monitoring_absent.score > IMPUTATION.curated_list_absent.score,
-      `stable-absence score (${IMPUTATION.crisis_monitoring_absent.score}) should be higher than unmonitored (${IMPUTATION.curated_list_absent.score})`,
+      minStableScore > maxUnmonitoredScore,
+      `every stable-absence entry must score higher than every unmonitored entry. ` +
+      `min stable-absence score = ${minStableScore}, max unmonitored score = ${maxUnmonitoredScore}. ` +
+      `stable-absence entries: ${stableAbsence.map((e) => `${e.label}=${e.entry.score}`).join(', ')}. ` +
+      `unmonitored entries: ${unmonitored.map((e) => `${e.label}=${e.entry.score}`).join(', ')}.`,
    );
+
+    const minStableCertainty = Math.min(...stableAbsence.map((e) => e.entry.certaintyCoverage));
+    const maxUnmonitoredCertainty = Math.max(...unmonitored.map((e) => e.entry.certaintyCoverage));
    assert.ok(
-      IMPUTATION.crisis_monitoring_absent.certaintyCoverage > IMPUTATION.curated_list_absent.certaintyCoverage,
-      `stable-absence certainty (${IMPUTATION.crisis_monitoring_absent.certaintyCoverage}) should be higher than unmonitored (${IMPUTATION.curated_list_absent.certaintyCoverage})`,
+      minStableCertainty > maxUnmonitoredCertainty,
+      `every stable-absence entry must have higher certaintyCoverage than every unmonitored entry. ` +
+      `min stable-absence certainty = ${minStableCertainty}, max unmonitored certainty = ${maxUnmonitoredCertainty}. ` +
+      `stable-absence entries: ${stableAbsence.map((e) => `${e.label}=${e.entry.certaintyCoverage}`).join(', ')}. ` +
+      `unmonitored entries: ${unmonitored.map((e) => `${e.label}=${e.entry.certaintyCoverage}`).join(', ')}.`,
    );
  });
 });