Files
worldmonitor/tests/resilience-construct-invariants.test.mts
Elie Habib df392b0514 feat(resilience): PR 0 — cohort-sanity release-gate harness (#3369)
* feat(resilience): PR 0 — cohort-sanity release-gate harness

Lands the audit infrastructure for the resilience cohort-ranking
structural audit (plan 2026-04-24-002). Release gate, not merge gate:
the audit tells release review what to look at before publishing a
ranking; it does not block a PR.

What's new
- scripts/audit-resilience-cohorts.mjs — Markdown report generator.
  Fetches the live ranking + per-country scores (or reads a fixture
  in offline mode), emits per-cohort per-dimension tables, contribution
  decomposition, saturated / outlier / identical-score flags, and a
  top-N movers comparison vs a baseline snapshot.
- tests/resilience-construct-invariants.test.mts — 12 formula-level
  anchor-value assertions with synthetic inputs. Covers HHI, external
  debt (Greenspan-Guidotti anchor), and sovereign fiscal buffer
  (saturating transform). Tests the MATH, not a country's rank.
- tests/fixtures/resilience-audit-fixture.json — offline fixture that
  mirrors the 2026-04-24 GCC state (KW>QA>AE) so the audit tool can
  be smoke-tested without API-key access.
- docs/methodology/cohort-sanity-release-gate.md — operational doc
  explaining when to run, how to read the report, and the explicit
  anti-pattern note on rank-targeted acceptance criteria.

Verified
- `npx tsx --test tests/resilience-construct-invariants.test.mts` —
  12 pass (HHI, debt, SWF invariants all green against current scorer)
- `npm run test:data` — 6706 pass / 0 fail
- `FIXTURE=tests/fixtures/resilience-audit-fixture.json
   OUT=/tmp/audit.md node scripts/audit-resilience-cohorts.mjs`
  runs to completion and correctly flags:
  (a) coverage-outlier on AE.importConcentration (0.3 vs peers 1.0)
  (b) saturated-high on GCC.externalDebtCoverage (all 6 at 100)
  — the two top cohort-sanity findings from the plan.

Not in this PR
- The live-API baseline snapshot
  (docs/snapshots/resilience-ranking-live-pre-cohort-audit-2026-04-24.json)
  is deferred to a manual release-prep step: run
  `WORLDMONITOR_API_KEY=wm_xxx API_BASE=https://api.worldmonitor.app
   node scripts/freeze-resilience-ranking.mjs` before the first
  methodology PR (PR 1 HHI period widening) so its movers table has
  something to compare against.
- No scorer changes. No cache-prefix bumps. This PR is pure tooling.

* fix(resilience): fail-closed on fetch failures + pillar-combine formula mode

Addresses review P1 + P2 on PR #3369.

P1 — fetch-failure silent-drop.
Per-country score fetches that failed were logged to stderr, silently
stored as null, and then filtered out of cohort tables via
`codes.filter((cc) => scoreMap.get(cc))`. A transient 403/500 on the
very country carrying the ranking anomaly could produce a Markdown
report that looked valid — wrong failure mode for a release gate.

Fix:
- `fetchScoresConcurrent` now tracks failures in a dedicated Map and
  does NOT insert null placeholders; missing cohort members are
  computed against the requested cohort code set.
- The report has a  blocker banner at top AND an always-rendered
  "Fetch failures / missing members" section (shown even when empty,
  so an operator learns to look).
- `STRICT=1` writes the report, then exits code 3 on any fetch
  failure or missing cohort member, code 4 on formula-mode drift,
  code 0 otherwise. Automation can differentiate the two.

P2 — pillar-combine formula mode invalidates contribution rows.
`docs/methodology/cohort-sanity-release-gate.md:63` tells operators
to run this audit before activating `RESILIENCE_PILLAR_COMBINE_ENABLED`,
but the contribution decomposition is a domain-weighted roll-up that
is ONLY valid when `overallScore = sum(domain.score * domain.weight)`.
Once pillar combine is on, `overallScore = penalizedPillarScore(pillars)`
(non-linear in dim scores); decomposition rows become materially
misleading for exactly the release-gate scenario the doc prescribes.

Fix:
- Added `detectFormulaMode(scoreMap)` that takes countries with:
  (a) `sum(domain.weight)` within 0.05 of 1.0 (complete response), AND
  (b) every dim at `coverage ≥ 0.9` (stable share math)
  and compares `|Σ contributions - overallScore|` against
  `CONTRIB_TOLERANCE` (default 1.5). If > 50% of ≥ 3 eligible
  countries drift, pillar combine is flagged.
- Report emits a  blocker banner at top, a "Formula mode" line in
  the header, and a "Formula-mode diagnostic" section with the first
  three offenders. Under `STRICT=1` exits code 4.
- Methodology doc updated: new "Fail-closed semantics" section,
  "Formula mode" operator guide, ENV table entries for STRICT +
  CONTRIB_TOLERANCE.

Verified:
- `tests/audit-cohort-formula-detection.test.mts` (NEW) — 3 child-process
  smoke tests: missing-members banner + STRICT exit 3, all-clear exit 0,
  pillar-mode banner + STRICT exit 4. All pass.
- `npx tsx --test tests/resilience-construct-invariants.test.mts
   tests/audit-cohort-formula-detection.test.mts` — 15 pass / 0 fail
- `npm run test:data` — 6709 pass / 0 fail
- `npm run typecheck` / `typecheck:api` — green
- `npm run lint` / `lint:md` — no warnings on new / changed files
  (refactor split buildReport complexity from 51 → under 50 by
  extracting `renderCohortSection` + `renderDimCell`)
- Fixture smoke: AE.importConcentration coverage-outlier and
  GCC.externalDebtCoverage saturated-high flags still fire correctly.

* fix(resilience): PR 0 review — fixture-mode source label, try/catch country-names, ASCII minus

Addresses 3 P2 Greptile findings on #3369:

1. **Misleading Source: line in fixture mode.** `FIXTURE_PATH` sets
   `API_BASE=''`, so the report header showed a bare "/api/..." path that
   never resolved — making a fixture run visually indistinguishable from
   a live run. Now surfaces `Source: fixture://<path>` in fixture mode.

2. **`loadCountryNameMap` crashes without useful diagnostics.** A missing
   or unparseable `shared/country-names.json` produced a raw unhandled
   rejection. Now the read and the parse are each wrapped in their own
   try/catch; on either failure the script logs a developer-friendly
   warning and falls back to ISO-2 codes (report shows "AE" instead of
   "Uae"). Keeps the audit operable in CI-offline scenarios.

3. **Unicode minus `−` (U+2212) instead of ASCII `-` in `fmtDelta`.**
   Downstream operators diff / grep / CSV-pipe the report; the Unicode
   minus breaks byte-level text tooling. Replaced with ASCII hyphen-
   minus. Left the U+2212 in the formula-mode diagnostic prose
   (`|Σ contributions − overallScore|`) where it's mathematical notation,
   not data.

Verified

- `npx tsx --test tests/audit-cohort-formula-detection.test.mts tests/resilience-construct-invariants.test.mts` — 15 pass / 0 fail
- Fixture-mode run produces `Source: fixture://tests/fixtures/...`
- Movers-table negative deltas now use ASCII `-`
2026-04-24 18:13:22 +04:00

162 lines
6.6 KiB
TypeScript
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
// Construct invariants — formula-level assertions with synthetic inputs.
//
// Purpose. Complement `resilience-dimension-monotonicity.test.mts` (which
// pins direction) with precise ANCHOR-VALUE checks. These tests fail when
// the scoring FORMULA breaks, not when a country's RANK changes. They are
// deliberately country-identity-free so the audit gate (see
// `docs/methodology/cohort-sanity-release-gate.md`) does not collapse into
// an outcome-seeking "ENTITY A must > ENTITY B" assertion — that is the
// anti-pattern the cohort-sanity skill explicitly warns against.
//
// Plan reference. PR 0 from
// `docs/plans/2026-04-24-002-fix-resilience-cohort-ranking-structural-audit-plan.md`
// (§"PR 0 — Release-gate audit harness"):
// > `score(HHI=0.05) > score(HHI=0.20)`
// > `score(debtToReservesRatio=0) > score(ratio=1) > score(ratio=2)`
// > `score(effMo=12) > score(effMo=3)`
// > `score(lowCarbonShare=80, fossilImportDep=0) > score(lowCarbonShare=0, fossilImportDep=100)`
//
// The tests are organised by scorer and include both the monotonicity
// claim and the precise anchor value where the construct fixes one
// (Greenspan-Guidotti = 50; saturating transform at effMo=12 = ~63).
// An anchor drift > 1 point is an invariant break: investigate before
// editing the test.
import assert from 'node:assert/strict';
import { describe, it } from 'node:test';
import {
scoreImportConcentration,
scoreExternalDebtCoverage,
scoreSovereignFiscalBuffer,
type ResilienceSeedReader,
} from '../server/worldmonitor/resilience/v1/_dimension-scorers.ts';
const TEST_ISO2 = 'XX';
function makeReader(keyValueMap: Record<string, unknown>): ResilienceSeedReader {
return async (key: string) => keyValueMap[key] ?? null;
}
describe('construct invariants — importConcentration', () => {
async function scoreWith(hhi: number) {
return scoreImportConcentration(TEST_ISO2, makeReader({
'resilience:recovery:import-hhi:v1': { countries: { [TEST_ISO2]: { hhi } } },
}));
}
it('score(HHI=0.05) > score(HHI=0.20)', async () => {
const diversified = await scoreWith(0.05);
const concentrated = await scoreWith(0.20);
assert.ok(
diversified.score > concentrated.score,
`HHI 0.05→0.20 should lower score; got ${diversified.score}${concentrated.score}`,
);
});
it('HHI=0 anchors at score 100 (no-concentration pole)', async () => {
const r = await scoreWith(0);
assert.ok(Math.abs(r.score - 100) < 1, `expected ~100 at HHI=0, got ${r.score}`);
});
it('HHI=0.5 (fully concentrated under current 0..5000 goalpost) anchors at score 0', async () => {
// Current scorer: hhi×10000 normalised against (0, 5000). 0.5×10000 = 5000 → 0.
const r = await scoreWith(0.5);
assert.ok(Math.abs(r.score - 0) < 1, `expected ~0 at HHI=0.5 under current goalpost, got ${r.score}`);
});
});
describe('construct invariants — externalDebtCoverage (Greenspan-Guidotti anchor)', () => {
async function scoreWith(debtToReservesRatio: number) {
return scoreExternalDebtCoverage(TEST_ISO2, makeReader({
'resilience:recovery:external-debt:v1': {
countries: { [TEST_ISO2]: { debtToReservesRatio } },
},
}));
}
it('ratio=0 → score 100 (zero-rollover-exposure pole)', async () => {
const r = await scoreWith(0);
assert.ok(Math.abs(r.score - 100) < 1, `expected ~100 at ratio=0, got ${r.score}`);
});
it('ratio=1.0 → score 50 (Greenspan-Guidotti threshold)', async () => {
const r = await scoreWith(1.0);
assert.ok(
Math.abs(r.score - 50) < 1,
`expected ~50 at ratio=1.0 under Greenspan-Guidotti anchor (worst=2), got ${r.score}`,
);
});
it('ratio=2.0 → score 0 (acute rollover-shock pole)', async () => {
const r = await scoreWith(2.0);
assert.ok(Math.abs(r.score - 0) < 1, `expected ~0 at ratio=2.0, got ${r.score}`);
});
it('monotonic: score(ratio=0) > score(ratio=1) > score(ratio=2)', async () => {
const [r0, r1, r2] = await Promise.all([scoreWith(0), scoreWith(1), scoreWith(2)]);
assert.ok(r0.score > r1.score && r1.score > r2.score,
`expected strictly decreasing; got ${r0.score}, ${r1.score}, ${r2.score}`);
});
});
describe('construct invariants — sovereignFiscalBuffer (saturating transform)', () => {
// Saturating transform per scorer (line ~1687):
// score = 100 * (1 - exp(-em / 12))
// Reference values (not tuning points — these are what the formula SHOULD
// produce if no one has silently redefined it):
// em=0 → 0
// em=3 → 100*(1-e^-0.25) ≈ 22.1
// em=12 → 100*(1-e^-1) ≈ 63.2
// em=24 → 100*(1-e^-2) ≈ 86.5
// em→∞ → 100
async function scoreWithEm(em: number) {
return scoreSovereignFiscalBuffer(TEST_ISO2, makeReader({
'resilience:recovery:sovereign-wealth:v1': {
countries: { [TEST_ISO2]: { totalEffectiveMonths: em, completeness: 1.0 } },
},
}));
}
it('em=0 → score 0 (no SWF buffer)', async () => {
const r = await scoreWithEm(0);
assert.ok(Math.abs(r.score - 0) < 1, `expected ~0 at em=0, got ${r.score}`);
});
it('em=12 → score ≈ 63 (one-year saturating anchor)', async () => {
const r = await scoreWithEm(12);
const expected = 100 * (1 - Math.exp(-1));
assert.ok(
Math.abs(r.score - expected) < 1,
`expected ~${expected.toFixed(1)} at em=12, got ${r.score}`,
);
});
it('em=24 → score ≈ 86 (two-year saturating anchor)', async () => {
const r = await scoreWithEm(24);
const expected = 100 * (1 - Math.exp(-2));
assert.ok(
Math.abs(r.score - expected) < 1,
`expected ~${expected.toFixed(1)} at em=24, got ${r.score}`,
);
});
it('monotonic: score(em=3) < score(em=12) < score(em=24)', async () => {
const [r3, r12, r24] = await Promise.all([scoreWithEm(3), scoreWithEm(12), scoreWithEm(24)]);
assert.ok(r3.score < r12.score && r12.score < r24.score,
`expected strictly increasing; got em=3:${r3.score}, em=12:${r12.score}, em=24:${r24.score}`);
});
it('country not in manifest → score 0, coverage 1.0 (legitimate zero, not imputed)', async () => {
// Seed present but country absent = "no SWF" (legitimate structural zero).
// This is distinct from "seed missing entirely" which returns IMPUTE.
const r = await scoreSovereignFiscalBuffer(TEST_ISO2, makeReader({
'resilience:recovery:sovereign-wealth:v1': { countries: {} },
}));
assert.equal(r.score, 0, `expected 0 when country has no manifest entry, got ${r.score}`);
assert.equal(r.coverage, 1.0, `expected coverage=1.0 (legitimate observation), got ${r.coverage}`);
assert.equal(r.imputationClass, null, `expected null imputation (not imputed), got ${r.imputationClass}`);
});
});