mirror of
https://github.com/koala73/worldmonitor.git
synced 2026-04-25 17:14:57 +02:00
* feat(resilience): PR 0 — cohort-sanity release-gate harness Lands the audit infrastructure for the resilience cohort-ranking structural audit (plan 2026-04-24-002). Release gate, not merge gate: the audit tells release review what to look at before publishing a ranking; it does not block a PR. What's new - scripts/audit-resilience-cohorts.mjs — Markdown report generator. Fetches the live ranking + per-country scores (or reads a fixture in offline mode), emits per-cohort per-dimension tables, contribution decomposition, saturated / outlier / identical-score flags, and a top-N movers comparison vs a baseline snapshot. - tests/resilience-construct-invariants.test.mts — 12 formula-level anchor-value assertions with synthetic inputs. Covers HHI, external debt (Greenspan-Guidotti anchor), and sovereign fiscal buffer (saturating transform). Tests the MATH, not a country's rank. - tests/fixtures/resilience-audit-fixture.json — offline fixture that mirrors the 2026-04-24 GCC state (KW>QA>AE) so the audit tool can be smoke-tested without API-key access. - docs/methodology/cohort-sanity-release-gate.md — operational doc explaining when to run, how to read the report, and the explicit anti-pattern note on rank-targeted acceptance criteria. Verified - `npx tsx --test tests/resilience-construct-invariants.test.mts` — 12 pass (HHI, debt, SWF invariants all green against current scorer) - `npm run test:data` — 6706 pass / 0 fail - `FIXTURE=tests/fixtures/resilience-audit-fixture.json OUT=/tmp/audit.md node scripts/audit-resilience-cohorts.mjs` runs to completion and correctly flags: (a) coverage-outlier on AE.importConcentration (0.3 vs peers 1.0) (b) saturated-high on GCC.externalDebtCoverage (all 6 at 100) — the two top cohort-sanity findings from the plan. Not in this PR - The live-API baseline snapshot (docs/snapshots/resilience-ranking-live-pre-cohort-audit-2026-04-24.json) is deferred to a manual release-prep step: run `WORLDMONITOR_API_KEY=wm_xxx API_BASE=https://api.worldmonitor.app node scripts/freeze-resilience-ranking.mjs` before the first methodology PR (PR 1 HHI period widening) so its movers table has something to compare against. - No scorer changes. No cache-prefix bumps. This PR is pure tooling. * fix(resilience): fail-closed on fetch failures + pillar-combine formula mode Addresses review P1 + P2 on PR #3369. P1 — fetch-failure silent-drop. Per-country score fetches that failed were logged to stderr, silently stored as null, and then filtered out of cohort tables via `codes.filter((cc) => scoreMap.get(cc))`. A transient 403/500 on the very country carrying the ranking anomaly could produce a Markdown report that looked valid — wrong failure mode for a release gate. Fix: - `fetchScoresConcurrent` now tracks failures in a dedicated Map and does NOT insert null placeholders; missing cohort members are computed against the requested cohort code set. - The report has a ⛔ blocker banner at top AND an always-rendered "Fetch failures / missing members" section (shown even when empty, so an operator learns to look). - `STRICT=1` writes the report, then exits code 3 on any fetch failure or missing cohort member, code 4 on formula-mode drift, code 0 otherwise. Automation can differentiate the two. P2 — pillar-combine formula mode invalidates contribution rows. `docs/methodology/cohort-sanity-release-gate.md:63` tells operators to run this audit before activating `RESILIENCE_PILLAR_COMBINE_ENABLED`, but the contribution decomposition is a domain-weighted roll-up that is ONLY valid when `overallScore = sum(domain.score * domain.weight)`. Once pillar combine is on, `overallScore = penalizedPillarScore(pillars)` (non-linear in dim scores); decomposition rows become materially misleading for exactly the release-gate scenario the doc prescribes. Fix: - Added `detectFormulaMode(scoreMap)` that takes countries with: (a) `sum(domain.weight)` within 0.05 of 1.0 (complete response), AND (b) every dim at `coverage ≥ 0.9` (stable share math) and compares `|Σ contributions - overallScore|` against `CONTRIB_TOLERANCE` (default 1.5). If > 50% of ≥ 3 eligible countries drift, pillar combine is flagged. - Report emits a ⛔ blocker banner at top, a "Formula mode" line in the header, and a "Formula-mode diagnostic" section with the first three offenders. Under `STRICT=1` exits code 4. - Methodology doc updated: new "Fail-closed semantics" section, "Formula mode" operator guide, ENV table entries for STRICT + CONTRIB_TOLERANCE. Verified: - `tests/audit-cohort-formula-detection.test.mts` (NEW) — 3 child-process smoke tests: missing-members banner + STRICT exit 3, all-clear exit 0, pillar-mode banner + STRICT exit 4. All pass. - `npx tsx --test tests/resilience-construct-invariants.test.mts tests/audit-cohort-formula-detection.test.mts` — 15 pass / 0 fail - `npm run test:data` — 6709 pass / 0 fail - `npm run typecheck` / `typecheck:api` — green - `npm run lint` / `lint:md` — no warnings on new / changed files (refactor split buildReport complexity from 51 → under 50 by extracting `renderCohortSection` + `renderDimCell`) - Fixture smoke: AE.importConcentration coverage-outlier and GCC.externalDebtCoverage saturated-high flags still fire correctly. * fix(resilience): PR 0 review — fixture-mode source label, try/catch country-names, ASCII minus Addresses 3 P2 Greptile findings on #3369: 1. **Misleading Source: line in fixture mode.** `FIXTURE_PATH` sets `API_BASE=''`, so the report header showed a bare "/api/..." path that never resolved — making a fixture run visually indistinguishable from a live run. Now surfaces `Source: fixture://<path>` in fixture mode. 2. **`loadCountryNameMap` crashes without useful diagnostics.** A missing or unparseable `shared/country-names.json` produced a raw unhandled rejection. Now the read and the parse are each wrapped in their own try/catch; on either failure the script logs a developer-friendly warning and falls back to ISO-2 codes (report shows "AE" instead of "Uae"). Keeps the audit operable in CI-offline scenarios. 3. **Unicode minus `−` (U+2212) instead of ASCII `-` in `fmtDelta`.** Downstream operators diff / grep / CSV-pipe the report; the Unicode minus breaks byte-level text tooling. Replaced with ASCII hyphen- minus. Left the U+2212 in the formula-mode diagnostic prose (`|Σ contributions − overallScore|`) where it's mathematical notation, not data. Verified - `npx tsx --test tests/audit-cohort-formula-detection.test.mts tests/resilience-construct-invariants.test.mts` — 15 pass / 0 fail - Fixture-mode run produces `Source: fixture://tests/fixtures/...` - Movers-table negative deltas now use ASCII `-`
162 lines
6.6 KiB
TypeScript
162 lines
6.6 KiB
TypeScript
// Construct invariants — formula-level assertions with synthetic inputs.
|
||
//
|
||
// Purpose. Complement `resilience-dimension-monotonicity.test.mts` (which
|
||
// pins direction) with precise ANCHOR-VALUE checks. These tests fail when
|
||
// the scoring FORMULA breaks, not when a country's RANK changes. They are
|
||
// deliberately country-identity-free so the audit gate (see
|
||
// `docs/methodology/cohort-sanity-release-gate.md`) does not collapse into
|
||
// an outcome-seeking "ENTITY A must > ENTITY B" assertion — that is the
|
||
// anti-pattern the cohort-sanity skill explicitly warns against.
|
||
//
|
||
// Plan reference. PR 0 from
|
||
// `docs/plans/2026-04-24-002-fix-resilience-cohort-ranking-structural-audit-plan.md`
|
||
// (§"PR 0 — Release-gate audit harness"):
|
||
// > `score(HHI=0.05) > score(HHI=0.20)`
|
||
// > `score(debtToReservesRatio=0) > score(ratio=1) > score(ratio=2)`
|
||
// > `score(effMo=12) > score(effMo=3)`
|
||
// > `score(lowCarbonShare=80, fossilImportDep=0) > score(lowCarbonShare=0, fossilImportDep=100)`
|
||
//
|
||
// The tests are organised by scorer and include both the monotonicity
|
||
// claim and the precise anchor value where the construct fixes one
|
||
// (Greenspan-Guidotti = 50; saturating transform at effMo=12 = ~63).
|
||
// An anchor drift > 1 point is an invariant break: investigate before
|
||
// editing the test.
|
||
|
||
import assert from 'node:assert/strict';
|
||
import { describe, it } from 'node:test';
|
||
|
||
import {
|
||
scoreImportConcentration,
|
||
scoreExternalDebtCoverage,
|
||
scoreSovereignFiscalBuffer,
|
||
type ResilienceSeedReader,
|
||
} from '../server/worldmonitor/resilience/v1/_dimension-scorers.ts';
|
||
|
||
const TEST_ISO2 = 'XX';
|
||
|
||
function makeReader(keyValueMap: Record<string, unknown>): ResilienceSeedReader {
|
||
return async (key: string) => keyValueMap[key] ?? null;
|
||
}
|
||
|
||
describe('construct invariants — importConcentration', () => {
|
||
async function scoreWith(hhi: number) {
|
||
return scoreImportConcentration(TEST_ISO2, makeReader({
|
||
'resilience:recovery:import-hhi:v1': { countries: { [TEST_ISO2]: { hhi } } },
|
||
}));
|
||
}
|
||
|
||
it('score(HHI=0.05) > score(HHI=0.20)', async () => {
|
||
const diversified = await scoreWith(0.05);
|
||
const concentrated = await scoreWith(0.20);
|
||
assert.ok(
|
||
diversified.score > concentrated.score,
|
||
`HHI 0.05→0.20 should lower score; got ${diversified.score} → ${concentrated.score}`,
|
||
);
|
||
});
|
||
|
||
it('HHI=0 anchors at score 100 (no-concentration pole)', async () => {
|
||
const r = await scoreWith(0);
|
||
assert.ok(Math.abs(r.score - 100) < 1, `expected ~100 at HHI=0, got ${r.score}`);
|
||
});
|
||
|
||
it('HHI=0.5 (fully concentrated under current 0..5000 goalpost) anchors at score 0', async () => {
|
||
// Current scorer: hhi×10000 normalised against (0, 5000). 0.5×10000 = 5000 → 0.
|
||
const r = await scoreWith(0.5);
|
||
assert.ok(Math.abs(r.score - 0) < 1, `expected ~0 at HHI=0.5 under current goalpost, got ${r.score}`);
|
||
});
|
||
});
|
||
|
||
describe('construct invariants — externalDebtCoverage (Greenspan-Guidotti anchor)', () => {
|
||
async function scoreWith(debtToReservesRatio: number) {
|
||
return scoreExternalDebtCoverage(TEST_ISO2, makeReader({
|
||
'resilience:recovery:external-debt:v1': {
|
||
countries: { [TEST_ISO2]: { debtToReservesRatio } },
|
||
},
|
||
}));
|
||
}
|
||
|
||
it('ratio=0 → score 100 (zero-rollover-exposure pole)', async () => {
|
||
const r = await scoreWith(0);
|
||
assert.ok(Math.abs(r.score - 100) < 1, `expected ~100 at ratio=0, got ${r.score}`);
|
||
});
|
||
|
||
it('ratio=1.0 → score 50 (Greenspan-Guidotti threshold)', async () => {
|
||
const r = await scoreWith(1.0);
|
||
assert.ok(
|
||
Math.abs(r.score - 50) < 1,
|
||
`expected ~50 at ratio=1.0 under Greenspan-Guidotti anchor (worst=2), got ${r.score}`,
|
||
);
|
||
});
|
||
|
||
it('ratio=2.0 → score 0 (acute rollover-shock pole)', async () => {
|
||
const r = await scoreWith(2.0);
|
||
assert.ok(Math.abs(r.score - 0) < 1, `expected ~0 at ratio=2.0, got ${r.score}`);
|
||
});
|
||
|
||
it('monotonic: score(ratio=0) > score(ratio=1) > score(ratio=2)', async () => {
|
||
const [r0, r1, r2] = await Promise.all([scoreWith(0), scoreWith(1), scoreWith(2)]);
|
||
assert.ok(r0.score > r1.score && r1.score > r2.score,
|
||
`expected strictly decreasing; got ${r0.score}, ${r1.score}, ${r2.score}`);
|
||
});
|
||
});
|
||
|
||
describe('construct invariants — sovereignFiscalBuffer (saturating transform)', () => {
|
||
// Saturating transform per scorer (line ~1687):
|
||
// score = 100 * (1 - exp(-em / 12))
|
||
// Reference values (not tuning points — these are what the formula SHOULD
|
||
// produce if no one has silently redefined it):
|
||
// em=0 → 0
|
||
// em=3 → 100*(1-e^-0.25) ≈ 22.1
|
||
// em=12 → 100*(1-e^-1) ≈ 63.2
|
||
// em=24 → 100*(1-e^-2) ≈ 86.5
|
||
// em→∞ → 100
|
||
|
||
async function scoreWithEm(em: number) {
|
||
return scoreSovereignFiscalBuffer(TEST_ISO2, makeReader({
|
||
'resilience:recovery:sovereign-wealth:v1': {
|
||
countries: { [TEST_ISO2]: { totalEffectiveMonths: em, completeness: 1.0 } },
|
||
},
|
||
}));
|
||
}
|
||
|
||
it('em=0 → score 0 (no SWF buffer)', async () => {
|
||
const r = await scoreWithEm(0);
|
||
assert.ok(Math.abs(r.score - 0) < 1, `expected ~0 at em=0, got ${r.score}`);
|
||
});
|
||
|
||
it('em=12 → score ≈ 63 (one-year saturating anchor)', async () => {
|
||
const r = await scoreWithEm(12);
|
||
const expected = 100 * (1 - Math.exp(-1));
|
||
assert.ok(
|
||
Math.abs(r.score - expected) < 1,
|
||
`expected ~${expected.toFixed(1)} at em=12, got ${r.score}`,
|
||
);
|
||
});
|
||
|
||
it('em=24 → score ≈ 86 (two-year saturating anchor)', async () => {
|
||
const r = await scoreWithEm(24);
|
||
const expected = 100 * (1 - Math.exp(-2));
|
||
assert.ok(
|
||
Math.abs(r.score - expected) < 1,
|
||
`expected ~${expected.toFixed(1)} at em=24, got ${r.score}`,
|
||
);
|
||
});
|
||
|
||
it('monotonic: score(em=3) < score(em=12) < score(em=24)', async () => {
|
||
const [r3, r12, r24] = await Promise.all([scoreWithEm(3), scoreWithEm(12), scoreWithEm(24)]);
|
||
assert.ok(r3.score < r12.score && r12.score < r24.score,
|
||
`expected strictly increasing; got em=3:${r3.score}, em=12:${r12.score}, em=24:${r24.score}`);
|
||
});
|
||
|
||
it('country not in manifest → score 0, coverage 1.0 (legitimate zero, not imputed)', async () => {
|
||
// Seed present but country absent = "no SWF" (legitimate structural zero).
|
||
// This is distinct from "seed missing entirely" which returns IMPUTE.
|
||
const r = await scoreSovereignFiscalBuffer(TEST_ISO2, makeReader({
|
||
'resilience:recovery:sovereign-wealth:v1': { countries: {} },
|
||
}));
|
||
assert.equal(r.score, 0, `expected 0 when country has no manifest entry, got ${r.score}`);
|
||
assert.equal(r.coverage, 1.0, `expected coverage=1.0 (legitimate observation), got ${r.coverage}`);
|
||
assert.equal(r.imputationClass, null, `expected null imputation (not imputed), got ${r.imputationClass}`);
|
||
});
|
||
});
|