Files
worldmonitor/tests/audit-cohort-formula-detection.test.mts
Elie Habib df392b0514 feat(resilience): PR 0 — cohort-sanity release-gate harness (#3369)
* feat(resilience): PR 0 — cohort-sanity release-gate harness

Lands the audit infrastructure for the resilience cohort-ranking
structural audit (plan 2026-04-24-002). Release gate, not merge gate:
the audit tells release review what to look at before publishing a
ranking; it does not block a PR.

What's new
- scripts/audit-resilience-cohorts.mjs — Markdown report generator.
  Fetches the live ranking + per-country scores (or reads a fixture
  in offline mode), emits per-cohort per-dimension tables, contribution
  decomposition, saturated / outlier / identical-score flags, and a
  top-N movers comparison vs a baseline snapshot.
- tests/resilience-construct-invariants.test.mts — 12 formula-level
  anchor-value assertions with synthetic inputs. Covers HHI, external
  debt (Greenspan-Guidotti anchor), and sovereign fiscal buffer
  (saturating transform). Tests the MATH, not a country's rank.
- tests/fixtures/resilience-audit-fixture.json — offline fixture that
  mirrors the 2026-04-24 GCC state (KW>QA>AE) so the audit tool can
  be smoke-tested without API-key access.
- docs/methodology/cohort-sanity-release-gate.md — operational doc
  explaining when to run, how to read the report, and the explicit
  anti-pattern note on rank-targeted acceptance criteria.

Verified
- `npx tsx --test tests/resilience-construct-invariants.test.mts` —
  12 pass (HHI, debt, SWF invariants all green against current scorer)
- `npm run test:data` — 6706 pass / 0 fail
- `FIXTURE=tests/fixtures/resilience-audit-fixture.json
   OUT=/tmp/audit.md node scripts/audit-resilience-cohorts.mjs`
  runs to completion and correctly flags:
  (a) coverage-outlier on AE.importConcentration (0.3 vs peers 1.0)
  (b) saturated-high on GCC.externalDebtCoverage (all 6 at 100)
  — the two top cohort-sanity findings from the plan.

Not in this PR
- The live-API baseline snapshot
  (docs/snapshots/resilience-ranking-live-pre-cohort-audit-2026-04-24.json)
  is deferred to a manual release-prep step: run
  `WORLDMONITOR_API_KEY=wm_xxx API_BASE=https://api.worldmonitor.app
   node scripts/freeze-resilience-ranking.mjs` before the first
  methodology PR (PR 1 HHI period widening) so its movers table has
  something to compare against.
- No scorer changes. No cache-prefix bumps. This PR is pure tooling.

* fix(resilience): fail-closed on fetch failures + pillar-combine formula mode

Addresses review P1 + P2 on PR #3369.

P1 — fetch-failure silent-drop.
Per-country score fetches that failed were logged to stderr, silently
stored as null, and then filtered out of cohort tables via
`codes.filter((cc) => scoreMap.get(cc))`. A transient 403/500 on the
very country carrying the ranking anomaly could produce a Markdown
report that looked valid — wrong failure mode for a release gate.

Fix:
- `fetchScoresConcurrent` now tracks failures in a dedicated Map and
  does NOT insert null placeholders; missing cohort members are
  computed against the requested cohort code set.
- The report has a  blocker banner at top AND an always-rendered
  "Fetch failures / missing members" section (shown even when empty,
  so an operator learns to look).
- `STRICT=1` writes the report, then exits code 3 on any fetch
  failure or missing cohort member, code 4 on formula-mode drift,
  code 0 otherwise. Automation can differentiate the two.

P2 — pillar-combine formula mode invalidates contribution rows.
`docs/methodology/cohort-sanity-release-gate.md:63` tells operators
to run this audit before activating `RESILIENCE_PILLAR_COMBINE_ENABLED`,
but the contribution decomposition is a domain-weighted roll-up that
is ONLY valid when `overallScore = sum(domain.score * domain.weight)`.
Once pillar combine is on, `overallScore = penalizedPillarScore(pillars)`
(non-linear in dim scores); decomposition rows become materially
misleading for exactly the release-gate scenario the doc prescribes.

Fix:
- Added `detectFormulaMode(scoreMap)` that takes countries with:
  (a) `sum(domain.weight)` within 0.05 of 1.0 (complete response), AND
  (b) every dim at `coverage ≥ 0.9` (stable share math)
  and compares `|Σ contributions - overallScore|` against
  `CONTRIB_TOLERANCE` (default 1.5). If > 50% of ≥ 3 eligible
  countries drift, pillar combine is flagged.
- Report emits a  blocker banner at top, a "Formula mode" line in
  the header, and a "Formula-mode diagnostic" section with the first
  three offenders. Under `STRICT=1` exits code 4.
- Methodology doc updated: new "Fail-closed semantics" section,
  "Formula mode" operator guide, ENV table entries for STRICT +
  CONTRIB_TOLERANCE.

Verified:
- `tests/audit-cohort-formula-detection.test.mts` (NEW) — 3 child-process
  smoke tests: missing-members banner + STRICT exit 3, all-clear exit 0,
  pillar-mode banner + STRICT exit 4. All pass.
- `npx tsx --test tests/resilience-construct-invariants.test.mts
   tests/audit-cohort-formula-detection.test.mts` — 15 pass / 0 fail
- `npm run test:data` — 6709 pass / 0 fail
- `npm run typecheck` / `typecheck:api` — green
- `npm run lint` / `lint:md` — no warnings on new / changed files
  (refactor split buildReport complexity from 51 → under 50 by
  extracting `renderCohortSection` + `renderDimCell`)
- Fixture smoke: AE.importConcentration coverage-outlier and
  GCC.externalDebtCoverage saturated-high flags still fire correctly.

* fix(resilience): PR 0 review — fixture-mode source label, try/catch country-names, ASCII minus

Addresses 3 P2 Greptile findings on #3369:

1. **Misleading Source: line in fixture mode.** `FIXTURE_PATH` sets
   `API_BASE=''`, so the report header showed a bare "/api/..." path that
   never resolved — making a fixture run visually indistinguishable from
   a live run. Now surfaces `Source: fixture://<path>` in fixture mode.

2. **`loadCountryNameMap` crashes without useful diagnostics.** A missing
   or unparseable `shared/country-names.json` produced a raw unhandled
   rejection. Now the read and the parse are each wrapped in their own
   try/catch; on either failure the script logs a developer-friendly
   warning and falls back to ISO-2 codes (report shows "AE" instead of
   "Uae"). Keeps the audit operable in CI-offline scenarios.

3. **Unicode minus `−` (U+2212) instead of ASCII `-` in `fmtDelta`.**
   Downstream operators diff / grep / CSV-pipe the report; the Unicode
   minus breaks byte-level text tooling. Replaced with ASCII hyphen-
   minus. Left the U+2212 in the formula-mode diagnostic prose
   (`|Σ contributions − overallScore|`) where it's mathematical notation,
   not data.

Verified

- `npx tsx --test tests/audit-cohort-formula-detection.test.mts tests/resilience-construct-invariants.test.mts` — 15 pass / 0 fail
- Fixture-mode run produces `Source: fixture://tests/fixtures/...`
- Movers-table negative deltas now use ASCII `-`
2026-04-24 18:13:22 +04:00

167 lines
8.1 KiB
TypeScript

// Smoke-tests for the fail-closed behaviour of
// `scripts/audit-resilience-cohorts.mjs`. Verifies:
// (1) Missing cohort members produce a ⛔ banner at report top
// and a dedicated "Fetch failures / missing members" section.
// (2) STRICT=1 exits non-zero (code 3) when members are missing.
// (3) Formula-mode detection correctly banners when pillar-combine
// is active (Σ contributions ≠ overallScore for complete responses)
// and correctly does NOT banner when contributions sum.
//
// The tests drive the script as a child process against synthetic
// fixtures so they exercise the full `main()` flow (report shape,
// exit codes, stderr logging) rather than just the pure helpers.
import assert from 'node:assert/strict';
import { describe, it } from 'node:test';
import { spawnSync } from 'node:child_process';
import path from 'node:path';
import fs from 'node:fs';
import os from 'node:os';
import { fileURLToPath } from 'node:url';
const __filename = fileURLToPath(import.meta.url);
const REPO_ROOT = path.resolve(path.dirname(__filename), '..');
const SCRIPT = path.join(REPO_ROOT, 'scripts', 'audit-resilience-cohorts.mjs');
function writeFixture(name: string, fixture: unknown): string {
const tmpFile = path.join(os.tmpdir(), `audit-fixture-${name}-${process.pid}.json`);
fs.writeFileSync(tmpFile, JSON.stringify(fixture));
return tmpFile;
}
function runAudit(env: Record<string, string>): { status: number | null; stdout: string; stderr: string; report: string } {
const outFile = path.join(os.tmpdir(), `audit-out-${Date.now()}-${Math.random().toString(36).slice(2)}.md`);
const result = spawnSync('node', [SCRIPT], {
env: { ...process.env, OUT: outFile, ...env },
encoding: 'utf8',
});
let report = '';
try { report = fs.readFileSync(outFile, 'utf8'); } catch { /* no report written */ }
return {
status: result.status,
stdout: result.stdout ?? '',
stderr: result.stderr ?? '',
report,
};
}
// Complete fixture: 57 cohort members so missing-member banner does NOT fire.
// Domain weights sum to 1.0 and coverage is 1.0 throughout.
// Σ contributions per country should land within CONTRIB_TOLERANCE of overall.
function buildCompleteFixture(options: { pillarMode?: boolean } = {}): unknown {
const allCohortCodes = Array.from(new Set([
'AE', 'SA', 'KW', 'QA', 'OM', 'BH',
'FR', 'US', 'GB', 'JP', 'KR', 'DE', 'CA', 'FI', 'SE', 'BE',
'SG', 'MY', 'TH', 'VN', 'ID', 'PH',
'BR', 'MX', 'CO', 'VE', 'AR', 'EC',
'NG', 'ZA', 'ET', 'KE', 'GH', 'CD', 'SD',
'RU', 'KZ', 'AZ', 'UA', 'UZ', 'GE', 'AM',
'LK', 'PK', 'LB', 'TR', 'EG', 'TN',
'HK', 'NL', 'PA', 'LT',
'NO',
'YE', 'SY', 'SO', 'AF',
]));
const buildDoc = (overallScore: number) => {
const dimScore = overallScore;
return {
countryCode: 'XX',
overallScore: options.pillarMode ? 10 : overallScore,
// When pillarMode=true we deliberately set overallScore to a value
// that won't match Σ contributions (penalizedPillarScore semantics)
// so the detector fires. coverage=1.0 across all dims keeps the
// eligibility gate satisfied.
level: 'moderate',
baselineScore: overallScore,
stressScore: overallScore,
stressFactor: 0.2,
domains: [
{ id: 'economic', weight: 0.17, score: dimScore, dimensions: [
{ id: 'macroFiscal', score: dimScore, coverage: 1.0, observedWeight: 1, imputedWeight: 0, imputationClass: '' },
]},
{ id: 'infrastructure', weight: 0.15, score: dimScore, dimensions: [
{ id: 'infrastructure', score: dimScore, coverage: 1.0, observedWeight: 1, imputedWeight: 0, imputationClass: '' },
]},
{ id: 'energy', weight: 0.11, score: dimScore, dimensions: [
{ id: 'energy', score: dimScore, coverage: 1.0, observedWeight: 1, imputedWeight: 0, imputationClass: '' },
]},
{ id: 'social-governance', weight: 0.19, score: dimScore, dimensions: [
{ id: 'governanceInstitutional', score: dimScore, coverage: 1.0, observedWeight: 1, imputedWeight: 0, imputationClass: '' },
]},
{ id: 'health-food', weight: 0.13, score: dimScore, dimensions: [
{ id: 'healthPublicService', score: dimScore, coverage: 1.0, observedWeight: 1, imputedWeight: 0, imputationClass: '' },
]},
{ id: 'recovery', weight: 0.25, score: dimScore, dimensions: [
{ id: 'externalDebtCoverage', score: dimScore, coverage: 1.0, observedWeight: 1, imputedWeight: 0, imputationClass: '' },
]},
],
};
};
const scores: Record<string, unknown> = {};
for (const cc of allCohortCodes) {
scores[cc] = { ...(buildDoc(70) as Record<string, unknown>), countryCode: cc };
}
const items = allCohortCodes.slice(0, 6).map((cc) => ({
countryCode: cc, overallScore: 70, level: 'moderate', lowConfidence: false, overallCoverage: 1.0, rankStable: true,
}));
return { ranking: { items, greyedOut: [] }, scores };
}
describe('audit-resilience-cohorts fail-closed — missing cohort members', () => {
it('banners the report when fixture omits cohort members AND exits 3 under STRICT=1', () => {
// Minimal fixture intentionally omits almost every cohort member.
const fixture = {
ranking: { items: [
{ countryCode: 'AE', overallScore: 72.72, level: 'high', lowConfidence: false, overallCoverage: 0.88, rankStable: true },
], greyedOut: [] },
scores: {
AE: { countryCode: 'AE', overallScore: 72.72, level: 'high', baselineScore: 72, stressScore: 70, stressFactor: 0.15, domains: [
{ id: 'recovery', weight: 0.25, score: 50, dimensions: [
{ id: 'externalDebtCoverage', score: 100, coverage: 1.0, observedWeight: 1, imputedWeight: 0, imputationClass: '' },
]},
]},
},
};
const fixturePath = writeFixture('missing-members', fixture);
try {
const result = runAudit({ FIXTURE: fixturePath, STRICT: '1' });
assert.equal(result.status, 3, `expected STRICT exit code 3 for missing members; got ${result.status}; stderr=${result.stderr}`);
assert.match(result.report, /⛔ \*\*Fetch failures \/ missing cohort members/, 'expected missing-members banner at report top');
assert.match(result.report, /## Fetch failures \/ missing members/, 'expected dedicated Fetch-failures section');
assert.match(result.report, /Cohort members with no score data:/, 'expected missing-members list');
} finally {
fs.unlinkSync(fixturePath);
}
});
it('exits 0 under STRICT=1 when all cohort members present + formula matches', () => {
const fixture = buildCompleteFixture({ pillarMode: false });
const fixturePath = writeFixture('complete', fixture);
try {
const result = runAudit({ FIXTURE: fixturePath, STRICT: '1' });
assert.equal(result.status, 0, `expected STRICT exit 0; got ${result.status}; stderr=${result.stderr}`);
assert.doesNotMatch(result.report, /⛔ \*\*Fetch failures/, 'missing-members banner should NOT fire');
assert.doesNotMatch(result.report, /⛔ \*\*Formula mode not supported/, 'formula-mode banner should NOT fire on legacy-formula response');
} finally {
fs.unlinkSync(fixturePath);
}
});
});
describe('audit-resilience-cohorts fail-closed — formula mode', () => {
it('banners the report when Σ contributions diverges from overallScore AND exits 4 under STRICT=1', () => {
const fixture = buildCompleteFixture({ pillarMode: true });
const fixturePath = writeFixture('pillar-mode', fixture);
try {
const result = runAudit({ FIXTURE: fixturePath, STRICT: '1' });
assert.equal(result.status, 4, `expected STRICT exit code 4 for formula mismatch; got ${result.status}; stderr=${result.stderr}`);
assert.match(result.report, /⛔ \*\*Formula mode not supported/, 'expected formula-mode banner at report top');
assert.match(result.report, /PILLAR-COMBINE \(decomposition invalid\)/, 'expected formula-mode line in header');
assert.match(result.report, /## Formula-mode diagnostic/, 'expected dedicated formula-mode diagnostic section');
} finally {
fs.unlinkSync(fixturePath);
}
});
});