mirror of
https://github.com/koala73/worldmonitor.git
synced 2026-04-25 17:14:57 +02:00
* feat(resilience): PR 0 — cohort-sanity release-gate harness Lands the audit infrastructure for the resilience cohort-ranking structural audit (plan 2026-04-24-002). Release gate, not merge gate: the audit tells release review what to look at before publishing a ranking; it does not block a PR. What's new - scripts/audit-resilience-cohorts.mjs — Markdown report generator. Fetches the live ranking + per-country scores (or reads a fixture in offline mode), emits per-cohort per-dimension tables, contribution decomposition, saturated / outlier / identical-score flags, and a top-N movers comparison vs a baseline snapshot. - tests/resilience-construct-invariants.test.mts — 12 formula-level anchor-value assertions with synthetic inputs. Covers HHI, external debt (Greenspan-Guidotti anchor), and sovereign fiscal buffer (saturating transform). Tests the MATH, not a country's rank. - tests/fixtures/resilience-audit-fixture.json — offline fixture that mirrors the 2026-04-24 GCC state (KW>QA>AE) so the audit tool can be smoke-tested without API-key access. - docs/methodology/cohort-sanity-release-gate.md — operational doc explaining when to run, how to read the report, and the explicit anti-pattern note on rank-targeted acceptance criteria. Verified - `npx tsx --test tests/resilience-construct-invariants.test.mts` — 12 pass (HHI, debt, SWF invariants all green against current scorer) - `npm run test:data` — 6706 pass / 0 fail - `FIXTURE=tests/fixtures/resilience-audit-fixture.json OUT=/tmp/audit.md node scripts/audit-resilience-cohorts.mjs` runs to completion and correctly flags: (a) coverage-outlier on AE.importConcentration (0.3 vs peers 1.0) (b) saturated-high on GCC.externalDebtCoverage (all 6 at 100) — the two top cohort-sanity findings from the plan. Not in this PR - The live-API baseline snapshot (docs/snapshots/resilience-ranking-live-pre-cohort-audit-2026-04-24.json) is deferred to a manual release-prep step: run `WORLDMONITOR_API_KEY=wm_xxx API_BASE=https://api.worldmonitor.app node scripts/freeze-resilience-ranking.mjs` before the first methodology PR (PR 1 HHI period widening) so its movers table has something to compare against. - No scorer changes. No cache-prefix bumps. This PR is pure tooling. * fix(resilience): fail-closed on fetch failures + pillar-combine formula mode Addresses review P1 + P2 on PR #3369. P1 — fetch-failure silent-drop. Per-country score fetches that failed were logged to stderr, silently stored as null, and then filtered out of cohort tables via `codes.filter((cc) => scoreMap.get(cc))`. A transient 403/500 on the very country carrying the ranking anomaly could produce a Markdown report that looked valid — wrong failure mode for a release gate. Fix: - `fetchScoresConcurrent` now tracks failures in a dedicated Map and does NOT insert null placeholders; missing cohort members are computed against the requested cohort code set. - The report has a ⛔ blocker banner at top AND an always-rendered "Fetch failures / missing members" section (shown even when empty, so an operator learns to look). - `STRICT=1` writes the report, then exits code 3 on any fetch failure or missing cohort member, code 4 on formula-mode drift, code 0 otherwise. Automation can differentiate the two. P2 — pillar-combine formula mode invalidates contribution rows. `docs/methodology/cohort-sanity-release-gate.md:63` tells operators to run this audit before activating `RESILIENCE_PILLAR_COMBINE_ENABLED`, but the contribution decomposition is a domain-weighted roll-up that is ONLY valid when `overallScore = sum(domain.score * domain.weight)`. Once pillar combine is on, `overallScore = penalizedPillarScore(pillars)` (non-linear in dim scores); decomposition rows become materially misleading for exactly the release-gate scenario the doc prescribes. Fix: - Added `detectFormulaMode(scoreMap)` that takes countries with: (a) `sum(domain.weight)` within 0.05 of 1.0 (complete response), AND (b) every dim at `coverage ≥ 0.9` (stable share math) and compares `|Σ contributions - overallScore|` against `CONTRIB_TOLERANCE` (default 1.5). If > 50% of ≥ 3 eligible countries drift, pillar combine is flagged. - Report emits a ⛔ blocker banner at top, a "Formula mode" line in the header, and a "Formula-mode diagnostic" section with the first three offenders. Under `STRICT=1` exits code 4. - Methodology doc updated: new "Fail-closed semantics" section, "Formula mode" operator guide, ENV table entries for STRICT + CONTRIB_TOLERANCE. Verified: - `tests/audit-cohort-formula-detection.test.mts` (NEW) — 3 child-process smoke tests: missing-members banner + STRICT exit 3, all-clear exit 0, pillar-mode banner + STRICT exit 4. All pass. - `npx tsx --test tests/resilience-construct-invariants.test.mts tests/audit-cohort-formula-detection.test.mts` — 15 pass / 0 fail - `npm run test:data` — 6709 pass / 0 fail - `npm run typecheck` / `typecheck:api` — green - `npm run lint` / `lint:md` — no warnings on new / changed files (refactor split buildReport complexity from 51 → under 50 by extracting `renderCohortSection` + `renderDimCell`) - Fixture smoke: AE.importConcentration coverage-outlier and GCC.externalDebtCoverage saturated-high flags still fire correctly. * fix(resilience): PR 0 review — fixture-mode source label, try/catch country-names, ASCII minus Addresses 3 P2 Greptile findings on #3369: 1. **Misleading Source: line in fixture mode.** `FIXTURE_PATH` sets `API_BASE=''`, so the report header showed a bare "/api/..." path that never resolved — making a fixture run visually indistinguishable from a live run. Now surfaces `Source: fixture://<path>` in fixture mode. 2. **`loadCountryNameMap` crashes without useful diagnostics.** A missing or unparseable `shared/country-names.json` produced a raw unhandled rejection. Now the read and the parse are each wrapped in their own try/catch; on either failure the script logs a developer-friendly warning and falls back to ISO-2 codes (report shows "AE" instead of "Uae"). Keeps the audit operable in CI-offline scenarios. 3. **Unicode minus `−` (U+2212) instead of ASCII `-` in `fmtDelta`.** Downstream operators diff / grep / CSV-pipe the report; the Unicode minus breaks byte-level text tooling. Replaced with ASCII hyphen- minus. Left the U+2212 in the formula-mode diagnostic prose (`|Σ contributions − overallScore|`) where it's mathematical notation, not data. Verified - `npx tsx --test tests/audit-cohort-formula-detection.test.mts tests/resilience-construct-invariants.test.mts` — 15 pass / 0 fail - Fixture-mode run produces `Source: fixture://tests/fixtures/...` - Movers-table negative deltas now use ASCII `-`
167 lines
8.1 KiB
TypeScript
167 lines
8.1 KiB
TypeScript
// Smoke-tests for the fail-closed behaviour of
|
|
// `scripts/audit-resilience-cohorts.mjs`. Verifies:
|
|
// (1) Missing cohort members produce a ⛔ banner at report top
|
|
// and a dedicated "Fetch failures / missing members" section.
|
|
// (2) STRICT=1 exits non-zero (code 3) when members are missing.
|
|
// (3) Formula-mode detection correctly banners when pillar-combine
|
|
// is active (Σ contributions ≠ overallScore for complete responses)
|
|
// and correctly does NOT banner when contributions sum.
|
|
//
|
|
// The tests drive the script as a child process against synthetic
|
|
// fixtures so they exercise the full `main()` flow (report shape,
|
|
// exit codes, stderr logging) rather than just the pure helpers.
|
|
|
|
import assert from 'node:assert/strict';
|
|
import { describe, it } from 'node:test';
|
|
import { spawnSync } from 'node:child_process';
|
|
import path from 'node:path';
|
|
import fs from 'node:fs';
|
|
import os from 'node:os';
|
|
import { fileURLToPath } from 'node:url';
|
|
|
|
const __filename = fileURLToPath(import.meta.url);
|
|
const REPO_ROOT = path.resolve(path.dirname(__filename), '..');
|
|
const SCRIPT = path.join(REPO_ROOT, 'scripts', 'audit-resilience-cohorts.mjs');
|
|
|
|
function writeFixture(name: string, fixture: unknown): string {
|
|
const tmpFile = path.join(os.tmpdir(), `audit-fixture-${name}-${process.pid}.json`);
|
|
fs.writeFileSync(tmpFile, JSON.stringify(fixture));
|
|
return tmpFile;
|
|
}
|
|
|
|
function runAudit(env: Record<string, string>): { status: number | null; stdout: string; stderr: string; report: string } {
|
|
const outFile = path.join(os.tmpdir(), `audit-out-${Date.now()}-${Math.random().toString(36).slice(2)}.md`);
|
|
const result = spawnSync('node', [SCRIPT], {
|
|
env: { ...process.env, OUT: outFile, ...env },
|
|
encoding: 'utf8',
|
|
});
|
|
let report = '';
|
|
try { report = fs.readFileSync(outFile, 'utf8'); } catch { /* no report written */ }
|
|
return {
|
|
status: result.status,
|
|
stdout: result.stdout ?? '',
|
|
stderr: result.stderr ?? '',
|
|
report,
|
|
};
|
|
}
|
|
|
|
// Complete fixture: 57 cohort members so missing-member banner does NOT fire.
|
|
// Domain weights sum to 1.0 and coverage is 1.0 throughout.
|
|
// Σ contributions per country should land within CONTRIB_TOLERANCE of overall.
|
|
function buildCompleteFixture(options: { pillarMode?: boolean } = {}): unknown {
|
|
const allCohortCodes = Array.from(new Set([
|
|
'AE', 'SA', 'KW', 'QA', 'OM', 'BH',
|
|
'FR', 'US', 'GB', 'JP', 'KR', 'DE', 'CA', 'FI', 'SE', 'BE',
|
|
'SG', 'MY', 'TH', 'VN', 'ID', 'PH',
|
|
'BR', 'MX', 'CO', 'VE', 'AR', 'EC',
|
|
'NG', 'ZA', 'ET', 'KE', 'GH', 'CD', 'SD',
|
|
'RU', 'KZ', 'AZ', 'UA', 'UZ', 'GE', 'AM',
|
|
'LK', 'PK', 'LB', 'TR', 'EG', 'TN',
|
|
'HK', 'NL', 'PA', 'LT',
|
|
'NO',
|
|
'YE', 'SY', 'SO', 'AF',
|
|
]));
|
|
|
|
const buildDoc = (overallScore: number) => {
|
|
const dimScore = overallScore;
|
|
return {
|
|
countryCode: 'XX',
|
|
overallScore: options.pillarMode ? 10 : overallScore,
|
|
// When pillarMode=true we deliberately set overallScore to a value
|
|
// that won't match Σ contributions (penalizedPillarScore semantics)
|
|
// so the detector fires. coverage=1.0 across all dims keeps the
|
|
// eligibility gate satisfied.
|
|
level: 'moderate',
|
|
baselineScore: overallScore,
|
|
stressScore: overallScore,
|
|
stressFactor: 0.2,
|
|
domains: [
|
|
{ id: 'economic', weight: 0.17, score: dimScore, dimensions: [
|
|
{ id: 'macroFiscal', score: dimScore, coverage: 1.0, observedWeight: 1, imputedWeight: 0, imputationClass: '' },
|
|
]},
|
|
{ id: 'infrastructure', weight: 0.15, score: dimScore, dimensions: [
|
|
{ id: 'infrastructure', score: dimScore, coverage: 1.0, observedWeight: 1, imputedWeight: 0, imputationClass: '' },
|
|
]},
|
|
{ id: 'energy', weight: 0.11, score: dimScore, dimensions: [
|
|
{ id: 'energy', score: dimScore, coverage: 1.0, observedWeight: 1, imputedWeight: 0, imputationClass: '' },
|
|
]},
|
|
{ id: 'social-governance', weight: 0.19, score: dimScore, dimensions: [
|
|
{ id: 'governanceInstitutional', score: dimScore, coverage: 1.0, observedWeight: 1, imputedWeight: 0, imputationClass: '' },
|
|
]},
|
|
{ id: 'health-food', weight: 0.13, score: dimScore, dimensions: [
|
|
{ id: 'healthPublicService', score: dimScore, coverage: 1.0, observedWeight: 1, imputedWeight: 0, imputationClass: '' },
|
|
]},
|
|
{ id: 'recovery', weight: 0.25, score: dimScore, dimensions: [
|
|
{ id: 'externalDebtCoverage', score: dimScore, coverage: 1.0, observedWeight: 1, imputedWeight: 0, imputationClass: '' },
|
|
]},
|
|
],
|
|
};
|
|
};
|
|
|
|
const scores: Record<string, unknown> = {};
|
|
for (const cc of allCohortCodes) {
|
|
scores[cc] = { ...(buildDoc(70) as Record<string, unknown>), countryCode: cc };
|
|
}
|
|
const items = allCohortCodes.slice(0, 6).map((cc) => ({
|
|
countryCode: cc, overallScore: 70, level: 'moderate', lowConfidence: false, overallCoverage: 1.0, rankStable: true,
|
|
}));
|
|
return { ranking: { items, greyedOut: [] }, scores };
|
|
}
|
|
|
|
describe('audit-resilience-cohorts fail-closed — missing cohort members', () => {
|
|
it('banners the report when fixture omits cohort members AND exits 3 under STRICT=1', () => {
|
|
// Minimal fixture intentionally omits almost every cohort member.
|
|
const fixture = {
|
|
ranking: { items: [
|
|
{ countryCode: 'AE', overallScore: 72.72, level: 'high', lowConfidence: false, overallCoverage: 0.88, rankStable: true },
|
|
], greyedOut: [] },
|
|
scores: {
|
|
AE: { countryCode: 'AE', overallScore: 72.72, level: 'high', baselineScore: 72, stressScore: 70, stressFactor: 0.15, domains: [
|
|
{ id: 'recovery', weight: 0.25, score: 50, dimensions: [
|
|
{ id: 'externalDebtCoverage', score: 100, coverage: 1.0, observedWeight: 1, imputedWeight: 0, imputationClass: '' },
|
|
]},
|
|
]},
|
|
},
|
|
};
|
|
const fixturePath = writeFixture('missing-members', fixture);
|
|
try {
|
|
const result = runAudit({ FIXTURE: fixturePath, STRICT: '1' });
|
|
assert.equal(result.status, 3, `expected STRICT exit code 3 for missing members; got ${result.status}; stderr=${result.stderr}`);
|
|
assert.match(result.report, /⛔ \*\*Fetch failures \/ missing cohort members/, 'expected missing-members banner at report top');
|
|
assert.match(result.report, /## Fetch failures \/ missing members/, 'expected dedicated Fetch-failures section');
|
|
assert.match(result.report, /Cohort members with no score data:/, 'expected missing-members list');
|
|
} finally {
|
|
fs.unlinkSync(fixturePath);
|
|
}
|
|
});
|
|
|
|
it('exits 0 under STRICT=1 when all cohort members present + formula matches', () => {
|
|
const fixture = buildCompleteFixture({ pillarMode: false });
|
|
const fixturePath = writeFixture('complete', fixture);
|
|
try {
|
|
const result = runAudit({ FIXTURE: fixturePath, STRICT: '1' });
|
|
assert.equal(result.status, 0, `expected STRICT exit 0; got ${result.status}; stderr=${result.stderr}`);
|
|
assert.doesNotMatch(result.report, /⛔ \*\*Fetch failures/, 'missing-members banner should NOT fire');
|
|
assert.doesNotMatch(result.report, /⛔ \*\*Formula mode not supported/, 'formula-mode banner should NOT fire on legacy-formula response');
|
|
} finally {
|
|
fs.unlinkSync(fixturePath);
|
|
}
|
|
});
|
|
});
|
|
|
|
describe('audit-resilience-cohorts fail-closed — formula mode', () => {
|
|
it('banners the report when Σ contributions diverges from overallScore AND exits 4 under STRICT=1', () => {
|
|
const fixture = buildCompleteFixture({ pillarMode: true });
|
|
const fixturePath = writeFixture('pillar-mode', fixture);
|
|
try {
|
|
const result = runAudit({ FIXTURE: fixturePath, STRICT: '1' });
|
|
assert.equal(result.status, 4, `expected STRICT exit code 4 for formula mismatch; got ${result.status}; stderr=${result.stderr}`);
|
|
assert.match(result.report, /⛔ \*\*Formula mode not supported/, 'expected formula-mode banner at report top');
|
|
assert.match(result.report, /PILLAR-COMBINE \(decomposition invalid\)/, 'expected formula-mode line in header');
|
|
assert.match(result.report, /## Formula-mode diagnostic/, 'expected dedicated formula-mode diagnostic section');
|
|
} finally {
|
|
fs.unlinkSync(fixturePath);
|
|
}
|
|
});
|
|
});
|