mirror of
https://github.com/koala73/worldmonitor.git
synced 2026-04-25 17:14:57 +02:00
* feat(resilience): PR 2 dimension wiring — split reserveAdequacy + add sovereignFiscalBuffer Plan §3.4 follow-up to #3305 + #3319. Lands the scorer + dimension registration so the SWF seed from the Railway cron feeds a real score once the bake-in window closes. No weight rebalance yet (separate commit with Spearman sensitivity check), no health.js graduation yet (7-day ON_DEMAND window per feedback_health_required_key_needs_ railway_cron_first.md), no bootstrap wiring yet (follow-up PR). Shape of the change Retirement: - reserveAdequacy joins fuelStockDays in RESILIENCE_RETIRED_DIMENSIONS. The legacy scorer now mirrors scoreFuelStockDays: returns coverage=0 / imputationClass=null so the dimension is filtered out of the confidence / coverage averages via the registry filter in computeLowConfidence, computeOverallCoverage, and the widget's formatResilienceConfidence. Kept in RESILIENCE_DIMENSION_ORDER for structural continuity (tests, cached payload shape, registry membership). Indicator registry tier demoted to 'experimental'. Two new active dimensions: - liquidReserveAdequacy (replaces the liquid-reserves half of the retired reserveAdequacy). Same source (WB FI.RES.TOTL.MO, total reserves in months of imports) but re-anchored 1..12 months instead of 1..18. Twelve months ≈ IMF "full reserve adequacy" benchmark for a diversified emerging-market importer — the tighter ceiling prevents wealthy commodity-exporters from claiming outsized credit for on-paper reserve stocks that are not the relevant shock-absorption buffer. - sovereignFiscalBuffer. Reads resilience:recovery:sovereign-wealth:v1 (populated by scripts/seed-sovereign-wealth.mjs, landed in #3305 + wired into Railway cron in #3319). Computes the saturating transform: effectiveMonths = Σ [ aum/annualImports × 12 × access × liquidity × transparency ] score = 100 × (1 − exp(−effectiveMonths / 12)) Exponential saturation prevents Norway-type outliers (effective months in the 100s) from dominating the recovery pillar. Three code paths in scoreSovereignFiscalBuffer: 1. Seed key absent entirely → IMPUTE.recoverySovereignFiscalBuffer (score 50 / coverage 0.3 / unmonitored). Covers the Railway-cron bake-in window before the first successful tick. 2. Seed present, country NOT in manifest → score=0 with FULL coverage. Substantive absence, NOT imputation — per plan §3.4 "What happens to no-SWF countries." 0 × weight = 0 in the numerator, so the country correctly scores lower than SWF-holding peers on this dim. 3. Seed present, country in payload → saturating score, coverage derated by the partial-seed completeness signal (so a Mubadala or Temasek scrape drift on a multi-fund country shows up as lower confidence rather than a silently-understated total). Indicator registry: - Demoted recoveryReserveMonths (tied to retired reserveAdequacy) to tier='experimental'. - Added recoveryLiquidReserveMonths: WB FI.RES.TOTL.MO, anchors 1..12, tier='core', coverage=188. - Added recoverySovereignWealthEffectiveMonths: the new SWF signal, tier='experimental' for now because the manifest only has 8 funds (below the 180-core / 137-§3.6-gate threshold). Graduating to 'core' requires expanding the manifest past ~137 entries — a later PR. Tests updated - resilience-release-gate: 19→21 dim count; RETIRED_DIMENSIONS allow- list now includes reserveAdequacy alongside fuelStockDays. - resilience-dimension-scorers: scoreReserveAdequacy monotonicity + "high reserves score well" tests migrated to scoreLiquidReserve- Adequacy (same source, new 1..12 anchor). New retirement-shape test for scoreReserveAdequacy mirroring the PR 3 fuelStockDays retirement test. Four new scorer tests pin the three code paths of scoreSovereignFiscalBuffer (absent seed / no-SWF country / SWF country / partial-completeness derate). - resilience-scorers fixture: baseline 60.12→60.35, recovery-domain flat mean 47.33→48.75, overall 63.27→63.6. Each number commented with the driver (split adds liquidReserveAdequacy 18@1.0 + sovereign FiscalBuffer 50@0.3 at IMPUTE; retired reserveAdequacy drops out). - resilience-dimension-monotonicity: target scoreLiquidReserveAdequacy instead of scoreReserveAdequacy. - resilience-handlers: response-shape dim count 19→21. - resilience-indicator-registry: coverage 19→21 dimensions. - resilience-dimension-freshness: allowlisted the new sovereign-wealth seed-meta key in KNOWN_SEEDS_NOT_IN_HEALTH for the ON_DEMAND window. - resilience-methodology-lint HEADING_TO_DIMENSION: added the two new heading mappings. Methodology doc gets H4 sections for Liquid Reserve Adequacy and Sovereign Fiscal Buffer; Reserve Adequacy section is annotated as retired. - resilience-retired-dimensions-parity: client-side RESILIENCE_RETIRED_DIMENSION_IDS gets reserveAdequacy. Parser upgraded to strip inline `// …` comments from the array body so a future reviewer can drop a rationale next to an entry without breaking parity. - resilience-confidence-averaging: fixture updated to include both retired dims (reserveAdequacy + fuelStockDays) — confirms the registry filter correctly excludes BOTH from the visible coverage reading. Extraction harness (scripts/compare-resilience-current-vs-proposed.mjs): - recoveryLiquidReserveMonths: reads the same reserve-adequacy seed field as recoveryReserveMonths. - recoverySovereignWealthEffectiveMonths: reads the new SWF seed key on field totalEffectiveMonths. Absent-payload → 0 for correlation math (matches the substantive-no-SWF scorer branch). Out of scope for this commit (follow-ups) - Recovery-domain weight rebalance + Spearman sensitivity rerun against the PR 0 baseline. - health.js graduation (SEED_META entry + ON_DEMAND_KEYS removal) once Railway cron has ~7 days of clean runs. - api/bootstrap.js wiring once an RPC consumer needs the SWF data. - Manifest expansion past 137 countries so sovereignFiscalBuffer can graduate from tier='experimental' to tier='core'. Tests: 6573/6573 data-tier tests pass. Typecheck clean on both tsconfig configs. Biome clean on all touched files. * fix(resilience): PR 2 review — add widget labels for new dimensions P2 review finding on PR #3324. DIMENSION_LABELS in src/components/ resilience-widget-utils.ts covered only the old 19 dimension IDs, so the two new active dims (liquidReserveAdequacy, sovereignFiscalBuffer) would render with their raw internal IDs in the confidence grid for every country once the scorer started emitting them. The widget test at getResilienceDimensionLabel also asserted only the 19-label set, so the gap would have shipped silently. Fix: add user-facing short labels for both new dims. "Reserves" is already claimed by the retired reserveAdequacy, so the replacement disambiguates with "Liquid Reserves"; sovereignFiscalBuffer → "Sovereign Wealth" per the methodology doc H4 heading. Also added a regression guard — new test asserts EVERY id in RESILIENCE_DIMENSION_ORDER resolves to a non-id label. Any future dimension that ships without a matching DIMENSION_LABELS entry now fails CI loudly instead of leaking the ID into the UI. Tests: 502/502 resilience tests pass (+1 new coverage check). Typecheck clean on both configs. * fix(resilience): PR 2 review — remove dead IMPUTE.recoveryReserveAdequacy entry Greptile P2: the retired scoreReserveAdequacy stub no longer reads from IMPUTE (it hardcodes coverage=0 / imputationClass=null per the retirement pattern), making IMPUTE.recoveryReserveAdequacy dead code. Removed the entry + added a breadcrumb comment pointing at the replacement IMPUTE.recoveryLiquidReserveAdequacy. The second P2 (bootstrap.js not wired) is a deliberate non-goal — the reviewer explicitly flags "for visibility" since it's tracked in the PR body. No action this commit; bootstrap wiring lands alongside the SEED_META graduation after the ~7-day Railway-cron bake-in. Tests: 502/502 resilience tests still pass. Typecheck clean.
144 lines
5.8 KiB
TypeScript
144 lines
5.8 KiB
TypeScript
import assert from 'node:assert/strict';
|
|
import { describe, it } from 'node:test';
|
|
|
|
import {
|
|
computeLowConfidence,
|
|
computeOverallCoverage,
|
|
} from '../server/worldmonitor/resilience/v1/_shared';
|
|
import type {
|
|
GetResilienceScoreResponse,
|
|
ResilienceDimension,
|
|
} from '../src/generated/server/worldmonitor/resilience/v1/service_server';
|
|
|
|
// PR 3 §3.5 follow-up (reviewer P1): the retired dimension (fuelStockDays,
|
|
// post-retirement) returns coverage=0 structurally and contributes zero
|
|
// weight to the domain score via coverageWeightedMean. The user-facing
|
|
// confidence/coverage averages must exclude retired dims — otherwise
|
|
// the retirement silently drags the reported averageCoverage down for
|
|
// every country even though the dimension is not part of the score.
|
|
//
|
|
// Reviewer anchor: on the US profile, including retired dims gave
|
|
// averageCoverage=0.8105 vs 0.8556 when retired dims are excluded —
|
|
// enough drift to misclassify edge countries as lowConfidence and to
|
|
// shift the widget's overallCoverage pill for the whole ranking.
|
|
//
|
|
// Critical invariant: the filter is keyed on the retired-dim REGISTRY,
|
|
// not on `coverage === 0`. Non-retired dimensions can legitimately
|
|
// emit coverage=0 on genuinely sparse-data countries via weightedBlend
|
|
// fall-through, and those entries MUST continue to drag confidence
|
|
// down — that is the sparse-data signal lowConfidence exists to
|
|
// surface. A too-aggressive `coverage === 0` filter would hide the
|
|
// sparsity and e.g. let South Sudan pass as full-confidence.
|
|
|
|
function dim(id: string, coverage: number): ResilienceDimension {
|
|
return {
|
|
id,
|
|
score: 50,
|
|
coverage,
|
|
observedWeight: coverage > 0 ? 1 : 0,
|
|
imputedWeight: 0,
|
|
imputationClass: '',
|
|
freshness: { lastObservedAtMs: '0', staleness: '' },
|
|
};
|
|
}
|
|
|
|
describe('computeOverallCoverage: retired-dim exclusion', () => {
|
|
it('excludes retired dimensions from the average', () => {
|
|
const response = {
|
|
domains: [
|
|
{
|
|
id: 'recovery',
|
|
dimensions: [
|
|
dim('fiscalSpace', 0.9),
|
|
dim('liquidReserveAdequacy', 0.8), // active replacement for reserveAdequacy
|
|
// Retired dims contribute coverage=0 in real payloads; both
|
|
// must be filtered out so the visible coverage reading
|
|
// tracks only the active dims.
|
|
dim('reserveAdequacy', 0), // retired in PR 2 §3.4
|
|
dim('fuelStockDays', 0), // retired in PR 3 §3.5
|
|
],
|
|
},
|
|
],
|
|
} as unknown as GetResilienceScoreResponse;
|
|
|
|
// (0.9 + 0.8) / 2 = 0.85 — only the two active dims count.
|
|
// With retired included the flat mean would be
|
|
// (0.9 + 0.8 + 0 + 0) / 4 = 0.425 — the regression shape.
|
|
assert.equal(computeOverallCoverage(response).toFixed(4), '0.8500');
|
|
});
|
|
|
|
it('keeps NON-retired coverage=0 dims in the average (sparse-data signal)', () => {
|
|
// A genuinely sparse-data country can emit coverage=0 on non-retired
|
|
// dims via weightedBlend fall-through. Those entries must stay in
|
|
// the average so sparse countries still surface as low confidence
|
|
// via the flat mean path.
|
|
const response = {
|
|
domains: [
|
|
{
|
|
id: 'economic',
|
|
dimensions: [
|
|
dim('macroFiscal', 0.9),
|
|
// NON-retired coverage=0: represents genuine data sparsity.
|
|
dim('currencyExternal', 0),
|
|
],
|
|
},
|
|
],
|
|
} as unknown as GetResilienceScoreResponse;
|
|
|
|
// (0.9 + 0) / 2 = 0.45. If the filter were keyed on coverage=0,
|
|
// the genuine sparsity would be hidden and this would be 0.9.
|
|
assert.equal(computeOverallCoverage(response).toFixed(4), '0.4500');
|
|
});
|
|
|
|
it('returns 0 when ALL dims are retired (degenerate case)', () => {
|
|
const response = {
|
|
domains: [
|
|
{ id: 'recovery', dimensions: [dim('fuelStockDays', 0)] },
|
|
],
|
|
} as unknown as GetResilienceScoreResponse;
|
|
assert.equal(computeOverallCoverage(response), 0);
|
|
});
|
|
});
|
|
|
|
describe('computeLowConfidence: retired-dim exclusion', () => {
|
|
it('does not flip lowConfidence purely on retired-dim drag', () => {
|
|
// Three active dims at 0.72 = 0.72 mean (above the low-confidence
|
|
// threshold). A single retired dim at coverage=0 must not flip the
|
|
// flag by dragging the flat mean below the threshold — that was
|
|
// the regression on the US profile.
|
|
const dims = [
|
|
dim('fiscalSpace', 0.72),
|
|
dim('reserveAdequacy', 0.72),
|
|
dim('externalDebtCoverage', 0.72),
|
|
dim('fuelStockDays', 0), // retired
|
|
];
|
|
assert.equal(computeLowConfidence(dims, 0), false,
|
|
'retired fuelStockDays must not flip lowConfidence for an otherwise well-covered country');
|
|
});
|
|
|
|
it('DOES flip lowConfidence for non-retired coverage=0 dims (sparse data)', () => {
|
|
// A sparse-data country: multiple non-retired dims at coverage=0
|
|
// via weightedBlend fall-through. The flat mean drops below the
|
|
// threshold and the flag must fire — this is the sparse-data
|
|
// signal lowConfidence exists to surface. A too-aggressive filter
|
|
// on coverage=0 would hide this.
|
|
const dims = [
|
|
dim('macroFiscal', 0.9),
|
|
dim('currencyExternal', 0), // non-retired coverage=0
|
|
dim('tradeSanctions', 0), // non-retired coverage=0
|
|
dim('cyberDigital', 0), // non-retired coverage=0
|
|
];
|
|
assert.equal(computeLowConfidence(dims, 0), true,
|
|
'non-retired coverage=0 dims must drag lowConfidence down — that is the sparse-data signal');
|
|
});
|
|
|
|
it('respects the imputationShare threshold independently', () => {
|
|
// Imputation-share check is a separate arm of the OR; retired-dim
|
|
// filtering must not suppress a legitimate high-imputation-share
|
|
// trigger.
|
|
const dims = [dim('fiscalSpace', 0.95)];
|
|
assert.equal(computeLowConfidence(dims, 0.6), true,
|
|
'imputationShare > 0.4 must flip lowConfidence even when coverage looks strong');
|
|
});
|
|
});
|