mirror of
https://github.com/koala73/worldmonitor.git
synced 2026-04-25 17:14:57 +02:00
* feat(resilience): PR 2 dimension wiring — split reserveAdequacy + add sovereignFiscalBuffer Plan §3.4 follow-up to #3305 + #3319. Lands the scorer + dimension registration so the SWF seed from the Railway cron feeds a real score once the bake-in window closes. No weight rebalance yet (separate commit with Spearman sensitivity check), no health.js graduation yet (7-day ON_DEMAND window per feedback_health_required_key_needs_ railway_cron_first.md), no bootstrap wiring yet (follow-up PR). Shape of the change Retirement: - reserveAdequacy joins fuelStockDays in RESILIENCE_RETIRED_DIMENSIONS. The legacy scorer now mirrors scoreFuelStockDays: returns coverage=0 / imputationClass=null so the dimension is filtered out of the confidence / coverage averages via the registry filter in computeLowConfidence, computeOverallCoverage, and the widget's formatResilienceConfidence. Kept in RESILIENCE_DIMENSION_ORDER for structural continuity (tests, cached payload shape, registry membership). Indicator registry tier demoted to 'experimental'. Two new active dimensions: - liquidReserveAdequacy (replaces the liquid-reserves half of the retired reserveAdequacy). Same source (WB FI.RES.TOTL.MO, total reserves in months of imports) but re-anchored 1..12 months instead of 1..18. Twelve months ≈ IMF "full reserve adequacy" benchmark for a diversified emerging-market importer — the tighter ceiling prevents wealthy commodity-exporters from claiming outsized credit for on-paper reserve stocks that are not the relevant shock-absorption buffer. - sovereignFiscalBuffer. Reads resilience:recovery:sovereign-wealth:v1 (populated by scripts/seed-sovereign-wealth.mjs, landed in #3305 + wired into Railway cron in #3319). Computes the saturating transform: effectiveMonths = Σ [ aum/annualImports × 12 × access × liquidity × transparency ] score = 100 × (1 − exp(−effectiveMonths / 12)) Exponential saturation prevents Norway-type outliers (effective months in the 100s) from dominating the recovery pillar. Three code paths in scoreSovereignFiscalBuffer: 1. Seed key absent entirely → IMPUTE.recoverySovereignFiscalBuffer (score 50 / coverage 0.3 / unmonitored). Covers the Railway-cron bake-in window before the first successful tick. 2. Seed present, country NOT in manifest → score=0 with FULL coverage. Substantive absence, NOT imputation — per plan §3.4 "What happens to no-SWF countries." 0 × weight = 0 in the numerator, so the country correctly scores lower than SWF-holding peers on this dim. 3. Seed present, country in payload → saturating score, coverage derated by the partial-seed completeness signal (so a Mubadala or Temasek scrape drift on a multi-fund country shows up as lower confidence rather than a silently-understated total). Indicator registry: - Demoted recoveryReserveMonths (tied to retired reserveAdequacy) to tier='experimental'. - Added recoveryLiquidReserveMonths: WB FI.RES.TOTL.MO, anchors 1..12, tier='core', coverage=188. - Added recoverySovereignWealthEffectiveMonths: the new SWF signal, tier='experimental' for now because the manifest only has 8 funds (below the 180-core / 137-§3.6-gate threshold). Graduating to 'core' requires expanding the manifest past ~137 entries — a later PR. Tests updated - resilience-release-gate: 19→21 dim count; RETIRED_DIMENSIONS allow- list now includes reserveAdequacy alongside fuelStockDays. - resilience-dimension-scorers: scoreReserveAdequacy monotonicity + "high reserves score well" tests migrated to scoreLiquidReserve- Adequacy (same source, new 1..12 anchor). New retirement-shape test for scoreReserveAdequacy mirroring the PR 3 fuelStockDays retirement test. Four new scorer tests pin the three code paths of scoreSovereignFiscalBuffer (absent seed / no-SWF country / SWF country / partial-completeness derate). - resilience-scorers fixture: baseline 60.12→60.35, recovery-domain flat mean 47.33→48.75, overall 63.27→63.6. Each number commented with the driver (split adds liquidReserveAdequacy 18@1.0 + sovereign FiscalBuffer 50@0.3 at IMPUTE; retired reserveAdequacy drops out). - resilience-dimension-monotonicity: target scoreLiquidReserveAdequacy instead of scoreReserveAdequacy. - resilience-handlers: response-shape dim count 19→21. - resilience-indicator-registry: coverage 19→21 dimensions. - resilience-dimension-freshness: allowlisted the new sovereign-wealth seed-meta key in KNOWN_SEEDS_NOT_IN_HEALTH for the ON_DEMAND window. - resilience-methodology-lint HEADING_TO_DIMENSION: added the two new heading mappings. Methodology doc gets H4 sections for Liquid Reserve Adequacy and Sovereign Fiscal Buffer; Reserve Adequacy section is annotated as retired. - resilience-retired-dimensions-parity: client-side RESILIENCE_RETIRED_DIMENSION_IDS gets reserveAdequacy. Parser upgraded to strip inline `// …` comments from the array body so a future reviewer can drop a rationale next to an entry without breaking parity. - resilience-confidence-averaging: fixture updated to include both retired dims (reserveAdequacy + fuelStockDays) — confirms the registry filter correctly excludes BOTH from the visible coverage reading. Extraction harness (scripts/compare-resilience-current-vs-proposed.mjs): - recoveryLiquidReserveMonths: reads the same reserve-adequacy seed field as recoveryReserveMonths. - recoverySovereignWealthEffectiveMonths: reads the new SWF seed key on field totalEffectiveMonths. Absent-payload → 0 for correlation math (matches the substantive-no-SWF scorer branch). Out of scope for this commit (follow-ups) - Recovery-domain weight rebalance + Spearman sensitivity rerun against the PR 0 baseline. - health.js graduation (SEED_META entry + ON_DEMAND_KEYS removal) once Railway cron has ~7 days of clean runs. - api/bootstrap.js wiring once an RPC consumer needs the SWF data. - Manifest expansion past 137 countries so sovereignFiscalBuffer can graduate from tier='experimental' to tier='core'. Tests: 6573/6573 data-tier tests pass. Typecheck clean on both tsconfig configs. Biome clean on all touched files. * fix(resilience): PR 2 review — add widget labels for new dimensions P2 review finding on PR #3324. DIMENSION_LABELS in src/components/ resilience-widget-utils.ts covered only the old 19 dimension IDs, so the two new active dims (liquidReserveAdequacy, sovereignFiscalBuffer) would render with their raw internal IDs in the confidence grid for every country once the scorer started emitting them. The widget test at getResilienceDimensionLabel also asserted only the 19-label set, so the gap would have shipped silently. Fix: add user-facing short labels for both new dims. "Reserves" is already claimed by the retired reserveAdequacy, so the replacement disambiguates with "Liquid Reserves"; sovereignFiscalBuffer → "Sovereign Wealth" per the methodology doc H4 heading. Also added a regression guard — new test asserts EVERY id in RESILIENCE_DIMENSION_ORDER resolves to a non-id label. Any future dimension that ships without a matching DIMENSION_LABELS entry now fails CI loudly instead of leaking the ID into the UI. Tests: 502/502 resilience tests pass (+1 new coverage check). Typecheck clean on both configs. * fix(resilience): PR 2 review — remove dead IMPUTE.recoveryReserveAdequacy entry Greptile P2: the retired scoreReserveAdequacy stub no longer reads from IMPUTE (it hardcodes coverage=0 / imputationClass=null per the retirement pattern), making IMPUTE.recoveryReserveAdequacy dead code. Removed the entry + added a breadcrumb comment pointing at the replacement IMPUTE.recoveryLiquidReserveAdequacy. The second P2 (bootstrap.js not wired) is a deliberate non-goal — the reviewer explicitly flags "for visibility" since it's tracked in the PR body. No action this commit; bootstrap wiring lands alongside the SEED_META graduation after the ~7-day Railway-cron bake-in. Tests: 502/502 resilience tests still pass. Typecheck clean.
109 lines
5.3 KiB
TypeScript
109 lines
5.3 KiB
TypeScript
import assert from 'node:assert/strict';
|
|
import { describe, it } from 'node:test';
|
|
|
|
import { RESILIENCE_DIMENSION_ORDER } from '../server/worldmonitor/resilience/v1/_dimension-scorers.ts';
|
|
import { INDICATOR_REGISTRY } from '../server/worldmonitor/resilience/v1/_indicator-registry.ts';
|
|
import type { IndicatorSpec } from '../server/worldmonitor/resilience/v1/_indicator-registry.ts';
|
|
|
|
describe('indicator registry', () => {
|
|
it('covers all 21 dimensions (19 active + 2 retired)', () => {
|
|
const coveredDimensions = new Set(INDICATOR_REGISTRY.map((i) => i.dimension));
|
|
for (const dimId of RESILIENCE_DIMENSION_ORDER) {
|
|
assert.ok(coveredDimensions.has(dimId), `${dimId} has no indicators in registry`);
|
|
}
|
|
assert.equal(coveredDimensions.size, 21);
|
|
});
|
|
|
|
it('has no duplicate indicator ids', () => {
|
|
const ids = INDICATOR_REGISTRY.map((i) => i.id);
|
|
const unique = new Set(ids);
|
|
assert.equal(ids.length, unique.size, `duplicate ids: ${ids.filter((id, idx) => ids.indexOf(id) !== idx).join(', ')}`);
|
|
});
|
|
|
|
it('every indicator has valid direction and positive weight', () => {
|
|
for (const spec of INDICATOR_REGISTRY) {
|
|
assert.ok(['higherBetter', 'lowerBetter'].includes(spec.direction), `${spec.id} has invalid direction: ${spec.direction}`);
|
|
assert.ok(spec.weight > 0, `${spec.id} has non-positive weight: ${spec.weight}`);
|
|
}
|
|
});
|
|
|
|
it('every indicator has valid cadence and scope', () => {
|
|
const validCadences = new Set(['realtime', 'daily', 'weekly', 'monthly', 'quarterly', 'annual']);
|
|
const validScopes = new Set(['global', 'curated']);
|
|
for (const spec of INDICATOR_REGISTRY) {
|
|
assert.ok(validCadences.has(spec.cadence), `${spec.id} has invalid cadence: ${spec.cadence}`);
|
|
assert.ok(validScopes.has(spec.scope), `${spec.id} has invalid scope: ${spec.scope}`);
|
|
}
|
|
});
|
|
|
|
it('goalposts worst != best for every indicator', () => {
|
|
for (const spec of INDICATOR_REGISTRY) {
|
|
assert.notEqual(spec.goalposts.worst, spec.goalposts.best, `${spec.id} has worst === best (${spec.goalposts.worst})`);
|
|
}
|
|
});
|
|
|
|
it('imputation entries have valid type, score in [0,100], certainty in (0,1]', () => {
|
|
const withImputation = INDICATOR_REGISTRY.filter((i): i is IndicatorSpec & { imputation: NonNullable<IndicatorSpec['imputation']> } => i.imputation != null);
|
|
assert.ok(withImputation.length > 0, 'expected at least one indicator with imputation');
|
|
for (const spec of withImputation) {
|
|
assert.ok(['absenceSignal', 'conservative'].includes(spec.imputation.type), `${spec.id} has invalid imputation type`);
|
|
assert.ok(spec.imputation.score >= 0 && spec.imputation.score <= 100, `${spec.id} imputation score out of range`);
|
|
assert.ok(spec.imputation.certainty > 0 && spec.imputation.certainty <= 1, `${spec.id} imputation certainty out of range`);
|
|
}
|
|
});
|
|
|
|
it('every dimension has non-experimental weights that sum to ~1.0', () => {
|
|
// Weight-sum invariant applies to the CURRENTLY-ACTIVE indicator
|
|
// set only. Indicators at tier='experimental' are flag-gated
|
|
// / in-progress work (e.g. the PR 1 v2 energy construct lands
|
|
// behind RESILIENCE_ENERGY_V2_ENABLED; until the flag flips,
|
|
// these indicators are NOT part of the live score and their
|
|
// weights must not be counted against the 1.0 invariant).
|
|
const byDimension = new Map<string, IndicatorSpec[]>();
|
|
for (const spec of INDICATOR_REGISTRY) {
|
|
if (spec.tier === 'experimental') continue;
|
|
const list = byDimension.get(spec.dimension) ?? [];
|
|
list.push(spec);
|
|
byDimension.set(spec.dimension, list);
|
|
}
|
|
for (const [dimId, specs] of byDimension) {
|
|
const totalWeight = specs.reduce((sum, s) => sum + s.weight, 0);
|
|
assert.ok(
|
|
Math.abs(totalWeight - 1) < 0.01,
|
|
`${dimId} non-experimental weights sum to ${totalWeight.toFixed(4)}, expected ~1.0`,
|
|
);
|
|
}
|
|
});
|
|
|
|
it('experimental weights are bounded at or below 1.0 per dimension', () => {
|
|
// Loose invariant for experimental indicators. A dimension's
|
|
// experimental set may only carry PART of the post-promotion
|
|
// weight — if some legacy indicators are RETAINED across the
|
|
// construct-repair (e.g. PR 1 retains energyPriceStress at a
|
|
// different weight and renames gasStorageStress to
|
|
// euGasStorageStress, both already in the non-experimental set),
|
|
// the experimental-only subsum will be < 1.0.
|
|
//
|
|
// Post-promotion weight-sum correctness for flag-gated indicator
|
|
// sets is the SCORER's responsibility to verify (via the flag-on
|
|
// behavioural tests in resilience-energy-v2.test.mts), not the
|
|
// registry's. This test enforces only the upper bound: no
|
|
// dimension should accumulate experimental weight in excess of
|
|
// the total it will eventually ship under the flag.
|
|
const byDimension = new Map<string, IndicatorSpec[]>();
|
|
for (const spec of INDICATOR_REGISTRY) {
|
|
if (spec.tier !== 'experimental') continue;
|
|
const list = byDimension.get(spec.dimension) ?? [];
|
|
list.push(spec);
|
|
byDimension.set(spec.dimension, list);
|
|
}
|
|
for (const [dimId, specs] of byDimension) {
|
|
const experimentalWeight = specs.reduce((sum, s) => sum + s.weight, 0);
|
|
assert.ok(
|
|
experimentalWeight <= 1.0 + 0.01,
|
|
`${dimId} experimental weights sum to ${experimentalWeight.toFixed(4)}, must not exceed 1.0`,
|
|
);
|
|
}
|
|
});
|
|
});
|