feat(resilience): PR 2 §3.4 recovery-domain weight rebalance (#3328)

* feat(resilience): PR 2 §3.4 recovery-domain weight rebalance Dials the two PR 2 §3.4 recovery dims (liquidReserveAdequacy, sovereignFiscalBuffer) to ~10% share each of the recovery-domain score via a new per-dimension weight channel in the coverage-weighted mean. Matches the plan's direction that the sovereign-wealth signal complement — rather than dominate — the classical liquid-reserves and fiscal-space signals. Implementation - RESILIENCE_DIMENSION_WEIGHTS: new Record<ResilienceDimensionId, number> alongside RESILIENCE_DOMAIN_WEIGHTS. Every dim has an explicit entry (default 1.0) so rebalance decisions stay auditable; the two new recovery dims carry 0.5 each. Share math at full coverage (6 active recovery dims): weight sum = 4 × 1.0 + 2 × 0.5 = 5.0 each new-dim share = 0.5 / 5.0 = 0.10 ✓ each core-dim share = 1.0 / 5.0 = 0.20 Retired dims (reserveAdequacy, fuelStockDays) keep weight 1.0 in the map; their coverage=0 neutralizes them at the coverage channel regardless. Explicit entries guard against a future scorer bug accidentally returning coverage>0 for a retired dim and falling through the `?? 1.0` default — every retirement decision is now tied to a single explicit source of truth. - coverageWeightedMean (_shared.ts): refactored to apply `coverage × dimWeight` per dim instead of `coverage` alone. Backward- compatible when all weights default to 1.0 (reduces to the original mean). All three aggregation callers — buildDomainList, baseline- Score, stressScore — pick up the weighting transparently. Test coverage 1. New `tests/resilience-recovery-weight-rebalance.test.mts`: pins the per-dim weight values, asserts the share math (0.10 new / 0.20 core), verifies completeness of the weight map, and documents why retired dims stay in the map at 1.0. 2. New `tests/resilience-recovery-ordering.test.mts`: fixture-based Spearman-proxy sensitivity check. Asserts NO > US > YE ordering preserved on both the overall score and the recovery-domain subscore after the rebalance. (Live post-merge Spearman rerun against the PR 0 snapshot is tracked as a follow-up commit.) 3. resilience-scorers.test.mts fixture anchors updated in lockstep: baselineScore: 60.35 → 62.17 (low-scoring liquidReserveAdequacy + partial-coverage SWF now contribute ~half the weight) overallScore: 63.60 → 64.39 (recovery subscore lifts by ~3 pts from the rebalance, overall by ~0.79) recovery flat mean: 48.75 (unchanged — flat mean doesn't apply weights by design; documents the coverage-weighted diff) Local coverageWeightedMean helper in the test mirrors the production implementation (weights applied per dim). Methodology doc - New "Per-dimension weights in the recovery domain" subsection with the weight table and a sentence explaining the cap. Cross-references the source of truth (RESILIENCE_DIMENSION_WEIGHTS). Deliberate non-goals - Live post-merge Spearman ≥0.85 check against the PR 0 baseline snapshot. Fixture ordering is preserved (new ordering test); the live-data check runs after Railway cron refreshes the rankings on the new weights and commits docs/snapshots/resilience-ranking-live- post-pr2-<date>.json. Tracked as the final piece of PR 2 §3.4 alongside the health.js / bootstrap graduation (waiting on the 7-day Railway cron bake-in window). Tests: 6588/6588 data-tier tests pass. Typecheck clean on both tsconfig configs. Biome clean on touched files. NO > US > YE fixture ordering preserved. * fix(resilience): PR 2 review — thread RESILIENCE_DIMENSION_WEIGHTS through the comparison harness Greptile P2: the operator comparison harness (scripts/compare-resilience-current-vs-proposed.mjs) claims its domain scores "mirror the production scorer's coverage-weighted mean" and is the artifact generator for Spearman / rank-delta acceptance decisions. After PR 2 §3.4's weight rebalance, the production mirror diverged — production now applies RESILIENCE_DIMENSION_WEIGHTS (liquidReserveAdequacy = 0.5, sovereignFiscalBuffer = 0.5) inside coverageWeightedMean, but the harness still used equal-weight aggregation. Left unfixed, post-merge Spearman / rank-delta diagnostics would compare live API scores (with the 0.5 recovery weights) against harness predictions that assume equal-share dims — silently biasing every acceptance decision until someone noticed a country's rank- delta didn't track. Fix - Mirrored coverageWeightedMean now accepts dimensionWeights and applies `coverage × weight` per dim, matching _shared.ts exactly. - Mirrored buildDomainList accepts + forwards dimensionWeights. - main() imports RESILIENCE_DIMENSION_WEIGHTS from the scorer module and passes it through to buildDomainList at the single call site. - Missing-entry default = 1.0 (same contract as production) — makes the harness forward-compatible with any future weight refactor (adds a new dim without an explicit entry, old production fallback path still produces the correct number). Verification - Harness syntax-check clean (node -c). - RESILIENCE_DIMENSION_WEIGHTS import resolves correctly from the harness's import path. - 509/509 resilience tests still pass (harness isn't in the test suite; the invariant is that production ↔ harness use the same math, and the production side is covered by tests/resilience- recovery-weight-rebalance.test.mts). * fix(resilience): PR 2 review — bump cache prefixes v10→v11 + document coverage-vs-weight asymmetry Greptile P1 + P2 on PR #3328. P1 — cache prefix not bumped after formula change -------------------------------------------------- The per-dim weight rebalance changes the score formula, but the `_formula` tag only distinguishes 'd6' vs 'pc' (pillar-combined vs legacy 6-domain) — it does NOT detect intra-'d6' weight changes. Left unfixed, scores cached before deploy would be served with the old equal-weight math for up to the full 6h TTL, and the ranking key for up to its 12h TTL. Matches the established v9→v10 pattern for every prior formula-changing deploy. Bumped in lockstep: - RESILIENCE_SCORE_CACHE_PREFIX: v10 → v11 - RESILIENCE_RANKING_CACHE_KEY: v10 → v11 - RESILIENCE_HISTORY_KEY_PREFIX: v5 → v6 - scripts/seed-resilience-scores.mjs local mirrors - api/health.js resilienceRanking literal - 4 analysis/backtest scripts that read the cached keys directly - Test fixtures in resilience-{ranking, handlers, scores-seed, pillar-aggregation}.test.* that assert on literal key values The v5→v6 history bump is the critical one: without it, pre-rebalance history points would mix with post-rebalance points inside the 30-day window, and change30d / trend math would diff values from different formulas against each other, producing false-negative "falling" trends for every country across the deploy window. P2 — coverage-vs-weight asymmetry in computeLowConfidence / computeOverallCoverage ---------------------------------------------------------------------------------- Reviewer flagged that these two functions still average coverage equally across all non-retired dims, even after the scoring aggregation started applying RESILIENCE_DIMENSION_WEIGHTS. The asymmetry is INTENTIONAL — these signals answer a different question from scoring: scoring aggregation: "how much does each dim matter to the score?" coverage signal: "how much real data do we have on this country?" A dim at weight 0.5 still has the same data-availability footprint as a weight=1.0 dim: its coverage value reflects whether we successfully fetched the upstream source, not whether the scorer cares about it. Applying scoring weights to the coverage signal would let a half-weight dim hide half its sparsity from the overallCoverage pill, misleading users reading coverage as a data-quality indicator. Added explicit comments to both functions noting the asymmetry is deliberate and pointing at the other site for matching rationale. No code change — just documentation. Tests: 6588/6588 data-tier tests pass (+511 resilience-specific including the prefix-literal assertions). Typecheck clean on both tsconfig configs. Biome clean on touched files. * docs(resilience): bump methodology doc cache-prefix references to v11/v6 Greptile P2 on PR #3328: Redis keys table in the reproducibility appendix still published `score:v10` / `ranking:v10` / `history:v5`, and the rollback instructions told operators to flush those keys. After the recovery-domain weight rebalance, live cache runs at `score:v11` / `ranking:v11` / `history:v6`. - Updated the Redis keys table (line 490-492) to match `_shared.ts`. - Updated the rollback block to name the current keys. - Left the historical "Activation sequence" narrative intact (it accurately describes the pillar-combine PR's v9→v10 / v4→v5 bump) but added a parenthetical pointing at the current v11/v6 values. No code change — doc-only correction for operator accuracy. * fix(docs): escape MDX-unsafe `<137` pattern to unblock Mintlify deploy Line 643 had `(<137 countries)` — MDX parses `<137` as a JSX tag starting with digit `1`, which is illegal and breaks the deploy with "Unexpected character \`1\` (U+0031) before name". Surfaced after the prior cache-prefix commit forced Mintlify to re-parse this file. Replaced with "fewer than 137 countries" for unambiguous rendering. Other `<` occurrences in this doc (lines 34, 642) are followed by whitespace and don't trip MDX's tag parser.
2026-04-25 17:14:57 +02:00 · 2026-04-23 10:25:18 +04:00
parent fe0e13b99e
commit d3d406448a
17 changed files with 463 additions and 91 deletions
--- a/docs/methodology/country-resilience-index.mdx
+++ b/docs/methodology/country-resilience-index.mdx
@@ -227,6 +227,31 @@ All six WGI indicators are equally weighted.

 This domain forms the recovery-capacity pillar. It measures a country's ability to bounce back from an acute shock along fiscal, monetary, trade, institutional, and energy dimensions.

+**Per-dimension weights in the recovery domain (PR 2 §3.4).** Four
+core recovery dimensions (`fiscalSpace`, `externalDebtCoverage`,
+`importConcentration`, `stateContinuity`) carry the default weight
+`1.0`. The two PR 2 §3.4 replacements for the retired `reserveAdequacy`
+carry weight `0.5` each:
+
+| Dimension | Weight | Share at full coverage |
+|---|---:|---:|
+| fiscalSpace | 1.0 | 20% |
+| externalDebtCoverage | 1.0 | 20% |
+| importConcentration | 1.0 | 20% |
+| stateContinuity | 1.0 | 20% |
+| liquidReserveAdequacy | 0.5 | 10% |
+| sovereignFiscalBuffer | 0.5 | 10% |
+
+The `0.5` weight on the two new dims caps their combined contribution
+to the recovery score at ~20%, matching the plan's direction that the
+sovereign-wealth signal complement — rather than dominate — the
+classical liquid-reserves and fiscal-space signals. The weights are
+applied via `RESILIENCE_DIMENSION_WEIGHTS` in
+`server/worldmonitor/resilience/v1/_dimension-scorers.ts`;
+`coverageWeightedMean` in `_shared.ts` multiplies each dim's coverage
+by its weight before computing the domain average, so a dim with
+`coverage=0` (retirement) still contributes zero regardless of weight.
+
 #### Fiscal Space

 | Indicator | Description | Direction | Goalposts (worst-best) | Weight | Source | Cadence |
@@ -462,9 +487,9 @@ The CRI is designed to be auditable end-to-end: given the Redis snapshot at any

 | Key | Type | TTL | Written by | Read by |
 |---|---|---|---|---|
-| `resilience:score:v10:{countryCode}` | JSON | 6 hours | `buildResilienceScore` in `server/worldmonitor/resilience/v1/_shared.ts` | `getResilienceScore` handler |
-| `resilience:ranking:v10` | JSON | 6 hours | `buildResilienceRanking`, only when all countries are scored | `getResilienceRanking` handler |
-| `resilience:history:v5:{countryCode}` | sorted set | indefinite, trimmed to 30 days | `appendHistory` during scoring | trend and `change30d` computation |
+| `resilience:score:v11:{countryCode}` | JSON | 6 hours | `buildResilienceScore` in `server/worldmonitor/resilience/v1/_shared.ts` | `getResilienceScore` handler |
+| `resilience:ranking:v11` | JSON | 6 hours | `buildResilienceRanking`, only when all countries are scored | `getResilienceRanking` handler |
+| `resilience:history:v6:{countryCode}` | sorted set | indefinite, trimmed to 30 days | `appendHistory` during scoring | trend and `change30d` computation |
 | `resilience:intervals:v1:{countryCode}` | JSON | 6 hours | `scripts/seed-resilience-intervals.mjs` | `getResilienceScore` (optional `scoreInterval` field) |
 | `seed-meta:resilience:static` | JSON | 2 hours | `scripts/seed-resilience-static.mjs` at the end of each successful seed run | scorer for `dataVersion` population, health checks |
 | `resilience:static:{countryCode}` | JSON | 400 days | `scripts/seed-resilience-static.mjs` | scorer for all baseline signals (WGI, WHO, FAO, GPI, RSF, and so on) |
@@ -573,12 +598,12 @@ The plan's non-compensatory pillar combine is the methodologically stronger form
 **Activation sequence**: the rank-stability evidence supports flipping the default — there is no statistical reason to keep the legacy compensatory form. The blocker is messaging: publishing "US = 54.50" the day after publishing "US = 68.26" without a methodology note would look like a regression instead of a rigor upgrade. The pillar-combine activation PR wires the following so the flip is a single env-var change with no code deploy required:

 1. **Feature flag**: `RESILIENCE_PILLAR_COMBINE_ENABLED`, read dynamically from `process.env` per call. Default `false`. Set to `true` in Vercel env + Railway env to activate.
-2. **Cache invalidation**: per-country score cache bumped from `resilience:score:v9:` to `resilience:score:v10:`, ranking cache bumped from `resilience:ranking:v9` to `resilience:ranking:v10`, and score-history bumped from `resilience:history:v4:` to `resilience:history:v5:`. The version bumps are a clean-slate guard; the actual cross-formula isolation is the `_formula` tag written into every cached score / ranking payload and the `:d6` / `:pc` suffix on every history sorted-set member, checked at read time so a flag flip forces a rebuild without waiting for TTLs.
+2. **Cache invalidation**: per-country score cache bumped from `resilience:score:v9:` to `resilience:score:v10:`, ranking cache bumped from `resilience:ranking:v9` to `resilience:ranking:v10`, and score-history bumped from `resilience:history:v4:` to `resilience:history:v5:` (subsequently bumped to `resilience:score:v11:`, `resilience:ranking:v11`, and `resilience:history:v6:` in the recovery-domain weight rebalance — see the Redis keys table above for current values). The version bumps are a clean-slate guard; the actual cross-formula isolation is the `_formula` tag written into every cached score / ranking payload and the `:d6` / `:pc` suffix on every history sorted-set member, checked at read time so a flag flip forces a rebuild without waiting for TTLs.
 3. **Methodology-aware level thresholds**: `classifyResilienceLevel` reads `isPillarCombineEnabled()` and switches the high/medium cutoffs from 70/40 (6-domain) to 60/30 (pillar-combined). Without this, scale compression alone would demote FI (75.64 → 68.60) and NZ (76.26 → 67.93) from "high" to "medium" purely because the formula changed, not because anything about the country changed. The re-anchored cutoffs preserve the qualitative label for every country whose old label was correct.
 4. **Re-anchored release-gate bands**: `tests/resilience-pillar-combine-activation.test.mts` pins high-band anchors (NO, CH, DK) at ≥ 60 (vs the 6-domain formula's ≥ 70 floor) and low-band anchors (YE, SO) at ≤ 40 (vs ≤ 45). The snapshot test reads `methodologyFormula` from each snapshot and applies the matching bands. The live sample numbers confirm the bands hold with margin: NO proposed ≈ 71.59 (≥ 60 by 11 points), YE ≈ 27.36 (≤ 40 by 13 points).
 5. **Projected snapshot**: `docs/snapshots/resilience-ranking-pillar-combined-projected-2026-04-21.json` carries the top/bottom/major-economies tables at the proposed formula so reviewers can preview the post-activation ranking before flipping the flag. Once the flag is on in production, run `scripts/freeze-resilience-ranking.mjs` to capture the authoritative full-universe snapshot.

-Rollback: set `RESILIENCE_PILLAR_COMBINE_ENABLED=false`, flush the `resilience:score:v10:*`, `resilience:ranking:v10`, and `resilience:history:v5:*` keys (or wait for TTLs to expire). The 6-domain formula lives alongside the pillar combine in `_shared.ts` and needs no code change to come back.
+Rollback: set `RESILIENCE_PILLAR_COMBINE_ENABLED=false`, flush the current `resilience:score:v11:*`, `resilience:ranking:v11`, and `resilience:history:v6:*` keys (or wait for TTLs to expire). The 6-domain formula lives alongside the pillar combine in `_shared.ts` and needs no code change to come back.

 Until operators set the flag, `overall_score` remains the 6-domain weighted aggregate documented above.

@@ -615,7 +640,7 @@ Self-assessed against the standard composite-indicator review axes on a 0-10 sca
 - **§3.5 point 1 — `fuelStockDays` permanently retired from the core score.** IEA/EIA fuel-stock disclosure covers ~45 OECD-member countries; every other country was imputed `unmonitored`. `scoreFuelStockDays` now pins at `score=50, coverage=0, imputationClass=null` for every country. Coverage-weighted domain aggregation excludes it (coverage=0 contributes zero weight), and user-facing confidence / coverage averages exclude it via the `RESILIENCE_RETIRED_DIMENSIONS` registry filter (distinct from non-retired runtime coverage=0 entries, which must keep dragging confidence down — that is the sparse-data signal). `imputationClass=null` (not `source-failure`) because retirement is structural, not a runtime outage; `source-failure` would render a false "Source down" label in the widget on every country. The `recoveryFuelStockDays` registry entry remains (tier=`experimental`) so the data surfaces on IEA-member drill-downs. Re-retention requires a globally-comparable strategic-reserve disclosure concept (>180 countries) to emerge.
 - **§3.5 point 2 — `currencyExternal` rebuilt on IMF inflation + WB reserves.** BIS REER / DSR covered only the 64 BIS-reporting economies; the old composite fell through to curated_list_absent (coverage 0.3) or a thin IMF proxy (coverage 0.45) for ~130 of 195 countries. New dimension: `inflationStability` (IMF WEO headline inflation, weight 0.60) + `fxReservesAdequacy` (WB reserves in months, weight 0.40). Coverage ladder: both=0.85, inflation-only=0.55, reserves-only=0.40, neither=0.30. Legacy `fxVolatility` + `fxDeviation` kept as `tier='experimental'` on country drill-downs for the 64 BIS economies.
 - **§3.5 point 3 — `externalDebtCoverage` re-goalposted from (0..5) to (0..2).** The old goalpost made ratios < 0.5 all score above 90, saturating at 100 across the full 9-country probe (including stressed states). New goalpost is anchored on Greenspan-Guidotti: ratio=1.0 (short-term debt matches reserves = reserve inadequacy threshold) → score 50; ratio=2.0 (double the threshold = acute rollover-shock exposure) → score 0. Ratios above 2.0 clamp to 0.
- **§3.6 — Coverage-and-influence gate on indicator weight.** `tests/resilience-coverage-influence-gate.test.mts` fails the build if any core indicator with observed coverage below 70% of the ~195-country universe (<137 countries) carries more than 5% nominal weight in the overall score. The effective-influence half (variance-explained, Pearson-derivative) runs through `scripts/validate-resilience-sensitivity.mjs` and is committed as an artifact per plan §5 acceptance-criterion 9.
+- **§3.6 — Coverage-and-influence gate on indicator weight.** `tests/resilience-coverage-influence-gate.test.mts` fails the build if any core indicator with observed coverage below 70% of the ~195-country universe (fewer than 137 countries) carries more than 5% nominal weight in the overall score. The effective-influence half (variance-explained, Pearson-derivative) runs through `scripts/validate-resilience-sensitivity.mjs` and is committed as an artifact per plan §5 acceptance-criterion 9.
 - **Acceptance gates (plan §6):** Spearman vs prior-state >= 0.85, no country swings >5 points from PR 1 state (plan §3.5 deliverable row 4), all release-gate anchors hold, matched-pair directions verified. Sensitivity rerun and post-PR-3 snapshot committed as `docs/snapshots/resilience-ranking-live-post-pr3-<date>.json` at flag-flip/ranking-refresh time.
 - **Construct-audit updates:** `docs/methodology/indicator-sources.yaml` updates `recoveryDebtToReserves.constructStatus` from `dead-signal` to `observed-mechanism` citing the Greenspan-Guidotti anchor.