eliott/worldmonitor - worldmonitor - lab48

eliott/worldmonitor

mirror of https://github.com/koala73/worldmonitor.git synced 2026-04-25 17:14:57 +02:00

Author	SHA1	Message	Date
Elie Habib	d3d406448a	feat(resilience): PR 2 §3.4 recovery-domain weight rebalance (#3328 ) * feat(resilience): PR 2 §3.4 recovery-domain weight rebalance Dials the two PR 2 §3.4 recovery dims (liquidReserveAdequacy, sovereignFiscalBuffer) to ~10% share each of the recovery-domain score via a new per-dimension weight channel in the coverage-weighted mean. Matches the plan's direction that the sovereign-wealth signal complement — rather than dominate — the classical liquid-reserves and fiscal-space signals. Implementation - RESILIENCE_DIMENSION_WEIGHTS: new Record<ResilienceDimensionId, number> alongside RESILIENCE_DOMAIN_WEIGHTS. Every dim has an explicit entry (default 1.0) so rebalance decisions stay auditable; the two new recovery dims carry 0.5 each. Share math at full coverage (6 active recovery dims): weight sum = 4 × 1.0 + 2 × 0.5 = 5.0 each new-dim share = 0.5 / 5.0 = 0.10 ✓ each core-dim share = 1.0 / 5.0 = 0.20 Retired dims (reserveAdequacy, fuelStockDays) keep weight 1.0 in the map; their coverage=0 neutralizes them at the coverage channel regardless. Explicit entries guard against a future scorer bug accidentally returning coverage>0 for a retired dim and falling through the `?? 1.0` default — every retirement decision is now tied to a single explicit source of truth. - coverageWeightedMean (_shared.ts): refactored to apply `coverage × dimWeight` per dim instead of `coverage` alone. Backward- compatible when all weights default to 1.0 (reduces to the original mean). All three aggregation callers — buildDomainList, baseline- Score, stressScore — pick up the weighting transparently. Test coverage 1. New `tests/resilience-recovery-weight-rebalance.test.mts`: pins the per-dim weight values, asserts the share math (0.10 new / 0.20 core), verifies completeness of the weight map, and documents why retired dims stay in the map at 1.0. 2. New `tests/resilience-recovery-ordering.test.mts`: fixture-based Spearman-proxy sensitivity check. Asserts NO > US > YE ordering preserved on both the overall score and the recovery-domain subscore after the rebalance. (Live post-merge Spearman rerun against the PR 0 snapshot is tracked as a follow-up commit.) 3. resilience-scorers.test.mts fixture anchors updated in lockstep: baselineScore: 60.35 → 62.17 (low-scoring liquidReserveAdequacy + partial-coverage SWF now contribute ~half the weight) overallScore: 63.60 → 64.39 (recovery subscore lifts by ~3 pts from the rebalance, overall by ~0.79) recovery flat mean: 48.75 (unchanged — flat mean doesn't apply weights by design; documents the coverage-weighted diff) Local coverageWeightedMean helper in the test mirrors the production implementation (weights applied per dim). Methodology doc - New "Per-dimension weights in the recovery domain" subsection with the weight table and a sentence explaining the cap. Cross-references the source of truth (RESILIENCE_DIMENSION_WEIGHTS). Deliberate non-goals - Live post-merge Spearman ≥0.85 check against the PR 0 baseline snapshot. Fixture ordering is preserved (new ordering test); the live-data check runs after Railway cron refreshes the rankings on the new weights and commits docs/snapshots/resilience-ranking-live- post-pr2-<date>.json. Tracked as the final piece of PR 2 §3.4 alongside the health.js / bootstrap graduation (waiting on the 7-day Railway cron bake-in window). Tests: 6588/6588 data-tier tests pass. Typecheck clean on both tsconfig configs. Biome clean on touched files. NO > US > YE fixture ordering preserved. * fix(resilience): PR 2 review — thread RESILIENCE_DIMENSION_WEIGHTS through the comparison harness Greptile P2: the operator comparison harness (scripts/compare-resilience-current-vs-proposed.mjs) claims its domain scores "mirror the production scorer's coverage-weighted mean" and is the artifact generator for Spearman / rank-delta acceptance decisions. After PR 2 §3.4's weight rebalance, the production mirror diverged — production now applies RESILIENCE_DIMENSION_WEIGHTS (liquidReserveAdequacy = 0.5, sovereignFiscalBuffer = 0.5) inside coverageWeightedMean, but the harness still used equal-weight aggregation. Left unfixed, post-merge Spearman / rank-delta diagnostics would compare live API scores (with the 0.5 recovery weights) against harness predictions that assume equal-share dims — silently biasing every acceptance decision until someone noticed a country's rank- delta didn't track. Fix - Mirrored coverageWeightedMean now accepts dimensionWeights and applies `coverage × weight` per dim, matching _shared.ts exactly. - Mirrored buildDomainList accepts + forwards dimensionWeights. - main() imports RESILIENCE_DIMENSION_WEIGHTS from the scorer module and passes it through to buildDomainList at the single call site. - Missing-entry default = 1.0 (same contract as production) — makes the harness forward-compatible with any future weight refactor (adds a new dim without an explicit entry, old production fallback path still produces the correct number). Verification - Harness syntax-check clean (node -c). - RESILIENCE_DIMENSION_WEIGHTS import resolves correctly from the harness's import path. - 509/509 resilience tests still pass (harness isn't in the test suite; the invariant is that production ↔ harness use the same math, and the production side is covered by tests/resilience- recovery-weight-rebalance.test.mts). * fix(resilience): PR 2 review — bump cache prefixes v10→v11 + document coverage-vs-weight asymmetry Greptile P1 + P2 on PR #3328. P1 — cache prefix not bumped after formula change -------------------------------------------------- The per-dim weight rebalance changes the score formula, but the `_formula` tag only distinguishes 'd6' vs 'pc' (pillar-combined vs legacy 6-domain) — it does NOT detect intra-'d6' weight changes. Left unfixed, scores cached before deploy would be served with the old equal-weight math for up to the full 6h TTL, and the ranking key for up to its 12h TTL. Matches the established v9→v10 pattern for every prior formula-changing deploy. Bumped in lockstep: - RESILIENCE_SCORE_CACHE_PREFIX: v10 → v11 - RESILIENCE_RANKING_CACHE_KEY: v10 → v11 - RESILIENCE_HISTORY_KEY_PREFIX: v5 → v6 - scripts/seed-resilience-scores.mjs local mirrors - api/health.js resilienceRanking literal - 4 analysis/backtest scripts that read the cached keys directly - Test fixtures in resilience-{ranking, handlers, scores-seed, pillar-aggregation}.test.* that assert on literal key values The v5→v6 history bump is the critical one: without it, pre-rebalance history points would mix with post-rebalance points inside the 30-day window, and change30d / trend math would diff values from different formulas against each other, producing false-negative "falling" trends for every country across the deploy window. P2 — coverage-vs-weight asymmetry in computeLowConfidence / computeOverallCoverage ---------------------------------------------------------------------------------- Reviewer flagged that these two functions still average coverage equally across all non-retired dims, even after the scoring aggregation started applying RESILIENCE_DIMENSION_WEIGHTS. The asymmetry is INTENTIONAL — these signals answer a different question from scoring: scoring aggregation: "how much does each dim matter to the score?" coverage signal: "how much real data do we have on this country?" A dim at weight 0.5 still has the same data-availability footprint as a weight=1.0 dim: its coverage value reflects whether we successfully fetched the upstream source, not whether the scorer cares about it. Applying scoring weights to the coverage signal would let a half-weight dim hide half its sparsity from the overallCoverage pill, misleading users reading coverage as a data-quality indicator. Added explicit comments to both functions noting the asymmetry is deliberate and pointing at the other site for matching rationale. No code change — just documentation. Tests: 6588/6588 data-tier tests pass (+511 resilience-specific including the prefix-literal assertions). Typecheck clean on both tsconfig configs. Biome clean on touched files. * docs(resilience): bump methodology doc cache-prefix references to v11/v6 Greptile P2 on PR #3328: Redis keys table in the reproducibility appendix still published `score:v10` / `ranking:v10` / `history:v5`, and the rollback instructions told operators to flush those keys. After the recovery-domain weight rebalance, live cache runs at `score:v11` / `ranking:v11` / `history:v6`. - Updated the Redis keys table (line 490-492) to match `_shared.ts`. - Updated the rollback block to name the current keys. - Left the historical "Activation sequence" narrative intact (it accurately describes the pillar-combine PR's v9→v10 / v4→v5 bump) but added a parenthetical pointing at the current v11/v6 values. No code change — doc-only correction for operator accuracy. * fix(docs): escape MDX-unsafe `<137` pattern to unblock Mintlify deploy Line 643 had `(<137 countries)` — MDX parses `<137` as a JSX tag starting with digit `1`, which is illegal and breaks the deploy with "Unexpected character \`1\` (U+0031) before name". Surfaced after the prior cache-prefix commit forced Mintlify to re-parse this file. Replaced with "fewer than 137 countries" for unambiguous rendering. Other `<` occurrences in this doc (lines 34, 642) are followed by whitespace and don't trip MDX's tag parser.	2026-04-23 10:25:18 +04:00
Elie Habib	da0f26a3cf	feat(resilience): PR 0 diagnostic freeze + fairness-audit harness (no scoring changes) (#3284 ) * feat(resilience): PR 0 diagnostic freeze + fairness-audit harness Lands the before-state and measurement apparatus every subsequent resilience-scorer PR validates against. Zero scoring changes. Per the v3 plan at docs/plans/2026-04-22-001-fix-resilience-scorer-structural- bias-plan.md this is tranche 0 of five. What lands: - Construct contract published in the methodology doc: absolute resilience not development-adjusted, mechanism test for every indicator, peer-relative views published separately from the core. - Known construct limitations section: six construct errors scheduled for PR 1-3 repair with explicit mapping to plan tranches. - Indicator-source manifest at docs/methodology/indicator-sources.yaml with source, seriesId, seriesUrl, coveragePct, lastObservedYear, license, mechanismTestRationale, and a constructStatus classification. - Pre-repair ranking snapshot at docs/snapshots/resilience-ranking-live-pre-repair-2026-04-22.json (217 items + 5 greyedOut, captured 2026-04-22 08:38 UTC at commit `425507d15`). - Cohort configuration at tests/helpers/resilience-cohorts.mts: six cohorts covering 87 countries (net-fuel-exporters, net-energy- importers-oecd, nuclear-heavy-generation, coal-heavy-domestic, small-island-importers, fragile-states). - Matched-pair sanity panel at tests/helpers/resilience-matched-pairs.mts: six pairs (FR/DE, NO/CA, UAE/BH, JP/KR, IN/ZA, SG/CH) with expected- direction rationale and minGap for acceptance gate 7. - scripts/compare-resilience-current-vs-proposed.mjs extended to emit cohortSummary and matchedPairSummary alongside the existing output shape (backward compatible). - tests/resilience-cohort-config.test.mts: 11 validations ensuring the cohort + matched-pair configs stay well-formed. Deferred to PR 0.5 (before PR 1 lands): - Monotonicity test harness for all 19 dimension scorers pinning the sign of every indicator. - Pearson-derivative variable-influence baseline inside the sensitivity script producing the nominal-weight-vs-effective-influence table that plan acceptance gate 8 requires. Verification: typecheck:all clean, 430/430 resilience tests pass, 11/11 new cohort-config tests pass, snapshot auto-discovered and validated by the existing snapshot-test harness. * feat(resilience): PR 0 follow-ups — monotonicity harness, variable-influence baseline, cross-consumer formula gate Completes the PR 0 scope per the v3 plan §5 deliverables. Three adds: 1. Monotonicity test harness tests/resilience-dimension-monotonicity.test.mts pins the direction of movement for 14 indicators across 7 dimensions (reserve adequacy, fiscal space 3x, external debt coverage, import concentration, governance WGI, food/water 2x, energy 5x). Each test builds two synthetic ResilienceSeedReader fixtures differing only in the target indicator and asserts the dimension score moves in the documented direction. The scoreEnergy tests explicitly flag three indicators (gasShare, coalShare, electricityConsumption) that PR 1 §3.1-3.2 overturns so future readers understand which directional claims the plan intentionally replaces. 2. Variable-influence baseline scripts/compare-resilience-current-vs-proposed.mjs now computes per-dimension Pearson correlation against the current overallScore scaled by the dimension's nominal domain weight (a Pearson-derivative approximation of Sobol indices). The output carries a variableInfluence[] array sorted by abs(effectiveInfluence) desc. Acceptance gate 8 from the plan compares post-change effective influence against assigned nominal weight; divergences flag a wealth-proxy or saturated-signal construct problem. 3. Cross-consumer formula gate Five external consumers of resilience:score:v10:* now filter stale- formula entries so a flag flip does not serve mixed-formula data downstream: - server/worldmonitor/supply-chain/v1/get-route-impact.ts — readResilienceScore() checks _formula via the new getCurrentCacheFormula export and returns 0 on mismatch. - scripts/validate-resilience-correlation.mjs, scripts/validate-resilience-backtest.mjs, scripts/backtest-resilience-outcomes.mjs, scripts/benchmark-resilience-external.mjs — each inlines a currentCacheFormulaLocal() helper that mirrors the server's formula derivation from env, skips parsed entries whose _formula disagrees, and logs the skip count so operators can notice a mismatch during the flip window. A mixed-formula cohort (some countries d6-tagged, others pc-tagged) would confound every correlation, AUC, and Spearman this repair plan depends on for its acceptance gates. These guards close that gap. Verification: typecheck:all clean, 444/444 resilience tests pass (+14 from the new monotonicity harness). * fix(resilience): PR 0 review follow-ups — sample-union + doc tense Two review-driven fixes on top of PR 0. 1. scripts/compare-resilience-current-vs-proposed.mjs — the cohort and matched-pair summaries were computed against the historical 52-country sensitivity seed, which silently excluded the small-island-importers cohort (zero members in the seed) and the sg-vs-ch matched pair (Singapore not in the seed). With the current script those acceptance gates are partially measured at best. SAMPLE now = union(historical 52 seed, every cohort member, every matched-pair endpoint). The imports for RESILIENCE_COHORTS and MATCHED_PAIRS moved from inside main() to module scope so the union can be computed before the script runs. Net sample size grows from 52 to ~95 countries. Still fast enough for an interactive pass; makes the acceptance gates honest. 2. docs/methodology/country-resilience-index.mdx — the construct contract wording read as present-tense compliance ("Every indicator in the scorer passes a single mechanism test"), which contradicted the immediately-following passage about indicators that currently fail the test. Reworded to "is being evaluated against" and added an explicit PR-0-does-not-change-scoring paragraph that names the known-failing indicators (electricityConsumption, gas/coal flat penalties, WHO per-capita health spend) and points at the repair plan for the replacement schedule. Verification: typecheck:all clean, 444/444 resilience tests pass. * fix(resilience): compare-script loads frozen baseline + emits per-indicator influence Addresses two P1 review findings on PR #3284: 1. Script previously compared current-6d vs proposed-pillar-combined from the SAME checkout; never loaded the frozen pre-PR-0 baseline, so acceptance gates 2/6/7 ("no country moved >15pts vs baseline", cohort median shift vs baseline, matched-pair gap change vs baseline) could not be enforced for later scorer PRs. Now auto-discovers the most recent resilience-ranking-live-pre-repair-<date>.json (or post-<pr>-<date>) in docs/snapshots/ and emits a baselineComparison block with: spearmanVsBaseline, maxCountryAbsDelta, biggestDriftsVsBaseline, cohortShiftVsBaseline, matchedPairGapChange. If no baseline is found, the block is emitted with status 'unavailable' so callers distinguish missing-baseline from passed-baseline. 2. variableInfluence was emitted only at the dimension level, which hid the exact sub-indicators the repair plan targets (electricityConsumption, gasShare, coalShare, etc.) inside their parent dimension. Added extractIndicatorValues() which pulls twelve construct-risk indicators per country from the shared memoized reader, then computes per-indicator Pearson correlation against the current overall score. Emitted as perIndicatorInfluence[], sorted by absolute effective influence. Acceptance gate 8 ("effective influence agrees in sign and rank-order with assigned nominal weights") is now computable at the indicator level, not only at the dimension level. No production code touched; diagnostic-harness only. * fix(resilience): baseline-snapshot selection by structured parse, not filename sort Addresses P1 review on compare-resilience-current-vs-proposed.mjs:118-130. Plain filename sort breaks the "immediate-prior state" contract two ways: 1. Lexical ordering: `pre-repair` sorts after `post-` (`pr...` to 'r' > 'o'), so the PR-0 freeze would keep winning even after post-PR snapshots exist. Later scorer PRs would then report acceptance-gate deltas against the original pre-repair freeze instead of the immediately-prior post-PR-(N-1) snapshot — the gate would appear valid while measuring against the wrong baseline. 2. Lexical ordering: `pr10` < `pr9` (digit-by-digit), so PR-10 would lose the selection to PR-9. Fix: parseBaselineSnapshotMeta() extracts (kind, prNumber, date) from the filename. Sort keys are (kindRank desc, prNumber desc, date desc): - post always beats pre-repair (kindRank 1 vs 0) - among posts, prNumber compared numerically (10 beats 9) - date breaks ties (same-PR re-snapshots, later capture wins) - unlabeled post tags get prNumber 0 so they sort between pre-repair and any numbered PR snapshot Surfaced in output: baselineKind / baselinePrNumber / baselineDate alongside baselineFile so the operator can verify which snapshot was selected without having to reopen the file. Module now isMain-guarded per feedback_seed_isMain_guard memory so tests can import parseBaselineSnapshotMeta without firing the scoring run. Added tests/resilience-baseline-snapshot-ordering.test.mjs (9 tests) pinning the ordering contract for every known failure mode. Diagnostic-harness change only. No production code touched. fix(resilience): full scorable universe + registry-driven per-indicator influence Addresses two fresh P1 review findings on the PR 0 compare harness. Finding 1 — acceptance math ran on a curated ~95-country sample, so plan gate 2 could miss large regressions in excluded countries. - Main scoring loop now iterates the FULL scorable universe (listScorableCountries()), not the 52-country seed + cohort union. - Removed SAMPLE / HISTORICAL_SENSITIVITY_SEED constants. - Added scorableUniverseSize + cohortMissingFromScorable to output so operators see universe size and any cohort/pair endpoint that listScorable refuses to score (fail-loud, not silent drop). Finding 3 — per-indicator influence was a hand-picked 12-indicator subset, hiding most registry indicators from the baseline that later scorer PRs need. - Extraction is now driven by INDICATOR_REGISTRY. Every Core + Enrichment indicator gets a row with explicit extractionStatus: implemented \| not-implemented (with reason) \| unregistered-in-harness - EXTRACTION_RULES covers 40/59 indicators across 11 shape families (static-path, static-wb-infrastructure, static-wgi, static-wgi-mean, static-who, energy-mix-field, gas-storage-field, recovery-country- field, imf-macro/labor-country-field, national-debt, sanctions-count). - Remaining 19 indicators need either a scorer trace hook (PR 0.5) or a safe aggregation duplicate; each carries a reason string. - extractionCoverage summary (totalIndicators / implemented / notImplemented / unregisteredInHarness / coreImplemented / coreTotal) exposed in output so PR 0.5 progress is measurable. Added tests/resilience-indicator-extraction-plan.test.mjs (11 tests) pinning: every registry entry has an extraction row; not-implemented rows carry a reason; all 12 plan-named construct-risk indicators stay extractable; Core-tier coverage floor of 45%; shape-family unit tests. Diagnostic-harness change only. No production code touched. * fix(resilience): wire event-aggregate per-indicator influence via exported scorer helpers Addresses P1 review on PR 0 compare harness. Previous commit marked 16 Core-tier indicators as 'not-implemented' because they needed scorer event-window/severity-weighting math; that left the gate-9 acceptance apparatus incomplete for a large part of the shipped score. Fix: export the scorer-internal aggregation helpers so the harness calls them directly. Zero aggregation math duplicated in the harness, harness and scorer cannot drift. Exported from _dimension-scorers.ts (purely additive): summarizeCyber, summarizeOutages, summarizeGps, summarizeUcdp, summarizeUnrest, summarizeSocialVelocity, getCountryDisplacement, getThreatSummaryScore, countTradeRestrictions, countTradeBarriers. 13 extraction rules moved from not-implemented to implemented: cyberThreats, internetOutages, infraOutages, gpsJamming, ucdpConflict, unrestEvents, socialVelocity, newsThreatScore, displacementTotal, displacementHosted, tradeRestrictions, tradeBarriers, recoveryConflictPressure, recoveryDisplacementVelocity. Coverage: 52/59 total (88%), 46/50 Core-tier (92%). Four Core indicators remain not-implemented for STRUCTURAL reasons, NOT missing code. Scorer inputs are genuinely global scalars with zero per-country variance, so Pearson(indicator, overall) is 0 or NaN by construction: shippingStress, transitDisruption, energyPriceStress — scorer reads a global scalar applied to every country; a per-country effective signal would need re-expression as (global x per-country exposure), which is a derived signal in a different entry. aquastatWaterAvailability — needs a distinct sub-indicator path resolver; enrichment follow-up. New test asserts the three no-per-country-variance indicators STAY not-implemented with a matching reason, so any future extraction that appears to cover them without fixing the underlying construct fails. Dispatcher split into STATIC / SIMPLE / AGGREGATE extractor tables to stay under biome complexity limit. Core-tier floor test raised from 45% to 80%. 89 resilience tests pass, typecheck clean, biome clean. No production behaviour changes. * fix(resilience): tag-gated AQUASTAT extractor closes the last fixable Core gap Reviewer flagged aquastatWaterAvailability as the only remaining Core indicator where the not-implemented status was structurally fixable rather than conceptually impossible. Both aquastatWaterStress and aquastatWaterAvailability share a single .aquastat.value field; the scorer's scoreAquastatValue splits them by the sibling .aquastat.indicator tag keyword (stress/withdrawal/ dependency to stress family; availability/renewable/access to availability family). The harness now mirrors this branching: - classifyAquastatFamily implements the scorer's priority order (stress-family match wins even if the tag also contains an availability keyword, matching the sequential if-check at _dimension-scorers.ts L770-776). - static-aquastat-stress / static-aquastat-availability extractors return the value only when the family matches, so stress-family readings never corrupt the availability Pearson and vice versa. Core-tier coverage: 46/50 to 47/50 (94%). The 3 remaining Core not-implemented indicators (shippingStress, transitDisruption, energyPriceStress) are all structural impossibilities: scorer inputs are global scalars with zero per-country variance. New contract test pins both directions of the tag gate plus the priority-order edge case (a tag containing both families' keywords routes to stress). 90 resilience tests pass, typecheck clean, biome clean.	2026-04-22 16:44:12 +04:00
Elie Habib	fbaf07e106	feat(resilience): flag-gated pillar-combined score activation (default off) (#3267 ) Wires the non-compensatory 3-pillar combined overall_score behind a RESILIENCE_PILLAR_COMBINE_ENABLED env flag. Default is false so this PR ships zero behavior change in production. When flipped true the top-level overall_score switches from the 6-domain weighted aggregate to penalizedPillarScore(pillars) with alpha 0.5 and pillar weights 0.40 / 0.35 / 0.25. Evidence from docs/snapshots/resilience-pillar-sensitivity-2026-04-21: - Spearman rank correlation current vs proposed 0.9935 - Mean score delta -13.44 points (every country drops, penalty is always at most 1) - Max top-50 rank swing 6 positions (Russia) - No ceiling or floor effects under plus/minus 20pct perturbation - Release gate PASS 0/19 Code change in server/worldmonitor/resilience/v1/_shared.ts: - New isPillarCombineEnabled() reads env dynamically so tests can flip state without reloading the module - overallScore branches on (isPillarCombineEnabled() AND RESILIENCE_SCHEMA_V2_ENABLED AND pillars.length > 0); otherwise falls through to the 6-domain aggregate (unchanged default path) - RESILIENCE_SCORE_CACHE_PREFIX bumped v9 to v10 - RESILIENCE_RANKING_CACHE_KEY bumped v9 to v10 Cache invalidation: the version bump forces both per-country score cache and ranking cache to recompute from the current code path on first read after a flag flip. Without the bump, 6-domain values cached under the flag-off path would continue to serve for up to 6-12 hours after the flip, producing a ragged mix of formulas. Ripple of v9 to v10: - api/health.js registry entry - scripts/seed-resilience-scores.mjs (both keys) - scripts/validate-resilience-correlation.mjs, scripts/backtest-resilience-outcomes.mjs, scripts/validate-resilience-backtest.mjs, scripts/benchmark-resilience-external.mjs - tests/resilience-ranking.test.mts 24 fixture usages - tests/resilience-handlers.test.mts - tests/resilience-scores-seed.test.mjs explicit pin - tests/resilience-pillar-aggregation.test.mts explicit pin - docs/methodology/country-resilience-index.mdx New tests/resilience-pillar-combine-activation.test.mts: 7 assertions exercising the flag-on path against the release fixtures with re-anchored bands (NO at least 60, YE/SO at most 40, NO greater than US preserved, elite greater than fragile). Regression guard verifies flipping the flag back off restores the 6-domain aggregate. tests/resilience-ranking-snapshot.test.mts: band thresholds now resolve from a METHODOLOGY_BANDS table keyed on snapshot.methodologyFormula. Backward compatible (missing formula defaults to domain-weighted-6d bands). Snapshots: - docs/snapshots/resilience-ranking-2026-04-21.json tagged methodologyFormula domain-weighted-6d - docs/snapshots/resilience-ranking-pillar-combined-projected-2026-04-21.json new: top/bottom/major-economies tables projected from the 52-country sensitivity sample. Explicitly tagged projected (NOT a full-universe live capture). When the flag is flipped in production, run scripts/freeze-resilience-ranking.mjs to capture the authoritative full-universe snapshot. Methodology doc: Pillar-combined score activation section rewritten to describe the flag-gated mechanism (activation is an env-var flip, no code deploy) and the rollback path. Verification: npm run typecheck:all clean, 397/397 resilience tests pass (up from 390, +7 activation tests). Activation plan: 1. Merge this PR with flag default false (zero behavior change) 2. Set RESILIENCE_PILLAR_COMBINE_ENABLED=true in Vercel and Railway env 3. Redeploy or wait for next cold start; v9 to v10 bump forces every country to be rescored on first read 4. Run scripts/freeze-resilience-ranking.mjs against the flag-on deployment and commit the resulting snapshot 5. Ship a v2.0 methodology-change note explaining the re-anchored scale so analysts understand the universal ~13 point score drop is a scale rebase, not a country-level regression Rollback: set RESILIENCE_PILLAR_COMBINE_ENABLED=false, flush resilience:score:v10:* and resilience:ranking:v10 keys (or wait for TTLs). The 6-domain formula stays alongside the pillar combine in _shared.ts and needs no code change to come back.	2026-04-22 06:52:07 +04:00
Elie Habib	676331607a	feat(resilience): three-pillar aggregation with penalized weighted mean (T2.3) (#2990 ) * feat(resilience): three-pillar aggregation with penalized weighted mean (Phase 2 T2.3) Wire real three-pillar scoring: structural-readiness (0.40), live-shock-exposure (0.35), recovery-capacity (0.25). Add penalizedPillarScore formula with alpha=0.50 penalty factor for backtest tuning. Set recovery domain weight to 0.25 and redistribute existing domain weights proportionally to sum to 1.0. Bump cache keys v8 to v9. The penalized formula is exported and tested but overallScore stays as the v1 domain-weighted sum until the flag flips in PR 10. * fix(resilience): update test description v8 to v9 (#2990 review) Test descriptions said "(v8)" but assertions check v9 cache keys.	2026-04-12 10:18:42 +04:00
Elie Habib	dca2e1ca3c	feat(resilience): expose imputationClass on ResilienceDimension (T1.7 schema pass) (#2959 ) * feat(resilience): expose imputationClass on ResilienceDimension (T1.7 schema pass) Ships the Phase 1 T1.7 schema pass of the country-resilience reference grade upgrade plan. PR #2944 shipped the classifier table foundation (ImputationClass type, ImputationEntry interface, IMPUTATION/IMPUTE tagged with four semantic classes) and explicitly deferred the schema propagation. This PR lands that propagation so downstream consumers can distinguish "country is stable" from "country is unmonitored" from "upstream is down" from "structurally not applicable" on a per-dimension basis. What this PR commits - Proto: new imputation_class string field on ResilienceDimension (empty string = dimension has any observed data; otherwise one of stable-absence, unmonitored, source-failure, not-applicable). - Generated TS types: regenerated service_server.ts and service_client.ts via make generate. - Scorer: ResilienceDimensionScore carries ImputationClass \| null. WeightedMetric carries an optional imputationClass that imputation paths populate. weightedBlend aggregates the dominant class by weight when the dimension is fully imputed, returns null otherwise. - All IMPUTE.* early-return paths propagate the class from the table (IMPUTE.bisEer, IMPUTE.wtoData, IMPUTE.ipcFood, IMPUTE.unhcrDisplacement). - Response builder: _shared.ts buildDimensionList passes the class through to the ResilienceDimension proto field. - Tests: weightedBlend aggregation semantics (5 cases), dimension-level propagation from IMPUTE tables, serialized response includes the field. What is deliberately NOT in this PR - No widget icon rendering (T1.6 full grid, PR 3 of 5) - No source-failure seed-meta consultation (PR 4 of 5) - No freshness field (T1.5 propagation, PR 2 of 5) - No cache key bump: the new field is empty-string default, existing cached responses continue to deserialize cleanly Verified - make generate clean - npm run typecheck + typecheck:api clean - tests/resilience-dimension-scorers.test.mts all passing (existing + new) - tests/resilience-.test.mts + test:data suite passing (4361 tests) - npm run lint exits 0 fix(resilience): normalize cached score responses on read (#2959 P2) Greptile P2 finding on PR #2959: cachedFetchJson and getCachedResilienceScores return pre-change payloads verbatim, so a resilience:score:v7 entry written before this PR lands lacks the imputationClass field. Downstream consumers that read dim.imputationClass get undefined for up to 6 hours until the cache TTL expires. Fix: add normalizeResilienceScoreResponse helper that defaults missing optional fields in place and apply it at both read sites. Defaults imputationClass to empty string, matching the proto3 default for the new imputation_class field. - ensureResilienceScoreCached applies the normalizer after cachedFetchJson returns. - getCachedResilienceScores applies it after each successful JSON.parse on the pipeline result. - Two new test cases: stale payload without imputationClass gets defaulted, present values are preserved. - Not bumping the cache key: stale-read defaults are safe, the key bump would invalidate every cached score for a 6-hour cold-start cycle. The normalizer is extensible when PR #2961 adds freshness to the same payload. P3 finding (broken docs reference) verified invalid: the proto comment points to docs/methodology/country-resilience-index.mdx, which IS the current file. The .md predecessor was renamed in PR #2945 (T1.3 methodology doc promotion to CII parity). No change needed to the comment. * fix(resilience): bump score cache key v7 to v8, drop normalizer (#2959 P2) Second fixup for the Greptile P2 finding on #2959. The previous fixup (`40ea22009`) added normalizeResilienceScoreResponse to default missing imputationClass fields on cached payloads to empty string. The reviewer correctly pushed back: defaulting to empty string is the proto3 default for "dimension has observed data", which silently misreports pre-rollout imputed dimensions as observed until the 6h TTL expires. Correct fix: bump RESILIENCE_SCORE_CACHE_PREFIX from resilience:score:v7: to resilience:score:v8:. Invalidates every pre-change cache entry, so the next request per country repopulates with the correct imputationClass written by the scorer. Cost: a 6h warmup cycle where first-request-per-country recomputes the score, ~100ms per country across hundreds of requests. Also deletes the normalizeResilienceScoreResponse helper and its two call sites. It was misleading defense-in-depth that can hide future schema drift bugs. Future additive field additions should bump the key, not silently default fields. - server/worldmonitor/resilience/v1/_shared.ts: prefix v7 to v8, delete normalizer function and both call sites. - scripts/seed-resilience-scores.mjs, validate-resilience-correlation.mjs, validate-resilience-backtest.mjs: mirror constants bumped. - tests/resilience-scores-seed.test.mjs: pin literal v7 to v8. - tests/resilience-ranking.test.mts: 7 hardcoded cache keys bumped. - tests/resilience-handlers.test.mts: stray v7 cache key bumped. - tests/resilience-release-gate.test.mts: the two normalizer test cases from `40ea22009` deleted along with the helper. - docs/methodology/country-resilience-index.mdx: Redis keys table updated from v7 to v8 to match the canonical constant. P3 (broken docs reference) confirmed invalid a second time. docs/methodology/country-resilience-index.mdx exists on origin/main AND on the PR branch with the same blob hash `d2ab1ebad3`. docs/methodology/resilience-index.md does not exist on either. No proto comment change.	2026-04-11 23:50:27 +04:00
Elie Habib	75e9c22dd3	feat(resilience): populate dataVersion field from seed-meta timestamp (#2865 ) * feat(resilience): populate dataVersion field from seed-meta timestamp Sets dataVersion to the ISO date of the most recent static bundle seed, making the data vintage visible to API consumers. * fix(resilience): bump score cache to v7 for dataVersion field addition	2026-04-09 12:22:46 +04:00
Elie Habib	09ed68db09	fix(resilience): revert overall score to domain-weighted average + fix RSF direction (#2847 ) * fix(resilience): revert overall score to domain-weighted average + fix RSF direction 1. overallScore reverted from baseline(1-stressFactor) to sum(domainScore domainWeight) — the multiplicative formula crushed all scores by 30-50% 2. RSF press freedom: normalizeHigherBetter → normalizeLowerBetter (RSF 0=best, 100=worst; Norway 6.52 was scoring 7 instead of 93) 3. Seed script ranking write removed (handler owns greyedOut split) 4. Widget Impact row removed (stressFactor no longer drives headline) 5. Cache keys bumped: score v6, ranking v6, history v3 * fix(resilience): update validation scripts to v6 + remove lock from read-only seed 1. Validation scripts (backtest, correlation, sensitivity) updated from v5 to v6 cache keys. Sensitivity formula updated to domain-weighted. 2. Seed script lock removed — read-only health check needs no lock. * chore: add clarifying comment on orphaned ranking TTL export	2026-04-09 08:49:54 +04:00
Elie Habib	5b0e11262e	feat(resilience): external index correlation validation (ND-GAIN, INFORM) (#2824 ) * feat(resilience): external index correlation validation (ND-GAIN, INFORM) Batch validation script computing Spearman rho between WorldMonitor resilience scores and ND-GAIN readiness / INFORM risk indices for 50 representative countries. Identifies top divergences. Phase 4 gate: rho > 0.6 with at least 2 benchmark indices. * fix(resilience): use correct score cache key v5 in correlation script Was hardcoded to v4, but production is v5 after PR #2821 (baseline/stress engine). Script would miss all cached scores and fail with "Too few scores available". * chore(resilience): remove unused redisGetJson from correlation script	2026-04-08 16:23:10 +04:00