worldmonitor

mirror of https://github.com/koala73/worldmonitor.git synced 2026-04-25 17:14:57 +02:00

Author	SHA1	Message	Date
Elie Habib	184e82cb40	feat(resilience): PR 3A — net-imports denominator for sovereignFiscalBuffer (#3380 ) PR 3A of cohort-audit plan 2026-04-24-002. Construct correction for re-export hubs: the SWF rawMonths denominator was gross imports, which double-counted flow-through trade that never represents domestic consumption. Net-imports fix: rawMonths = aum / (grossImports × (1 − reexportShareOfImports)) × 12 applied to any country in the re-export share manifest. Countries NOT in the manifest get gross imports unchanged (status-quo fallback). Plan acceptance gates — verified synthetically in this PR: Construct invariant. Two synthetic countries, same SWF, same gross imports. A re-exports 60%; B re-exports 0%. Post-fix, A's rawMonths is 2.5× B's (1/(1-0.6) = 2.5). Pinned in tests/resilience-net-imports-denominator.test.mts. SWF-heavy exporter invariant. Country with share ≤ 5%: rawMonths lift < 5% vs baseline (negligible). Pinned. What shipped 1. Re-export share manifest infrastructure. - scripts/shared/reexport-share-manifest.yaml (new, empty) — schema committed; entries populated in follow-up PRs with UNCTAD Handbook citations. - scripts/shared/reexport-share-loader.mjs (new) — loader + strict validator, mirrors swf-manifest-loader.mjs. - scripts/seed-recovery-reexport-share.mjs (new) — publishes resilience:recovery:reexport-share:v1 from manifest. Empty manifest = valid (no countries, no adjustment). 2. SWF seeder uses net-imports denominator. - scripts/seed-sovereign-wealth.mjs exports computeNetImports(gross, share) — pure helper, unit-tested. - Per-country loop: reads manifest, computes denominatorImports, applies to rawMonths math. - Payload records annualImports (gross, audit), denominatorImports (used in math), reexportShareOfImports (provenance). - Summary log reports which countries had a net-imports adjustment applied with source year. 3. Bundle wiring. - Reexport-Share runs BEFORE Sovereign-Wealth in the recovery bundle so the SWF seeder reads fresh re-export data in the same cron tick. - tests/seed-bundle-resilience-recovery.test.mjs expected-entries updated (6 → 7) with ordering preservation. 4. Cache-prefix bump (per cache-prefix-bump-propagation-scope skill). - RESILIENCE_SCORE_CACHE_PREFIX: v11 → v12 - RESILIENCE_RANKING_CACHE_KEY: v11 → v12 - RESILIENCE_HISTORY_KEY_PREFIX: v6 → v7 (history rotation prevents 30-day rolling window from mixing pre/post-fix scores and manufacturing false "falling" trends on deploy day). - Source of truth: server/worldmonitor/resilience/v1/_shared.ts - Mirrored in: scripts/seed-resilience-scores.mjs, scripts/validate-resilience-correlation.mjs, scripts/backtest-resilience-outcomes.mjs, scripts/validate-resilience-backtest.mjs, scripts/benchmark-resilience-external.mjs, api/health.js - Test literals bumped in 4 test files (26 line edits). - EXTENDED tests/resilience-cache-keys-health-sync.test.mts with a parity pass that reads every known mirror file and asserts both (a) canonical prefix present AND (b) no stale v<older> literals in non-comment code. Found one legacy log-line that still referenced v9 (scripts/seed-resilience-scores.mjs:342) and refactored it to use the RESILIENCE_RANKING_CACHE_KEY constant so future bumps self-update. Explicitly NOT in this PR - liquidReserveAdequacy denominator fix. The plan's PR 3A wording mentions both dims, but the RESERVES ratio (WB FI.RES.TOTL.MO) is a PRE-COMPUTED WB series; applying a post-hoc net-imports adjustment mixes WB's denominator year with our manifest-year, and the math change belongs in PR 3B (unified liquidity) where the α calibration is explicit. This PR stays scoped to sovereignFiscalBuffer. - Live re-export share entries. The manifest ships EMPTY in this PR; entries with UNCTAD citations are one-per-PR follow-ups so each figure is individually auditable. Verified - tests/resilience-net-imports-denominator.test.mts — 9 pass (construct contract: 2.5× ratio gate, monotonicity, boundary rejections, backward-compat on missing manifest entry, cohort-proportionality, SWF-heavy-exporter-unchanged) - tests/reexport-share-loader.test.mts — 7 pass (committed-manifest shape + 6 schema-violation rejections) - tests/resilience-cache-keys-health-sync.test.mts — 5 pass (existing 3 + 2 new parity checks across all mirror files) - tests/seed-bundle-resilience-recovery.test.mjs — 17 pass (expected entries bumped to 7) - npm run test:data — 6714 pass / 0 fail - npm run typecheck / typecheck:api — green - npm run lint / lint:md — clean Deployment notes Score + ranking + history cache prefixes all bump in the same deploy. Per established v10→v11 precedent (and the cache-prefix-bump- propagation-scope skill): - Score / ranking: 6h TTL — the new prefix populates via the Railway resilience-scores cron within one tick. - History: 30d ring — the v7 ring starts empty; the first 30 days post-deploy lack baseline points, so trend / change30d will read as "no change" until v7 accumulates a window. - Legacy v11 keys can be deleted from Redis at any time post-deploy (no reader references them). Leaving them in place costs storage but does no harm.	2026-04-24 18:14:04 +04:00
Elie Habib	d3d406448a	feat(resilience): PR 2 §3.4 recovery-domain weight rebalance (#3328 ) * feat(resilience): PR 2 §3.4 recovery-domain weight rebalance Dials the two PR 2 §3.4 recovery dims (liquidReserveAdequacy, sovereignFiscalBuffer) to ~10% share each of the recovery-domain score via a new per-dimension weight channel in the coverage-weighted mean. Matches the plan's direction that the sovereign-wealth signal complement — rather than dominate — the classical liquid-reserves and fiscal-space signals. Implementation - RESILIENCE_DIMENSION_WEIGHTS: new Record<ResilienceDimensionId, number> alongside RESILIENCE_DOMAIN_WEIGHTS. Every dim has an explicit entry (default 1.0) so rebalance decisions stay auditable; the two new recovery dims carry 0.5 each. Share math at full coverage (6 active recovery dims): weight sum = 4 × 1.0 + 2 × 0.5 = 5.0 each new-dim share = 0.5 / 5.0 = 0.10 ✓ each core-dim share = 1.0 / 5.0 = 0.20 Retired dims (reserveAdequacy, fuelStockDays) keep weight 1.0 in the map; their coverage=0 neutralizes them at the coverage channel regardless. Explicit entries guard against a future scorer bug accidentally returning coverage>0 for a retired dim and falling through the `?? 1.0` default — every retirement decision is now tied to a single explicit source of truth. - coverageWeightedMean (_shared.ts): refactored to apply `coverage × dimWeight` per dim instead of `coverage` alone. Backward- compatible when all weights default to 1.0 (reduces to the original mean). All three aggregation callers — buildDomainList, baseline- Score, stressScore — pick up the weighting transparently. Test coverage 1. New `tests/resilience-recovery-weight-rebalance.test.mts`: pins the per-dim weight values, asserts the share math (0.10 new / 0.20 core), verifies completeness of the weight map, and documents why retired dims stay in the map at 1.0. 2. New `tests/resilience-recovery-ordering.test.mts`: fixture-based Spearman-proxy sensitivity check. Asserts NO > US > YE ordering preserved on both the overall score and the recovery-domain subscore after the rebalance. (Live post-merge Spearman rerun against the PR 0 snapshot is tracked as a follow-up commit.) 3. resilience-scorers.test.mts fixture anchors updated in lockstep: baselineScore: 60.35 → 62.17 (low-scoring liquidReserveAdequacy + partial-coverage SWF now contribute ~half the weight) overallScore: 63.60 → 64.39 (recovery subscore lifts by ~3 pts from the rebalance, overall by ~0.79) recovery flat mean: 48.75 (unchanged — flat mean doesn't apply weights by design; documents the coverage-weighted diff) Local coverageWeightedMean helper in the test mirrors the production implementation (weights applied per dim). Methodology doc - New "Per-dimension weights in the recovery domain" subsection with the weight table and a sentence explaining the cap. Cross-references the source of truth (RESILIENCE_DIMENSION_WEIGHTS). Deliberate non-goals - Live post-merge Spearman ≥0.85 check against the PR 0 baseline snapshot. Fixture ordering is preserved (new ordering test); the live-data check runs after Railway cron refreshes the rankings on the new weights and commits docs/snapshots/resilience-ranking-live- post-pr2-<date>.json. Tracked as the final piece of PR 2 §3.4 alongside the health.js / bootstrap graduation (waiting on the 7-day Railway cron bake-in window). Tests: 6588/6588 data-tier tests pass. Typecheck clean on both tsconfig configs. Biome clean on touched files. NO > US > YE fixture ordering preserved. * fix(resilience): PR 2 review — thread RESILIENCE_DIMENSION_WEIGHTS through the comparison harness Greptile P2: the operator comparison harness (scripts/compare-resilience-current-vs-proposed.mjs) claims its domain scores "mirror the production scorer's coverage-weighted mean" and is the artifact generator for Spearman / rank-delta acceptance decisions. After PR 2 §3.4's weight rebalance, the production mirror diverged — production now applies RESILIENCE_DIMENSION_WEIGHTS (liquidReserveAdequacy = 0.5, sovereignFiscalBuffer = 0.5) inside coverageWeightedMean, but the harness still used equal-weight aggregation. Left unfixed, post-merge Spearman / rank-delta diagnostics would compare live API scores (with the 0.5 recovery weights) against harness predictions that assume equal-share dims — silently biasing every acceptance decision until someone noticed a country's rank- delta didn't track. Fix - Mirrored coverageWeightedMean now accepts dimensionWeights and applies `coverage × weight` per dim, matching _shared.ts exactly. - Mirrored buildDomainList accepts + forwards dimensionWeights. - main() imports RESILIENCE_DIMENSION_WEIGHTS from the scorer module and passes it through to buildDomainList at the single call site. - Missing-entry default = 1.0 (same contract as production) — makes the harness forward-compatible with any future weight refactor (adds a new dim without an explicit entry, old production fallback path still produces the correct number). Verification - Harness syntax-check clean (node -c). - RESILIENCE_DIMENSION_WEIGHTS import resolves correctly from the harness's import path. - 509/509 resilience tests still pass (harness isn't in the test suite; the invariant is that production ↔ harness use the same math, and the production side is covered by tests/resilience- recovery-weight-rebalance.test.mts). * fix(resilience): PR 2 review — bump cache prefixes v10→v11 + document coverage-vs-weight asymmetry Greptile P1 + P2 on PR #3328. P1 — cache prefix not bumped after formula change -------------------------------------------------- The per-dim weight rebalance changes the score formula, but the `_formula` tag only distinguishes 'd6' vs 'pc' (pillar-combined vs legacy 6-domain) — it does NOT detect intra-'d6' weight changes. Left unfixed, scores cached before deploy would be served with the old equal-weight math for up to the full 6h TTL, and the ranking key for up to its 12h TTL. Matches the established v9→v10 pattern for every prior formula-changing deploy. Bumped in lockstep: - RESILIENCE_SCORE_CACHE_PREFIX: v10 → v11 - RESILIENCE_RANKING_CACHE_KEY: v10 → v11 - RESILIENCE_HISTORY_KEY_PREFIX: v5 → v6 - scripts/seed-resilience-scores.mjs local mirrors - api/health.js resilienceRanking literal - 4 analysis/backtest scripts that read the cached keys directly - Test fixtures in resilience-{ranking, handlers, scores-seed, pillar-aggregation}.test.* that assert on literal key values The v5→v6 history bump is the critical one: without it, pre-rebalance history points would mix with post-rebalance points inside the 30-day window, and change30d / trend math would diff values from different formulas against each other, producing false-negative "falling" trends for every country across the deploy window. P2 — coverage-vs-weight asymmetry in computeLowConfidence / computeOverallCoverage ---------------------------------------------------------------------------------- Reviewer flagged that these two functions still average coverage equally across all non-retired dims, even after the scoring aggregation started applying RESILIENCE_DIMENSION_WEIGHTS. The asymmetry is INTENTIONAL — these signals answer a different question from scoring: scoring aggregation: "how much does each dim matter to the score?" coverage signal: "how much real data do we have on this country?" A dim at weight 0.5 still has the same data-availability footprint as a weight=1.0 dim: its coverage value reflects whether we successfully fetched the upstream source, not whether the scorer cares about it. Applying scoring weights to the coverage signal would let a half-weight dim hide half its sparsity from the overallCoverage pill, misleading users reading coverage as a data-quality indicator. Added explicit comments to both functions noting the asymmetry is deliberate and pointing at the other site for matching rationale. No code change — just documentation. Tests: 6588/6588 data-tier tests pass (+511 resilience-specific including the prefix-literal assertions). Typecheck clean on both tsconfig configs. Biome clean on touched files. * docs(resilience): bump methodology doc cache-prefix references to v11/v6 Greptile P2 on PR #3328: Redis keys table in the reproducibility appendix still published `score:v10` / `ranking:v10` / `history:v5`, and the rollback instructions told operators to flush those keys. After the recovery-domain weight rebalance, live cache runs at `score:v11` / `ranking:v11` / `history:v6`. - Updated the Redis keys table (line 490-492) to match `_shared.ts`. - Updated the rollback block to name the current keys. - Left the historical "Activation sequence" narrative intact (it accurately describes the pillar-combine PR's v9→v10 / v4→v5 bump) but added a parenthetical pointing at the current v11/v6 values. No code change — doc-only correction for operator accuracy. * fix(docs): escape MDX-unsafe `<137` pattern to unblock Mintlify deploy Line 643 had `(<137 countries)` — MDX parses `<137` as a JSX tag starting with digit `1`, which is illegal and breaks the deploy with "Unexpected character \`1\` (U+0031) before name". Surfaced after the prior cache-prefix commit forced Mintlify to re-parse this file. Replaced with "fewer than 137 countries" for unambiguous rendering. Other `<` occurrences in this doc (lines 34, 642) are followed by whitespace and don't trip MDX's tag parser.	2026-04-23 10:25:18 +04:00
Elie Habib	fbaf07e106	feat(resilience): flag-gated pillar-combined score activation (default off) (#3267 ) Wires the non-compensatory 3-pillar combined overall_score behind a RESILIENCE_PILLAR_COMBINE_ENABLED env flag. Default is false so this PR ships zero behavior change in production. When flipped true the top-level overall_score switches from the 6-domain weighted aggregate to penalizedPillarScore(pillars) with alpha 0.5 and pillar weights 0.40 / 0.35 / 0.25. Evidence from docs/snapshots/resilience-pillar-sensitivity-2026-04-21: - Spearman rank correlation current vs proposed 0.9935 - Mean score delta -13.44 points (every country drops, penalty is always at most 1) - Max top-50 rank swing 6 positions (Russia) - No ceiling or floor effects under plus/minus 20pct perturbation - Release gate PASS 0/19 Code change in server/worldmonitor/resilience/v1/_shared.ts: - New isPillarCombineEnabled() reads env dynamically so tests can flip state without reloading the module - overallScore branches on (isPillarCombineEnabled() AND RESILIENCE_SCHEMA_V2_ENABLED AND pillars.length > 0); otherwise falls through to the 6-domain aggregate (unchanged default path) - RESILIENCE_SCORE_CACHE_PREFIX bumped v9 to v10 - RESILIENCE_RANKING_CACHE_KEY bumped v9 to v10 Cache invalidation: the version bump forces both per-country score cache and ranking cache to recompute from the current code path on first read after a flag flip. Without the bump, 6-domain values cached under the flag-off path would continue to serve for up to 6-12 hours after the flip, producing a ragged mix of formulas. Ripple of v9 to v10: - api/health.js registry entry - scripts/seed-resilience-scores.mjs (both keys) - scripts/validate-resilience-correlation.mjs, scripts/backtest-resilience-outcomes.mjs, scripts/validate-resilience-backtest.mjs, scripts/benchmark-resilience-external.mjs - tests/resilience-ranking.test.mts 24 fixture usages - tests/resilience-handlers.test.mts - tests/resilience-scores-seed.test.mjs explicit pin - tests/resilience-pillar-aggregation.test.mts explicit pin - docs/methodology/country-resilience-index.mdx New tests/resilience-pillar-combine-activation.test.mts: 7 assertions exercising the flag-on path against the release fixtures with re-anchored bands (NO at least 60, YE/SO at most 40, NO greater than US preserved, elite greater than fragile). Regression guard verifies flipping the flag back off restores the 6-domain aggregate. tests/resilience-ranking-snapshot.test.mts: band thresholds now resolve from a METHODOLOGY_BANDS table keyed on snapshot.methodologyFormula. Backward compatible (missing formula defaults to domain-weighted-6d bands). Snapshots: - docs/snapshots/resilience-ranking-2026-04-21.json tagged methodologyFormula domain-weighted-6d - docs/snapshots/resilience-ranking-pillar-combined-projected-2026-04-21.json new: top/bottom/major-economies tables projected from the 52-country sensitivity sample. Explicitly tagged projected (NOT a full-universe live capture). When the flag is flipped in production, run scripts/freeze-resilience-ranking.mjs to capture the authoritative full-universe snapshot. Methodology doc: Pillar-combined score activation section rewritten to describe the flag-gated mechanism (activation is an env-var flip, no code deploy) and the rollback path. Verification: npm run typecheck:all clean, 397/397 resilience tests pass (up from 390, +7 activation tests). Activation plan: 1. Merge this PR with flag default false (zero behavior change) 2. Set RESILIENCE_PILLAR_COMBINE_ENABLED=true in Vercel and Railway env 3. Redeploy or wait for next cold start; v9 to v10 bump forces every country to be rescored on first read 4. Run scripts/freeze-resilience-ranking.mjs against the flag-on deployment and commit the resulting snapshot 5. Ship a v2.0 methodology-change note explaining the re-anchored scale so analysts understand the universal ~13 point score drop is a scale rebase, not a country-level regression Rollback: set RESILIENCE_PILLAR_COMBINE_ENABLED=false, flush resilience:score:v10:* and resilience:ranking:v10 keys (or wait for TTLs). The 6-domain formula stays alongside the pillar combine in _shared.ts and needs no code change to come back.	2026-04-22 06:52:07 +04:00
Elie Habib	676331607a	feat(resilience): three-pillar aggregation with penalized weighted mean (T2.3) (#2990 ) * feat(resilience): three-pillar aggregation with penalized weighted mean (Phase 2 T2.3) Wire real three-pillar scoring: structural-readiness (0.40), live-shock-exposure (0.35), recovery-capacity (0.25). Add penalizedPillarScore formula with alpha=0.50 penalty factor for backtest tuning. Set recovery domain weight to 0.25 and redistribute existing domain weights proportionally to sum to 1.0. Bump cache keys v8 to v9. The penalized formula is exported and tested but overallScore stays as the v1 domain-weighted sum until the flag flips in PR 10. * fix(resilience): update test description v8 to v9 (#2990 review) Test descriptions said "(v8)" but assertions check v9 cache keys.	2026-04-12 10:18:42 +04:00

4 Commits