mirror of
https://github.com/koala73/worldmonitor.git
synced 2026-04-25 17:14:57 +02:00
* feat(resilience): PR 2 §3.4 recovery-domain weight rebalance
Dials the two PR 2 §3.4 recovery dims (liquidReserveAdequacy,
sovereignFiscalBuffer) to ~10% share each of the recovery-domain
score via a new per-dimension weight channel in the coverage-weighted
mean. Matches the plan's direction that the sovereign-wealth signal
complement — rather than dominate — the classical liquid-reserves
and fiscal-space signals.
Implementation
- RESILIENCE_DIMENSION_WEIGHTS: new Record<ResilienceDimensionId, number>
alongside RESILIENCE_DOMAIN_WEIGHTS. Every dim has an explicit entry
(default 1.0) so rebalance decisions stay auditable; the two new
recovery dims carry 0.5 each.
Share math at full coverage (6 active recovery dims):
weight sum = 4 × 1.0 + 2 × 0.5 = 5.0
each new-dim share = 0.5 / 5.0 = 0.10 ✓
each core-dim share = 1.0 / 5.0 = 0.20
Retired dims (reserveAdequacy, fuelStockDays) keep weight 1.0 in
the map; their coverage=0 neutralizes them at the coverage channel
regardless. Explicit entries guard against a future scorer bug
accidentally returning coverage>0 for a retired dim and falling
through the `?? 1.0` default — every retirement decision is now
tied to a single explicit source of truth.
- coverageWeightedMean (_shared.ts): refactored to apply
`coverage × dimWeight` per dim instead of `coverage` alone. Backward-
compatible when all weights default to 1.0 (reduces to the original
mean). All three aggregation callers — buildDomainList, baseline-
Score, stressScore — pick up the weighting transparently.
Test coverage
1. New `tests/resilience-recovery-weight-rebalance.test.mts`:
pins the per-dim weight values, asserts the share math
(0.10 new / 0.20 core), verifies completeness of the weight map,
and documents why retired dims stay in the map at 1.0.
2. New `tests/resilience-recovery-ordering.test.mts`: fixture-based
Spearman-proxy sensitivity check. Asserts NO > US > YE ordering
preserved on both the overall score and the recovery-domain
subscore after the rebalance. (Live post-merge Spearman rerun
against the PR 0 snapshot is tracked as a follow-up commit.)
3. resilience-scorers.test.mts fixture anchors updated in lockstep:
baselineScore: 60.35 → 62.17 (low-scoring liquidReserveAdequacy
+ partial-coverage SWF now contribute ~half the weight)
overallScore: 63.60 → 64.39 (recovery subscore lifts by ~3 pts
from the rebalance, overall by ~0.79)
recovery flat mean: 48.75 (unchanged — flat mean doesn't apply
weights by design; documents the coverage-weighted diff)
Local coverageWeightedMean helper in the test mirrors the
production implementation (weights applied per dim).
Methodology doc
- New "Per-dimension weights in the recovery domain" subsection with
the weight table and a sentence explaining the cap. Cross-references
the source of truth (RESILIENCE_DIMENSION_WEIGHTS).
Deliberate non-goals
- Live post-merge Spearman ≥0.85 check against the PR 0 baseline
snapshot. Fixture ordering is preserved (new ordering test); the
live-data check runs after Railway cron refreshes the rankings on
the new weights and commits docs/snapshots/resilience-ranking-live-
post-pr2-<date>.json. Tracked as the final piece of PR 2 §3.4
alongside the health.js / bootstrap graduation (waiting on the
7-day Railway cron bake-in window).
Tests: 6588/6588 data-tier tests pass. Typecheck clean on both
tsconfig configs. Biome clean on touched files. NO > US > YE
fixture ordering preserved.
* fix(resilience): PR 2 review — thread RESILIENCE_DIMENSION_WEIGHTS through the comparison harness
Greptile P2: the operator comparison harness
(scripts/compare-resilience-current-vs-proposed.mjs) claims its domain
scores "mirror the production scorer's coverage-weighted mean" and is
the artifact generator for Spearman / rank-delta acceptance decisions.
After PR 2 §3.4's weight rebalance, the production mirror diverged —
production now applies RESILIENCE_DIMENSION_WEIGHTS (liquidReserveAdequacy
= 0.5, sovereignFiscalBuffer = 0.5) inside coverageWeightedMean, but
the harness still used equal-weight aggregation.
Left unfixed, post-merge Spearman / rank-delta diagnostics would
compare live API scores (with the 0.5 recovery weights) against
harness predictions that assume equal-share dims — silently biasing
every acceptance decision until someone noticed a country's rank-
delta didn't track.
Fix
- Mirrored coverageWeightedMean now accepts dimensionWeights and
applies `coverage × weight` per dim, matching _shared.ts exactly.
- Mirrored buildDomainList accepts + forwards dimensionWeights.
- main() imports RESILIENCE_DIMENSION_WEIGHTS from the scorer module
and passes it through to buildDomainList at the single call site.
- Missing-entry default = 1.0 (same contract as production) — makes
the harness forward-compatible with any future weight refactor
(adds a new dim without an explicit entry, old production fallback
path still produces the correct number).
Verification
- Harness syntax-check clean (node -c).
- RESILIENCE_DIMENSION_WEIGHTS import resolves correctly from the
harness's import path.
- 509/509 resilience tests still pass (harness isn't in the test
suite; the invariant is that production ↔ harness use the same
math, and the production side is covered by tests/resilience-
recovery-weight-rebalance.test.mts).
* fix(resilience): PR 2 review — bump cache prefixes v10→v11 + document coverage-vs-weight asymmetry
Greptile P1 + P2 on PR #3328.
P1 — cache prefix not bumped after formula change
--------------------------------------------------
The per-dim weight rebalance changes the score formula, but the
`_formula` tag only distinguishes 'd6' vs 'pc' (pillar-combined vs
legacy 6-domain) — it does NOT detect intra-'d6' weight changes. Left
unfixed, scores cached before deploy would be served with the old
equal-weight math for up to the full 6h TTL, and the ranking key for
up to its 12h TTL. Matches the established v9→v10 pattern for every
prior formula-changing deploy.
Bumped in lockstep:
- RESILIENCE_SCORE_CACHE_PREFIX: v10 → v11
- RESILIENCE_RANKING_CACHE_KEY: v10 → v11
- RESILIENCE_HISTORY_KEY_PREFIX: v5 → v6
- scripts/seed-resilience-scores.mjs local mirrors
- api/health.js resilienceRanking literal
- 4 analysis/backtest scripts that read the cached keys directly
- Test fixtures in resilience-{ranking, handlers, scores-seed,
pillar-aggregation}.test.* that assert on literal key values
The v5→v6 history bump is the critical one: without it, pre-rebalance
history points would mix with post-rebalance points inside the 30-day
window, and change30d / trend math would diff values from different
formulas against each other, producing false-negative "falling" trends
for every country across the deploy window.
P2 — coverage-vs-weight asymmetry in computeLowConfidence / computeOverallCoverage
----------------------------------------------------------------------------------
Reviewer flagged that these two functions still average coverage
equally across all non-retired dims, even after the scoring aggregation
started applying RESILIENCE_DIMENSION_WEIGHTS. The asymmetry is
INTENTIONAL — these signals answer a different question from scoring:
scoring aggregation: "how much does each dim matter to the score?"
coverage signal: "how much real data do we have on this country?"
A dim at weight 0.5 still has the same data-availability footprint as
a weight=1.0 dim: its coverage value reflects whether we successfully
fetched the upstream source, not whether the scorer cares about it.
Applying scoring weights to the coverage signal would let a
half-weight dim hide half its sparsity from the overallCoverage pill,
misleading users reading coverage as a data-quality indicator.
Added explicit comments to both functions noting the asymmetry is
deliberate and pointing at the other site for matching rationale.
No code change — just documentation.
Tests: 6588/6588 data-tier tests pass (+511 resilience-specific
including the prefix-literal assertions). Typecheck clean on both
tsconfig configs. Biome clean on touched files.
* docs(resilience): bump methodology doc cache-prefix references to v11/v6
Greptile P2 on PR #3328: Redis keys table in the reproducibility
appendix still published `score:v10` / `ranking:v10` / `history:v5`,
and the rollback instructions told operators to flush those keys.
After the recovery-domain weight rebalance, live cache runs at
`score:v11` / `ranking:v11` / `history:v6`.
- Updated the Redis keys table (line 490-492) to match `_shared.ts`.
- Updated the rollback block to name the current keys.
- Left the historical "Activation sequence" narrative intact (it
accurately describes the pillar-combine PR's v9→v10 / v4→v5 bump)
but added a parenthetical pointing at the current v11/v6 values.
No code change — doc-only correction for operator accuracy.
* fix(docs): escape MDX-unsafe `<137` pattern to unblock Mintlify deploy
Line 643 had `(<137 countries)` — MDX parses `<137` as a JSX tag
starting with digit `1`, which is illegal and breaks the deploy with
"Unexpected character \`1\` (U+0031) before name". Surfaced after the
prior cache-prefix commit forced Mintlify to re-parse this file.
Replaced with "fewer than 137 countries" for unambiguous rendering.
Other `<` occurrences in this doc (lines 34, 642) are followed by
whitespace and don't trip MDX's tag parser.
112 lines
4.5 KiB
TypeScript
112 lines
4.5 KiB
TypeScript
import assert from 'node:assert/strict';
|
||
import { describe, it } from 'node:test';
|
||
|
||
import {
|
||
RESILIENCE_DIMENSION_DOMAINS,
|
||
RESILIENCE_DIMENSION_ORDER,
|
||
RESILIENCE_DIMENSION_WEIGHTS,
|
||
RESILIENCE_RETIRED_DIMENSIONS,
|
||
type ResilienceDimensionId,
|
||
} from '../server/worldmonitor/resilience/v1/_dimension-scorers.ts';
|
||
|
||
// PR 2 §3.4 recovery-domain weight rebalance. The plan pins the two
|
||
// new dims (liquidReserveAdequacy, sovereignFiscalBuffer) at ~0.10
|
||
// share of the recovery-domain score, with the other four active
|
||
// recovery dims absorbing the residual. This test locks the share
|
||
// arithmetic against regression — any future weight change must
|
||
// explicitly update this test with the new targets so the operator
|
||
// rationale stays auditable.
|
||
//
|
||
// Math (6 active recovery dims at coverage=1.0, weights from
|
||
// RESILIENCE_DIMENSION_WEIGHTS):
|
||
// fiscalSpace × 1.0
|
||
// externalDebtCoverage × 1.0
|
||
// importConcentration × 1.0
|
||
// stateContinuity × 1.0
|
||
// liquidReserveAdequacy × 0.5
|
||
// sovereignFiscalBuffer × 0.5
|
||
// Total weighted coverage = 4.0 + 2×0.5 = 5.0
|
||
// Each new-dim share = 0.5 / 5.0 = 0.10
|
||
// Each other-dim share = 1.0 / 5.0 = 0.20
|
||
describe('recovery-domain weight rebalance (PR 2 §3.4)', () => {
|
||
const recoveryDims = RESILIENCE_DIMENSION_ORDER.filter(
|
||
(id) => RESILIENCE_DIMENSION_DOMAINS[id] === 'recovery',
|
||
);
|
||
const activeRecoveryDims = recoveryDims.filter(
|
||
(id) => !RESILIENCE_RETIRED_DIMENSIONS.has(id),
|
||
);
|
||
|
||
it('exposes a per-dimension weight entry for every dim in the order', () => {
|
||
for (const id of RESILIENCE_DIMENSION_ORDER) {
|
||
assert.ok(
|
||
RESILIENCE_DIMENSION_WEIGHTS[id] != null,
|
||
`RESILIENCE_DIMENSION_WEIGHTS missing entry for ${id}. Every dim must have an explicit weight — default 1.0 is fine but must be spelled out so the rebalance decisions stay auditable.`,
|
||
);
|
||
}
|
||
});
|
||
|
||
it('pins liquidReserveAdequacy + sovereignFiscalBuffer at weight 0.5', () => {
|
||
assert.equal(
|
||
RESILIENCE_DIMENSION_WEIGHTS.liquidReserveAdequacy,
|
||
0.5,
|
||
'plan §3.4 targets ~10% recovery share; weight 0.5 with the other 4 dims at 1.0 gives 0.5/5.0 = 0.10',
|
||
);
|
||
assert.equal(
|
||
RESILIENCE_DIMENSION_WEIGHTS.sovereignFiscalBuffer,
|
||
0.5,
|
||
'plan §3.4 targets ~10% recovery share; weight 0.5 with the other 4 dims at 1.0 gives 0.5/5.0 = 0.10',
|
||
);
|
||
});
|
||
|
||
it('the four active core recovery dims carry weight 1.0', () => {
|
||
const coreRecovery: ResilienceDimensionId[] = [
|
||
'fiscalSpace',
|
||
'externalDebtCoverage',
|
||
'importConcentration',
|
||
'stateContinuity',
|
||
];
|
||
for (const id of coreRecovery) {
|
||
assert.equal(
|
||
RESILIENCE_DIMENSION_WEIGHTS[id],
|
||
1.0,
|
||
`${id} must carry weight 1.0 per plan §3.4 "other recovery dimensions absorb residual"`,
|
||
);
|
||
}
|
||
});
|
||
|
||
it('recovery-domain share math: each new dim = 10% at full coverage', () => {
|
||
// Reproduce the coverage-weighted-mean share denominator using
|
||
// coverage=1.0 for all active dims. If this ever diverges from
|
||
// 0.10 the plan's target is no longer met.
|
||
const weightSum = activeRecoveryDims.reduce(
|
||
(s, id) => s + (RESILIENCE_DIMENSION_WEIGHTS[id] ?? 1),
|
||
0,
|
||
);
|
||
const liquidShare = (RESILIENCE_DIMENSION_WEIGHTS.liquidReserveAdequacy) / weightSum;
|
||
const swfShare = (RESILIENCE_DIMENSION_WEIGHTS.sovereignFiscalBuffer) / weightSum;
|
||
// ±0.005 = tolerant of one future addition drifting the share
|
||
// slightly; the plan says "~0.10" not exactly 0.10.
|
||
assert.ok(
|
||
Math.abs(liquidShare - 0.10) < 0.005,
|
||
`liquidReserveAdequacy share at full coverage = ${liquidShare.toFixed(4)}, expected ~0.10`,
|
||
);
|
||
assert.ok(
|
||
Math.abs(swfShare - 0.10) < 0.005,
|
||
`sovereignFiscalBuffer share at full coverage = ${swfShare.toFixed(4)}, expected ~0.10`,
|
||
);
|
||
});
|
||
|
||
it('retired recovery dims (reserveAdequacy, fuelStockDays) stay in the weight map', () => {
|
||
// Retired dims have coverage=0 and so are neutralized at the
|
||
// coverage channel regardless of weight. Keeping them in the
|
||
// weight map at 1.0 rather than stripping them is the defensive
|
||
// choice: if a future scorer bug accidentally returns coverage>0
|
||
// for a retired dim, a missing weight entry here would make the
|
||
// aggregation silently fall through to the `?? 1.0` default,
|
||
// bypassing the retirement signal. Having explicit weights
|
||
// enforces a single source of truth.
|
||
assert.ok(RESILIENCE_DIMENSION_WEIGHTS.reserveAdequacy != null);
|
||
assert.ok(RESILIENCE_DIMENSION_WEIGHTS.fuelStockDays != null);
|
||
});
|
||
});
|