Files
worldmonitor/tests/resilience-pillar-aggregation.test.mts
Elie Habib d3d406448a feat(resilience): PR 2 §3.4 recovery-domain weight rebalance (#3328)
* feat(resilience): PR 2 §3.4 recovery-domain weight rebalance

Dials the two PR 2 §3.4 recovery dims (liquidReserveAdequacy,
sovereignFiscalBuffer) to ~10% share each of the recovery-domain
score via a new per-dimension weight channel in the coverage-weighted
mean. Matches the plan's direction that the sovereign-wealth signal
complement — rather than dominate — the classical liquid-reserves
and fiscal-space signals.

Implementation

- RESILIENCE_DIMENSION_WEIGHTS: new Record<ResilienceDimensionId, number>
  alongside RESILIENCE_DOMAIN_WEIGHTS. Every dim has an explicit entry
  (default 1.0) so rebalance decisions stay auditable; the two new
  recovery dims carry 0.5 each.

  Share math at full coverage (6 active recovery dims):
    weight sum                  = 4 × 1.0 + 2 × 0.5 = 5.0
    each new-dim share          = 0.5 / 5.0 = 0.10  ✓
    each core-dim share         = 1.0 / 5.0 = 0.20

  Retired dims (reserveAdequacy, fuelStockDays) keep weight 1.0 in
  the map; their coverage=0 neutralizes them at the coverage channel
  regardless. Explicit entries guard against a future scorer bug
  accidentally returning coverage>0 for a retired dim and falling
  through the `?? 1.0` default — every retirement decision is now
  tied to a single explicit source of truth.

- coverageWeightedMean (_shared.ts): refactored to apply
  `coverage × dimWeight` per dim instead of `coverage` alone. Backward-
  compatible when all weights default to 1.0 (reduces to the original
  mean). All three aggregation callers — buildDomainList, baseline-
  Score, stressScore — pick up the weighting transparently.

Test coverage

1. New `tests/resilience-recovery-weight-rebalance.test.mts`:
   pins the per-dim weight values, asserts the share math
   (0.10 new / 0.20 core), verifies completeness of the weight map,
   and documents why retired dims stay in the map at 1.0.
2. New `tests/resilience-recovery-ordering.test.mts`: fixture-based
   Spearman-proxy sensitivity check. Asserts NO > US > YE ordering
   preserved on both the overall score and the recovery-domain
   subscore after the rebalance. (Live post-merge Spearman rerun
   against the PR 0 snapshot is tracked as a follow-up commit.)
3. resilience-scorers.test.mts fixture anchors updated in lockstep:
     baselineScore: 60.35 → 62.17 (low-scoring liquidReserveAdequacy
       + partial-coverage SWF now contribute ~half the weight)
     overallScore:  63.60 → 64.39 (recovery subscore lifts by ~3 pts
       from the rebalance, overall by ~0.79)
     recovery flat mean: 48.75 (unchanged — flat mean doesn't apply
       weights by design; documents the coverage-weighted diff)
   Local coverageWeightedMean helper in the test mirrors the
   production implementation (weights applied per dim).

Methodology doc

- New "Per-dimension weights in the recovery domain" subsection with
  the weight table and a sentence explaining the cap. Cross-references
  the source of truth (RESILIENCE_DIMENSION_WEIGHTS).

Deliberate non-goals

- Live post-merge Spearman ≥0.85 check against the PR 0 baseline
  snapshot. Fixture ordering is preserved (new ordering test); the
  live-data check runs after Railway cron refreshes the rankings on
  the new weights and commits docs/snapshots/resilience-ranking-live-
  post-pr2-<date>.json. Tracked as the final piece of PR 2 §3.4
  alongside the health.js / bootstrap graduation (waiting on the
  7-day Railway cron bake-in window).

Tests: 6588/6588 data-tier tests pass. Typecheck clean on both
tsconfig configs. Biome clean on touched files. NO > US > YE
fixture ordering preserved.

* fix(resilience): PR 2 review — thread RESILIENCE_DIMENSION_WEIGHTS through the comparison harness

Greptile P2: the operator comparison harness
(scripts/compare-resilience-current-vs-proposed.mjs) claims its domain
scores "mirror the production scorer's coverage-weighted mean" and is
the artifact generator for Spearman / rank-delta acceptance decisions.
After PR 2 §3.4's weight rebalance, the production mirror diverged —
production now applies RESILIENCE_DIMENSION_WEIGHTS (liquidReserveAdequacy
= 0.5, sovereignFiscalBuffer = 0.5) inside coverageWeightedMean, but
the harness still used equal-weight aggregation.

Left unfixed, post-merge Spearman / rank-delta diagnostics would
compare live API scores (with the 0.5 recovery weights) against
harness predictions that assume equal-share dims — silently biasing
every acceptance decision until someone noticed a country's rank-
delta didn't track.

Fix

- Mirrored coverageWeightedMean now accepts dimensionWeights and
  applies `coverage × weight` per dim, matching _shared.ts exactly.
- Mirrored buildDomainList accepts + forwards dimensionWeights.
- main() imports RESILIENCE_DIMENSION_WEIGHTS from the scorer module
  and passes it through to buildDomainList at the single call site.
- Missing-entry default = 1.0 (same contract as production) — makes
  the harness forward-compatible with any future weight refactor
  (adds a new dim without an explicit entry, old production fallback
  path still produces the correct number).

Verification

- Harness syntax-check clean (node -c).
- RESILIENCE_DIMENSION_WEIGHTS import resolves correctly from the
  harness's import path.
- 509/509 resilience tests still pass (harness isn't in the test
  suite; the invariant is that production ↔ harness use the same
  math, and the production side is covered by tests/resilience-
  recovery-weight-rebalance.test.mts).

* fix(resilience): PR 2 review — bump cache prefixes v10→v11 + document coverage-vs-weight asymmetry

Greptile P1 + P2 on PR #3328.

P1 — cache prefix not bumped after formula change
--------------------------------------------------
The per-dim weight rebalance changes the score formula, but the
`_formula` tag only distinguishes 'd6' vs 'pc' (pillar-combined vs
legacy 6-domain) — it does NOT detect intra-'d6' weight changes. Left
unfixed, scores cached before deploy would be served with the old
equal-weight math for up to the full 6h TTL, and the ranking key for
up to its 12h TTL. Matches the established v9→v10 pattern for every
prior formula-changing deploy.

Bumped in lockstep:
 - RESILIENCE_SCORE_CACHE_PREFIX:     v10  → v11
 - RESILIENCE_RANKING_CACHE_KEY:      v10  → v11
 - RESILIENCE_HISTORY_KEY_PREFIX:      v5  → v6
 - scripts/seed-resilience-scores.mjs local mirrors
 - api/health.js resilienceRanking literal
 - 4 analysis/backtest scripts that read the cached keys directly
 - Test fixtures in resilience-{ranking, handlers, scores-seed,
   pillar-aggregation}.test.* that assert on literal key values

The v5→v6 history bump is the critical one: without it, pre-rebalance
history points would mix with post-rebalance points inside the 30-day
window, and change30d / trend math would diff values from different
formulas against each other, producing false-negative "falling" trends
for every country across the deploy window.

P2 — coverage-vs-weight asymmetry in computeLowConfidence / computeOverallCoverage
----------------------------------------------------------------------------------
Reviewer flagged that these two functions still average coverage
equally across all non-retired dims, even after the scoring aggregation
started applying RESILIENCE_DIMENSION_WEIGHTS. The asymmetry is
INTENTIONAL — these signals answer a different question from scoring:

  scoring aggregation: "how much does each dim matter to the score?"
  coverage signal:     "how much real data do we have on this country?"

A dim at weight 0.5 still has the same data-availability footprint as
a weight=1.0 dim: its coverage value reflects whether we successfully
fetched the upstream source, not whether the scorer cares about it.
Applying scoring weights to the coverage signal would let a
half-weight dim hide half its sparsity from the overallCoverage pill,
misleading users reading coverage as a data-quality indicator.

Added explicit comments to both functions noting the asymmetry is
deliberate and pointing at the other site for matching rationale.
No code change — just documentation.

Tests: 6588/6588 data-tier tests pass (+511 resilience-specific
including the prefix-literal assertions). Typecheck clean on both
tsconfig configs. Biome clean on touched files.

* docs(resilience): bump methodology doc cache-prefix references to v11/v6

Greptile P2 on PR #3328: Redis keys table in the reproducibility
appendix still published `score:v10` / `ranking:v10` / `history:v5`,
and the rollback instructions told operators to flush those keys.
After the recovery-domain weight rebalance, live cache runs at
`score:v11` / `ranking:v11` / `history:v6`.

- Updated the Redis keys table (line 490-492) to match `_shared.ts`.
- Updated the rollback block to name the current keys.
- Left the historical "Activation sequence" narrative intact (it
  accurately describes the pillar-combine PR's v9→v10 / v4→v5 bump)
  but added a parenthetical pointing at the current v11/v6 values.

No code change — doc-only correction for operator accuracy.

* fix(docs): escape MDX-unsafe `<137` pattern to unblock Mintlify deploy

Line 643 had `(<137 countries)` — MDX parses `<137` as a JSX tag
starting with digit `1`, which is illegal and breaks the deploy with
"Unexpected character \`1\` (U+0031) before name". Surfaced after the
prior cache-prefix commit forced Mintlify to re-parse this file.

Replaced with "fewer than 137 countries" for unambiguous rendering.
Other `<` occurrences in this doc (lines 34, 642) are followed by
whitespace and don't trip MDX's tag parser.
2026-04-23 10:25:18 +04:00

180 lines
6.4 KiB
TypeScript

import assert from 'node:assert/strict';
import { describe, it } from 'node:test';
import {
PENALTY_ALPHA,
RESILIENCE_SCORE_CACHE_PREFIX,
penalizedPillarScore,
} from '../server/worldmonitor/resilience/v1/_shared.ts';
import {
PILLAR_DOMAINS,
PILLAR_ORDER,
PILLAR_WEIGHTS,
buildPillarList,
type ResiliencePillarId,
} from '../server/worldmonitor/resilience/v1/_pillar-membership.ts';
import type { ResilienceDomain } from '../src/generated/server/worldmonitor/resilience/v1/service_server.ts';
function makeDomain(id: string, score: number, coverage: number): ResilienceDomain {
return {
id,
score,
weight: 0.17,
dimensions: [
{ id: `${id}-d1`, score, coverage, observedWeight: coverage, imputedWeight: 1 - coverage, imputationClass: '', freshness: { lastObservedAtMs: '0', staleness: '' } },
],
};
}
describe('penalizedPillarScore', () => {
it('returns 0 for empty pillars', () => {
assert.equal(penalizedPillarScore([]), 0);
});
it('equal pillar scores produce minimal penalty (penalty factor approaches 1)', () => {
const pillars = [
{ score: 60, weight: 0.40 },
{ score: 60, weight: 0.35 },
{ score: 60, weight: 0.25 },
];
const result = penalizedPillarScore(pillars);
const weighted = 60 * 0.40 + 60 * 0.35 + 60 * 0.25;
const penalty = 1 - 0.5 * (1 - 60 / 100);
assert.equal(result, Math.round(weighted * penalty * 100) / 100);
});
it('one pillar at 0 applies maximum penalty (factor = 0.5 at alpha=0.5)', () => {
const pillars = [
{ score: 80, weight: 0.40 },
{ score: 70, weight: 0.35 },
{ score: 0, weight: 0.25 },
];
const result = penalizedPillarScore(pillars);
const weighted = 80 * 0.40 + 70 * 0.35 + 0 * 0.25;
const penalty = 1 - 0.5 * (1 - 0 / 100);
assert.equal(result, Math.round(weighted * penalty * 100) / 100);
assert.equal(penalty, 0.5);
});
it('realistic scores (S=70, L=45, R=60) produce expected value', () => {
const pillars = [
{ score: 70, weight: 0.40 },
{ score: 45, weight: 0.35 },
{ score: 60, weight: 0.25 },
];
const result = penalizedPillarScore(pillars);
const weighted = 70 * 0.40 + 45 * 0.35 + 60 * 0.25;
const minScore = 45;
const penalty = 1 - 0.5 * (1 - minScore / 100);
const expected = Math.round(weighted * penalty * 100) / 100;
assert.equal(result, expected);
assert.ok(result > 0 && result < 100, `result=${result} should be in (0,100)`);
});
it('all pillars at 100 produce no penalty (factor = 1.0)', () => {
const pillars = [
{ score: 100, weight: 0.40 },
{ score: 100, weight: 0.35 },
{ score: 100, weight: 0.25 },
];
const result = penalizedPillarScore(pillars);
assert.equal(result, 100);
});
});
describe('buildPillarList', () => {
it('returns empty array when schemaV2Enabled is false', () => {
const domains: ResilienceDomain[] = [makeDomain('economic', 75, 0.9)];
assert.deepEqual(buildPillarList(domains, false), []);
});
it('produces 3 pillars with non-zero scores from real domain data', () => {
const domains: ResilienceDomain[] = [
makeDomain('economic', 75, 0.9),
makeDomain('social-governance', 65, 0.85),
makeDomain('infrastructure', 70, 0.8),
makeDomain('energy', 60, 0.7),
makeDomain('health-food', 55, 0.75),
makeDomain('recovery', 50, 0.6),
];
const pillars = buildPillarList(domains, true);
assert.equal(pillars.length, 3);
for (const pillar of pillars) {
assert.ok(pillar.score > 0, `pillar ${pillar.id} score should be > 0, got ${pillar.score}`);
assert.ok(pillar.coverage > 0, `pillar ${pillar.id} coverage should be > 0, got ${pillar.coverage}`);
}
});
it('recovery-capacity pillar contains the recovery domain', () => {
const domains: ResilienceDomain[] = [
makeDomain('economic', 75, 0.9),
makeDomain('social-governance', 65, 0.85),
makeDomain('infrastructure', 70, 0.8),
makeDomain('energy', 60, 0.7),
makeDomain('health-food', 55, 0.75),
makeDomain('recovery', 50, 0.6),
];
const pillars = buildPillarList(domains, true);
const recovery = pillars.find((p) => p.id === 'recovery-capacity');
assert.ok(recovery, 'recovery-capacity pillar should exist');
assert.equal(recovery!.domains.length, 1, 'recovery-capacity pillar should have 1 domain');
assert.equal(recovery!.domains[0]!.id, 'recovery');
});
it('pillar weights match PILLAR_WEIGHTS', () => {
const domains: ResilienceDomain[] = [
makeDomain('economic', 75, 0.9),
makeDomain('social-governance', 65, 0.85),
makeDomain('infrastructure', 70, 0.8),
makeDomain('energy', 60, 0.7),
makeDomain('health-food', 55, 0.75),
makeDomain('recovery', 50, 0.6),
];
const pillars = buildPillarList(domains, true);
for (const pillar of pillars) {
assert.equal(pillar.weight, PILLAR_WEIGHTS[pillar.id as ResiliencePillarId]);
}
});
it('structural-readiness contains economic + social-governance', () => {
const domains: ResilienceDomain[] = [
makeDomain('economic', 75, 0.9),
makeDomain('social-governance', 65, 0.85),
makeDomain('infrastructure', 70, 0.8),
makeDomain('energy', 60, 0.7),
makeDomain('health-food', 55, 0.75),
makeDomain('recovery', 50, 0.6),
];
const pillars = buildPillarList(domains, true);
const sr = pillars.find((p) => p.id === 'structural-readiness')!;
const domainIds = sr.domains.map((d) => d.id).sort();
assert.deepEqual(domainIds, ['economic', 'social-governance']);
});
});
describe('pillar constants', () => {
it('PENALTY_ALPHA equals 0.50', () => {
assert.equal(PENALTY_ALPHA, 0.50);
});
it('RESILIENCE_SCORE_CACHE_PREFIX is v10', () => {
assert.equal(RESILIENCE_SCORE_CACHE_PREFIX, 'resilience:score:v11:');
});
it('PILLAR_ORDER has 3 entries', () => {
assert.equal(PILLAR_ORDER.length, 3);
});
it('pillar weights sum to 1.0', () => {
const sum = PILLAR_ORDER.reduce((s, id) => s + PILLAR_WEIGHTS[id], 0);
assert.ok(Math.abs(sum - 1.0) < 0.001, `pillar weights sum to ${sum}, expected 1.0`);
});
it('every domain appears in exactly one pillar', () => {
const allDomains = PILLAR_ORDER.flatMap((id) => PILLAR_DOMAINS[id]);
const unique = new Set(allDomains);
assert.equal(allDomains.length, unique.size, 'no domain should appear in multiple pillars');
assert.equal(unique.size, 6, 'all 6 domains should be covered');
});
});