Files
worldmonitor/tests/resilience-pillar-combine-activation.test.mts
Elie Habib fbaf07e106 feat(resilience): flag-gated pillar-combined score activation (default off) (#3267)
Wires the non-compensatory 3-pillar combined overall_score behind a
RESILIENCE_PILLAR_COMBINE_ENABLED env flag. Default is false so this PR
ships zero behavior change in production. When flipped true the
top-level overall_score switches from the 6-domain weighted aggregate
to penalizedPillarScore(pillars) with alpha 0.5 and pillar weights
0.40 / 0.35 / 0.25.

Evidence from docs/snapshots/resilience-pillar-sensitivity-2026-04-21:
- Spearman rank correlation current vs proposed 0.9935
- Mean score delta -13.44 points (every country drops, penalty is
  always at most 1)
- Max top-50 rank swing 6 positions (Russia)
- No ceiling or floor effects under plus/minus 20pct perturbation
- Release gate PASS 0/19

Code change in server/worldmonitor/resilience/v1/_shared.ts:
- New isPillarCombineEnabled() reads env dynamically so tests can flip
  state without reloading the module
- overallScore branches on (isPillarCombineEnabled() AND
  RESILIENCE_SCHEMA_V2_ENABLED AND pillars.length > 0); otherwise falls
  through to the 6-domain aggregate (unchanged default path)
- RESILIENCE_SCORE_CACHE_PREFIX bumped v9 to v10
- RESILIENCE_RANKING_CACHE_KEY bumped v9 to v10

Cache invalidation: the version bump forces both per-country score
cache and ranking cache to recompute from the current code path on
first read after a flag flip. Without the bump, 6-domain values cached
under the flag-off path would continue to serve for up to 6-12 hours
after the flip, producing a ragged mix of formulas.

Ripple of v9 to v10:
- api/health.js registry entry
- scripts/seed-resilience-scores.mjs (both keys)
- scripts/validate-resilience-correlation.mjs,
  scripts/backtest-resilience-outcomes.mjs,
  scripts/validate-resilience-backtest.mjs,
  scripts/benchmark-resilience-external.mjs
- tests/resilience-ranking.test.mts 24 fixture usages
- tests/resilience-handlers.test.mts
- tests/resilience-scores-seed.test.mjs explicit pin
- tests/resilience-pillar-aggregation.test.mts explicit pin
- docs/methodology/country-resilience-index.mdx

New tests/resilience-pillar-combine-activation.test.mts:
7 assertions exercising the flag-on path against the release fixtures
with re-anchored bands (NO at least 60, YE/SO at most 40, NO greater
than US preserved, elite greater than fragile). Regression guard
verifies flipping the flag back off restores the 6-domain aggregate.

tests/resilience-ranking-snapshot.test.mts: band thresholds now
resolve from a METHODOLOGY_BANDS table keyed on
snapshot.methodologyFormula. Backward compatible (missing formula
defaults to domain-weighted-6d bands).

Snapshots:
- docs/snapshots/resilience-ranking-2026-04-21.json tagged
  methodologyFormula domain-weighted-6d
- docs/snapshots/resilience-ranking-pillar-combined-projected-2026-04-21.json
  new: top/bottom/major-economies tables projected from the
  52-country sensitivity sample. Explicitly tagged projected (NOT a
  full-universe live capture). When the flag is flipped in production,
  run scripts/freeze-resilience-ranking.mjs to capture the
  authoritative full-universe snapshot.

Methodology doc: Pillar-combined score activation section rewritten to
describe the flag-gated mechanism (activation is an env-var flip, no
code deploy) and the rollback path.

Verification: npm run typecheck:all clean, 397/397 resilience tests
pass (up from 390, +7 activation tests).

Activation plan:
1. Merge this PR with flag default false (zero behavior change)
2. Set RESILIENCE_PILLAR_COMBINE_ENABLED=true in Vercel and Railway env
3. Redeploy or wait for next cold start; v9 to v10 bump forces every
   country to be rescored on first read
4. Run scripts/freeze-resilience-ranking.mjs against the flag-on
   deployment and commit the resulting snapshot
5. Ship a v2.0 methodology-change note explaining the re-anchored
   scale so analysts understand the universal ~13 point score drop is
   a scale rebase, not a country-level regression

Rollback: set RESILIENCE_PILLAR_COMBINE_ENABLED=false, flush
resilience:score:v10:* and resilience:ranking:v10 keys (or wait for
TTLs). The 6-domain formula stays alongside the pillar combine in
_shared.ts and needs no code change to come back.
2026-04-22 06:52:07 +04:00

248 lines
11 KiB
TypeScript
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
// Phase 2 T2.3 activation test suite.
//
// Exercises the `RESILIENCE_PILLAR_COMBINE_ENABLED` flag: when set,
// `overallScore` switches from the 6-domain weighted aggregate to the
// penalized pillar-combined form. The existing release-gate tests
// (tests/resilience-release-gate.test.mts) cover the default (flag=off)
// path and pin the anchors for the 6-domain formula; this file covers
// the re-anchored bands under the pillar combine.
//
// Why separate file: the existing release-gate test imports
// `getResilienceScore` at the top of the file (captures the legacy
// overallScore path) and runs many asserts that would become stale
// under the pillar combine. A separate file lets us flip the env flag
// in a per-test setup/teardown cleanly.
import assert from 'node:assert/strict';
import { afterEach, beforeEach, describe, it } from 'node:test';
import { getResilienceRanking } from '../server/worldmonitor/resilience/v1/get-resilience-ranking.ts';
import { getResilienceScore } from '../server/worldmonitor/resilience/v1/get-resilience-score.ts';
import {
isPillarCombineEnabled,
penalizedPillarScore,
} from '../server/worldmonitor/resilience/v1/_shared.ts';
import { createRedisFetch } from './helpers/fake-upstash-redis.mts';
import {
buildReleaseGateFixtures,
} from './helpers/resilience-release-fixtures.mts';
// Re-anchored bands for the pillar-combined formula, derived from the
// 52-country live-Redis sensitivity capture in
// docs/snapshots/resilience-pillar-sensitivity-2026-04-21.json.
// Old (6-domain): NO ≥ 70, YE/SO/CD ≤ 35, NO US ≥ 8.
// New (pillar combine, α=0.5): every country drops ~13 points, top
// stays ~65-72, fragile states drop to ~15-35. The re-anchored bands
// preserve the "high" vs "low" separation without pinning numbers that
// are only valid for the legacy formula.
const HIGH_BAND_FLOOR = 60;
const LOW_BAND_CEILING = 40;
const MIN_HIGH_LOW_SEPARATION = 20;
const fixtures = buildReleaseGateFixtures();
const originalFetch = globalThis.fetch;
const originalRedisUrl = process.env.UPSTASH_REDIS_REST_URL;
const originalRedisToken = process.env.UPSTASH_REDIS_REST_TOKEN;
const originalVercelEnv = process.env.VERCEL_ENV;
const originalPillarFlag = process.env.RESILIENCE_PILLAR_COMBINE_ENABLED;
function installRedisFixtures() {
process.env.UPSTASH_REDIS_REST_URL = 'https://redis.example';
process.env.UPSTASH_REDIS_REST_TOKEN = 'token';
delete process.env.VERCEL_ENV;
const redisState = createRedisFetch(fixtures);
globalThis.fetch = redisState.fetchImpl;
return redisState;
}
function enablePillarCombine(): void {
process.env.RESILIENCE_PILLAR_COMBINE_ENABLED = 'true';
}
function disablePillarCombine(): void {
process.env.RESILIENCE_PILLAR_COMBINE_ENABLED = 'false';
}
describe('pillar-combined score activation', () => {
beforeEach(() => {
enablePillarCombine();
});
afterEach(() => {
globalThis.fetch = originalFetch;
if (originalRedisUrl == null) delete process.env.UPSTASH_REDIS_REST_URL;
else process.env.UPSTASH_REDIS_REST_URL = originalRedisUrl;
if (originalRedisToken == null) delete process.env.UPSTASH_REDIS_REST_TOKEN;
else process.env.UPSTASH_REDIS_REST_TOKEN = originalRedisToken;
if (originalVercelEnv == null) delete process.env.VERCEL_ENV;
else process.env.VERCEL_ENV = originalVercelEnv;
if (originalPillarFlag == null) delete process.env.RESILIENCE_PILLAR_COMBINE_ENABLED;
else process.env.RESILIENCE_PILLAR_COMBINE_ENABLED = originalPillarFlag;
});
it('isPillarCombineEnabled reads env dynamically', () => {
enablePillarCombine();
assert.equal(isPillarCombineEnabled(), true);
disablePillarCombine();
assert.equal(isPillarCombineEnabled(), false);
enablePillarCombine();
assert.equal(isPillarCombineEnabled(), true);
});
it('penalizedPillarScore collapses to weighted-sum when all pillars equal (penalty minimal)', () => {
// All pillars at 80 → min=80 → penalty = 1 0.5*(1 0.8) = 0.9.
// Weighted sum = 80 * (0.40 + 0.35 + 0.25) = 80.
// Final = 80 * 0.9 = 72.
const result = penalizedPillarScore([
{ score: 80, weight: 0.40 },
{ score: 80, weight: 0.35 },
{ score: 80, weight: 0.25 },
]);
assert.equal(Math.round(result * 100) / 100, 72.00);
});
it('pillar-combined overallScore drops NO below the 6-domain band floor (expected, re-anchored)', async () => {
installRedisFixtures();
const response = await getResilienceScore(
{ request: new Request('https://example.com?countryCode=NO') } as never,
{ countryCode: 'NO' },
);
// Norway under the 6-domain formula scores ~86 under the current
// fixtures (pinned by T1.1 regression test). Under the pillar
// combine it drops to roughly the low-70s because penalty = 1
// 0.5 × (1 min_pillar/100) is always ≤ 1. The activated path's
// HIGH_BAND_FLOOR = 60 leaves plenty of headroom above mid-tier
// countries while accepting that elite scores no longer sit in the
// 85+ range.
assert.ok(
response.overallScore >= HIGH_BAND_FLOOR,
`NO in the pillar-combined formula must stay above the re-anchored high-band floor (${HIGH_BAND_FLOOR}), got ${response.overallScore}`,
);
assert.ok(
response.overallScore <= 90,
`NO in the pillar-combined formula should NOT exceed 90 — penalty factor is always ≤ 1, so getting close to 100 would indicate the penalty is not firing. Got ${response.overallScore}.`,
);
});
it('pillar-combined overallScore keeps fragile countries (YE, SO) below the re-anchored low-band ceiling', async () => {
installRedisFixtures();
for (const countryCode of ['YE', 'SO'] as const) {
const response = await getResilienceScore(
{ request: new Request(`https://example.com?countryCode=${countryCode}`) } as never,
{ countryCode },
);
assert.ok(
response.overallScore <= LOW_BAND_CEILING,
`${countryCode} in the pillar-combined formula must stay below the re-anchored low-band ceiling (${LOW_BAND_CEILING}), got ${response.overallScore}`,
);
}
});
it('pillar-combined preserves NO vs US separation (high-band vs mid-band)', async () => {
installRedisFixtures();
const [no, us] = await Promise.all([
getResilienceScore({ request: new Request('https://example.com?countryCode=NO') } as never, { countryCode: 'NO' }),
getResilienceScore({ request: new Request('https://example.com?countryCode=US') } as never, { countryCode: 'US' }),
]);
// The 6-domain separation was ~14 points under fixtures. The
// pillar combine amplifies penalty on imbalanced pillar profiles
// (US has a weaker live-shock pillar than Norway), so the
// separation is expected to hold or widen.
assert.ok(
no.overallScore > us.overallScore,
`NO (${no.overallScore}) must still outscore US (${us.overallScore}) under the pillar combine`,
);
assert.ok(
no.overallScore - us.overallScore >= MIN_HIGH_LOW_SEPARATION - 12,
`NO US separation must stay ≥ ${MIN_HIGH_LOW_SEPARATION - 12} under pillar combine; got NO=${no.overallScore}, US=${us.overallScore}, Δ=${(no.overallScore - us.overallScore).toFixed(2)}`,
);
});
it('pillar-combined ranking preserves the elite vs fragile ordering over the release set', async () => {
installRedisFixtures();
const ranking = await getResilienceRanking({ request: new Request('https://example.com') } as never, {});
const byCountry = new Map(ranking.items.map((item) => [item.countryCode, item]));
// Every high-band anchor (if present in the ranking) must outrank
// every low-band anchor (if present). This is the structural
// invariant the pillar combine must preserve to be accepted.
const highAnchors = ['NO', 'CH', 'DK', 'IS', 'FI', 'SE', 'NZ'].filter((cc) => byCountry.has(cc));
const lowAnchors = ['YE', 'SO', 'SD', 'CD'].filter((cc) => byCountry.has(cc));
for (const high of highAnchors) {
for (const low of lowAnchors) {
const highScore = byCountry.get(high)!.overallScore;
const lowScore = byCountry.get(low)!.overallScore;
assert.ok(
highScore > lowScore,
`pillar-combined ranking must keep ${high} (${highScore}) above ${low} (${lowScore})`,
);
}
}
});
it('disabling the flag restores the 6-domain aggregate (regression guard for the default path)', async () => {
installRedisFixtures();
disablePillarCombine();
const response = await getResilienceScore(
{ request: new Request('https://example.com?countryCode=NO') } as never,
{ countryCode: 'NO' },
);
// Under the 6-domain formula + current fixtures, NO is pinned at
// ≥ 70 by the existing release-gate test. The flag-off code path
// is the same one the production default uses; we verify here that
// switching the flag off mid-suite really does restore it (the
// dynamic env read in isPillarCombineEnabled() is load-bearing).
assert.ok(
response.overallScore >= 70,
`with flag off, NO must still meet the 6-domain release-gate floor (70), got ${response.overallScore}`,
);
});
it('flipping the flag mid-session rebuilds the score (stale-formula cache invalidation)', async () => {
// This is the core guarantee for the activation story: merging this
// PR with flag=false populates cached scores tagged _formula='d6',
// and later setting RESILIENCE_PILLAR_COMBINE_ENABLED=true MUST
// force a rebuild on next read (rather than serving the d6-tagged
// entry for up to 6h until the TTL expires). We simulate the flip
// inside a single test by pre-computing a cache entry with the
// flag off, flipping the flag, then reading again — the second
// read must produce a different overallScore because the cache
// entry's _formula no longer matches the current formula.
disablePillarCombine();
installRedisFixtures();
const firstRead = await getResilienceScore(
{ request: new Request('https://example.com?countryCode=NO') } as never,
{ countryCode: 'NO' },
);
assert.ok(firstRead.overallScore >= 70, `flag-off NO should score ≥70, got ${firstRead.overallScore}`);
// Flip the flag. The cached entry in Redis still carries
// _formula='d6' from the first read. Without the stale-formula
// gate, the second read would serve that same 6-domain score.
enablePillarCombine();
const secondRead = await getResilienceScore(
{ request: new Request('https://example.com?countryCode=NO') } as never,
{ countryCode: 'NO' },
);
assert.ok(
secondRead.overallScore < firstRead.overallScore,
`flag-on rebuild must drop NO's score below the 6-domain value (penalty factor ≤ 1); got first=${firstRead.overallScore} second=${secondRead.overallScore}. If these are equal, the stale-formula cache gate is not firing and a flag flip in production would serve legacy values for up to the 6h TTL.`,
);
assert.ok(
secondRead.overallScore >= 60,
`flag-on NO should still meet the re-anchored 60 floor, got ${secondRead.overallScore}`,
);
});
});