Files
worldmonitor/server
Elie Habib bdfb415f8f fix(resilience-ranking): return warmed scores from memory, skip lossy re-read (#3121)
* fix(resilience-ranking): return warmed scores from memory, skip lossy re-read

Upstash REST writes via /set aren't always visible to an immediately-following
/pipeline GET in the same Vercel invocation (documented in PR #3057 /
feedback_upstash_write_reread_race_in_handler.md). The ranking handler was
warming 222 countries then re-reading them from Redis to compute a coverage
ratio; that re-read could return 0 despite every SET succeeding, collapsing
coverage to 0% < 75% and silently dropping the ranking publish. Consequence:
`resilience:ranking:v9` missing, per-country score keys absent, health
reports EMPTY_ON_DEMAND even while the seeder keeps writing a "fresh" meta.

Fix: warmMissingResilienceScores now returns Map<cc, GetResilienceScoreResponse>
with every successfully computed score. The handler merges those into
cachedScores directly and drops the post-warm re-read. Coverage now reflects
what was actually computed in-memory this invocation, not what Redis happens
to surface after write lag.

Adds a regression test that simulates pipeline-GET returning null for
freshly-SET score keys; it fails against the old code (coverage=0, no
ranking written) and passes with the fix (coverage=100%, ranking written).

Slice A of the resilience-ranking recovery plan; Slice B (seeder meta
truthfulness) follows.

* fix(resilience-ranking): verify score-key persistence via pipeline SET response

PR review P1: trusting every fulfilled ensureResilienceScoreCached() result as
"cached" turned the read-lag fix into a write-failure false positive.
cachedFetchJson's underlying setCachedJson only logs and swallows write errors,
so a transient /set failure on resilience:score:v9:* would leave per-country
scores absent while the ranking aggregate and its seed-meta got published on
top of them — worse than the bug this PR was meant to fix.

Fix: use the pipeline SET response as the authoritative persistence signal.

- Extract the score builder into a pure `buildResilienceScore()` with no
  caching side-effects (appendHistory stays — it's part of the score
  semantics).
- `ensureResilienceScoreCached()` still wraps it in cachedFetchJson so
  single-country RPC callers keep their log-and-return-anyway behavior.
- `warmMissingResilienceScores()` now computes in-memory, persists all
  scores in one pipeline SET, and only returns countries whose command
  reported `result: OK`. Pipeline SET's response is synchronous with the
  write, so OK means actually stored — no ambiguity with read-after-write
  lag.
- When runRedisPipeline returns fewer responses than commands (transport
  failure), return an empty map: no proof of persistence → coverage gate
  can't false-positive.

Adds regression test that blocks pipeline SETs to score keys and asserts
the ranking + meta are NOT published. Existing race-regression test still
passes.

* fix(resilience-ranking): preserve env key prefix on warm pipeline SET

PR review P1: the pipeline SET added to verify score-key persistence was
called with raw=true, bypassing the preview/dev key prefix (preview:<sha>:).
Two coupled regressions:

  1. Preview/dev deploys write unprefixed `resilience:score:v9:*` keys, but
     all reads (getCachedResilienceScores, ensureResilienceScoreCached via
     setCachedJson/cachedFetchJson) look in the prefixed namespace. Warmed
     scores become invisible to the same preview on the next read.
  2. Because production uses the empty prefix, preview writes land directly
     in the production-visible namespace, defeating the environment
     isolation guard in server/_shared/redis.ts.

Fix: drop the raw=true flag so runRedisPipeline applies prefixKey on each
command, symmetric with the reads. Adds __resetKeyPrefixCacheForTests in
redis.ts so tests can exercise a non-empty prefix without relying on
process-startup memoization order.

Adds regression test that simulates VERCEL_ENV=preview + a commit SHA and
asserts every score-key SET in the pipeline carries the preview:<sha>:
prefix. Fails on old code (raw writes), passes now. installRedis gains an
opt-in `keepVercelEnv` so the test can run under a forced env without
being clobbered by the helper's default reset.

* test(resilience-ranking): snapshot + restore VERCEL_GIT_COMMIT_SHA in afterEach

PR review P2: the preview-prefix test mutates process.env.VERCEL_GIT_COMMIT_SHA
but the file's afterEach only restored VERCEL_ENV. A process started with a
real preview SHA (e.g. CI) would have that value unconditionally deleted after
the test ran, leaking changed state into later tests and producing different
prefix behavior locally vs. CI.

Fix: capture originalVercelSha at module load, restore it in afterEach, and
invalidate the memoized key prefix after each test so the next one recomputes
against the restored env. The preview-prefix test's finally block is no
longer needed — the shared teardown handles it.

Verified: suite still passes 11/11 under both `VERCEL_ENV=production` (unset)
and `VERCEL_ENV=preview VERCEL_GIT_COMMIT_SHA=ci-original-sha` process
environments.
2026-04-16 10:17:22 +04:00
..