Files
worldmonitor/server
Elie Habib 3c47c1b222 fix(supply-chain): split chokepoint transit data + close silent zero-state cache (#3185)
* fix(supply-chain): split chokepoint transit data + close silent zero-state cache

Production supply-chain panel was rendering 13 empty chokepoints because
the getChokepointStatus RPC silently cached zero-state for 5 minutes:

1. supply_chain:transit-summaries:v1 grew to ~500 KB (180d × 13 × 14 fields
   of history per chokepoint).
2. REDIS_OP_TIMEOUT_MS is 1.5 s. Vercel Sydney edge → Upstash for a 500 KB
   GET consistently exceeded the budget; getCachedJson caught the AbortError
   and returned null.
3. The 500 KB portwatch fallback read hit the same timeout.
4. summaries = {} → every summaries[cp.id] was undefined → 13 chokepoints
   got the zero-state default → cached as a non-null success response for
   REDIS_CACHE_TTL (5 min) instead of NEG_SENTINEL (120 s).

Fix (one PR, per docs/plans/chokepoint-rpc-payload-split.md):

- ais-relay.cjs: split seedTransitSummaries output.
  - supply_chain:transit-summaries:v1 — compact (~30 KB, no history).
  - supply_chain:transit-summaries:history:v1:{id} — per chokepoint
    (~35 KB each, 13 keys). Both under the 1.5 s Redis read budget.
- New RPC GetChokepointHistory: lazy-loaded on card expand.
- get-chokepoint-status.ts: drop the 500 KB portwatch/corridorrisk/
  chokepoint_transits fallback reads. Treat a null transit-summaries
  read as upstreamUnavailable=true so cachedFetchJson writes NEG_SENTINEL
  (2 min) instead of a 5-min zero-state pin. Omit history from the
  response (proto field stays declared; empty array).
- server/_shared/redis.ts: tag AbortError timeouts with [REDIS-TIMEOUT]
  key=… timeoutMs=… so log drains / Sentry-Vercel integration pick up
  large-payload timeouts instead of them being silently swallowed.
- SupplyChainPanel.ts + MapPopup.ts: lazy-fetch history on card expand
  via fetchChokepointHistory; session-scoped cache; graceful "History
  unavailable" on empty/error. PRO gating on the map popup unchanged.
- Gateway: cache-tier entry for /get-chokepoint-history (slow).
- Tests: regression guards for upstreamUnavailable gate + per-id key
  shape + handler wiring + proto query annotations.

Audit included in plan: no other RPC consumer read stacks >200 KB
besides displacement:summary:v1:2026 (724 KB, same risk, flagged for
follow-up PR). wildfire:fires:v1 at 1.7 MB loads via bootstrap (3 s
timeout, different path) — monitor but out of scope.

Expected impact:
- supply_chain:chokepoints:v4 payload drops from ~508 KB to <100 KB.
- supply_chain:transit-summaries:v1 drops from ~502 KB to <50 KB.
- RPC Redis reads stay well under 1.5 s in the hot path.
- Silent zero-state pinning is now impossible: null reads → 2-min neg
  cache → self-heal on next relay tick.

* fix(supply-chain): address PR #3185 review — stop caching empty/error + fix partial coverage

Two P1 regressions caught in review:

1. Client cache poisoning on empty/error (MapPopup.ts, SupplyChainPanel.ts)
   Empty-array is truthy in JS, so MapPopup's `!cached && !inflight` branch
   never fired once we cached []. Neither `cached && cached.length` fired
   either — popup stuck on "Loading transit history..." for the session.
   SupplyChainPanel had the explicit `cached && !cached.length` branch but
   still never retried, so the same transient became session-sticky there too.

   Fix: cache ONLY non-empty successful responses. Empty/error show the
   "History unavailable" placeholder but leave the cache untouched, so the
   next re-expand retries. The /get-chokepoint-history gateway tier is
   "slow" (5-min CF edge cache) → retries stay cheap.

2. Partial portwatch coverage treated as healthy (ais-relay.cjs)
   seedTransitSummaries iterated Object.entries(pw), so if seed-portwatch
   dropped N of 13 chokepoints (ArcGIS reject/empty), summaries had <13 keys.
   get-chokepoint-status upstreamUnavailable fires only on fully-empty
   summaries, so the N missing chokepoints fell through to zero-state rows
   that got pinned in cache for 5 minutes.

   Fix: iterate CANONICAL_IDS (Object.keys(CHOKEPOINT_THREAT_LEVELS)) and
   fill zero-state for any ID missing from pw. Shape is consistently 13
   keys. Track pwCovered → envelope + seed-meta recordCount reflect real
   upstream coverage (not shape size), so health.js can distinguish 13/13
   healthy from 10/13 partial. Warn-log on shortfall.

Tests: new regression guards
- panel must NOT cache empty arrays (historyCache.set with []).
- writer must iterate CANONICAL_IDS, not Object.entries(pw).
- seed-meta recordCount binds to pwCovered.

5718/5718 data tests pass. typecheck + typecheck:api clean.
2026-04-18 23:14:00 +04:00
..