mirror of
https://github.com/koala73/worldmonitor.git
synced 2026-04-25 17:14:57 +02:00
* fix(supply-chain): split chokepoint transit data + close silent zero-state cache
Production supply-chain panel was rendering 13 empty chokepoints because
the getChokepointStatus RPC silently cached zero-state for 5 minutes:
1. supply_chain:transit-summaries:v1 grew to ~500 KB (180d × 13 × 14 fields
of history per chokepoint).
2. REDIS_OP_TIMEOUT_MS is 1.5 s. Vercel Sydney edge → Upstash for a 500 KB
GET consistently exceeded the budget; getCachedJson caught the AbortError
and returned null.
3. The 500 KB portwatch fallback read hit the same timeout.
4. summaries = {} → every summaries[cp.id] was undefined → 13 chokepoints
got the zero-state default → cached as a non-null success response for
REDIS_CACHE_TTL (5 min) instead of NEG_SENTINEL (120 s).
Fix (one PR, per docs/plans/chokepoint-rpc-payload-split.md):
- ais-relay.cjs: split seedTransitSummaries output.
- supply_chain:transit-summaries:v1 — compact (~30 KB, no history).
- supply_chain:transit-summaries:history:v1:{id} — per chokepoint
(~35 KB each, 13 keys). Both under the 1.5 s Redis read budget.
- New RPC GetChokepointHistory: lazy-loaded on card expand.
- get-chokepoint-status.ts: drop the 500 KB portwatch/corridorrisk/
chokepoint_transits fallback reads. Treat a null transit-summaries
read as upstreamUnavailable=true so cachedFetchJson writes NEG_SENTINEL
(2 min) instead of a 5-min zero-state pin. Omit history from the
response (proto field stays declared; empty array).
- server/_shared/redis.ts: tag AbortError timeouts with [REDIS-TIMEOUT]
key=… timeoutMs=… so log drains / Sentry-Vercel integration pick up
large-payload timeouts instead of them being silently swallowed.
- SupplyChainPanel.ts + MapPopup.ts: lazy-fetch history on card expand
via fetchChokepointHistory; session-scoped cache; graceful "History
unavailable" on empty/error. PRO gating on the map popup unchanged.
- Gateway: cache-tier entry for /get-chokepoint-history (slow).
- Tests: regression guards for upstreamUnavailable gate + per-id key
shape + handler wiring + proto query annotations.
Audit included in plan: no other RPC consumer read stacks >200 KB
besides displacement:summary:v1:2026 (724 KB, same risk, flagged for
follow-up PR). wildfire:fires:v1 at 1.7 MB loads via bootstrap (3 s
timeout, different path) — monitor but out of scope.
Expected impact:
- supply_chain:chokepoints:v4 payload drops from ~508 KB to <100 KB.
- supply_chain:transit-summaries:v1 drops from ~502 KB to <50 KB.
- RPC Redis reads stay well under 1.5 s in the hot path.
- Silent zero-state pinning is now impossible: null reads → 2-min neg
cache → self-heal on next relay tick.
* fix(supply-chain): address PR #3185 review — stop caching empty/error + fix partial coverage
Two P1 regressions caught in review:
1. Client cache poisoning on empty/error (MapPopup.ts, SupplyChainPanel.ts)
Empty-array is truthy in JS, so MapPopup's `!cached && !inflight` branch
never fired once we cached []. Neither `cached && cached.length` fired
either — popup stuck on "Loading transit history..." for the session.
SupplyChainPanel had the explicit `cached && !cached.length` branch but
still never retried, so the same transient became session-sticky there too.
Fix: cache ONLY non-empty successful responses. Empty/error show the
"History unavailable" placeholder but leave the cache untouched, so the
next re-expand retries. The /get-chokepoint-history gateway tier is
"slow" (5-min CF edge cache) → retries stay cheap.
2. Partial portwatch coverage treated as healthy (ais-relay.cjs)
seedTransitSummaries iterated Object.entries(pw), so if seed-portwatch
dropped N of 13 chokepoints (ArcGIS reject/empty), summaries had <13 keys.
get-chokepoint-status upstreamUnavailable fires only on fully-empty
summaries, so the N missing chokepoints fell through to zero-state rows
that got pinned in cache for 5 minutes.
Fix: iterate CANONICAL_IDS (Object.keys(CHOKEPOINT_THREAT_LEVELS)) and
fill zero-state for any ID missing from pw. Shape is consistently 13
keys. Track pwCovered → envelope + seed-meta recordCount reflect real
upstream coverage (not shape size), so health.js can distinguish 13/13
healthy from 10/13 partial. Warn-log on shortfall.
Tests: new regression guards
- panel must NOT cache empty arrays (historyCache.set with []).
- writer must iterate CANONICAL_IDS, not Object.entries(pw).
- seed-meta recordCount binds to pwCovered.
5718/5718 data tests pass. typecheck + typecheck:api clean.