fix(supply-chain): split chokepoint transit data + close silent zero-state cache (#3185)

* fix(supply-chain): split chokepoint transit data + close silent zero-state cache

Production supply-chain panel was rendering 13 empty chokepoints because
the getChokepointStatus RPC silently cached zero-state for 5 minutes:

1. supply_chain:transit-summaries:v1 grew to ~500 KB (180d × 13 × 14 fields
   of history per chokepoint).
2. REDIS_OP_TIMEOUT_MS is 1.5 s. Vercel Sydney edge → Upstash for a 500 KB
   GET consistently exceeded the budget; getCachedJson caught the AbortError
   and returned null.
3. The 500 KB portwatch fallback read hit the same timeout.
4. summaries = {} → every summaries[cp.id] was undefined → 13 chokepoints
   got the zero-state default → cached as a non-null success response for
   REDIS_CACHE_TTL (5 min) instead of NEG_SENTINEL (120 s).

Fix (one PR, per docs/plans/chokepoint-rpc-payload-split.md):

- ais-relay.cjs: split seedTransitSummaries output.
  - supply_chain:transit-summaries:v1 — compact (~30 KB, no history).
  - supply_chain:transit-summaries:history:v1:{id} — per chokepoint
    (~35 KB each, 13 keys). Both under the 1.5 s Redis read budget.
- New RPC GetChokepointHistory: lazy-loaded on card expand.
- get-chokepoint-status.ts: drop the 500 KB portwatch/corridorrisk/
  chokepoint_transits fallback reads. Treat a null transit-summaries
  read as upstreamUnavailable=true so cachedFetchJson writes NEG_SENTINEL
  (2 min) instead of a 5-min zero-state pin. Omit history from the
  response (proto field stays declared; empty array).
- server/_shared/redis.ts: tag AbortError timeouts with [REDIS-TIMEOUT]
  key=… timeoutMs=… so log drains / Sentry-Vercel integration pick up
  large-payload timeouts instead of them being silently swallowed.
- SupplyChainPanel.ts + MapPopup.ts: lazy-fetch history on card expand
  via fetchChokepointHistory; session-scoped cache; graceful "History
  unavailable" on empty/error. PRO gating on the map popup unchanged.
- Gateway: cache-tier entry for /get-chokepoint-history (slow).
- Tests: regression guards for upstreamUnavailable gate + per-id key
  shape + handler wiring + proto query annotations.

Audit included in plan: no other RPC consumer read stacks >200 KB
besides displacement:summary:v1:2026 (724 KB, same risk, flagged for
follow-up PR). wildfire:fires:v1 at 1.7 MB loads via bootstrap (3 s
timeout, different path) — monitor but out of scope.

Expected impact:
- supply_chain:chokepoints:v4 payload drops from ~508 KB to <100 KB.
- supply_chain:transit-summaries:v1 drops from ~502 KB to <50 KB.
- RPC Redis reads stay well under 1.5 s in the hot path.
- Silent zero-state pinning is now impossible: null reads → 2-min neg
  cache → self-heal on next relay tick.

* fix(supply-chain): address PR #3185 review — stop caching empty/error + fix partial coverage

Two P1 regressions caught in review:

1. Client cache poisoning on empty/error (MapPopup.ts, SupplyChainPanel.ts)
   Empty-array is truthy in JS, so MapPopup's `!cached && !inflight` branch
   never fired once we cached []. Neither `cached && cached.length` fired
   either — popup stuck on "Loading transit history..." for the session.
   SupplyChainPanel had the explicit `cached && !cached.length` branch but
   still never retried, so the same transient became session-sticky there too.

   Fix: cache ONLY non-empty successful responses. Empty/error show the
   "History unavailable" placeholder but leave the cache untouched, so the
   next re-expand retries. The /get-chokepoint-history gateway tier is
   "slow" (5-min CF edge cache) → retries stay cheap.

2. Partial portwatch coverage treated as healthy (ais-relay.cjs)
   seedTransitSummaries iterated Object.entries(pw), so if seed-portwatch
   dropped N of 13 chokepoints (ArcGIS reject/empty), summaries had <13 keys.
   get-chokepoint-status upstreamUnavailable fires only on fully-empty
   summaries, so the N missing chokepoints fell through to zero-state rows
   that got pinned in cache for 5 minutes.

   Fix: iterate CANONICAL_IDS (Object.keys(CHOKEPOINT_THREAT_LEVELS)) and
   fill zero-state for any ID missing from pw. Shape is consistently 13
   keys. Track pwCovered → envelope + seed-meta recordCount reflect real
   upstream coverage (not shape size), so health.js can distinguish 13/13
   healthy from 10/13 partial. Warn-log on shortfall.

Tests: new regression guards
- panel must NOT cache empty arrays (historyCache.set with []).
- writer must iterate CANONICAL_IDS, not Object.entries(pw).
- seed-meta recordCount binds to pwCovered.

5718/5718 data tests pass. typecheck + typecheck:api clean.
This commit is contained in:
Elie Habib
2026-04-18 23:14:00 +04:00
committed by GitHub
parent d1e084061d
commit 3c47c1b222
19 changed files with 633 additions and 119 deletions

View File

@@ -53,6 +53,41 @@ paths:
application/json:
schema:
$ref: '#/components/schemas/Error'
/api/supply-chain/v1/get-chokepoint-history:
get:
tags:
- SupplyChainService
summary: GetChokepointHistory
description: |-
GetChokepointHistory returns transit-count history for a single chokepoint,
loaded lazily on card expand. Keeps the status RPC compact (no 180-day
history per chokepoint on every call).
operationId: GetChokepointHistory
parameters:
- name: chokepointId
in: query
required: false
schema:
type: string
responses:
"200":
description: Successful response
content:
application/json:
schema:
$ref: '#/components/schemas/GetChokepointHistoryResponse'
"400":
description: Validation error
content:
application/json:
schema:
$ref: '#/components/schemas/ValidationError'
default:
description: Error response
content:
application/json:
schema:
$ref: '#/components/schemas/Error'
/api/supply-chain/v1/get-critical-minerals:
get:
tags:
@@ -623,6 +658,29 @@ components:
type: string
hazardAlertName:
type: string
GetChokepointHistoryRequest:
type: object
properties:
chokepointId:
type: string
required:
- chokepointId
description: |-
GetChokepointHistory returns the transit-count history for a single
chokepoint. Loaded lazily on card expand so the main chokepoint-status
response can stay compact (no 180-day history per chokepoint).
GetChokepointHistoryResponse:
type: object
properties:
chokepointId:
type: string
history:
type: array
items:
$ref: '#/components/schemas/TransitDayCount'
fetchedAt:
type: string
format: int64
GetCriticalMineralsRequest:
type: object
GetCriticalMineralsResponse: