mirror of
https://github.com/koala73/worldmonitor.git
synced 2026-04-25 17:14:57 +02:00
* fix(supply-chain): popup-keyed history re-query + dataAvailable flag for partial coverage Two P1 findings on #3185 post-merge review: 1. MapPopup cross-chokepoint history contamination Popup's async history resolve re-queried [data-transit-chart] without a cpId key. User opens popup A → fetch starts for cpA; user opens popup B before it resolves → cpA's history mounts into cpB's chart container. Fix: add data-transit-chart-id keyed by cpId; re-query by it on resolve. Mirrors SupplyChainPanel's existing data-chart-cp-id pattern. 2. Partial portwatch coverage still looked healthy Previous fix emits all 13 canonical summaries (zero-state fill for missing IDs) and records pwCovered in seed-meta, but: - get-chokepoint-status still zero-filled missing chokepoints and cached the response as healthy — panel rendered silent empty rows. - api/health.js only degrades on recordCount=0, so 10/13 partial read as OK despite the UI hiding entire chokepoints. Fix: - proto: TransitSummary.data_available (field 12). Writer tags with Boolean(cpData). Status RPC passes through; defaults true for pre-fix payloads (absence = covered). - Status RPC writes seed-meta recordCount as covered count (not shape size), and flips response-level upstreamUnavailable on partial. - api/health.js: new minRecordCount field on SEED_META entries + new COVERAGE_PARTIAL status (warn rollup). chokepoints entry declares minRecordCount: 13. recordCount < 13 → COVERAGE_PARTIAL. - Client (panel + popup): skip stats/chart rendering when !dataAvailable; show "Transit data unavailable (upstream partial)" microcopy so users understand the gap. 5759/5759 data tests pass. Typecheck + typecheck:api clean. * fix(supply-chain): guarantee Simulate Closure button exits Computing state User reports "Simulate Closure does nothing beyond write Computing…" — the button sticks at Computing forever. Two causes: 1. Scenario worker appears down (0 scenario-result:* keys in Redis in the last 24h of 24h-TTL). Railway-side — separate intervention needed to redeploy scripts/scenario-worker.mjs. 2. Client leaked the "Computing…" state on multiple exit paths: - signal.aborted early-return inside the poll loop never reset the button. Second click fired abort on first → first returned without resetting → button stayed "Computing…" until next render. - !this.content.isConnected early-return also skipped reset (less user-visible but same class of bug). - catch block swallowed AbortError without resetting. - POST /run had no hard timeout — a hanging edge function left the button in Computing indefinitely. Fix: - resetButton(text) helper touches the btn only if still connected; applied in every exit path (abort, timeout, post-success, catch). - AbortSignal.any([caller, AbortSignal.timeout(20_000)]) on POST /run. - console.error on failure so Simulate Closure errors surface in ops. - Error message includes "scenario worker may be down" on loop timeout so operators see the right suspect. Backend observations (for follow-up): - Hormuz backend is healthy (/api/health chokepoints OK, 13 records, 1 min old; live RPC has hormuz_strait.riskLevel=critical, wow=-22, flowEstimate present; GetChokepointHistory returns 174 entries). User-reported "Hormuz empty" is likely browser/CDN stale cache from before PR #3185; hard refresh should resolve. - scenario-worker.mjs has zero result keys in 24h. Railway service needs verification/redeployment. * fix(scenario): wrong Upstash RPUSH format silently broke every Simulate Closure Railway scenario-worker log shows every job failing field validation since at least 03:06Z today: [scenario-worker] Job failed field validation, discarding: ["{\"jobId\":\"scenario:1776535792087:cynxx5v4\",... The leading [" in the payload is the smoking gun. api/scenario/v1/run.ts was POSTing to /rpush/{key} with body `[payload]`, expecting Upstash to unpack the array and push one string value. Upstash does NOT parse that form — it stored the literal `["{...}"]` string as a single list value. Worker BLMOVEs the literal string → JSON.parse → array → destructure `{jobId, scenarioId, iso2}` on an array returns undefined for all three → every job discarded without writing a result. Client poll returns `pending` for the full 60s timeout, then (on the prior client code path) leaked the stuck "Computing…" button state indefinitely. Fix: use the standard Upstash REST command format — POST to the base URL with body `["RPUSH", key, value]`. Matches scripts/ais-relay.cjs upstashLpush. After this, the scenario-queue:pending list stores the raw payload string, BLMOVE returns the payload, JSON.parse gives the object, validation passes, computeScenario runs, result key gets written, client poll sees `done`. Zero result keys existed in prod Redis in the last 24h (24h TTL on scenario-result:*) — confirms the fix addresses the production outage.