worldmonitor

mirror of https://github.com/koala73/worldmonitor.git synced 2026-04-25 17:14:57 +02:00

Author	SHA1	Message	Date
Elie Habib	0d076a689f	style(auth): green Sign In button matching 2D selected state (#3147 ) * style(auth): switch Sign In button to green accent to match 2D selected state Use the same --green background and --bg text color as .map-dim-btn.active so the CTA matches the existing visual vocabulary and stands out against the dark header. Adds a soft green glow on hover/focus-visible via color-mix (already used extensively in this stylesheet). * fix(a11y): pin Sign In text color to #0a0a0a for WCAG AA in light theme Reviewer on #3147 flagged that using color: var(--bg) on the green background fails WCAG AA in light theme. var(--bg) resolves to #f8f9fa, which lands at ~3.13:1 on var(--green) #16a34a for a 13px label (AA needs 4.5:1). Pinning the label to #0a0a0a in both themes gives: Light #0a0a0a on #16a34a is about 6.0:1 (AA pass) Dark #0a0a0a on #44ff88 is about 15.0:1 (AAA pass) Dark theme rendering is unchanged because var(--bg) already resolved to #0a0a0a there. Only the light-theme regression is corrected.	2026-04-17 20:20:29 +04:00
Sebastien Melki	72e3e3ee3b	fix(auth): remove isProUser() gate so all visitors see Sign In button (#3115 ) The setupAuthWidget() call was gated behind isProUser(), which created a deadlock: new users without legacy API keys could never see the Sign In button and thus could never authenticate. Removes the guard in both the main app and the pro landing page pricing section. Closes #034 Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 19:08:29 +04:00
Elie Habib	dcf73385ca	fix(scoring): rebalance formula weights severity 55%, corroboration 15% (#3144 ) * fix(scoring): rebalance formula weights severity 55%, corroboration 15% PR A of the scoring recalibration plan (docs/plans/2026-04-17-002). The v2 shadow-log recalibration (690 items, Pearson 0.413) showed the formula compresses scores into a narrow 30-70 range, making the 85 critical gate unreachable and the 65 high gate marginal. Root cause: corroboration at 30% weight penalizes breaking single-source news (the most important alerts) while severity at 40% doesn't separate critical from high enough. Weight change: BEFORE: severity 0.40 + sourceTier 0.20 + corroboration 0.30 + recency 0.10 AFTER: severity 0.55 + sourceTier 0.20 + corroboration 0.15 + recency 0.10 Expected effect: critical/tier1/fresh rises from 76 to 88 (clears 85 gate). critical/tier2/fresh rises from 71 to 83 (recommend lowering critical gate to 80 at activation time). high/tier2/fresh rises from 61 to 69 (clears 65 gate). The HIGH-CRITICAL gap widens from 10 to 14 points for same-tier items. Also: - Bumps shadow-log key from v2 to v3 for a clean recalibration dataset (v2 had old-weight scores that would contaminate the 48h soak) - Updates proto/news_item.proto formula comment to reflect new weights - Updates cache-keys.ts documentation No cache migration needed: the classify cache stores {level, category}, not scores. Scores are computed at read time from the stored level + the formula, so new digest requests immediately produce new scores. Gates remain OFF. After 48h of v3 data, re-run: node scripts/shadow-score-report.mjs node scripts/shadow-score-rank.mjs sample 25 🤖 Generated with Claude Opus 4.6 via Claude Code + Compound Engineering v2.49.0 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: regenerate proto OpenAPI docs for weight rebalance * fix(scoring): bump SHADOW_SCORE_LOG_KEY export to v3 The exported constant in cache-keys.ts was left at v2 while the relay's local constant was bumped to v3. Anyone importing the export (or grep- discovering it) would get a stale key. Architecture review flagged this. * fix(scoring): update test + stale comments for shadow-log v3 Review found the regression test still asserted v2 key, causing CI failure. Also fixed stale v1/v2 references in report script header, default-key comment, report title render, and shouldNotify docstring. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-17 17:43:39 +04:00
Elie Habib	20864f9c8a	feat(settings): promote Notifications into its own tab (#3145 ) * feat(settings): promote Notifications from settings accordion to its own tab Notifications was buried at the bottom of the Settings accordion list, one more click away than Panels/Sources. Since the feature is Pro-gated and channel-heavy (Telegram pairing, Slack/Discord OAuth, webhook URL entry, quiet hours, digest scheduling), a dedicated tab gives it the surface it needs and makes the upsell visible to free users. Extracts the notifications HTML and attach() logic into src/services/notifications-settings.ts. Removes the inline block and now unused imports (notification-channels, clerk, entitlements.hasTier, variant, uqr, QUIET_HOURS/DIGEST_CRON rollout flags) from preferences-content.ts. Adds a 'notifications' tab between Sources and API Keys in UnifiedSettings (web only; desktop app keeps the old layout). Rollout-flag tests (digest/quiet-hours) now read from the new module. * perf(settings): lazy-attach Notifications tab to avoid eager fetch Previously render() called notifs.attach() unconditionally, which fired getChannelsData() on every modal open for Pro users even when they never visited the Notifications tab. Mirrors the loadApiKeys() pattern: store the render result in pendingNotifs and only call attach() when the tab is first activated (or is the initial active tab). Cleanup on close, destroy, and re-render remains unchanged. Addresses greptile P2 on PR #3145.	2026-04-17 17:43:21 +04:00
Elie Habib	1cf249c2f8	fix(security): strip importanceScore from /api/notify payload + scope fan-out by userId (#3143 ) * fix(security): strip importanceScore from /api/notify payload + scope fan-out by userId Closes todo #196 (activation blocker for IMPORTANCE_SCORE_LIVE=1). Before this fix, any authenticated Pro user could POST to /api/notify with `payload.importanceScore: 100` and `severity: 'critical'`, bypassing the relay's IMPORTANCE_SCORE_MIN gate and fan-out would hit every Pro user with matching rules (no userId filter). This was a pre-existing vulnerability surfaced during the scoring pipeline work in PR #3069. Two changes: 1. api/notify.ts — strip `importanceScore` and `corroborationCount` from the user-submitted payload before publishing to wm:events:queue. These fields are relay-internal (computed by ais-relay's scoring pipeline). Also validates `severity` against the known allowlist (critical, high, medium, low, info) instead of accepting any string. 2. scripts/notification-relay.cjs — scope rule matching: if the event carries `event.userId` (browser-submitted via /api/notify), only match rules where `rule.userId === event.userId`. Relay-emitted events (from ais-relay, regional-snapshot) have no userId and continue to fan out to all matching Pro users. This prevents a single user from broadcasting crafted events to every other Pro subscriber's notification channels. Net effect: browser-submitted events can only reach the submitting user's own Telegram/Slack/Email/webhook channels, and cannot carry an injected importanceScore. 🤖 Generated with Claude Opus 4.6 via Claude Code Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(security): reject internal relay control events from /api/notify Review found that `flush_quiet_held` and `channel_welcome` are internal relay control events (dispatched by Railway cron scripts) that the public /api/notify endpoint accepted because only eventType length was checked. A Pro user could POST `{"eventType":"flush_quiet_held","payload":{}, "variant":"full"}` to force-drain their held quiet-hours queue on demand, bypassing batch_on_wake behavior. Now returns 403 for reserved event types. The denylist approach (vs allowlist) is deliberate: new user-facing event types shouldn't require an API change to work, while new internal events must explicitly be added to the deny set if they carry privileged semantics. * fix(security): exempt browser events from score gate + hoist Sets to module scope Two review findings from Greptile on PR #3143: P1: Once IMPORTANCE_SCORE_LIVE=1 activates, browser-submitted rss_alert events (which had importanceScore stripped by the first commit) would evaluate to score=0 at the relay's top-level gate and be silently dropped before rule matching. Fix: add `&& !event.userId` to the gate condition — browser events carry userId and have no server-computed score, so the gate must not apply to them. Relay-emitted events (no userId, server-computed score) are still gated as before. P2: VALID_SEVERITIES and INTERNAL_EVENT_TYPES Sets were allocated inside the handler on every request. Hoisted to module scope. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-17 11:14:25 +04:00
Elie Habib	94571b5df9	feat(positioning): standalone 24/7 Positioning panel with directional gauges (#3141 ) * feat(positioning): standalone 24/7 Positioning panel with directional gauges Extract Hyperliquid perp positioning from the "Perp Flow" tab inside CommoditiesPanel into a dedicated "24/7 Positioning" panel (#3140). New panel features: - SVG arc gauge (0-100) per asset with color encoding: green = bullish/longs crowded, red = bearish/shorts crowded - Grouped by asset class: Commodities, Crypto, FX - Visual elevation (glow + border) for scores above 40 - Click-through to relevant panels (crypto, commodities) - Hover tooltips with score, funding rate, OI delta - Warmup banner, stale badges, unavailable state - Compact secondary metrics (funding rate + OI delta 1h) - Mobile responsive (2-column grid below 600px) Registration: panel-layout.ts, App.ts (primeTask + 5min refresh), commands.ts (CMD+K), panels.ts (all variants), index.ts export, en.json i18n keys. Removals from CommoditiesPanel: - "Perp Flow" tab and its tab type - fetchHyperliquidFlow() method - _renderFlow() and _renderFlowGrid() methods - Associated state (_flow, _flowLoading) - App.ts primeTask and refreshScheduler entries No backend changes. Reuses existing getHyperliquidFlow RPC and market:hyperliquid-flow:v1 Redis key. * fix(positioning): correct oiDelta lookback, click targets, warmup state (review) P1 #1: oiDelta1h used 2-point lookback (5min delta mislabeled as 1h). Restored to 13-point lookback matching the original MarketPanel.ts implementation (12 samples back = 1h at 5min cadence). P1 #2: CLICK_TARGETS used display-style names (GOLD, WTI, BRENT) but seeded symbols are xyz:-prefixed Hyperliquid names (xyz:GOLD, xyz:CL, xyz:BRENTOIL). Fixed to match actual seeder output from seed-hyperliquid-flow.mjs. P2: Empty/unavailable snapshots (cold seed, fresh deploy) now show warmup guidance instead of a hard error. The prior Perp Flow tab intentionally treated this as normal bootstrap. The new panel regressed to showError which is confusing on first deploy. * fix(positioning): use data-panel selector, guard missing target panels (review) P1: Click-through used getElementById('panel-X') but panels mount with data-panel="X" attribute. Changed to querySelector('[data-panel="X"]'). P2: Crypto cards (BTC/ETH/SOL) target the 'crypto' panel which doesn't exist in the commodity variant. resolveClickTarget() now checks if the target panel is actually in the DOM before adding clickable styling and data-pos-navigate. Cards whose target panel is absent render as non-clickable (no cursor pointer, no navigate attribute).	2026-04-17 08:48:00 +04:00
Elie Habib	d9194a5179	fix(railway): tolerate Ubuntu apt mirror failures in NIXPACKS + Dockerfile builds (#3142 ) Ubuntu's noble-security package-index CDN is returning hash-sum mismatches (2026-04-17), causing ALL Railway NIXPACKS builds to fail at the 'apt-get update && apt-get install curl' layer with exit code 100. Multiple Railway services are red. NIXPACKS' aptPkgs = ['curl'] generates a strict 'apt-get update && apt-get install -y' that fails hard on any mirror error. Fix: replace aptPkgs with manual cmds that: 1. Allow apt-get update to partially fail (\|\| true) 2. Use --fix-missing on apt-get install so packages from healthy mirrors still install even if one mirror is broken Same treatment for consumer-prices-core/Dockerfile. Files changed: - nixpacks.toml (root — used by ais-relay + standalone cron seeders) - scripts/nixpacks.toml (used by bundled seed services) - consumer-prices-core/Dockerfile The \|\| true on apt-get update is safe because: 1. curl is the only package we install and it's often already present in the NIXPACKS base image (nix-env provides it) 2. If curl genuinely isn't available, the seeder will fail at runtime with a clear 'curl: not found' error — not a silent degradation	2026-04-17 08:35:20 +04:00
Elie Habib	aeef68dd56	fix(relay): envelope-aware Redis reads restore chokepoint transit chart (#3139 ) * fix(relay): envelope-aware Redis reads for PortWatch/CorridorRisk/stocks-bootstrap PR #3097 migrated 91 seed producers to the contract-mode envelope shape {_seed, data}, but ais-relay's private upstashGet() reads raw JSON and does not unwrap the envelope. Three callsites in the relay now see the wrapper metadata as the payload and silently corrupt downstream: - seedTransitSummaries() iterates {_seed, data} as chokepoint IDs and writes supply_chain:transit-summaries:v1 with keys "_seed" and "data", both mapping to empty-history summary objects. Every chokepoint's transitSummary.history is therefore empty, which gates off the time-series chart in SupplyChainPanel and MapPopup waterway popup. - loadWsbTickerSet() reads market:stocks-bootstrap:v1 and sees data.quotes === undefined, silently disabling WSB ticker matching. Fix: add envelopeRead(key) next to envelopeWrite — mirrors the server/_shared/redis.ts::getCachedJson semantics (envelope-aware by default; legacy raw shapes pass through unchanged). Swap the three upstashGet calls that target contract-mode canonical keys. After the relay re-seeds at its 10-min cadence, transit-summaries:v1 will contain a proper {hormuz_strait, suez, ...} map and the chart comes back in the panel and map popup. Unit tests cover contract-mode unwrap, legacy passthrough, null, and the array-with-numeric-indices false-positive edge case. Existing static assertions updated to guard against regression to raw upstashGet on these keys. * fix(backtest): envelope-aware reads in resilience backtest scripts backtest-resilience-outcomes.mjs reads economic:fx:yoy:v1, infra:outages:v1, and conflict:ucdp-events:v1 — all migrated to the {_seed, data} envelope by PR #3097. The private redisGetJson helper did not unwrap, so AUC computation was silently running against the envelope wrapper instead of the country event maps (same class of bug as PR #3139's relay fix, offline blast radius). validate-resilience-backtest.mjs uses the same read pattern across multiple family keys and is fixed for the same reason. Both scripts now import unwrapEnvelope from _seed-envelope-source.mjs (the canonical ESM source of truth used by seed-chokepoint-flows.mjs, seed-energy-spine.mjs, seed-forecasts.mjs, and others). Legacy raw shapes still pass through unchanged. * fix(relay): envelopeRead for OREF_REDIS_KEY bootstrap (Greptile P1) orefPersistHistory() writes OREF_REDIS_KEY via envelopeWrite at line 1133, but the bootstrap reader at line 1214 was still using raw upstashGet. cached.history was therefore undefined and Array.isArray() always false, so OREF alert history was never restored from Redis after a relay restart — every cold start hit the upstream API unnecessarily. Also adds the two regression guards Greptile flagged as missing: - loadWsbTickerSet() reading market:stocks-bootstrap:v1 via envelopeRead - orefBootstrapFromRedis reading OREF_REDIS_KEY via envelopeRead Same class of bug as the three callsites fixed earlier in this PR.	2026-04-17 08:12:07 +04:00
Elie Habib	bd559fec88	fix(liquidity-shifts): wire missing primeTask so fetchData() is called (#3138 ) LiquidityShiftsPanel was created in panel-layout.ts but never had a primeTask('liquidity-shifts', ...) call in App.ts, so fetchData() was never invoked. The panel rendered its initial showLoading() state and stayed there permanently, showing "Loading..." with the radar spinner. Every other self-fetching panel (cot-positioning, gold-intelligence, fear-greed, etf-flows, etc.) has its primeTask call. This one was missed when the panel was added.	2026-04-17 07:36:09 +04:00
Sebastien Melki	a4d9b0a5fa	feat(auth): user-facing API key management (create / list / revoke) (#3125 ) * feat(auth): user-facing API key management (create / list / revoke) Adds full-stack API key management so authenticated users can create, list, and revoke their own API keys from the Settings UI. Backend: - Convex `userApiKeys` table with SHA-256 key hash storage - Mutations: createApiKey, listApiKeys, revokeApiKey - Internal query validateKeyByHash + touchKeyLastUsed for gateway - HTTP endpoints: /api/api-keys (CRUD) + /api/internal-validate-api-key - Gateway middleware validates user-owned keys via Convex + Redis cache Frontend: - New "API Keys" tab in UnifiedSettings (visible when signed in) - Create form with copy-on-creation banner (key shown once) - List with prefix display, timestamps, and revoke action - Client-side key generation + hashing (plaintext never sent to DB) Closes #3116 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(api-keys): address PR review — cache invalidation, prefix validation, revoked-key guard - Invalidate Redis cache on key revocation so gateway rejects revoked keys immediately instead of waiting for 5-min TTL expiry (P1) - Enforce `wm_` prefix format with regex instead of loose length check (P2) - Skip `touchKeyLastUsed` for revoked keys to preserve clean audit trail (P2) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(api-keys): address consolidated PR review (P0–P3) P0: gate createApiKey on pro entitlement (tier >= 1); isCallerPremium now verifies key-owner tier instead of treating existence as premium. P1: wire wm_ user keys into the domain gateway auth path with async Convex-backed validation; user keys go through entitlement checks (only admin keys bypass). Lower cache TTL 300s → 60s and await revocation cache-bust instead of fire-and-forget. P2: remove dead HTTP create/list/revoke path from convex/http.ts; switch to cachedFetchJson (stampede protection, env-prefixed keys, standard NEG_SENTINEL); add tenancy check on cache-invalidation endpoint via new /api/internal-get-key-owner route; add 22 Convex tests covering tier gate, per-user limit, duplicate hash, ownership revoke guard, getKeyOwner, and touchKeyLastUsed debounce. P3: tighten keyPrefix regex to exactly 5 hex chars; debounce touchKeyLastUsed (5 min); surface PRO_REQUIRED in UI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(api-keys): gate on apiAccess (not tier), wire wm_ keys through edge routes, harden error paths - Gate API key creation/validation on features.apiAccess instead of tier >= 1. Pro (tier 1, apiAccess=false) can no longer mint keys — only API_STARTER+. - Wire wm_ user keys through standalone edge routes (shipping/route-intelligence, shipping/webhooks) that were short-circuiting on validateApiKey before async Convex validation could run. - Restore fail-soft behavior in validateUserApiKey: transient Convex/network errors degrade to unauthorized instead of bubbling a 500. - Fail-closed on cache invalidation endpoint: ownership check errors now return 503 instead of silently proceeding (surfaces Convex outages in logs). - Tests updated: positive paths use api_starter (apiAccess=true), new test locks Pro-without-API-access rejection. 23 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(webhooks): remove wm_ user key fallback from shipping webhooks Webhook ownership is keyed to SHA-256(apiKey) via callerFingerprint(), not to the user. With user-owned keys (up to 5 per user), this causes cross-key blindness (webhooks invisible when calling with a different key) and revoke-orphaning (revoking the creating key makes the webhook permanently unmanageable). User keys remain supported on the read-only route-intelligence endpoint. Webhook ownership migration to userId will follow in a separate PR. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Elie Habib <elie.habib@gmail.com>	2026-04-17 07:20:39 +04:00
Elie Habib	935417e390	chore(relay): socialVelocity + wsbTickers to hourly fetch (6x Reddit traffic reduction) (#3135 ) * chore(relay): socialVelocity + wsbTickers to hourly fetch (was 10min) Reduce Reddit rate-limiting blast radius. Both seeders fetch 5 subreddits combined (2 for SV: worldnews, geopolitics; 3 for WSB: wallstreetbets, stocks, investing) with no proxy or OAuth. Reddit's behavioral heuristic for datacenter IPs consistently flags the Railway IP after ~50min of 10-min polling and returns HTTP 403 on every subsequent cycle until the container restarts with a new IP. Evidence (2026-04-16 ais-relay log): 13:32-14:22 UTC: 6 successful 10-min cycles for both seeders 16:06-16:16 UTC: 2 more successful cycles after a restart 16:26 UTC: BOTH subs flip to HTTP 403 simultaneously 16:36, 16:46, 16:56: every cycle, all 5 subreddits return 403 Dropping success-path frequency from 6/hour to 1/hour cuts the traffic Reddit's heuristic sees by 6x. On failure path the 20-min retry is kept as-is — during a block we've already been flagged, so extra retries don't make it worse. Changes: - SOCIAL_VELOCITY_INTERVAL_MS: 10min → 60min - SOCIAL_VELOCITY_TTL: 30min → 3h (3× new interval) - WSB_TICKERS_INTERVAL_MS: 10min → 60min - WSB_TICKERS_TTL: 30min → 3h (3× new interval) - api/health.js maxStaleMin: 30min → 180min for both (3× interval) - api/seed-health.js intervalMin: 15 → 90 for wsb-tickers (maxStaleMin / 2) Proper fix (proxy fallback or Reddit OAuth) deferred. * fix(seed-health): add socialVelocity parity entry — greptile P2 Review finding on PR #3135: wsbTickers was bumped from intervalMin=15 to 90 but socialVelocity had no seed-health.js entry at all. Both Reddit seeders now share the same 60-min cadence; adding the missing entry gives parity. P2-1 (malformed comment lines 5682-5683) is a false positive — verified the lines do start with '//' in the file.	2026-04-16 22:17:58 +04:00
Elie Habib	0075af5a47	fix(sector-valuations): proxy Yahoo quoteSummary via Decodo curl egress (#3134 ) * fix(sector-valuations): proxy Yahoo quoteSummary via Decodo curl egress Yahoo's /v10/finance/quoteSummary returns HTTP 401 from Railway container IPs. Railway logs 2026-04-16 show all 12 sector ETFs failing every 5-min cron: [Sector] Yahoo quoteSummary XLK HTTP 401 (x12 per tick) [Market] Seeded 12/12 sectors, 0 valuations Add a curl-based proxy fallback that matches scripts/_yahoo-fetch.mjs: hit us.decodo.com (curl egress pool) NOT gate.decodo.com (CONNECT egress pool). Per the 2026-04-16 probe documented in _yahoo-fetch.mjs header, Yahoo blocks Decodo's CONNECT egress IPs but accepts the curl egress. Reusing ytFetchViaProxy here would keep failing silently. Shares the existing _yahooProxyFailCount / _yahooProxyCooldownUntil cooldown state with fetchYahooChartDirect so both Yahoo paths pause together if Decodo's curl pool also gets blocked. No change to direct-path behavior when Yahoo is healthy. * fix(sector-valuations): don't proxy on empty quoteSummary result (review) Direct 200 with data.quoteSummary.result[0] absent is an app-level "no data for this symbol" signal (e.g. delisted ETF). Proxy won't return different data for a symbol Yahoo itself doesn't carry — falling back would burn the 5-failure cooldown budget on structurally empty symbols and mask a genuine proxy outage. Resolve null on !result; keep JSON.parse catch going to proxy (garbage body IS a transport-level signal — captive portal, Cloudflare challenge). Review feedback from PR #3134. * fix(sector-valuations): split cooldown per egress route, cover transport failures (review) Review feedback on PR #3134, both P1. P1 #1 — transport failures bypassed cooldown execFileSync timeouts, proxy-connect refusals, and JSON.parse on garbage bodies all went through the catch block and returned null without ticking _yahooProxyFailCount. In the exact failure mode this PR hardens against, the relay would have thrashed through 12 × 20s curl attempts per tick with no backoff. Extract a bumpCooldown() helper and call it from both the non-2xx branch and the catch block. P1 #2 — two Decodo egress pools shared one cooldown budget fetchYahooChartDirect uses CONNECT via gate.decodo.com. _yahooQuoteSummaryProxyFallback uses curl via us.decodo.com. These are independent egress IP pools — per the 2026-04-16 probe, Yahoo blocks CONNECT but accepts curl. Sharing cooldown means 5 CONNECT failures suppress the healthy curl path (and vice versa). Split into _yahooConnectProxy* (chart) and _yahooCurlProxy* (sector valuations). Also: on proxy 200 with empty result, reset the curl counter. The route is healthy even if this specific symbol has no data — don't pretend it's a failure. * fix(sector-valuations): non-blocking curl + settle guard (review round 3) Review feedback on PR #3134, both P1. P1 #1 - double proxy invocation on timeout/error race req.destroy() inside the timeout handler can still emit 'error', and both handlers eagerly called resolve(_yahooQuoteSummaryProxyFallback(...)). A single upstream timeout therefore launched two curl subprocesses, double-ticked the cooldown counter, and blocked twice. Add a settled flag; settle() exits early on the second handler before evaluating the fallback. P1 #2 - execFileSync blocks the relay event loop The relay serves HTTP/WS traffic on the same thread that awaits seedSectorSummary's per-symbol Yahoo fetch. execFileSync for up to 20s per failure x 5 failures before cooldown = ~100s of frozen event loop. Switch to promisify(execFile). resolve(promise) chains the Promise through fetchYahooQuoteSummary's outer Promise, so the main-loop await yields while curl runs. Other traffic continues during the fetch. tests/sector-valuations.test.mjs: bump the static-analysis window from 1500 to 2000 chars so the field-extraction markers (ytdReturn etc.) stay inside the window after the settle guard was added.	2026-04-16 20:02:31 +04:00
Elie Habib	7d27cec21c	feat(relay): seeder-loop heartbeats for chokepoint-flows + climate-news (#3133 ) * feat(relay): seeder-loop heartbeats for chokepoint-flows + climate-news Detect silent relay-loop failures (ERR_MODULE_NOT_FOUND at import, event-loop blocked, container restart loop) up to 4 hours earlier than the data-level seed-meta staleness window. The chokepoint-flows bug that motivated this PR was invisible in health for 32 hours because each 6h cron tick fired, execFile'd the child, child died at import, and NO ONE updated seed-meta:energy:chokepoint-flows. Since the last successful write was still within its 3-day TTL, the data key was present and the old seed-meta was still there — STALE_SEED triggered only at +12h, and even then was a warn (not crit) that could easily be missed. Fix: - In scripts/ais-relay.cjs, write a success-only heartbeat via upstashSet after each execFile-spawned seeder exits cleanly. TTL = 3x the loop interval (18h for chokepoint-flows, 90min for climate-news) so a single missed cycle doesn't flap but two consecutive misses alarm. Payload shape matches seed-meta for drop-in compatibility with the existing health-check reader: { fetchedAt, recordCount, durMs }. - In api/health.js, register two new STANDALONE_KEYS entries pointing at the heartbeat keys, plus SEED_META entries with tighter maxStaleMin: chokepointFlowsRelayHeartbeat: 480min (8h vs 720min existing) climateNewsRelayHeartbeat: 60min (vs 90min existing) When the relay loop fails for >2 intervals, the heartbeat goes stale first and surfaces as STALE_SEED in /api/health, giving 4h more notice than waiting for seed-meta:energy:chokepoint-flows. This is orthogonal to PR #3132 (fixes the actual ERR_MODULE_NOT_FOUND root cause). Heartbeat is defensive observability for the NEXT failure mode we can't predict. * fix(health): gate new relay heartbeat keys as ON_DEMAND during deploy window — greptile review Review finding on PR #3133: new heartbeat keys (relay:heartbeat:chokepoint-flows, relay:heartbeat:climate-news) are written by ais-relay.cjs AFTER the first successful post-deploy loop. Vercel deploys api/health.js instantly, so the window between 'merge' and 'first heartbeat written' is: - chokepoint-flows: up to 6h (initial loop tick) - climate-news: up to 30min During that window the heartbeat keys don't exist in Redis. classifyKey() would return EMPTY (crit), which counts toward critCount and can flip overall /api/health to DEGRADED even though climateNews and chokepointFlows data themselves are fine. Matches existing rule in project memory (feedback_health_required_key_needs_railway_cron_first.md) — new seeder + health.js registration in same PR needs ON_DEMAND gating until the Railway side catches up, then harden after ~7 days. Fix: add both keys to ON_DEMAND_KEYS with TRANSITIONAL comments, matching the fxYoy / hyperliquidFlow pattern already used for the same issue.	2026-04-16 18:21:51 +04:00
Elie Habib	7381a90a44	fix(sentry): guard ConvexClient on Firefox 149/Linux + filter Quark noise (#3130 ) * fix(sentry): guard ConvexClient construct on Firefox 149/Linux + filter Quark browser noise ConvexClient import throws `TypeError: t is not a constructor` on Firefox 149/Linux (WORLDMONITOR-N0 + MX, 5 events / 2 users). Breadcrumbs proved both entitlements and billing subscription watchers were failing on `new CC(convexUrl)`. Wraps the constructor in try/catch so getConvexClient returns null — callers already have the null path wired for the no-VITE_CONVEX_URL case, so subscription features silently no-op instead of error-bubbling into Sentry via billing.ts:71. Also filters Quark browser (Alibaba mobile) touch-tracking injection that sets `bodyTouched` on undefined — property name has 0 matches in repo (WORLDMONITOR-N1). * fix(convex-client): reset authReadyPromise on constructor failure Addresses greptile-apps review on PR #3130: on the catch path, authReadyPromise had just been set to a never-resolving Promise at function entry. Without this reset, any future waitForConvexAuth() caller that doesn't pre-check the client is non-null would silently block for the full 10s timeout.	2026-04-16 17:28:31 +04:00
Elie Habib	c31662c3c9	fix(relay): COPY missing _seed-envelope-source + _seed-contract — chokepointFlows stale 32h (#3132 ) * fix(relay): COPY _seed-envelope-source + _seed-contract into Dockerfile.relay Root cause of chokepointFlows STALE_SEED (1911min stale, maxStaleMin=720): since 2026-04-14 (PR #3097/#3101 landing), scripts/_seed-utils.mjs imports _seed-envelope-source.mjs and _seed-contract.mjs. Dockerfile.relay COPY'd _seed-utils.mjs but NOT its new transitive dependencies, so every execFile invocation of seed-chokepoint-flows.mjs, seed-climate-news.mjs, and seed-ember-electricity.mjs crashed at import with ERR_MODULE_NOT_FOUND. The ais-relay loop kept firing every 6h but each child died instantly — no visible error because execFile only surfaces child stderr to the parent relay's log stream. Local repro: node scripts/seed-chokepoint-flows.mjs runs fine in 3.6s and writes 7 records. Same command inside the relay container would throw at the import line because the file doesn't exist. Fix: 1. Add COPY scripts/_seed-envelope-source.mjs and COPY scripts/_seed-contract.mjs to Dockerfile.relay. 2. Add a static guard test (tests/dockerfile-relay-imports.test.mjs) that BFS's the transitive-import graph from every COPY'd entrypoint and fails if any reached scripts/.mjs\|cjs isn't also COPY'd. This would have caught the original regression. Matches feedback_dockerfile_relay_explicit_copy.md — we now have a test enforcing it. fix(test): scanner also covers require() and createRequire(...)(...) — greptile P2 Review finding on PR #3132: collectRelativeImports only matched ESM import/export syntax, so require('./x.cjs') in ais-relay.cjs and createRequire(import.meta.url)('./x.cjs') in _seed-utils.mjs were invisible to the guard. No active bug (_proxy-utils.cjs is already COPY'd) but a future createRequire pointing at a new uncopied helper would slip through. Two regexes now cover both forms: - cjsRe: direct require('./x') — with a non-identifier lookbehind so 'thisrequire(' or 'foorequire(' can't match. - createRequireRe: createRequire(...)('./x') chained-call — the outer call is applied to createRequire's return value, not to a 'require(' token, so the first regex misses it on its own. Added a unit test asserting both forms resolve on known sites (_seed-utils.mjs and ais-relay.cjs) so the next edit to this file can't silently drop coverage.	2026-04-16 17:28:16 +04:00
Elie Habib	d1a3fdffed	fix(portwatch): unblock port-activity seeder (global EP4 refs, conc 12, progress logs, SIGTERM) (#3128 ) * fix(portwatch): global EP4 refs + concurrency 12 + progress logs + SIGTERM cleanup seed-portwatch-port-activity has been SIGKILL'd at the Railway 10-min container ceiling on every run since 2026-04-14 (recordCount=174, seedAgeMin=3096 = 51.6h = 4+ failed cycles), leaving portwatchPortActivity STALE_SEED and the 30-min lock leaking between runs. Root cause: ~240 ISO3s x 2 per-country ArcGIS queries at CONCURRENCY=4 with zero per-batch logging — slow enough to miss the 420s timeoutMs and silent enough that the timeout line was the only log on failure. Fixes (all 4): 1. fetchAllPortRefs(): one paginated EP4 query (where=1=1), grouped by ISO3 locally — collapses ~240 ref calls into ~5 pages. 2. CONCURRENCY 4 -> 12 and only queue activity fetches for iso3s that appear in refsByIso3 and in the iso3->iso2 map. 3. Per-page ref logs + per-batch activity logs every 5 batches — next failure will show exactly where it stalls. 4. SIGTERM/SIGINT handler releases the lock and extends prev-snapshot TTLs before exit so the next cron tick isn't blocked and Redis data doesn't evaporate when the bundle-runner kills the child. * fix(portwatch): advance pagination by actual features.length, not PAGE_SIZE Review finding on PR #3128: the new fetchAllPortRefs() global pager assumes the server honors resultRecordCount=2000, but ArcGIS's PortWatch_ports_database FeatureServer caps responses at 1000 rows. Incrementing offset by PAGE_SIZE (=2000) silently skipped rows 1000-1999 on EVERY page. Verified against the live endpoint: - returnCountOnly: 2065 - offset=0 size=2000: returned 1000 (ETL=true) - offset=1000 size=2000: returned 1000 (ETL=true) - offset=2000 size=2000: returned 65 (ETL=false) The buggy loop therefore loaded 1065 refs instead of 2065 — silently dropping 34 mapped countries with EP3 activity entirely and leaving 110 more countries with partial ref coverage. Partial coverage fell back to (0,0) lat/lon via `refMap.get(portId) \|\| { lat: 0, lon: 0 }`. Not a regression from the old code (per-country EP4 fetches maxed at ~148 ports and never hit the cap), but a real bug introduced by the global pager. Fix: advance `offset` by `features.length` (the actual returned count) instead of by `PAGE_SIZE`. Applied to both fetchAllPortRefs (EP4) and fetchActivityRows (EP3) for consistency. Added break-on-empty guard so a server that returns exceededTransferLimit=true with 0 features can't infinite-loop. Regression test asserts `offset += features.length` appears for both paginators and `offset += PAGE_SIZE` appears nowhere in the file.	2026-04-16 15:51:21 +04:00
Elie Habib	5093d82e45	fix(seed-bundle-resilience): drop Resilience-Scores interval 6h to 2h so refresh runs (#3126 ) Live log 2026-04-16 09:25 showed the bundle runner SKIPPING Resilience-Scores (last seeded 203min ago, interval 360min -> 288min skip threshold). Every Railway cron fire within the 4.8h skip window bypassed the section entirely, so refreshRankingAggregate() -- the whole point of the Slice B work merged in #3124 -- never ran. Ranking could then silently expire in the gap. Lower intervalMs to 2h. The bundle runner skip threshold becomes 96min; hourly Railway fires run the section about every 2h. Well within the 12h ranking TTL, and cheap per warm-path run: - computeAndWriteIntervals (~100ms local CPU + one pipeline write) - refreshRankingAggregate -> /api/resilience/v1/get-resilience-ranking?refresh=1 (handler recompute + 2-SET pipeline, ~2-5s) - STRLEN + GET-meta verify in parallel (~200ms) Total ~5-10s per warm-scores run. The expensive 222-country warm still only runs when scores are actually missing. Structural test pins intervalMs <= 2 hours so this doesn't silently regress. Full resilience suite: 378/378.	2026-04-16 13:41:28 +04:00
Elie Habib	da1fa3367b	fix(resilience-ranking): chunked warm SET, always-on rebuild, truthful meta (Slice B) (#3124 ) * fix(resilience-ranking): chunked warm SET, always-on rebuild, truthful meta Slice B follow-up to PR #3121. Three coupled production failures observed: 1. Per-country score persistence works (Slice A), but the 222-SET single pipeline body (~600KB) exceeds REDIS_PIPELINE_TIMEOUT_MS (5s) on Vercel Edge. runRedisPipeline returns []; persistence guard correctly returns empty; coverage = 0/222 < 75%; ranking publish silently dropped. Live Railway log: "Ranking: 0 ranked, 222 greyed out" → "Rebuilt … with 222 countries (bulk-call race left ranking:v9 null)" — second call only succeeded because Upstash had finally caught up between attempts. 2. The seeder's probe + rebuild block lives inside `if (missing > 0)`. When per-country scores survive a cron tick (TTL 6h, cron every 6h), missing=0 and the rebuild path is skipped. Ranking aggregate then expires alone and is never refreshed until scores also expire — multi-hour gaps where `resilience:ranking:v9` is gone while seed-meta still claims freshness. 3. `writeRankingSeedMeta` fires whenever finalWarmed > 0, regardless of whether the ranking key is actually present. Health endpoint sees fresh meta + missing data → EMPTY_ON_DEMAND with a misleading seedAge. Fixes: - _shared.ts: split the warm pipeline SET into SET_BATCH=30-command chunks so each pipeline body fits well under timeout. Pad missing-batch results with empty entries so the per-command alignment stays correct (failed batches stay excluded from `warmed`, no proof = no claim). - seed-resilience-scores.mjs: extract `ensureRankingPresent` helper, call it from BOTH the missing>0 and missing===0 branches so the ranking gets refreshed every cron. Add a post-rebuild STRLEN verification — rebuild HTTP can return 200 with a payload but still skip the SET (coverage gate, pipeline failure). - main(): only writeRankingSeedMeta when result.rankingPresent === true. Otherwise log and let the next cron retry. Tests: - resilience-ranking.test.mts: assert pipelines stay ≤30 commands. - resilience-scores-seed.test.mjs: structural checks that the rebuild is hoisted (≥2 callsites of ensureRankingPresent), STRLEN verification is present, and meta write is gated on rankingPresent. Full resilience suite: 373/373 pass (was 370 — 3 new tests). * fix(resilience-ranking): seeder no longer writes seed-meta (handler is sole writer) Reviewer P1: ensureRankingPresent() returning true only means the ranking key exists in Redis — not that THIS cron actually wrote it. The handler skips both the ranking SET and the meta SET when coverage < 75%, so an older ranking from a prior cron can linger while this cron's data didn't land. Under that scenario, the previous commit still wrote a fresh seed-meta:resilience:ranking, recreating the stale-meta-over-stale-data failure this PR is meant to eliminate. Fix: remove seeder-side seed-meta writes entirely. The ranking handler already writes ranking + meta atomically in the same pipeline when (and only when) coverage passes the gate. ensureRankingPresent() triggers the handler every cron, which addresses the original rationale for the seeder heartbeat (meta going stale during quiet Pro usage) without the seeder needing to lie. Consequence on failure: - Coverage gate trips → handler writes neither ranking nor meta. - seed-meta stays at its previous timestamp; api/health reports accurate staleness (STALE_SEED after maxStaleMin, then CRIT) instead of a fresh meta over stale/empty data. Tests updated: the "meta gated on rankingPresent" assertion is replaced with "seeder must not SET seed-meta:resilience:ranking" + "no writeRankingSeedMeta". Comments may still reference the key name for maintainer clarity — the assertion targets actual SET commands. Full resilience suite: 373/373 pass. * fix(resilience-ranking): always refresh + 12h TTL (close timing hole) Reviewer P1+P2: - P1: ranking TTL == cron interval (both 6h) left a timing hole. If a cron wrote the key near the end of its run and the next cron fired near the start of its interval, the key was still alive at probe time → ensureRankingPresent() returned early → no rebuild → key expired a short while later and stayed absent until a cron eventually ran while the key was missing. Multi-hour EMPTY_ON_DEMAND gaps. - P2: probing only the ranking data key (not seed-meta) meant a partial handler pipeline (ranking SET ok, meta SET missed) would self-heal only when the ranking itself expired — never during its TTL window. Fix: 1. Bump RESILIENCE_RANKING_CACHE_TTL_SECONDS from 6h to 12h (2x cron interval). A single missed or slow cron no longer causes a gap. Server-side and seeder-side constants kept in sync. 2. Replace ensureRankingPresent() with refreshRankingAggregate(): drop the 'if key present, skip' short-circuit. Rebuild every cron, unconditionally. One cheap HTTP call keeps ranking + seed-meta rolling forward together and self-heals the partial-pipeline case — handler retries the atomic pair every 6h regardless of whether the keys are currently live. 3. Update health.js comment to reflect the new TTL and refresh cadence (12h data TTL, 6h refresh, 12h staleness threshold = 2 missed ticks). Tests: - RESILIENCE_RANKING_CACHE_TTL_SECONDS asserts 12h (was 6h). - New assertion: refreshRankingAggregate must NOT early-return on probe- hit, and the rebuild HTTP call must be unconditional in its body. - DEL-guard test relaxed to allow comments between '{' and the DEL line (structural property preserved). Full resilience suite: 375/375. * fix(resilience-ranking): parallelize warm batches + atomic rebuild via ?refresh=1 Reviewer P2s: - Warm path serialized the 8 batch pipelines with `await` in a for-loop, adding ~7 extra Upstash round-trips (100-500ms each on Edge) to the warm wall-clock. Batches are independent; Promise.all collapses them into one slowest-batch window. - DEL+rebuild created a brief absence window: if the rebuild request failed transiently, the ranking stayed absent until the next cron. Now seeder calls `/api/resilience/v1/get-resilience-ranking?refresh=1` and the handler bypasses its cache-hit early-return, recomputing and SETting atomically. On rebuild failure, the existing (possibly stale-but-present) ranking is preserved instead of being nuked. Handler: read ctx.request.url for the refresh query param; guard the URL parse with try/catch so an unparseable url falls back to the cached-first behavior. Tests: - New: ?refresh=1 must bypass the cache-hit early-return (fails on old code, passes now). - DEL-guard test replaced with 'does NOT DEL' + 'uses ?refresh=1'. - Batch chunking still asserted at SET_BATCH=30. Full resilience suite: 376/376. * fix(resilience-ranking): bulk-warm call also needs ?refresh=1 (asymmetric TTL hazard) Reviewer P1: in the 6h-12h window, per-country score keys have expired (TTL 6h) but the ranking aggregate is still alive (TTL 12h). The seeder's bulk-warm call was hitting get-resilience-ranking without ?refresh=1, so the handler's cache-hit early-return fired and the entire warm path was skipped. Scores stayed missing; coverage degraded; the only recovery was the per-country laggard loop (5-request batches) — which silently no-ops when WM_KEY is absent. This defeated the whole point of the chunked bulk warm introduced in this PR. Fix: the bulk-warm fetch at scripts/seed-resilience-scores.mjs:167 now appends ?refresh=1, matching the rebuild call. Every seeder-initiated hit on the ranking endpoint forces the handler to route through warmMissingResilienceScores and its chunked pipeline SET, regardless of whether the aggregate is still cached. Test extended: structural assertion now scans ALL occurrences of get-resilience-ranking in the seeder and requires every one of them to carry ?refresh=1. Fails the moment a future change adds a bare call. Full resilience suite: 376/376. * fix(resilience-ranking): gate ?refresh=1 on seed key + detect partial pipeline publish Reviewer P1: ?refresh=1 was honored for any caller — including valid Pro bearer tokens. A full warm is ~222 score computations + chunked pipeline SETs; a Pro user looping on refresh=1 (or an automated client) could DoS Upstash quota and Edge budget. Gate refresh behind WORLDMONITOR_VALID_KEYS / WORLDMONITOR_API_KEY (X-WorldMonitor-Key header) — the same allowlist the cron uses. Pro bearer tokens get the standard cache-first path; refresh requires the seed service key. Reviewer P2: the handler's atomic runRedisPipeline SET of ranking + meta is non-transactional on Upstash REST — either SET can fail independently. If the ranking landed but meta missed, the seeder's STRLEN verify would pass (ranking present) while /api/health stays stuck on stale meta. Two-part fix: - Handler inspects pipelineResult[0] and [1] and logs a warning when either SET didn't return OK. Ops-greppable signal. - Seeder's verify now checks BOTH keys in parallel: STRLEN on ranking data, and GET + fetchedAt freshness (<5min) on seed-meta. Partial publish logs a warning; next cron retries (SET is idempotent). Tests: - New: ?refresh=1 without/with-wrong X-WorldMonitor-Key must NOT trigger recompute (falls back to cached response). Existing bypass test updated to carry a valid seed key header. Full resilience suite: 376/376 + 1 new = 377/377.	2026-04-16 12:48:41 +04:00
Elie Habib	3c1caa75e6	feat(gdelt): _gdelt-fetch helper with curl-multi-retry proxy + seed-gdelt-intel migration (#3122 ) * feat(_gdelt-fetch): curl proxy multi-retry helper for per-IP-throttled API GDELT (api.gdeltproject.org) is a public free API with strict per-IP throttling. seed-gdelt-intel currently has no proxy fallback — Railway egress IPs hit 429 storms and the seeder degrades. Probed 2026-04-16: Decodo curl egress against GDELT gives ~40% success per attempt (session-rotates IPs per call). Helper retries up to 5×; expected overall success ~92% (1 - 0.6^5). PROXY STRATEGY — CURL ONLY WITH MULTI-RETRY Differs from _yahoo-fetch.mjs (single proxy attempt) and _open-meteo-archive.mjs (CONNECT + curl cascade): - Curl-only: CONNECT not yet probed cleanly against GDELT. - Multi-retry on the curl leg: the proxy IS the rotation mechanism (each call → different egress IP), so successive attempts probe different IPs in the throttle pool. - Distinguishes retryable (HTTP 429/503 from upstream) from non-retryable (parse failure, auth, network) — bails immediately on non-retryable to avoid 5× of wasted log noise. Direct loop uses LONGER backoff than Yahoo's 5s base (10s) — GDELT's throttle window is wider than Yahoo's, so quick retries usually re-hit the same throttle. Tests (tests/gdelt-fetch.test.mjs, 13 cases — every learning from PR #3118 + #3119 + #3120 baked in): - Production defaults: curl resolver/fetcher reference equality - Production defaults: NO CONNECT leg (regression guard for unverified path) - 200 OK passthrough - 429 with no proxy → throws with HTTP 429 in message - Retry-After parsed (DI _sleep capture asserts 7000ms not retryBaseMs) - Retry-After absent → linear backoff retryBaseMs (paired branch test) - Proxy multi-retry: 4× HTTP 429 then 5th succeeds → returns data (asserts 5 proxy calls + 4 inter-proxy backoffs of proxyRetryBaseMs) - Proxy non-retryable (parse failure) bails after 1 attempt (does NOT burn all proxyMaxAttempts on a structural failure) - Proxy retryable + non-retryable mix: retries on 429, bails on parse - Thrown fetch error on final retry → proxy multi-retry runs (P1 guard) - All proxy attempts fail → throws with 'X/N attempts' in message + cause - Malformed JSON does NOT emit succeeded log before throw (P2 guard) - parseRetryAfterMs unit Verification: - tests/gdelt-fetch.test.mjs → 13/13 pass - node --check scripts/_gdelt-fetch.mjs → clean Phase 1 of 2. Seeder migration follows. * feat(seed-gdelt-intel): migrate to _gdelt-fetch helper Replaces direct fetch + ad-hoc retry in seed-gdelt-intel with the new fetchGdeltJson helper. Each topic call now gets: 3 direct retries (10/20/40s backoff) → 5 curl proxy attempts via Decodo session-rotating egress. Specific changes: - import fetchGdeltJson from _gdelt-fetch.mjs - fetchTopicArticles: replace fetch+retry+throw block with single await fetchGdeltJson(url, { label: topic.id }) - fetchTopicTimeline: same — best-effort try/catch returns [] on any failure (preserved). Helper still attempts proxy fallback before throwing, so a 429-throttled IP doesn't kill the timeline. - fetchWithRetry: collapsed from outer 3-retry loop with 60/120/240s backoff (which would have multiplied to 24 attempts/topic on top of helper's 8) to a thin wrapper that translates exhaustion into the {exhausted, articles:[]} shape the caller uses to drive POST_EXHAUST_DELAY_MS cooldown. - Drop CHROME_UA import (no longer used directly; helper handles it). Helper's exhausted-throw includes 'HTTP 429' substring when 429 was the upstream signal, so the existing is429 detection in fetchWithRetry continues to work without modification. Verification: - node --check scripts/seed-gdelt-intel.mjs → clean - npm run typecheck:all → clean - npm run test:data → 5382/5382 (was 5363, +13 from helper + 6 from prior PR work) Phase 2 of 2. * fix(_gdelt-fetch): proxy timeouts/network errors RETRY (rotates Decodo session) P1 from PR #3122 review: probed Decodo curl egress against GDELT (2026-04-16) gave 200/200/429/TIMEOUT/429 — TIMEOUT is part of the normal transient mix that the multi-retry design exists to absorb. Pre-fix logic only retried on substring 'HTTP 429'/'HTTP 503' matches, so a curl exec timeout (Node Error with no .status, not a SyntaxError) bailed on the first attempt. The PR's headline 'expected ~92% success with 5 attempts' was therefore not actually achievable for one of the exact failure modes that motivated the design. Reframed the proxy retryability decision around what we CAN reliably discriminate from the curl error shape: curlErr.status == number → retry only if 429/503 (curlFetch attaches .status only when curl returned a clean HTTP status) curlErr instanceof SyntaxError → bail (parse failure is structural) otherwise → RETRY (timeout, ECONNRESET, DNS, curl exec failure, CONNECT tunnel failure — all transient; rotating Decodo session usually clears them) P2 from same review: tests covered HTTP-status proxy retries + parse failures but never the timeout/thrown-error class. Added 3 tests: - proxy timeout (no .status) RETRIES → asserts proxyCalls=2 after a first-attempt ETIMEDOUT then second-attempt success - proxy ECONNRESET (no .status) RETRIES → same pattern - proxy HTTP 4xx with .status (e.g. 401 auth) does NOT retry → bails after 1 attempt Existing tests still pass — they use 'HTTP 429' Error WITHOUT .status, which now flows through the 'else: assume transient' branch and still retries. Only differences: the regex parsing is gone and curlFetch's .status property is the canonical signal. Verification: - tests/gdelt-fetch.test.mjs: 16/16 (was 13, +3) - npm run test:data: 5385/5385 (+3) - npm run typecheck:all: clean Followup commit on PR #3122. * fix(seed-gdelt-intel): timeline calls fast-fail (maxRetries:0, proxyMaxAttempts:0) P1 from PR #3122 review: fetchTopicTimeline is best-effort (returns [] on any failure), but the migration routed it through fetchGdeltJson with the helper's article-fetch defaults: 3 direct retries (10/20/40s backoff = ~70s) + 5 proxy attempts (5s base = ~20s) = ~90s worst case per call. Called 2× per topic × 6 topics = 12 calls = up to ~18 minutes of blocking on data the seeder discards on failure. Pre-helper code did a single direct fetch with no retry. Real operational regression under exactly the GDELT 429 storm conditions this PR is meant to absorb. Fix: 1. seed-gdelt-intel.mjs:fetchTopicTimeline now passes maxRetries:0, proxyMaxAttempts:0 — single direct attempt, no proxy, throws on first failure → caught, returns []. Matches pre-helper timing exactly. Article fetches keep the full retry budget; only timelines fast-fail. 2. _gdelt-fetch.mjs gate: skip the proxy block entirely when proxyMaxAttempts <= 0. Pre-fix the 'trying proxy (curl) up to 0×' log line would still emit even though the for loop runs zero times, producing a misleading line that the proxy was attempted when it wasn't. Tests (2 new): - maxRetries:0 + proxyMaxAttempts:0 → asserts directCalls=1, proxyCalls=0 even though _curlProxyResolver returns a valid auth string (proxy block must be fully bypassed). - proxyMaxAttempts:0 → captures console.log and asserts no 'trying proxy' line emitted (no misleading 'up to 0×' line). Verification: - tests/gdelt-fetch.test.mjs: 18/18 (was 16, +2) - npm run test:data: 5387/5387 (+2) - npm run typecheck:all: clean Followup commit on PR #3122. * fix(gdelt): direct parse-failure reaches proxy + timeline budget tweak + JSDoc accuracy 3 Greptile P2s on PR #3122: P2a — _gdelt-fetch.mjs:112: `resp.json()` was called outside the try/catch that guards fetch(). A 200 OK with HTML/garbage body (WAF challenge, partial response, gzip mismatch) would throw SyntaxError and escape the helper entirely — proxy fallback never ran. The proxy leg already parsed inside its own catch; making the direct leg symmetric. New regression test: direct 200 OK with malformed JSON must reach the proxy and recover. P2b — seed-gdelt-intel.mjs timeline budget bumped from 0/0 to 0/2. Best-effort timelines still fast-fail on direct 429 (no direct retries) but get 2 proxy attempts via Decodo session rotation before returning []. Worst case: ~25s/call × 12 calls = ~5 min ceiling under heavy throttling vs ~3 min with 0/0. Tradeoff: small additional time budget for a real chance to recover timeline data via proxy IP rotation. Articles still keep the full retry budget. P2c — JSDoc said 'Linear proxy backoff base' but the implementation uses a flat constant (proxyRetryBaseMs, line 156). Linear growth would not help here because Decodo rotates the session IP per call — the next attempt's success is independent of the previous wait. Doc now reads 'Fixed (constant, NOT linear) backoff' with the rationale. Verification: - tests/gdelt-fetch.test.mjs: 19/19 pass (was 18, +1) - npm run test:data: 5388/5388 (+1) - npm run typecheck:all: clean Followup commit on PR #3122. * test(gdelt): clarify helper-API vs seeder-mirror tests + add 0/2 lock Reviewer feedback on PR #3122 conflated two test classes: - Helper-API tests (lock the helper's contract for arbitrary callers using budget knobs like 0/0 — independent of any specific seeder) - Seeder-mirror tests (lock the budget the actual production caller in seed-gdelt-intel.mjs uses) Pre-fix the test file only had the 0/0 helper-API tests, with a section header that read 'Best-effort caller budgets (fast-fail)' — ambiguous about whether 0/0 was the helper API contract or the seeder's choice. Reviewer assumed seeder still used 0/0 because the tests locked it, but seed-gdelt-intel.mjs:97-98 actually uses 0/2 (per the prior P2b fix). Fixes: 1. Section header for the 0/0 tests now explicitly says these are helper-API tests and notes that seed-gdelt-intel uses 0/2 (not 0/0). Eliminates the conflation. 2. New 'Seeder-mirror: 0/2' section with 2 tests that lock the seeder's actual choice end-to-end: - 0/2 with first proxy attempt 429 + second succeeds → returns data (asserts directCalls=1, proxyCalls=2) - 0/2 with both proxy attempts failing → throws exhausted with '2/2 attempts' in message (asserts the budget propagates to the error message correctly) These tests would catch any future regression where the seeder's 0/2 choice gets reverted to 0/0 OR where the helper stops honoring the proxyMaxAttempts override. Verification: - tests/gdelt-fetch.test.mjs: 21/21 (was 19, +2) - npm run test:data: 5390/5390 (+2) - npm run typecheck:all: clean Followup commit on PR #3122.	2026-04-16 10:41:15 +04:00
Elie Habib	bdfb415f8f	fix(resilience-ranking): return warmed scores from memory, skip lossy re-read (#3121 ) * fix(resilience-ranking): return warmed scores from memory, skip lossy re-read Upstash REST writes via /set aren't always visible to an immediately-following /pipeline GET in the same Vercel invocation (documented in PR #3057 / feedback_upstash_write_reread_race_in_handler.md). The ranking handler was warming 222 countries then re-reading them from Redis to compute a coverage ratio; that re-read could return 0 despite every SET succeeding, collapsing coverage to 0% < 75% and silently dropping the ranking publish. Consequence: `resilience:ranking:v9` missing, per-country score keys absent, health reports EMPTY_ON_DEMAND even while the seeder keeps writing a "fresh" meta. Fix: warmMissingResilienceScores now returns Map<cc, GetResilienceScoreResponse> with every successfully computed score. The handler merges those into cachedScores directly and drops the post-warm re-read. Coverage now reflects what was actually computed in-memory this invocation, not what Redis happens to surface after write lag. Adds a regression test that simulates pipeline-GET returning null for freshly-SET score keys; it fails against the old code (coverage=0, no ranking written) and passes with the fix (coverage=100%, ranking written). Slice A of the resilience-ranking recovery plan; Slice B (seeder meta truthfulness) follows. * fix(resilience-ranking): verify score-key persistence via pipeline SET response PR review P1: trusting every fulfilled ensureResilienceScoreCached() result as "cached" turned the read-lag fix into a write-failure false positive. cachedFetchJson's underlying setCachedJson only logs and swallows write errors, so a transient /set failure on resilience:score:v9:* would leave per-country scores absent while the ranking aggregate and its seed-meta got published on top of them — worse than the bug this PR was meant to fix. Fix: use the pipeline SET response as the authoritative persistence signal. - Extract the score builder into a pure `buildResilienceScore()` with no caching side-effects (appendHistory stays — it's part of the score semantics). - `ensureResilienceScoreCached()` still wraps it in cachedFetchJson so single-country RPC callers keep their log-and-return-anyway behavior. - `warmMissingResilienceScores()` now computes in-memory, persists all scores in one pipeline SET, and only returns countries whose command reported `result: OK`. Pipeline SET's response is synchronous with the write, so OK means actually stored — no ambiguity with read-after-write lag. - When runRedisPipeline returns fewer responses than commands (transport failure), return an empty map: no proof of persistence → coverage gate can't false-positive. Adds regression test that blocks pipeline SETs to score keys and asserts the ranking + meta are NOT published. Existing race-regression test still passes. * fix(resilience-ranking): preserve env key prefix on warm pipeline SET PR review P1: the pipeline SET added to verify score-key persistence was called with raw=true, bypassing the preview/dev key prefix (preview:<sha>:). Two coupled regressions: 1. Preview/dev deploys write unprefixed `resilience:score:v9:` keys, but all reads (getCachedResilienceScores, ensureResilienceScoreCached via setCachedJson/cachedFetchJson) look in the prefixed namespace. Warmed scores become invisible to the same preview on the next read. 2. Because production uses the empty prefix, preview writes land directly in the production-visible namespace, defeating the environment isolation guard in server/_shared/redis.ts. Fix: drop the raw=true flag so runRedisPipeline applies prefixKey on each command, symmetric with the reads. Adds __resetKeyPrefixCacheForTests in redis.ts so tests can exercise a non-empty prefix without relying on process-startup memoization order. Adds regression test that simulates VERCEL_ENV=preview + a commit SHA and asserts every score-key SET in the pipeline carries the preview:<sha>: prefix. Fails on old code (raw writes), passes now. installRedis gains an opt-in `keepVercelEnv` so the test can run under a forced env without being clobbered by the helper's default reset. test(resilience-ranking): snapshot + restore VERCEL_GIT_COMMIT_SHA in afterEach PR review P2: the preview-prefix test mutates process.env.VERCEL_GIT_COMMIT_SHA but the file's afterEach only restored VERCEL_ENV. A process started with a real preview SHA (e.g. CI) would have that value unconditionally deleted after the test ran, leaking changed state into later tests and producing different prefix behavior locally vs. CI. Fix: capture originalVercelSha at module load, restore it in afterEach, and invalidate the memoized key prefix after each test so the next one recomputes against the restored env. The preview-prefix test's finally block is no longer needed — the shared teardown handles it. Verified: suite still passes 11/11 under both `VERCEL_ENV=production` (unset) and `VERCEL_ENV=preview VERCEL_GIT_COMMIT_SHA=ci-original-sha` process environments.	2026-04-16 10:17:22 +04:00
Elie Habib	fec135d6b8	chore(sentry): filter DuckDuckGo browser `Response did not contain 'success' or 'data'` noise (#3123 ) Adds one ignoreErrors pattern for the distinctive DuckDuckGo browser-internal error phrase (WORLDMONITOR-MZ). The message is never emitted by our own code, contains backtick-quoted field names that are not in our vocabulary, and arrives with an empty stack from DuckDuckGo 26.3 on macOS.	2026-04-16 10:06:58 +04:00
Elie Habib	9b07fc8d8a	feat(yahoo): _yahoo-fetch helper with curl-only Decodo proxy fallback + 4 seeder migrations (#3120 ) * feat(_yahoo-fetch): curl-only Decodo proxy fallback helper Yahoo Finance throttles Railway egress IPs aggressively. 4 seeders (seed-commodity-quotes, seed-etf-flows, seed-gulf-quotes, seed-market-quotes) duplicated the same fetchYahooWithRetry block with no proxy fallback. This helper consolidates them and adds the proxy fallback. Yahoo-specific: CURL-ONLY proxy strategy. Probed 2026-04-16: query1.finance.yahoo.com via CONNECT (httpsProxyFetchRaw): HTTP 404 query1.finance.yahoo.com via curl (curlFetch): HTTP 200 Yahoo's edge blocks Decodo's CONNECT egress IPs but accepts the curl egress IPs. Helper deliberately omits the CONNECT leg — adding it would burn time on guaranteed-404 attempts. Production defaults expose ONLY curlProxyResolver + curlFetcher. All learnings from PR #3118 + #3119 reviews baked in: - lastDirectError accumulator across the loop, embedded in final throw + Error.cause chain - catch block uses break (NOT throw) so thrown errors also reach proxy - DI seams (_curlProxyResolver, _proxyCurlFetcher) for hermetic tests - _PROXY_DEFAULTS exported for production-default lock tests - Sync curlFetch wrapped with await Promise.resolve() to future-proof against an async refactor (Greptile P2 from #3119) Tests (tests/yahoo-fetch.test.mjs, 11 cases): - Production defaults: curl resolver/fetcher reference equality - Production defaults: NO CONNECT leg present (regression guard) - 200 OK passthrough, never touches proxy - 429 with no proxy → throws exhausted with HTTP 429 in message - Retry-After header parsed correctly - 429 + curl proxy succeeds → returns proxy data - Thrown fetch error on final retry → proxy fallback runs (P1 guard) - 429 + proxy ALSO fails → both errors visible in message + cause chain - Proxy malformed JSON → throws exhausted - Non-retryable 500 → no extra direct retry, falls to proxy - parseRetryAfterMs unit (exported sanity check) Verification: 11/11 helper tests pass. node --check clean. Phase 1 of 2 — seeder migrations follow. * feat(yahoo-seeders): migrate 4 seeders to _yahoo-fetch helper Removes the duplicated fetchYahooWithRetry function (4 byte-identical copies across seed-commodity-quotes, seed-etf-flows, seed-gulf-quotes, seed-market-quotes) and routes all Yahoo Finance fetches through the new scripts/_yahoo-fetch.mjs helper. Each seeder gains the curl-only Decodo proxy fallback baked into the helper. Per-seeder changes (mechanical): - import { fetchYahooJson } from './_yahoo-fetch.mjs' - delete the local fetchYahooWithRetry function - replace 'const resp = await fetchYahooWithRetry(url, label); if (!resp) return X; const json = await resp.json()' with 'let json; try { json = await fetchYahooJson(url, { label }); } catch { return X; }' - prune now-unused CHROME_UA/sleep imports where applicable Latent bugs fixed in passing: - seed-etf-flows.mjs:23 and seed-market-quotes.mjs:38 referenced CHROME_UA without importing it (would throw ReferenceError at runtime if the helper were called). Now the call site is gone in etf-flows; in market-quotes CHROME_UA is properly imported because Finnhub call still uses it. seed-commodity-quotes also has fetchYahooChart1y (separate non-retry function for gold history). Migrated to use fetchYahooJson under the hood — preserves return shape, adds proxy fallback automatically. Verification: - node --check clean on all 4 modified seeders - npm run typecheck:all clean - npm run test:data: 5374/5374 pass Phase 2 of 2. * fix(_yahoo-fetch): log success AFTER parse + add _sleep DI seam for honest Retry-After test Greptile P2: "[YAHOO] proxy (curl) succeeded" was logged BEFORE JSON.parse(text). On malformed proxy JSON, Railway logs would show: [YAHOO] proxy (curl) succeeded for AAPL throw: Yahoo retries exhausted ... Contradictory + breaks the post-deploy log-grep verification this PR relies on ("look for [YAHOO] proxy (curl) succeeded"). Fix: parse first; success log only fires when parse succeeds AND the value is about to be returned. Greptile P3: 'Retry-After header parsed correctly' test used header value '0', but parseRetryAfterMs() treats non-positive seconds as null → helper falls through to default linear backoff. So the test was exercising the wrong branch despite its name. Fix: added _sleep DI opt seam to the helper. New test injects a sleep spy and asserts the captured duration: Retry-After: '7' → captured sleep == [7000] (Retry-After branch) no Retry-After → captured sleep == [10] (default backoff = retryBaseMs * 1) Two paired tests lock both branches separately so a future regression that collapses them is caught. Also added a log-ordering regression test: malformed proxy JSON must NOT emit the 'succeeded' log. Captures console.log into an array and asserts no 'proxy (curl) succeeded' line appeared before the throw. Verification: - tests/yahoo-fetch.test.mjs: 13/13 (was 11, +2) - npm run test:data: 5376/5376 (+2) - npm run typecheck:all: clean Followup commits on PR #3120.	2026-04-16 09:25:06 +04:00
Elie Habib	57414e4762	fix(open-meteo): curl proxy as second-choice when CONNECT proxy fails (#3119 ) * fix(open-meteo): curl proxy as second-choice when CONNECT proxy fails Decodo's CONNECT egress and curl egress reach DIFFERENT IP pools (per scripts/_proxy-utils.cjs:67). Probed 2026-04-16 against Yahoo Finance: Yahoo via CONNECT (httpsProxyFetchRaw): HTTP 404 Yahoo via curl (curlFetch): HTTP 200 For Open-Meteo both paths happen to work today, but pinning the helper to one path is a single point of failure if Decodo rebalances pools, or if Open-Meteo starts behaving like Yahoo. PR #3118 wired only the CONNECT path (`httpsProxyFetchRaw`); this commit adds curl as a second-choice attempt that runs only when CONNECT also fails. Cascade: direct retries (3) → CONNECT proxy (1) → curl proxy (1) → throw Steady-state cost: zero. Curl exec only runs when CONNECT also failed. Final exhausted-throw now appends the LAST proxy error too, so on-call sees both upstream signals (direct + proxy) instead of just direct. Tests: added 4 cases locking the cascade behavior: - CONNECT fails → curl succeeds: returns curl data, neither throws - CONNECT succeeds: curl never invoked (cost gate) - CONNECT fails AND curl fails: throws exhausted with both errors visible in the message (HTTP 429 from direct + curl 502 from proxy) - curl returns malformed JSON: caught + warns + throws exhausted Updated 2 existing tests to also stub _proxyCurlFetcher so they don't shell out to real curl when CONNECT is mocked-failed (would have run real curl with proxy.test:8000 → 8s timeout per test). Verification: - tests/open-meteo-proxy-fallback.test.mjs → 12/12 pass (was 8, +4 new) - npm run test:data → 5367/5367 (+4) - npm run typecheck:all → clean Followup to PR #3118. * fix: CONNECT leg uses resolveProxyForConnect; lock production defaults P1 from PR #3119 review: the cascade was logged as 'CONNECT proxy → curl proxy' but BOTH legs were resolving via resolveProxy() — which rewrites gate.decodo.com → us.decodo.com for curl egress. So the 'two-leg cascade' was actually one Decodo egress pool wearing two transport mechanisms. Defeats the redundancy this PR is supposed to provide. Fix: import resolveProxyForConnect (preserves gate.decodo.com — the host Decodo routes via its CONNECT egress pool, distinct from the curl-egress pool reached by us.decodo.com via resolveProxy). CONNECT leg uses resolveProxyForConnect; curl leg uses resolveProxy. Matches the established pattern in scripts/seed-portwatch-chokepoints-ref.mjs:33-37 and scripts/seed-recovery-external-debt.mjs:31-35. Refactored test seams: split single _proxyResolver into _connectProxyResolver + _curlProxyResolver. Test files inject both. P2 fix: every cascade test injected _proxyResolver, so the suite stayed green even when production defaults were misconfigured. Exported _PROXY_DEFAULTS object and added 2 lock-tests: 1. CONNECT leg uses resolveProxyForConnect, curl leg uses resolveProxy (reference equality on each of 4 default fields). 2. connect/curl resolvers are different functions — guards against the 'collapsed cascade' regression class generally, not just this specific instance. Updated the 8 existing cascade tests to inject BOTH resolvers. The docstring at the top of the file now spells out the wiring invariant and points to the lock-tests. Verification: - tests/open-meteo-proxy-fallback.test.mjs: 14/14 pass (+2) - npm run test:data: 5369/5369 (+2) - npm run typecheck:all: clean Followup commit on PR #3119. * fix(open-meteo): future-proof sync curlFetch call with Promise.resolve+await Greptile P2: _proxyCurlFetcher (curlFetch / execFileSync) is sync today, adjacent CONNECT path is async (await _proxyFetcher(...)). A future refactor of curlFetch to async would silently break this line — JSON.parse would receive a Promise<string> instead of a string and explode at parse time, not at the obvious call site. Wrapping with await Promise.resolve(...) is a no-op for the current sync implementation but auto-handles a future async refactor. Comment spells out the contract so the wrap doesn't read as cargo-cult. Tests still 14/14.	2026-04-16 09:24:12 +04:00
Elie Habib	5d1c8625e9	fix(seed-climate-zone-normals): proxy fallback when Open-Meteo 429s on Railway IP (#3118 ) * fix(seed-climate-zone-normals): proxy fallback when Open-Meteo 429s on Railway IP Railway logs.1776312819911.log showed seed-climate-zone-normals failing every batch with HTTP 429 from Open-Meteo's free-tier per-IP throttle (2026-04-16). The seeder retried with 2/4/8/16s backoff but exhausted without ever falling back to the project's Decodo proxy infrastructure that other rate-limited sources (FRED, IMF) already use. Open-Meteo throttles by source IP. Railway containers share IP pools and get 429 storms whenever zone-normals fires (monthly cron — high churn when it runs). Result: PR #3097's bake clock for climate:zone-normals:v1 couldn't start, because the seeder couldn't write the contract envelope even when manually triggered. Fix: after direct retries exhaust, _open-meteo-archive.mjs falls back to httpsProxyFetchRaw (Decodo) — same pattern as fredFetchJson and imfFetchJson in _seed-utils.mjs. Skips silently if no proxy is configured (preserves existing behavior in non-Railway envs). Added tests/open-meteo-proxy-fallback.test.mjs (4 cases): - 429 with no proxy → throws after exhausting retries (pre-fix behavior preserved) - 200 OK → returns parsed batch without touching proxy path - batch size mismatch → throws even on 200 - Non-retryable 500 → break out, attempt proxy, throw exhausted (no extra direct retry — matches new control flow) Verification: npm run test:data → 5359/5359, +4 new. node --check clean. Same pattern can be applied to any other helper that fetches Open-Meteo (grep 'open-meteo' scripts/) if more 429s show up. * fix: proxy fallback runs on thrown direct errors + actually-exercised tests Addresses two PR #3118 review findings. P1: catch block did 'throw err' on the final direct attempt, silently bypassing the proxy fallback for thrown-error cases (timeout, ECONNRESET, DNS failures). Only non-OK HTTP responses reached the proxy path. Fix: record the error in lastDirectError and 'break' so control falls through to the proxy fallback regardless of whether the direct path failed via thrown error or non-OK status. Also: include lastDirectError context in the final 'retries exhausted' message + Error.cause so on-call can see what triggered the fallback attempt (was: opaque 'retries exhausted'). P2: tests didn't exercise the actual proxy path. Refactored helper to accept _proxyResolver and _proxyFetcher opt overrides (production defaults to real resolveProxy/httpsProxyFetchRaw from _seed-utils.mjs; tests inject mocks). Added 4 new cases: - 429 + proxy succeeds → returns proxy data - thrown fetch error on final retry → proxy fallback runs (P1 regression guard with explicit assertion: directCalls=2, proxyCalls=1) - 429 + proxy ALSO fails → throws exhausted, original HTTP 429 in message + cause chain - Proxy returns wrong batch size → caught + warns + throws exhausted Verification: - tests/open-meteo-proxy-fallback.test.mjs: 8/8 pass (4 added) - npm run test:data: 5363/5363 pass (+4 from prior 5359) - node --check clean	2026-04-16 08:28:05 +04:00
Elie Habib	e6a6d4e326	fix(bundle-runner): stream child stdio + SIGKILL escalation on timeout (#3114 ) * fix(bundle-runner): stream child stdio + SIGKILL escalation on timeout Silent Railway crashes in seed-bundle-portwatch — container exits after ~7min with ZERO logs from the hanging section. Root cause in the runner, not the seeder: execFile buffers child stdout until the callback fires, and its default SIGTERM never escalates to SIGKILL, so a child with in-flight HTTPS sockets can outlive the timeout and be killed by the container limit before any error is logged. Switch to spawn + live line-prefixed streaming. On timeout, send SIGTERM, then SIGKILL after a 10s grace. Always log the terminal reason (timeout / exit code / signal) so the next failing bundle surfaces the hung section on its own line instead of going dark. Applies to all 15 seed-bundle-.mjs services that use this runner. fix(bundle-runner): guard double-resolve, update docstring, add tests Review follow-ups: - Idempotent settle() so spawn 'error' + 'close' can't double-resolve - Header comment reflects spawn + streaming + SIGKILL behavior - tests/bundle-runner.test.mjs covers live streaming, SIGKILL escalation when a child ignores SIGTERM, and non-zero exit reporting * fix(bundle-runner): address PR review — declare softKill before settle, handle stdio error * fix(bundle-runner): log terminal reason BEFORE SIGKILL grace + include grace in budget Review P1 follow-up. Two gaps the previous commit left open: 1. A section with timeoutMs close to Railway's ~10min container cap could be killed by the container mid-grace, before the "Failed ... timeout" line reached the log stream. Fix: emit the terminal Failed line at the moment softKill fires (before SIGTERM), so the reason is flushed BEFORE any grace window that could be truncated by a container kill. 2. The admission check used raw timeoutMs, but worst-case runtime is timeoutMs + KILL_GRACE_MS when the child ignores SIGTERM. A section that "fit" the budget could still overrun. Fix: compare elapsed + timeout + grace against maxBundleMs. close handler still settles the promise but no longer re-logs on the timeout path (alreadyLogged flag). New test asserts the Failed line precedes SIGKILL escalation, and that budget accounts for grace.	2026-04-16 07:58:18 +04:00
Elie Habib	13446a2170	fix(seed-contract-probe): send Origin header so /api/bootstrap boundary check doesn't 401 (#3100 ) * fix(seed-contract-probe): send Origin header so /api/bootstrap boundary check doesn't 401 Production probe returned {boundary: [{endpoint: '/api/bootstrap', pass: false, status: 401, reason: 'status:401'}]}. Root cause: checkPublicBoundary's self-fetch had no Origin header, so /api/bootstrap's validateApiKey() treated it as a non-browser caller and required an API key. Fix: set Origin: https://worldmonitor.app on the boundary self-fetch. This takes the trusted-browser path without needing to embed an API key in the probe. The probe runs edge-side with x-probe-secret internal auth; emulating a trusted browser is only for boundary response-shape verification. Tests still 17/17. * fix(seed-contract-probe): explicit User-Agent on boundary self-fetch Per AGENTS.md, server-side fetches must include a UA. middleware.ts:138 returns 403 for !ua \|\| ua.length < 10 on non-public paths, and /api/bootstrap is not in PUBLIC_API_PATHS — the probe works today only because Vercel Edge implicitly adds a UA. Making it explicit. Addresses greptile P2 on PR #3100.	2026-04-15 15:34:38 +04:00
Elie Habib	1b4335353f	fix(sentry): suppress 6 noise patterns from triage (#3105 ) * fix(sentry): suppress 6 noise patterns flagged in triage Add filters so resolved issues don't re-fire: - Convex "Connection lost while action was in flight" → ignoreErrors - Convex WS onmessage JSON.parse truncation (Ping/Updated frames) → beforeSend gated on onmessage frame - chrome/moz/safari-extension frames intercepting fetch → beforeSend - Sentry SDK breadcrumb null.contains DOM crash → beforeSend gated on sentry chunk - bare "Failed to fetch" (no TypeError: prefix in msg) → extend existing regex * test(sentry): tighten onmessage guard + add tests for 3 new filters Review feedback from PR #3105: - add !hasFirstParty guard to Convex WS onmessage JSON-parse filter - add 6 test cases covering chrome-extension drop, sentry null.contains gate, and Convex onmessage suppression + first-party regression paths * fix(sentry): gate extension and sentry-contains filters on !hasFirstParty Review feedback PR #3105 (round 2): - extension-frame drop no longer suppresses events when a first-party frame is also on the stack (a real app regression could have an extension wrapper around it) - sentry null.contains filter no longer suppresses when a first-party frame is present (Sentry wraps first-party handlers, so a genuine el.contains() bug produces a stack with both main-.js and sentry-.js) Adds 3 more tests covering the !hasFirstParty boundary.	2026-04-15 15:28:15 +04:00
Elie Habib	90f4ac0f78	feat(consumer-prices): strict search-hit validator (shadow mode) (#3101 ) * feat(consumer-prices): add 'candidate' match state + negativeTokens schema Schema foundation for the strict-validator plan: - migration 008 widens product_matches.match_status CHECK to include 'candidate' so weak search hits can be persisted without entering aggregates (aggregate.ts + snapshots filter on ('auto','approved') so candidates are excluded automatically). - BasketItemSchema gains optional negativeTokens[] — config-driven reject tokens for obvious class errors (e.g. 'canned' for fresh tomatoes). Product-taxonomy splits like plain vs greek yogurt belong in separate substitutionGroup values, not here. - upsertProductMatch accepts 'candidate' and writes evidence_json so reviewers can see why a match was downgraded. * feat(consumer-prices): add validateSearchHit pure helper + known-bad test fixtures Deterministic post-extraction validator that replaces the boolean isTitlePlausible gate for scoring and candidate triage. Evaluates four signals and returns { ok, score, reasons, signals }: - class-error rejects from BasketItem.negativeTokens (whole-token match for single words; substring match for hyphenated entries like 'plant-based' so 'Plant-Based Yogurt' trips without needing token-splitting gymnastics) - non-food indicators (seeds, fertilizer, planting) — shared with the legacy gate - token-overlap ratio over identity tokens (>2 chars, non-packaging) - quantity-window conformance against minBaseQty/maxBaseQty Score is a 0..1 weighted sum (overlap 0.55, size 0.35/0.2/0, class- clean 0.10). AUTO_MATCH_THRESHOLD=0.75 exported for the scrape-side auto-vs-candidate decision. Locked all five bad log examples into regression tests and added matching positive cases so the rule set proves both sides of the boundary. Also added vitest.config.ts so consumer-prices-core tests run under its own config instead of inheriting the worldmonitor root config (which excludes this directory). * feat(consumer-prices): wire validator (shadow) + replace 1.0 auto-match search.ts: - Thread BasketItem constraints (baseUnit, min/maxBaseQty, negativeTokens, substitutionGroup) through discoverTargets → fetchTarget → parseListing using explicit named fields, not an opaque JSON blob. - _extractFromUrl now runs validateSearchHit alongside isTitlePlausible. Legacy gate remains the hard gate; validator is shadow-only for now — when legacy accepts but validator rejects, a [search:shadow-reject] line is logged with reasons + score so the rollout diff report can inform the decision to flip the gate. No live behavior change. - ValidatorResult attached to SearchPayload + rawPayload so scrape.ts can score the match without re-running the validator. scrape.ts: - Remove unconditional matchScore:1.0 / status:'auto' insert. Use the validator score from the adapter payload. Hits with ok=true and score >= AUTO_MATCH_THRESHOLD (0.75) keep 'auto'; everything else (including validator.ok=false) writes 'candidate' with evidence_json carrying the reasons + signals. Aggregates filter on ('auto','approved') so candidates are excluded automatically. - Adapters without a validator (exa-search, etc.) fall back to the legacy 1.0/auto behavior so this PR is a no-op for non-search paths. * feat(consumer-prices): populate negativeTokens for 6 known-bad groups * fix(consumer-prices): enforce validator on pin path + drop 'cane' from sugar rejects Addresses PR #3101 review: 1. Pinned direct hits bypassed the validator downgrade — the new auto-vs-candidate decision only ran inside the !wasDirectHit block, so a pin that drifted onto the wrong product (the steady-state common path) would still flow poisoned prices into aggregates through the existing 'auto' match. Now: before inserting an observation, if the direct hit's validator.ok === false, skip the observation and route the target through handlePinError so the pin soft-disables after 3 strikes. Legacy isTitlePlausible continues to gate the pin extraction itself. 2. 'cane' was a hard reject for sugar_white across all 10 baskets but 'white cane sugar' is a legitimate SKU descriptor — would have downgraded real products to candidate and dropped coverage. Removed from every essentials_.yaml sugar_white negativeTokens list. Added a regression test that locks in 'Silver Spoon White Cane Sugar 1kg' as a must-pass positive case. fix(consumer-prices): strip size tokens from identity + protect approved rows Addresses PR #3101 round 2 review: 1. Compact size tokens ("1kg", "500g", "250ml") were kept as identity tokens. Firecrawl emits size spaced ("1 kg"), which tokenises to ["1","kg"] — both below the length>2 floor — so the compact "1kg" token could never match. Short canonical names like "Onions 1kg" lost 0.5 token overlap and legitimate hits landed at score 0.725 < AUTO_MATCH_THRESHOLD, silently downgrading to candidate. Size fidelity is already enforced by the quantity-window check; identity tokens now ignore /^\d+(?:\.\d+)?[a-z]+$/. New regression test locks in "Fresh Red Onions 1 kg" as a must-pass case. 2. upsertProductMatch's DO UPDATE unconditionally wrote EXCLUDED.status. A re-scrape whose validator scored an already-approved URL below 0.75 would silently demote human-curated 'approved' rows to 'candidate'. Added a CASE guard so approved stays approved; every other state follows the new validator verdict. * fix(consumer-prices): widen curated-state guard to review + rejected PR #3101 round 3: the CASE only protected 'approved' from being overwritten. 'review' (written by validate.ts when a price is an outlier, or by humans sending a row back) and 'rejected' (human block) are equally curated — a re-scrape under this path silently overwrites them with the fresh validator verdict and re-enables the URL in aggregate queries on the next pass. Widen the immutable set to ('approved','review','rejected'). Also stop clearing pin_disabled_at on those rows so a quarantined pin keeps its disabled flag until the review workflow resolves it. * fix(analyze-stock): classify dividend frequency by median gap recentDivs.length within a hard 365.25-day window misclassifies quarterly payers whose last-year Q1 payment falls just outside the cutoff — common after mid-April each year, when Date.now() - 365.25d lands after Jan's payment timestamp. The test 'non-zero CAGR for a quarterly payer' flaked calendar-dependently for this reason. Prefer median inter-payment interval: quarterly = ~91d median gap, regardless of where the trailing-12-month window happens to bisect the payment series. Falls back to the old count when <2 entries exist. Also documents the CAGR filter invariant in the test helper. * fix(analyze-stock): suppress frequency when no recent divs + detect regime slowdowns Addresses PR #3102 review: 1. Suspended programs no longer leak a frequency badge. When recentDivs is empty, dividendYield and trailingAnnualDividendRate are both 0; emitting 'Quarterly' derived from historical median would contradict those zeros in the UI. paymentsPerYear now short-circuits to 0 before the interval classifier runs. 2. Whole-history median-gap no longer masks cadence regime changes. The reconciliation now depends on trailing-year count: recent >= 3 → interval classifier (robust to calendar drift) recent 1..2 → inspect most-recent inter-payment gap: > 180d = real slowdown, trust count (Annual) <= 180d = calendar drift, trust interval (Quarterly) recent 0 → empty frequency (suspended) The interval classifier itself is now scoped to the last 2 years so it responds to regime changes instead of averaging over 5y of history. Regression tests: - 'emits empty frequency when the dividend program has been suspended' — 3y of quarterly history + 18mo silence must report '' not 'Quarterly'. - 'detects a recent quarterly → annual cadence change' — 12 historical quarterly payments + 1 recent annual payment must report 'Annual'. * fix(analyze-stock): scope interval median to trailing year when recent>=3 Addresses PR #3102 review round 2: the reconciler's recent>=3 branch called paymentsPerYearFromInterval(entries), which scopes to the last 2 years. A monthly→quarterly shift (12 monthly payments in year -2..-1 plus 4 quarterly in year -1..0) produced a 2-year median of ~30d and misclassified as Monthly even though the current trailing-year cadence is clearly quarterly. Pass recentDivs directly to the interval classifier when recent>=3. Two payments in the trailing year = 1 gap which suffices for the median (gap count >=1, median well-defined). The historical-window 2y scoping still applies for the recent 1..2 branch, where we actively need history to distinguish drift from slowdown. Regression test: 12 monthly payments from -13..-24 months ago + 4 quarterly payments inside the trailing year must classify as Quarterly. * fix(analyze-stock): use true median (avg of two middles) for even gap counts PR #3102 P2: gaps[floor(length/2)] returns the upper-middle value for even-length arrays, biasing toward slower cadence at classifier thresholds when the trailing-year sample is small. Use the average of the two middles for even lengths. Harmless on 5-year histories with 50+ gaps where values cluster, but correct at sparse sample sizes where the trailing-year branch can have only 2–3 gaps.	2026-04-15 14:28:18 +04:00
Elie Habib	fffc5d9607	fix(analyze-stock): classify dividend frequency by median gap (#3102 ) * fix(analyze-stock): classify dividend frequency by median gap recentDivs.length within a hard 365.25-day window misclassifies quarterly payers whose last-year Q1 payment falls just outside the cutoff — common after mid-April each year, when Date.now() - 365.25d lands after Jan's payment timestamp. The test 'non-zero CAGR for a quarterly payer' flaked calendar-dependently for this reason. Prefer median inter-payment interval: quarterly = ~91d median gap, regardless of where the trailing-12-month window happens to bisect the payment series. Falls back to the old count when <2 entries exist. Also documents the CAGR filter invariant in the test helper. * fix(analyze-stock): suppress frequency when no recent divs + detect regime slowdowns Addresses PR #3102 review: 1. Suspended programs no longer leak a frequency badge. When recentDivs is empty, dividendYield and trailingAnnualDividendRate are both 0; emitting 'Quarterly' derived from historical median would contradict those zeros in the UI. paymentsPerYear now short-circuits to 0 before the interval classifier runs. 2. Whole-history median-gap no longer masks cadence regime changes. The reconciliation now depends on trailing-year count: recent >= 3 → interval classifier (robust to calendar drift) recent 1..2 → inspect most-recent inter-payment gap: > 180d = real slowdown, trust count (Annual) <= 180d = calendar drift, trust interval (Quarterly) recent 0 → empty frequency (suspended) The interval classifier itself is now scoped to the last 2 years so it responds to regime changes instead of averaging over 5y of history. Regression tests: - 'emits empty frequency when the dividend program has been suspended' — 3y of quarterly history + 18mo silence must report '' not 'Quarterly'. - 'detects a recent quarterly → annual cadence change' — 12 historical quarterly payments + 1 recent annual payment must report 'Annual'. * fix(analyze-stock): scope interval median to trailing year when recent>=3 Addresses PR #3102 review round 2: the reconciler's recent>=3 branch called paymentsPerYearFromInterval(entries), which scopes to the last 2 years. A monthly→quarterly shift (12 monthly payments in year -2..-1 plus 4 quarterly in year -1..0) produced a 2-year median of ~30d and misclassified as Monthly even though the current trailing-year cadence is clearly quarterly. Pass recentDivs directly to the interval classifier when recent>=3. Two payments in the trailing year = 1 gap which suffices for the median (gap count >=1, median well-defined). The historical-window 2y scoping still applies for the recent 1..2 branch, where we actively need history to distinguish drift from slowdown. Regression test: 12 monthly payments from -13..-24 months ago + 4 quarterly payments inside the trailing year must classify as Quarterly. * fix(analyze-stock): use true median (avg of two middles) for even gap counts PR #3102 P2: gaps[floor(length/2)] returns the upper-middle value for even-length arrays, biasing toward slower cadence at classifier thresholds when the trailing-year sample is small. Use the average of the two middles for even lengths. Harmless on 5-year histories with 50+ gaps where values cluster, but correct at sparse sample sizes where the trailing-year branch can have only 2–3 gaps.	2026-04-15 14:00:57 +04:00
Elie Habib	1346946f15	fix(middleware): allow /api/seed-contract-probe to bypass bot UA filter (#3099 ) Vercel log showed 'Middleware 403 Forbidden' on /api/seed-contract-probe for both curl-from-ops and UptimeRobot requests. middleware.ts's BOT_UA regex matches 'curl/' and 'bot', so any monitoring/probe UA was blocked before reaching the handler — even though the probe has its own RELAY_SHARED_SECRET auth that makes the UA check redundant. Added /api/seed-contract-probe to PUBLIC_API_PATHS (joining /api/version and /api/health). Safe: the endpoint enforces x-probe-secret matching RELAY_SHARED_SECRET internally; bypassing the generic UA gate does not reduce security. Commented the allowlist to spell out the invariant: entries must carry their own auth, because this list disables the middleware's generic bot gate. Verified via Vercel Inspector log trace: Firewall: bypass → OK Middleware: 403 Forbidden ← this commit fixes it Handler: (unreachable before fix)	2026-04-15 09:40:35 +04:00
Elie Habib	044598346e	feat(seed-contract): PR 2a — runSeed envelope dual-write + 91 seeders migrated (#3097 ) * feat(seed-contract): PR 2a — runSeed envelope dual-write + 91 seeders migrated Opt-in contract path in runSeed: when opts.declareRecords is provided, write {_seed, data} envelope to the canonical key alongside legacy seed-meta:* (dual-write). State machine: OK / OK_ZERO / RETRY with zeroIsValid opt. declareRecords throws or returns non-integer → hard fail (contract violation). extraKeys[] support per-key declareRecords; each extra key writes its own envelope. Legacy seeders (no declareRecords) entirely unchanged. Migrated all 91 scripts/seed-.mjs to contract mode. Each exports declareRecords returning the canonical record count, and passes schemaVersion: 1 + maxStaleMin (matched to api/health.js SEED_META, or 2.5x interval where no registry entry exists). Contract conformance reports 84/86 seeders with full descriptor (2 pre-existing warnings). Legacy seed-meta keys still written so unmigrated readers keep working; follow-up slices flip health.js + readers to envelope-first. Tests: 61/61 PR 1 tests still pass. Next slices for PR 2: - api/health.js registry collapse + 15 seed-bundle-.mjs canonicalKey wiring - reader migration (mcp, resilience, aviation, displacement, regional-snapshot) - direct writers — ais-relay.cjs, consumer-prices-core publish.ts - public-boundary stripSeedEnvelope + test migration Plan: docs/plans/2026-04-14-002-fix-runseed-zero-record-lockout-plan.md fix(seed-contract): unwrap envelopes in internal cross-seed readers After PR 2a enveloped 91 canonical keys as {_seed, data}, every script-side reader that returned the raw parsed JSON started silently handing callers the envelope instead of the bare payload. WoW baselines (bigmac, grocery-basket, fear-greed) saw undefined .countries / .composite; seed-climate-anomalies saw undefined .normals from climate:zone-normals:v1; seed-thermal-escalation saw undefined .fireDetections from wildfire:fires:v1; seed-forecasts' ~40-key pipeline batch returned envelopes for every input. Fix: route every script-side reader through unwrapEnvelope(...).data. Legacy bare-shape values pass through unchanged (unwrapEnvelope returns {_seed: null, data: raw} for any non-envelope shape). Changed: - scripts/_seed-utils.mjs: import unwrapEnvelope; redisGet, readSeedSnapshot, verifySeedKey all unwrap. Exported new readCanonicalValue() helper for cross-seed consumers. - 18 seed-.mjs scripts with local redisGet-style helpers or inline fetch patched to unwrap via the envelope source module (subagent sweep). - scripts/seed-forecasts.mjs pipeline batch: parse() unwraps each result. - scripts/seed-energy-spine.mjs redisMget: unwraps each result. Tests: - tests/seed-utils-envelope-reads.test.mjs: 7 new cases covering envelope + legacy + null paths for readSeedSnapshot and verifySeedKey. - Full seed suite: 67/67 pass (was 61, +6 new). Addresses both of user's P1 findings on PR #3097. feat(seed-contract): envelope-aware reads in server + api helpers Every RPC and public-boundary reader now automatically strips _seed from contract-mode canonical keys. Legacy bare-shape values pass through unchanged (unwrapEnvelope no-ops on non-envelope shapes). Changed helpers (one-place fix — unblocks ~60 call sites): - server/_shared/redis.ts: getRawJson, getCachedJson, getCachedJsonBatch unwrap by default. cachedFetchJson inherits via getCachedJson. - api/_upstash-json.js: readJsonFromUpstash unwraps (covers api/mcp.ts tool responses + all its canonical-key reads). - api/bootstrap.js: getCachedJsonBatch unwraps (public-boundary — clients never see envelope metadata). Left intentionally unchanged: - api/health.js / api/seed-health.js: read only seed-meta:* keys which remain bare-shape during dual-write. unwrapEnvelope already imported at the meta-read boundary (PR 1) as a defensive no-op. Tests: 67/67 seed tests pass. typecheck + typecheck:api clean. This is the blast-radius fix the PR #3097 review called out — external readers that would otherwise see {_seed, data} after the writer side migrated. * fix(test): strip export keyword in vm.runInContext'd seed source cross-source-signals-regulatory.test.mjs loads scripts/seed-cross-source-signals.mjs via vm.runInContext, which cannot parse ESM `export` syntax. PR 2a added `export function declareRecords` to every seeder, which broke this test's static-analysis approach. Fix: strip the `export` keyword from the declareRecords line in the preprocessed source string so the function body still evaluates as a plain declaration. Full test:data suite: 5307/5307 pass. typecheck + typecheck:api clean. * feat(seed-contract): consumer-prices publish.ts writes envelopes Wrap the 5 canonical keys written by consumer-prices-core/src/jobs/publish.ts (overview, movers:7d/30d, freshness, categories:7d/30d/90d, retailer-spread, basket-series) in {_seed, data} envelopes. Legacy seed-meta:<key> writes preserved for dual-write. Inlined a buildEnvelope helper (10 lines) rather than taking a cross-package dependency — consumer-prices-core is a standalone npm package. Documented the four-file parity contract (mjs source, ts mirror, js edge mirror, this copy). Contract fields: sourceVersion='consumer-prices-core-publish-v1', schemaVersion=1, state='OK' (recordCount>0) or 'OK_ZERO' (legitimate zero). Typecheck: no new errors in publish.ts. * fix(seed-contract): 3 more server-side readers unwrap envelopes Found during final audit: - server/worldmonitor/resilience/v1/_shared.ts: resilience score reader parsed cached GetResilienceScoreResponse raw. Contract-mode seed-resilience-scores now envelopes those keys. - server/worldmonitor/resilience/v1/get-resilience-ranking.ts: p05/p95 interval lookup parsed raw from seed-resilience-scores' extra-key path. - server/worldmonitor/infrastructure/v1/_shared.ts: mgetJson() used for count-source keys (wildfire:fires:v1, news:insights:v1) which are both contract-mode now. All three now unwrap via server/_shared/seed-envelope. Legacy shapes pass through unchanged. Typecheck clean. * feat(seed-contract): ais-relay.cjs direct writes produce envelopes 32 canonical-key write sites in scripts/ais-relay.cjs now produce {_seed, data} envelopes. Inlined buildEnvelope() (CJS module can't require ESM source) + envelopeWrite(key, data, ttlSeconds, meta) wrapper. Enveloped keys span market bootstrap, aviation, cyber-threats, theater-posture, weather-alerts, economic spending/fred/worldbank, tech-events, corridor-risk, usni-fleet, shipping-stress, social:reddit, wsb-tickers, pizzint, product-catalog, chokepoint transits, ucdp-events, satellites, oref. Left bare (not seeded data keys): seed-meta:* (dual-write legacy), classifyCacheKey LLM cache, notam:prev-closed-state internal state, wm:notif:scan-dedup flags. Updated tests/ucdp-seed-resilience.test.mjs regex to accept both upstashSet (pre-contract) and envelopeWrite (post-contract) call patterns. * feat(seed-contract): 15 bundle files add canonicalKey for envelope gate 54 bundle sections across 12 files now declare canonicalKey alongside the existing seedMetaKey. _bundle-runner.mjs (from PR 1) prefers canonicalKey when both are present — gates section runs on envelope._seed.fetchedAt read directly from the data key, eliminating the meta-outlives-data class of bugs. Files touched: - climate (5), derived-signals (2), ecb-eu (3), energy-sources (6), health (2), imf-extended (4), macro (10), market-backup (9), portwatch (4), relay-backup (2), resilience-recovery (5), static-ref (2) Skipped (14 sections, 3 whole bundles): multi-key writers, dynamic templated keys (displacement year-scoped), or non-runSeed orchestrators (regional brief cron, resilience-scores' 222-country publish, validation/ benchmark scripts). These continue to use seedMetaKey or their own gate. seedMetaKey preserved everywhere — dual-write. _bundle-runner.mjs falls back to legacy when canonicalKey is absent. All 15 bundles pass node --check. test:data: 5307/5307. typecheck:all: clean. * fix(seed-contract): 4 PR #3097 review P1s — transform/declareRecords mismatches + envelope leaks Addresses both P1 findings and the extra-key seed-meta leak surfaced in review: 1. runSeed helper-level invariant: seed-meta:* keys NEVER envelope. scripts/_seed-utils.mjs exports shouldEnvelopeKey(key) — returns false for any key starting with 'seed-meta:'. Both atomicPublish (canonical) and writeExtraKey (extras) gate the envelope wrap through this helper. Fixes seed-iea-oil-stocks' ANALYSIS_META_EXTRA_KEY silently getting enveloped, which broke health.js parsing the value as bare {fetchedAt, recordCount}. Also defends against any future manual writeExtraKey(..., envelopeMeta) call that happens to target a seed-meta:* key. 2. seed-token-panels canonical + extras fixed. publishTransform returns data.defi (the defi panel itself, shape {tokens}). Old declareRecords counted data.defi.tokens + data.ai.tokens + data.other.tokens on the transformed payload → 0 → RETRY path → canonical market:defi-tokens:v1 never wrote, and because runSeed returned before the extraKeys loop, market:ai-tokens:v1 + market:other-tokens:v1 stayed stale too. New: declareRecords counts data.tokens on the transformed shape. AI_KEY + OTHER_KEY extras reuse the same function (transforms return structurally identical panels). Added isMain guard so test imports don't fire runSeed. 3. api/product-catalog.js cached reader unwraps envelope. ais-relay.cjs now envelopes product-catalog:v2 via envelopeWrite(). The edge reader did raw JSON.parse(result) and returned {_seed, data} to clients, breaking the cached path. Fix: import unwrapEnvelope from ./_seed-envelope.js, apply after JSON.parse. One site — :238-241 is downstream of getFromCache(), so the single reader fix covers both. 4. Regression lock tests/seed-contract-transform-regressions.test.mjs (11 cases): - shouldEnvelopeKey invariant: seed-meta:* false, canonical true - Token-panels declareRecords works on transformed shape (canonical + both extras) - Explicit repro of pre-fix buggy signature returning 0 — guards against revert - resolveRecordCount accepts 0, rejects non-integer - Product-catalog envelope unwrap returns bare shape; legacy passes through Verification: - npm run test:data → 5318/5318 pass (was 5307 — 11 new regressions) - npm run typecheck:all → clean - node --check on every modified script iea-oil-stocks canonical declareRecords was NOT broken (user confirmed during review — buildIndex preserves .members); only its ANALYSIS_META_EXTRA_KEY was affected, now covered generically by commit 1's helper invariant. * fix(seed-contract): seed-token-panels validateFn also runs on post-transform shape Review finding: fixing declareRecords wasn't sufficient — atomicPublish() runs validateFn(publishData) on the transformed payload too. seed-token-panels' validate() checked data.defi/.ai/.other on the transformed {tokens} shape, returned false, and runSeed took the early skipped-write branch (before even reaching the declareRecords RETRY logic). Net effect: same as before the declareRecords fix — canonical + both extras stayed stale. Fix: validate() now checks the canonical defi panel directly (Array.isArray (data?.tokens) && has at least one t.price > 0). AI/OTHER panels are validated implicitly by their own extraKey declareRecords on write. Audited the other 9 seeders with publishTransform (bls-series, bis-extended, bis-data, gdelt-intel, trade-flows, iea-oil-stocks, jodi-gas, sanctions-pressure, forecasts): all validateFn's correctly target the post-transform shape. Only token-panels regressed. Added 4 regression tests (tests/seed-contract-transform-regressions.test.mjs): - validate accepts transformed panel with priced tokens - validate rejects all-zero-price tokens - validate rejects empty/missing tokens - Explicit pre-fix repro (buggy old signature fails on transformed shape) Verification: - npm run test:data → 5322/5322 pass (was 5318; +4 new) - npm run typecheck:all → clean - node --check clean * feat(seed-contract): add /api/seed-contract-probe validation endpoint Single machine-readable gate for 'is PR #3097 working in production'. Replaces the curl/jq ritual with one authenticated edge call that returns HTTP 200 ok:true or 503 + failing check list. What it validates: - 8 canonical keys have {_seed, data} envelopes with required data fields and minRecords floors (fsi-eu, zone-normals, 3 token panels + minRecords guard against token-panels RETRY regression, product-catalog, wildfire, earthquakes). - 2 seed-meta:* keys remain BARE (shouldEnvelopeKey invariant; guards against iea-oil-stocks ANALYSIS_META_EXTRA_KEY-class regressions). - /api/product-catalog + /api/bootstrap responses contain no '_seed' leak. Auth: x-probe-secret header must match RELAY_SHARED_SECRET (reuses existing Vercel↔Railway internal trust boundary). Probe logic is exported (checkProbe, checkPublicBoundary, DEFAULT_PROBES) for hermetic testing. tests/seed-contract-probe.test.mjs covers every branch: envelope pass/fail on field/records/shape, bare pass/fail on shape/field, missing/malformed JSON, Redis non-2xx, boundary seed-leak detection, DEFAULT_PROBES sanity (seed-meta invariant present, token-panels minRecords guard present). Usage: curl -H "x-probe-secret: $RELAY_SHARED_SECRET" \ https://api.worldmonitor.app/api/seed-contract-probe PR 3 will extend the probe with a stricter mode that asserts seed-meta:* keys are GONE (not just bare) once legacy dual-write is removed. Verification: - tests/seed-contract-probe.test.mjs → 15/15 pass - npm run test:data → 5338/5338 (was 5322; +16 new incl. conformance) - npm run typecheck:all → clean * fix(seed-contract): tighten probe — minRecords on AI/OTHER + cache-path source header Review P2 findings: the probe's stated guards were weaker than advertised. 1. market:ai-tokens:v1 + market:other-tokens:v1 probes claimed to guard the token-panels extra-key RETRY regression but only checked shape='envelope' + dataHas:['tokens']. If an extra-key declareRecords regressed to 0, both probes would still pass because checkProbe() only inspects _seed.recordCount when minRecords is set. Now both enforce minRecords: 1. 2. /api/product-catalog boundary check only asserted no '_seed' leak — which is also true for the static fallback path. A broken cached reader (getFromCache returning null or throwing) could serve fallback silently and still pass this probe. Now: - api/product-catalog.js emits X-Product-Catalog-Source: cache\|dodo\|fallback on the response (the json() helper gained an optional source param wired to each of the three branches). - checkPublicBoundary declaratively requires that header's value match 'cache' for /api/product-catalog, so a fallback-serve fails the probe with reason 'source:fallback!=cache' or 'source:missing!=cache'. Test updates (tests/seed-contract-probe.test.mjs): - Boundary check reworked to use a BOUNDARY_CHECKS config with optional requireSourceHeader per endpoint. - New cases: served-from-cache passes, served-from-fallback fails with source mismatch, missing header fails, seed-leak still takes precedence, bad status fails. - Token-panels sanity test now asserts minRecords≥1 on all 3 panels. Verification: - tests/seed-contract-probe.test.mjs → 17/17 pass (was 15, +2 net) - npm run test:data → 5340/5340 - npm run typecheck:all → clean	2026-04-15 09:16:27 +04:00
Elie Habib	224d6fa2e3	fix(consumer-prices): count risers+fallers in movers recordCount (#3098 ) * fix(consumer-prices): count risers+fallers in movers recordCount Health endpoint reported consumerPricesMovers as EMPTY_DATA whenever the 30d window had zero risers, because recordCount's `??` chain in publish.ts picks only one sibling array. Bipolar payloads (risers[] + fallers[]) need the sum; otherwise a valid all-fallers payload registers as 0 records and trips false staleness alarms. Fix both the authoritative publish job and the manual fallback seed script. * fix(consumer-prices): floor movers recordCount at 1 + include essentialsSeries fallback Addresses PR #3098 review: 1. All-flat markets (every sampled item unchanged) legitimately produce risers=[] AND fallers=[] from buildMoversSnapshot. Summing the two still yields 0 → health reports EMPTY_DATA for a valid snapshot. Floor at 1; advanceSeedMeta already gates writes on upstream freshness, so this can't mask an upstream-unavailable case. 2. Seed script's non-movers fallback was missing essentialsSeries, so basket-series payloads from the manual script reported recordCount=1 instead of the series length. Align with publish.ts. * fix(consumer-prices): force recordCount=0 for upstreamUnavailable placeholders Addresses PR #3098 review: flooring movers at 1 in the manual fallback seeder also floored the synthetic emptyMovers() placeholder (upstreamUnavailable=true) the script writes when BASE_URL is unset or the upstream returns null. Since writeExtraKeyWithMeta always persists seed-meta, that made a real outage read green in api/health.js. Short-circuit upstreamUnavailable payloads to 0 so the outage surfaces.	2026-04-15 09:06:24 +04:00
Elie Habib	dc10e47197	feat(seed-contract): PR 1 foundation — envelope + contract + conformance test (#3095 ) * feat(seed-contract): PR 1 foundation — envelope helpers + contract validators + static conformance test Adds the foundational pieces for the unified seed contract rollout described in docs/plans/2026-04-14-002-fix-runseed-zero-record-lockout-plan.md. Behavior- preserving by construction: legacy-shape Redis values unwrap as { _seed: null, data: raw } and pass through every helper unchanged. New files: - scripts/_seed-envelope-source.mjs — single source of truth for unwrapEnvelope, stripSeedEnvelope, buildEnvelope. - api/_seed-envelope.js — edge-safe mirror (AGENTS.md:80 forbids api/* importing from server/). - server/_shared/seed-envelope.ts — TS mirror with SeedMeta, SeedEnvelope, UnwrapResult types. - scripts/_seed-contract.mjs — SeedContractError + validateDescriptor (10 required fields, 10 optional, unknown-field rejection) + resolveRecordCount (non-negative integer or throw). - scripts/verify-seed-envelope-parity.mjs — diffs function bodies between the two JS copies; TS copy guarded by tsc. - tests/seed-envelope.test.mjs — 14 tests for the three helpers (null, legacy-passthrough, stringified JSON, round-trip). - tests/seed-contract.test.mjs — 25 tests for validateDescriptor/ resolveRecordCount + a soft-warn conformance scan that STATICALLY parses scripts/seed-.mjs (never dynamic import — several seeders process.exit() at module load). Currently logs 91 seeders awaiting declareRecords migration. Wiring (minimal, behavior-preserving): - api/health.js: imports unwrapEnvelope; routes readSeedMeta's parsed value through it. Legacy meta has no _seed wrapper → passes through unchanged. - scripts/_bundle-runner.mjs: readSectionFreshness prefers envelope at section.canonicalKey when present, falls back to the existing seed-meta:<key> read via section.seedMetaKey (unchanged path today since no bundle defines canonicalKey yet). No seeder modified. No writes changed. All 5279 existing data tests still green; both typechecks clean; parity verifier green; 39 new tests pass. PR 2 will migrate seeders, bundles, and readers to envelope semantics. PR 3 removes the legacy path and hard-fails the conformance test. fix(seed-contract): address PR #3095 review — metaTtlSeconds opt, bundle fallback, strict conformance mode Review findings applied: P1 — metaTtlSeconds missing from OPTIONAL_FIELDS whitelist. scripts/seed-jodi-gas.mjs:250 passes metaTtlSeconds to runSeed(); field is consumed by _seed-utils writeSeedMeta. Without it in the whitelist, PR 2's validateDescriptor wiring would throw 'unknown field' the moment jodi-gas migrates. Added with a 'removed in PR 3' note. P2 — Bundle canonicalKey short-circuit over-runs during migration. readSectionFreshness previously returned null if canonicalKey had no envelope yet, even when a legacy seed-meta key was also declared — making every cron re-run the section. Fixed to fall through to seedMetaKey on null envelope so the transition state is safe. P3 — Conformance soft-warn signal was invisible in CI. tests/seed-contract.test.mjs now emits a t.diagnostic summary line ('N/M seeders export declareRecords') visible on every run and gates hard-fail behind SEED_CONTRACT_STRICT=1 so PR 3 can flip to strict without more code. Nitpick — parity regex missed 'export async function'. Added '(?:async\s+)?' to scripts/verify-seed-envelope-parity.mjs function extraction regex. Verified: 39 tests green, parity verifier green, strict mode correctly hard-fails with 91 seeders missing (expected during PR 1). * fix(seed-contract): address review round 2 — NaN/empty-string validation, Error cause, parity CI wiring P2 — Non-finite ttlSeconds/maxStaleMin bypassed validation. `typeof NaN === 'number'` and `NaN > 0 === false` meant a NaN duration passed the old typeof+<=0 checks and would have poisoned TTLs once validateDescriptor is wired into runSeed. Now gated by Number.isFinite, which rejects NaN and ±Infinity. Tests added for NaN/Infinity on both fields. P2 — Empty/whitespace-only strings for domain/resource/canonicalKey/sourceVersion bypassed validation. Added .trim() === '' rejection + tests per field. This mattered because canonicalKey='' would have landed writes at the empty key and seed-meta under a blank resource namespace. P3 — SeedContractError silently dropped the Error v2 cause option. Constructor now forwards { cause } through super() so err.cause works with standard tooling (Node's default stack printer, Sentry chained-cause serialization). resolveRecordCount's manual err.cause = err assignment was replaced with the options-bag form. Test added for both constructor direct-use and the resolveRecordCount wrap path. P3 — Parity verifier was not on an automated path. Added tests/seed-envelope-parity.test.mjs which spawns scripts/verify-seed-envelope-parity.mjs via execFile; non-zero exit (drift) → test fails. Now runs as part of `npm run test:data` (tsx --test tests/.test.mjs). Drift injection confirmed: sed -i modifying api/_seed-envelope.js makes the test fail with 'Command failed' from execFile. 51 tests total (was 39). All green on clean tree. fix(seed-contract): conformance test checks full descriptor, not just declareRecords Previous conformance check green-lit any seeder that exported declareRecords, even if the runSeed(...) call-site omitted other validateDescriptor-required opts (validateFn, ttlSeconds, sourceVersion, schemaVersion, maxStaleMin). That would have produced a false readiness signal for PR 3's strict flip: test goes green, but wiring validateDescriptor() into runSeed in PR 2 would still throw at runtime across the fleet. Examples verified on the PR head: - scripts/seed-cot.mjs:188-192 — no sourceVersion/schemaVersion/maxStaleMin - scripts/seed-market-breadth.mjs:121-124 — same - scripts/seed-jodi-gas.mjs:248-253 — no schemaVersion/maxStaleMin Now the conformance test: 1. AST-lite extracts the runSeed(...) call site with balanced parens, tolerating strings and comments. 2. Checks every REQUIRED_OPTS_FIELDS entry (validateFn, declareRecords, ttlSeconds, sourceVersion, schemaVersion, maxStaleMin) is present as an object key in that call-site. 3. Emits a per-file diagnostic listing missing fields. 4. Migration signal is now accurate: 0/91 seeders fully satisfy the descriptor (was claiming 0/91 missing just declareRecords). Matches the underlying validateDescriptor behavior. Verified: strict mode (SEED_CONTRACT_STRICT=1) surfaces 'opt:schemaVersion, opt:maxStaleMin' as missing fields per seeder — actionable for PR 2 migration work. 51 tests total (unchanged count; behavior change is in which seeders the one conformance test considers migrated). * fix(seed-contract): strip comments/strings before parsing runSeed() call site The conformance scanner located the first 'runSeed(' substring in the raw source, which caught commented-out mentions upstream of the real call. Offending files where this produced false 'incomplete' diagnoses: - scripts/seed-bis-data.mjs:209 // runSeed() calls process.exit(0)… real call at :220 - scripts/seed-economy.mjs:788 header comment mentioning runSeed() real call at :891 Three files had the same pattern. Under strict mode these would have been false hard failures in PR 3 even when the real descriptor was migrated. Fix: - stripCommentsAndStrings(src) produces a view where block comments, line comments, and string/template literals are replaced with spaces (line feeds preserved). Indices stay aligned with the original source so extractRunSeedCall can match against the stripped view and then slice the original source for the real call body. - descriptorFieldsPresent() also runs its field-presence regex against the stripped call body so '// TODO: validateFn' inside the call doesn't fool the check. - hasRunSeedCall() uses the stripped view too, which correctly excludes 5 seeders that only mentioned runSeed in comments. Count dropped 91→86 real callers. Added 4 targeted tests covering: - runSeed() inside a line comment ahead of the real call - runSeed() inside a block comment - runSeed() inside a string literal ("don't call runSeed() directly") - descriptor field names inside an inline comment don't count as present Verified on the actual files: seed-bis-data.mjs first real runSeed( in stripped source is at line 220 (was line 209 before fix). 40 tests total, all green. * fix(seed-contract): parity verifier survives unbalanced braces in string/template literals Addresses Greptile P2 on PR #3095: the body extractor in scripts/verify-seed-envelope-parity.mjs counted raw { and } on every character. A future helper body that legitimately contains `const marker = '{'` would have pushed depth past zero at the literal brace and truncated the body — silently masking drift in the rest of the function. Extracted the scan into scanBalanced(source, start, open, close) which skips characters inside line comments, block comments, and string / template literals (with escape handling and template-literal ${} recursion for interpolation). Call sites in extractFunctions updated to use the new scanner for both the arg-list parens and the function body braces. Made extractFunctions and scanBalanced exported so the new test file can exercise them directly. Gated main() behind an isMain check so importing the module from tests doesn't trigger process.exit. New tests in tests/seed-envelope-parity.test.mjs: - extractFunctions tolerates unbalanced braces in string literals - same for template literals - same for braces inside block comments - same for braces inside line comments - scanBalanced respects backslash-escapes inside strings - scanBalanced recurses into template-literal ${} interpolation Also addresses the other two Greptile P2s which were already fixed in earlier commits on this branch: - Empty-string gap (`99646dd9a`): .trim()==='' rejection added - SeedContractError cause drop (`99646dd9a`): constructor forwards cause through super's options bag per Error v2 spec 61 tests green. Both typechecks clean.	2026-04-14 22:11:56 +04:00
Sebastien Melki	e39ffc3c3b	feat(analytics): send sub status & planKey with Umami identity (#3093 ) * feat(analytics): send subscription status and planKey with Umami identity (#3092) Enhance identifyUser() to include subStatus and planKey from billing data. initAuthAnalytics() now subscribes to both auth and billing changes, re-identifying when subscription data arrives via Convex. Also identify all authenticated users, not just pro. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: store billing unsubscribe + add destroyAuthAnalytics teardown Address Greptile P2 review: store the billing unsubscribe for symmetry with _unsubAuth, and add destroyAuthAnalytics() so both listeners can be properly cleaned up. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(analytics): reset _lastSub on sign-out to prevent cross-user leak When user A signs out and user B signs in, the auth callback fires first with user B's id while _lastSub still holds A's subscription. identifyUser would then send (userB.id, userB.role, userA.status, userA.planKey) to Umami until billing's Convex update landed. Clear _lastSub in _syncIdentity's signed-out branch so the next sign-in starts clean — subStatus/planKey are simply omitted (both guards use != null) until billing delivers the new sub. * fix(analytics): reset _lastSub on direct user switch, not just sign-out App.ts:857-864 supports a direct user A -> user B account switch without an intermediate null auth state. The previous fix only cleared _lastSub inside _syncIdentity()'s sign-out branch, so on a direct switch the auth subscriber fired with user B while _lastSub still held user A's subscription, leaking A's status/planKey into B's Umami identify call. Detect user-id change in the auth subscriber and drop _lastSub before calling _syncIdentity(). Sign-out reset inside _syncIdentity is kept as belt-and-suspenders for the no-auth path. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Elie Habib <elie.habib@gmail.com>	2026-04-14 21:33:12 +04:00
Elie Habib	9b180d6ee2	fix(bundle-runner): wall-time budget to prevent Railway 10min SIGKILL (#3094 ) * fix(bundle-runner): enforce wall-time budget to prevent Railway 10min SIGKILL Railway cron services SIGKILL the container at 10min. When a bundle happened to have two heavy sections due in the same tick (e.g. PW-Main + PW-Port-Activity with timeoutMs totaling 15min+), the second section's stdout never flushed and Railway marked the run as crashed — even though earlier sections published successfully. - _bundle-runner.mjs: add maxBundleMs budget (default 9min, 60s headroom under Railway's 10min ceiling). Sections whose worst-case timeout would exceed the remaining budget are deferred to the next tick with a clear log line. Summary now reports ran/skipped/deferred/failed. - seed-bundle-portwatch.mjs: lower PW-Port-Activity timeoutMs 600s→420s so a single section can no longer consume the entire budget. Observed on 2026-04-14 16:03 UTC portwatch run: PW-Disruptions + PW-Main ran cleanly, PW-Port-Activity started with ~9m37s of Railway budget and its 10min execFile timeout, got SIGKILL'd before any output flushed, job marked as crash. * fix(bundle-runner): make maxBundleMs opt-in to avoid deferring other bundles Greptile PR review flagged P1: default maxBundleMs=540_000 silently applied to all runBundle callers. At least 12 sections across 7 other bundles (energy-sources, climate, resilience, resilience-validation, imf-extended, static-ref, health) have timeoutMs >= 540_000, which means 0 + 600_000 > 540_000 is true on every first tick — those sections would be permanently deferred with no alarm. Default to Infinity; portwatch opts in via { maxBundleMs: 540_000 }. Other Railway-constrained bundles can opt in as their timeouts are audited.	2026-04-14 21:08:40 +04:00
Elie Habib	51c9d2c95f	fix(seed-forecasts): pipeline timeout 10s→45s + BATCH_SIZE 10→5 (#3090 ) * fix(seed-forecasts): pipeline timeout 10s→45s + BATCH_SIZE 10→5 Root cause (validated via STRLEN probe + Railway log): readInputKeys() batched GETs against Upstash REST /pipeline deterministically timed out at the 10s budget. ~40 input keys totaling ~2.27 MB; top 5 keys (ucdp 657KB + chokepoints 500KB + cyber-threats 390KB + commodities 192KB + gpsjam 174KB) = 90% of payload. Worst-case co-located batch at BATCH_SIZE=10 was ~1.55 MB; at Upstash REST observed slow-spike floor (~100 KB/s implied by failure pattern), 1.55 MB needs ~16s, exceeding the 10s budget. Production proof — Railway log 2026-04-14 10:01 UTC: Reading input data from Redis... Retry 1/2 in 1000ms: The operation was aborted due to timeout ... 12 consecutive abort-timeouts (4 outer × ~3 inner) ... FETCH FAILED: The operation was aborted due to timeout === Failed gracefully (188070ms) === Fix: BATCH_SIZE 10 → 5 (reduces probability of tail co-location) timeout 10s → 45s (2.4× headroom at observed floor) Round-trips 4 → 8; per-batch overhead ~150-500ms total amortized by undici keep-alive. Negligible vs hourly cadence. What this PR does NOT do (5-agent deepen-plan review caught these): - Does NOT remove input keys. Initial draft proposed dropping 3 "stub" keys. All 3 are LIVE: producers traced to seed-insights, seed-conflict-intel, and seed-forecasts itself (L14919 — self- referential EMA windows state key). Zero-byte STRLEN snapshot caught inter-cycle gaps, not dead keys. Removing reads would break newsDigest, acledEvents, and EMA windows. - Does NOT bump api/health.js maxStaleMin. Right fix = make read succeed, not widen alarm. - Does NOT extract shared batchedPipelineGet helper. Tracked. Latent sibling bugs (separate PRs per feedback_no_pr_pollution): - seed-cross-source-signals.mjs:163 (15s, 23 keys) - seed-correlation.mjs:26 (10s, 9 market keys) - seed-energy-spine.mjs:71 (30s, 300 cmds/batch) - seed-resilience-scores.mjs:73 (30s, BATCH=50 writes) Plan: docs/plans/2026-04-14-001-fix-seed-forecasts-pipeline-timeout-plan.md Skill: ~/.claude/skills/upstash-pipeline-payload-timeout/SKILL.md Tests: node --check + typecheck + typecheck:api clean. * docs(seed-forecasts): drop personal-machine path from comment Architecture-strategist review on PR #3090 caught the same anti-pattern flagged on PR #3088: the inline comment referenced ~/.claude/skills/... which only resolves on the author's machine. Replaced with a self-contained "diagnostic methodology" paragraph so the rationale is portable and contributors on CI / other machines see complete context. No code change. * docs(seed-forecasts): correct comment math + add reorder follow-up note (Greptile P2) Greptile review on PR #3090 caught: BATCH_SIZE divides the keys array deterministically by index, so the worst batch is FIXED by array order (not random co-location as my comment implied). Verified live with STRLEN: batch 2 (indices 5-9 = chokepoints + iran + ucdp + unrest + outages) is 1.17 MB, not the 1.9 MB worst-random-case I claimed. Updated comment to reflect: - Actual deterministic worst-case batch (1.17 MB). - Headroom recalc: 1.17 MB at 100 KB/s = ~12s; 45s gives 3.7× margin. - Architectural insight as follow-up: interleaving heavies (chokepoints + ucdp) with smalls in the keys array would split the deterministic worst-case across two batches, halving per-request payload. Tracked for a future PR (no PR pollution). No code change; comment-only correction.	2026-04-14 15:27:40 +04:00
Elie Habib	5152ed7a06	fix(spr-policies): wire seed-spr-policies into seed-bundle-energy-sources (#3089 ) scripts/seed-spr-policies.mjs exists as a runnable seeder with proper isMain guard, but no Railway service ever invokes it. /api/health shows sprPolicies as EMPTY with seedAgeMin: null — energy:spr-policies:v1 has literally never been written to Redis. Wired it as a bundle entry alongside the other energy seeders. Cadence: weekly. Static-registry data (scripts/data/spr-policies.json) only needs to run once after deploys + restarts to populate the key; the 400d maxStaleMin in api/health.js confirms intent. 60s timeout is generous for a JSON-file read + Redis write. Tests: node --check on the bundle clean; npm run typecheck clean.	2026-04-14 13:29:35 +04:00
Elie Habib	6fdd9d8440	fix(commodity-quotes): move .then() block to opts.afterPublish — resurrect 3 dead Redis writes (#3088 ) * fix(commodity-quotes): move .then() block to opts.afterPublish — 3 dead Redis writes resurrected runSeed() ends with process.exit(0) on success, terminating the Node process before any chained .then() microtask runs. The block at seed-commodity-quotes.mjs:258-287 has been silently dead — its three Redis writes never executed: - market:commodities:v1:<symbols> (alias for canonical key) - market:quotes:v1:<symbols> (with finnhubSkipped flags) - market:gold-extended:v1 (cross-currency XAU + drivers) Production proof from Railway log 2026-04-14 08:50:31: === market:commodities Seed === [Yahoo] ^VIX: $18.64 (-2.51%) ... 33 Yahoo symbols ... Verified: data present in Redis === Done (8691ms) === Starting Container ← next cron; ZERO [Gold] log lines Health endpoint shows goldExtended as EMPTY with seedAgeMin: null — that key has literally never been written. Fix: - Extract post-publish writes into writeCompanionKeys(data). - Wire it via opts.afterPublish, which IS awaited inside runSeed BEFORE process.exit. The companion writes will now actually run. - Wrap the call in try/catch so companion-key failures log explicitly rather than masking the canonical write success. - Remove the now-redundant module-level seedData variable; afterPublish receives the canonical data as its first argument. Effect: next cron cycle (~5 min) writes all 3 keys for the first time. goldExtended health flips from EMPTY (seedAgeMin: null) to OK. Tests: node --check syntax clean; npm run typecheck clean. The fix is structural — verified by the runSeed contract in _seed-utils.mjs ~L792-794 which awaits afterPublish before process.exit. * fix(commodity-quotes): split required vs optional companion writes (Codex P1 #3088) Codex P1 review caught: the catch-all wrapper around writeCompanionKeys masked Upstash failures on the REQUIRED alias keys (market:commodities:v1 and market:quotes:v1). A transient Upstash 5xx mid-write would log a warning and return — runSeed would then write canonical seed-meta as fresh and exit 0, leaving the alias keys stale or missing with no health signal. That defeats the entire point of this PR: the canonical key would look healthy while the resurrected companions silently failed again. Fix: split into two functions with different error semantics. writeRequiredCompanionKeys(data): - Both alias-key writes propagate errors. If either Upstash write fails, the exception bubbles to runSeed's outer try/catch, lock is released, seed-meta is NOT stamped fresh, and the outer .catch fires process.exit(1). Health correctly flags STALE_SEED on next /api/health poll. writeOptionalGoldExtended(): - Yahoo XAU fetch + writeExtraKeyWithMeta wrapped in its own try/catch. Yahoo flakiness only degrades the gold-extended panel (which has its own seed-meta key that goes stale independently); the canonical commodity publish stays healthy. Per Codex's own suggestion: only the OPTIONAL gold-extended branch is downgraded to a warning. Required writes are not. Tests: node --check + typecheck clean. * fix(commodity-quotes): address Greptile P2 — parallelize alias writes, inline JSDoc Two P2 findings on PR #3088: 1. Sequential alias-key writes — independent Redis writes were awaited one after the other. Wrapped in Promise.all to halve latency on every seed cycle. No read-after-write ordering required. 2. Local-machine path in JSDoc (~/.claude/skills/...) was unresolvable for other contributors. Inlined the key facts (background, error semantics, parallelization rationale) directly into the JSDoc so the context survives on any workstation or CI runner. Also dropped a stale duplicate JSDoc block that was left over from the prior refactor. Tests: node --check + typecheck clean.	2026-04-14 13:29:05 +04:00
Elie Habib	56103684c6	fix(seed-utils): payloadBytes>0 fallback for runSeed recordCount auto-detect (#3087 ) * fix(seed-utils): payloadBytes>0 fallback for runSeed recordCount auto-detect Phantom EMPTY_DATA in /api/health: 16 of 21 failing health checks were caused by seeders publishing custom payload shapes without passing opts.recordCount. The auto-detect chain in runSeed only matches a hardcoded list of shapes; anything else falls through to recordCount=0 and triggers EMPTY_DATA in /api/health even though the payload is fully populated and verified in Redis. Smoking-gun log signature from Railway 2026-04-14: [BLS-Series] recordCount:0, payloadBytes:6093, Verified: data present [VPD-Tracker] recordCount:0, payloadBytes:3068853, Verified: data present [Disease-Outbreaks] recordCount:0, payloadBytes:92684, Verified: data present Fix: - Extract recordCount logic into pure exported computeRecordCount() for unit testability. - Add payloadBytes>0 → 1 fallback at the end of the resolution chain. When triggered, console.warn names the seeder so the author can add an explicit opts.recordCount for accurate dashboards. - Resolution order unchanged for existing callers: opts.recordCount wins, then known-shape auto-detect, then the new payloadBytes fallback, then 0. Explicit opts.recordCount=0 still wins (test covers it). Effect: clears 16 phantom CRITs on the next bundle cycle. Per-seeder warns will surface in logs so we can add accurate opts.recordCount in follow-up. Tests: 11 new computeRecordCount cases (opts precedence, auto-detect shapes, fallback behavior, no-spurious-warn, explicit-zero precedence). seed-utils.test.mjs 18/18 + seed-utils-empty-data-failure.test.mjs 2/2 + typecheck clean. * test(seed-utils): address Greptile P2 — replace it.each mutation, add empty-known-shape edge case Greptile review on PR #3087 caught two minor test issues: 1. `it.each = undefined` mutated the imported `it` function (ES module live binding). Replaced with a plain comment. 2. Missing edge case: `data: { events: [] }` with payloadBytes > 0 should NOT trigger the payloadBytes fallback because detectedFromShape resolves to a real 0 (not undefined). Without this guard, a future regression could collapse the !=null check and silently mask genuine empty upstream cycles as "1 record". Test added. Tests: 19/19 (was 18). No production code change.	2026-04-14 13:28:00 +04:00
Sebastien Melki	3314db2664	fix(relay): treat quiet hours start === end as disabled, not 24/7 (#3061 ) (#3066 ) * fix(relay): treat quiet hours start === end as disabled, not 24/7 (#3061) When quietHoursStart equalled quietHoursEnd, the midnight-spanning branch evaluated `hour >= N \|\| hour < N` which is true for all hours, silently suppressing all non-critical alerts permanently. Add an early return for start === end in the relay and reject the combination in Convex validation. Closes #3061 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: cross-check quiet hours start/end against persisted value on single-field updates Addresses Greptile review: validateQuietHoursArgs only caught start===end when both arrived in the same call. Now the mutation handlers also check against the DB record to prevent sequential single-field updates from creating a start===end state. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: gate quiet hours start===end check on effectiveEnabled Only enforce the start !== end invariant when quiet hours are effectively enabled. This allows users with legacy start===end records to disable quiet hours, change timezone/override, or recover from old bad state without getting locked out. Addresses koala73's P1 review feedback on #3066. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(relay): extract quiet-hours + consolidate equality check, add tests - Move isInQuietHours/toLocalHour to scripts/lib/quiet-hours.cjs so they are testable without importing the full relay (which has top-level side effects and env requirements). - Drop the unconditional start===end check from validateQuietHoursArgs; the effectiveEnabled-guarded check in setQuietHours / setQuietHoursForUser is now the single source of truth. Previously a user disabling quiet hours with start===end would be rejected even though the values are irrelevant when disabled. - Add tests/quiet-hours.test.mjs covering: disabled, start===end regression (#3061), midnight-spanning window, same-day window, inclusive/exclusive bounds, invalid timezone, timezone handling, defaults. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Elie Habib <elie.habib@gmail.com>	2026-04-14 13:14:53 +04:00
Elie Habib	29d39462e1	fix(crypto-quotes): use CoinPaprika as primary, CoinGecko as fallback (#3086 ) * fix(crypto-quotes): use CoinPaprika as primary, CoinGecko as fallback Railway bundle log 2026-04-14 07:17:10 UTC showed seed-bundle-market-backup finishing with failed:1. Crypto-Quotes hit CoinGecko 429s on every retry: [Crypto-Quotes] CoinGecko 429 — waiting 10s (1/5) ... (5 attempts, 10/20/30/40/50s back-off) [Crypto-Quotes] Crypto-Quotes failed after 120.0s: timeout Root cause: CoinGecko 5-step retry budget (10+20+30+40+50 = 150s) exceeds the bundle 120s section timeout, so the existing CoinPaprika fallback never runs — the child process is killed mid-retry. Fix: swap source order. CoinPaprika is now primary; CoinGecko is retained as fallback for sparkline_in_7d data (CoinPaprika does not provide sparklines). Probed CoinPaprika live: all 10 mapped crypto IDs present in /v1/tickers, no auth required. Trade-off: when CoinPaprika is healthy, sparkline arrays will be empty. Acceptable — the panel already handles undefined sparklines, and the alternative (no quotes at all because CoinGecko is rate-limited) is worse. Tests: crypto-config.test.mjs 6/6, typecheck + typecheck:api clean. * fix(crypto-quotes): require full coverage in validate() — no partial snapshots Codex review on PR #3086 caught: validate() only required >=1 quote with positive price. With the new CoinPaprika-primary path, a single dropped or renamed ticker would silently publish a 9/10 snapshot. Health stays green while one tracked asset disappears from the panel — exactly the silent data-loss class we want to avoid on a fixed-cardinality top-10 feed. Tightened validate() to require: - quotes.length === CRYPTO_IDS.length (full cardinality) - every quote has Number.isFinite(price) && price > 0 - every configured symbol is present in the response (defends against duplicate IDs masquerading as full coverage) When the validator rejects, runSeed() takes the skipped path: existing TTL is extended, seed-meta is bumped with count=0, and the Railway log will scream which symbol is missing on the next cycle so the broken CoinPaprika mapping is caught immediately. Tests: crypto-config.test.mjs 6/6, typecheck clean. * fix(crypto-quotes): address Greptile P2 — fallback retry budget + sourceVersion P2-1: CoinGecko fallback was still wired with maxAttempts=5 (10+20+30+40+50 = 150s budget), so when CoinPaprika fails the fallback path could itself overrun the bundle's 120s section timeout — recreating the exact failure mode this PR fixes. Capped at maxAttempts=2 (10+20=30s) so the fallback always finishes well within the bundle window. P2-2: sourceVersion in seed-meta was still 'coingecko-markets' even though CoinPaprika is now primary. Changed to 'coinpaprika-tickers+coingecko-fallback' so health dashboards and on-call runbooks see the real data path. Tests: crypto-config.test.mjs 6/6, typecheck clean.	2026-04-14 12:43:11 +04:00
Elie Habib	b92b22bd20	fix(fuel-prices): tolerate Brazil ANP failure to stop Railway crash-loop (#3085 ) * fix(fuel-prices): tolerate Brazil ANP failure to stop Railway crash-loop Brazil gov.br is structurally unreachable from Railway IPs: - Decodo proxy 403s all .gov.br CONNECTs by policy - Direct fetch fails undici TLS handshake from Railway egress After PR #3082 tightened the publish gate to require zero failed sources, every run exits 1 -> Railway "Deployment crashed" banner + STALE_SEED. Add TOLERATED_FAILURES = {'Brazil'}; validateFuel ignores tolerated names when checking failedSources. Critical regions (US/GB/MY) and the >=30 country floor still gate publish. Brazil's outage stays visible via the existing [FRESHNESS] log. * fix(fuel-prices): rotate :prev on tolerated-only failures to keep WoW fresh Reviewer catch: after tolerating Brazil, allSourcesFresh stays false forever → :prev never rotates → panel's WoW stretches into 2-week, 3-week, ... deltas for every non-Brazil country while still labeled 'week-over-week'. Gate :prev rotation on untolerated failures only. Tolerated sources are absent from the snapshot entirely, so rotating is safe (no stale-self- compare poisoning next week). * fix(fuel-prices): distinguish tolerated vs untolerated sources in [DEGRADED] log Greptile P2: the [DEGRADED] message said 'publish will be rejected' even when only tolerated sources (Brazil) failed — confusing for operators watching Railway logs.	2026-04-14 12:36:07 +04:00
Elie Habib	16d868bd6d	fix(comtrade): retry on transient 5xx to stop silent reporter drops (#3084 ) * fix(comtrade): retry on transient 5xx to stop silent reporter drops Railway log 2026-04-14 bilateral-hs4 run: India (699) hit HTTP 503 on both batches and was dropped entirely from the snapshot. Iran (364) hit 500 mid-batch. All three Comtrade seeders (bilateral-hs4, trade-flows, recovery-import-hhi) retried only on 429; any 5xx = silent coverage gap. Adds bounded 5xx retry (3 attempts, 5s then 15s backoff) in each seeder. On giveup caller returns empty so resume cache picks the reporter up next cycle. Exports isTransientComtrade + fetchBilateral for unit tests; 6 new tests pin the contract. * fix(comtrade): collapse 429+5xx into single classification loop (PR review) Reviewer caught that the 429 branch bypassed the 5xx retry path: a 429 -> 503 sequence would return [] immediately after the 429-retry without consuming any transient-5xx retries, leaving the silent-drop bug intact for that specific sequence. Both seeders now use a single while-loop that reclassifies each response: - 429 (once, with full backoff) - 5xx (up to 2 retries with 5s/15s or 5s/10s backoff) - anything else -> break and return Two new tests lock in the mixed case: 429 then 503 still consumes transient retries; consecutive 429s cap at one wait. 8/8 pass. * test(comtrade): inject sleep to drop retry-test runtime from 185s to 277ms PR review flagged that the new mixed 429+5xx tests slept for the full production backoffs (60s + 5s + 15s = 80s per test), making the unit suite unnecessarily slow and CI-timeout-prone. Add a module-local _retrySleep binding with __setSleepForTests(fn) export. Production keeps the real sleep; tests swap in a no-op that records requested delays. The sleepCalls array now pins the production cadence so a future refactor that changes [60_000, 5_000, 15_000] has to update the test too. 8/8 pass in 277ms (down from 185s). * test(comtrade): update 60s-on-429 static-analysis regex for _retrySleep alias The existing substring check 'sleep(60_000)' broke after the previous commit renamed production calls to _retrySleep(60_000) for test injection. Widen the regex to accept either the bare or injectable form; both preserve the 60s production cadence. * test(comtrade): extend retry coverage to trade-flows + recovery-import-hhi Three P2 review findings addressed: 1. partnerCode 000 in the succeeds-on-third test was changed to real code 156. groupByProduct() filters 0/000 downstream, so the test was passing while the user-visible seeder would still drop the row. 2. trade-flows and recovery-import-hhi had no unit coverage for their new retry state machines. Adds 7 tests covering succeed-first, retry-then- succeed, giveup-after-3, and mixed 429+5xx classification. 3. Both seeders now expose __setSleepForTests + export their fetch helper. seed-trade-flows also gets an isMain guard so tests import without triggering a real seed run. sleepCalls asserts pin the production cadence. 15 retry tests pass in 183ms. Full suite 5198/5198. * fix(trade-flows): per-reporter coverage gate blocks full-reporter flatline PR #3084 review P1: the existing MIN_COVERAGE_RATIO=0.70 gate was global-only. 6 reporters × 5 commodities = 30 pairs; losing one entire reporter (e.g. the India/Taiwan silent-drop this PR is trying to stop) is only 5/30 missing = 83% global coverage, which passed. Adds a per-reporter coverage floor: each reporter must have ≥40% of its commodities populated (2 of 5). Global gate kept as the broad- outage catch; per-reporter gate catches the single-reporter flatline. Extracts checkCoverage() as a pure function for unit testing — mocking 30+ fetches in fetchAllFlows is fragile, and the failure mode lives in the gate, not the fetcher. 6 new tests cover: 30/30 ok; India flatline → reject at 83% global; Taiwan flatline; broad outage → reject via global gate; healthy 80% global with 4/5 per-reporter → ok; per-reporter breakdown shape. 5204/5204 tests pass.	2026-04-14 12:29:17 +04:00
Elie Habib	7e7ca70faf	fix(fuel-prices): resilient seeder — proxy, retry, stale-carry-forward, strict gate (#3082 ) * fix(fuel-prices): resilient seeder — proxy, retry, stale-carry-forward, strict gate Addresses 2026-04-07 run where 4 of 7 sources failed (NZ 403, BR/MX fetch failed) and the seeder silently published 30 countries with Brazil/Mexico/NZ vanishing from the UI. - Startup proxy diagnostic so PROXY_URL misconfigs are immediately visible. - New fetchWithProxyPreferred (proxy-first, direct fallback) + withFuelRetry (3 attempts, backoff) wrapping NZ/BR/MX upstream calls. - Swap MX from dead datos.gob.mx to CRE publicacionexterna XML (13k stations). - Stale-carry-forward failed sources from :prev snapshot (stale: true) instead of dropping countries; fresh-only ranking; skip WoW for stale entries. - Gate :prev rotation on all-sources-succeeded so partial runs don't poison next week's WoW. - Strict validateFn: >=25 countries AND US+GB+MY fresh. Prior gate was >=1. - emptyDataIsFailure: true so validation fail doesnt refresh seed-meta. - Wrap imperative body in main() + isMain guard; export parseCREStationPrices and validateFuel; 9 new unit tests. * fix(fuel-prices): remove stale-carry-forward, harden validator (PR review) Reviewer flagged two P1s on the prior commit: 1. stale-carry-forward inserted stale: true rows into the published payload, but the proto schema and panel have no staleness render path. Users would see week-old BR/MX/NZ prices as current. Resilience turned into a freshness bug. 2. Validator counted stale-carried entries toward the floor. US/GB/MY fresh + 22 stale still passed, refreshing seed-meta.fetchedAt and leaving health operationally healthy indefinitely. Hid the outage. Fix: remove stale-carry-forward entirely. Tighten validator to require countries.length >= 30, US+GB+MY present, and failedSources.length === 0. Partial-failure runs now rejected → 10-day cache TTL serves last healthy snapshot → health STALE_SEED after maxStaleMin. Correct, visible signal. Drops dead code: SOURCE_COUNTRY_CODES, staleCarried/freshCountries, stale WoW skip. Tests updated for the failedSources gate.	2026-04-14 09:16:23 +04:00
Elie Habib	45c98284da	fix(trade): correct UN Comtrade reporter codes for India and Taiwan (#3081 ) * fix(trade): correct UN Comtrade reporter codes for India and Taiwan seed-trade-flows was fetching India (356) and Taiwan (158) using UN M49 codes. UN Comtrade registers India as 699 and Taiwan as 490 ("Other Asia, nes"), so every fetch silently returned count:0 — 10 of 30 reporter×commodity pairs yielded zero records per run. Live probe confirms 699→500 India rows, 490→159 Taiwan rows. - Update reporter codes in seed-trade-flows.mjs and its consumer list-comtrade-flows.ts. - Update ISO2_TO_COMTRADE in _comtrade-reporters.ts and seed-energy-spine.mjs so energy-shock and sector-dependency RPCs resolve the correct Comtrade keys for IN/TW. - Add IN/TW overrides to seed-comtrade-bilateral-hs4 and seed-recovery-import-hhi (they iterate shared/un-to-iso2.json which must remain pure M49 for other callers). - Fix partner-dedupe bug in seed-trade-flows: the preview endpoint returns partner-level rows; keying by (flowCode, year) without summing kept only the last partner seen, so tradeValueUsd was a random counterparty's value, not the World aggregate. Sum across partners and label as World. - Add a 70% coverage floor on reporter×commodity pairs so an entire reporter silently flatlining now throws in Phase 1 (TTL extend, no seed-meta refresh) rather than publishing a partial snapshot. - Sync energy-shock test fixture. * fix(trade): apply Comtrade IN/TW overrides to runtime consumers too Follow-up to PR review: the seed-side fix was incomplete because two request-time consumers still mapped iso2 → M49 (356/158) when hitting Comtrade or reading the (now-rekeyed) seeded cache. - server/worldmonitor/supply-chain/v1/_bilateral-hs4-lazy.ts: apply the IN=699 / TW=490 override when deriving ISO2_TO_UN, so the lazy bilateral-hs4 fetch path used by get-route-impact and get-country-chokepoint-index stops silently returning count:0 for India and Taiwan when the seeded cache is cold. - src/utils/country-codes.ts: add iso2ToComtradeReporterCode helper with the override baked in. Keep iso2ToUnCode as pure M49 (used elsewhere for legitimate M49 semantics). - src/app/country-intel.ts: switch the listComtradeFlows call on the country brief page to the new helper so IN/TW resolve to the same reporter codes the seeder now writes under.	2026-04-14 09:05:50 +04:00
Elie Habib	9d27ff0d6a	fix(seeds): strict-floor validators must not poison seed-meta on empty (#3078 ) * fix(seeds): strict-floor validators must not poison seed-meta on empty When `runSeed`'s validateFn rejected (empty/short data), seed-meta was refreshed with `fetchedAt=now, recordCount=0`. Bundle runners read `fetchedAt` to decide skip — so one transient empty fetch locked the IMF-extended bundle (30-day cadence) out for a full month. Adds opt-in `emptyDataIsFailure` flag that skips the meta refresh on validation failure, letting the bundle retry next cron fire and health flip to STALE_SEED. Wires it on all four IMF/WEO seeders (floor 150-190 countries), which structurally can't have legitimate empty results. Default behavior unchanged for quiet-period feeds (news, events) where empty is normal. Observed: Railway log 2026-04-13 18:58 — imf-external validation fail; next fire 8h later skipped "483min ago / interval 43200min". * test(seeds): regression coverage for emptyDataIsFailure branch Static-analysis guard against the PR #3078 regression reintroducing itself: - Asserts runSeed gates writeFreshnessMetadata on opts.emptyDataIsFailure and that extendExistingTtl still runs in both branches (cache preserved). - Asserts the four strict-floor IMF seeders (external/growth/labor/macro) pass emptyDataIsFailure: true. Prevents silent bundle-lockout if someone removes the gate or adds a new strict-floor seeder without the flag. * fix(seeds): strict-floor failure must exit(1) + behavioral test P2 (surfacing upstream failures in bundle summary): Strict-floor seeders with emptyDataIsFailure:true now process.exit(1) after logging FAILURE. _bundle-runner's spawnSeed wraps execFile, so non-zero exit rejects → failed++ increments → bundle itself exits 1. Before: bundle logged 'Done' and ran++ on a poisoned upstream, hiding 30-day outages from Railway monitoring. P3 (behavioral regression coverage, replacing static source-shape test): Stubs globalThis.fetch (Upstash REST) + process.exit to drive runSeed through both branches. Asserts on actual Redis commands: - strict path: zero seed-meta SET, pipeline EXPIRE still called, exit(1) - default path: exactly one seed-meta SET, exit(0) Catches future regressions where writeFreshnessMetadata is reintroduced indirectly, and is immune to cosmetic refactors of _seed-utils.mjs. * test(seeds): regression for emptyDataIsFailure meta-refresh gate Proves that validation failure with opts.emptyDataIsFailure:true does NOT write seed-meta (strict-floor seeders) while the default behavior DOES write count=0 meta (quiet-period feeds). Addresses PR #3078 review.	2026-04-14 09:02:21 +04:00
Elie Habib	1875531e2a	fix(seed-economy): retry proxy/EIA transients; gate stress index on full FRED coverage (#3080 ) * fix(seed-economy): retry proxy/EIA transients; fail stress index on missing FRED components Log review of 16 runs (2026-04-14 00:45–04:31 UTC) showed 50% degraded: - Decodo proxy flapped with HTTP 500/502/522, `fredFetchJson` fell back direct on first proxy error and FRED then returned 500 to Railway IP, dropping series. - 5 EIA panels (EnergyPrices, Crude, NatGas, SPR, Refinery) timed out in lockstep at 01:00 and 02:00, producing full `no write` runs. - StressIndex silently excluded missing FRED components (VIXCLS, T10Y3M, STLFSI4, ICSA), publishing a degraded composite as if healthy. Changes: - fredFetchJson: retry proxy 3x with jittered backoff on 5xx/522/timeout before falling back direct. - eiaFetchJson helper: 20s timeout (was 10s) + 3x retry on 5xx/timeout; wired into all EIA call-sites. - computeStressIndex: throw when any FRED-sourced component is missing; GSCPI (ais-relay) can still be absent. Caught in fetchAll so other secondary writes proceed but composite is not published degraded. * fix(seed-economy): narrow stress-index catch; don't retry 4xx in eiaFetchJson - computeStressIndex try/catch no longer wraps the Redis write so a write failure surfaces as a run error instead of being swallowed. - eiaFetchJson bails immediately on 4xx and non-transient thrown errors; only 5xx / timeouts / network resets are retried.	2026-04-14 08:55:01 +04:00
Elie Habib	93113174b8	fix(seed-fires): retry per-country FIRMS fetch once to cut silent coverage gaps (#3079 ) * fix(seed-fires): retry per-country FIRMS fetch once before giving up Turkey/North Korea/Russia/Iran/Israel-Gaza/Saudi/Syria failed ~1.1x per 30-min run in prod logs (77 fails / 69 runs, 2026-04-12→13), silently zeroing those regions on the map for up to ~20% of the day. Upstream is transiently flaky on large bboxes, not auth-related. Adds 1 retry with 5s backoff in fetchRegionSource. Worst-case added latency ~100s, still well under the 30-min cadence. * fix(seed-fires): align retry backoff with rate-limit pacing; extend lock TTL Addresses review: - Retry backoff 5s -> 6s matches the inter-call pacing budget; prevents breaching FIRMS free-tier 10 req/min under clustered fast-failures. - lockTtlMs 10m -> 25m; retry path doubles worst-case per-slot runtime (~71s) and can exceed the old lock, risking concurrent publish races. * fix(seed-fires): bump lock TTL to 40m to cover full worst-case retry runtime 27 slots × ~72s worst-case = 32.4 min, exceeds the previous 25m TTL. Bump to 40m so a hung/crashed run can't hold a stale lock forever while still safely covering legitimate long runs. Next 30m cron tick will see the lock held and skip, which is the intended behavior.	2026-04-14 08:51:45 +04:00
Elie Habib	21d33c4bb5	fix(hyperliquid-flow): fetch both default dex and xyz builder dex (#3077 ) Root cause: Hyperliquid's commodity and FX perps (xyz:CL, xyz:BRENTOIL, xyz:GOLD, xyz:SILVER, xyz:PLATINUM, xyz:PALLADIUM, xyz:COPPER, xyz:NATGAS, xyz:EUR, xyz:JPY) live on a separate 'xyz' builder dex, NOT the default perp dex. The MIT reference repo listed these with xyz: prefixes but didn't document that they require {type:metaAndAssetCtxs, dex:xyz} as a separate POST. Production symptom (Railway bundle logs 2026-04-14 04:10): [Hyperliquid-Flow] SKIPPED: validation failed (empty data) The seeder polled the default dex only, matched 4 of 14 whitelisted assets (BTC/ETH/SOL/PAXG), and validateFn rejected snapshots with <12 assets. Seed-meta was refreshed on the skipped path so health stayed OK but market:hyperliquid:flow:v1 was never written. Fix: - New fetchAllMetaAndCtxs(): parallel-fetches both dexes and merges {universe, assetCtxs} by concatenation. xyz entries already carry the xyz: prefix in their universe names. - New validateDexPayload(raw, dexLabel, minUniverse): per-dex floor so the thinner xyz dex (~63 entries) does not false-trip the default floor of 50. Errors include the dex label for debuggability. - validateUpstream(): back-compat wrapper — accepts either the legacy single-dex [meta, assetCtxs] tuple (buildSnapshot tests) or the merged {universe, assetCtxs} shape from fetchAllMetaAndCtxs. Tests: 37/37 green. New tests cover dual-dex fetch merge, cross-dex error propagation, xyz floor accept/reject, and merged-shape pass-through.	2026-04-14 08:28:57 +04:00
Elie Habib	30ddad28d7	fix(seeds): upstream API drift — SPDR XLSX + IMF IRFCL + IMF-External BX/BM drop (#3076 ) * fix(seeds): gold-etf XLSX migration, IRFCL dataflow, imf-external BX/BM drop Three upstream-drift regressions caught from the market-backup + imf-extended bundle logs. Root causes validated by live API probes before coding. 1. seed-gold-etf-flows: SPDR /assets/dynamic/GLD/GLD_US_archive_EN.csv now silently returns a PDF (Content-Type: application/pdf) — site migrated to api.spdrgoldshares.com/api/v1/historical-archive which serves XLSX. Swapped the CSV parser for an exceljs-based XLSX parser. Adds browser-ish Origin/Referer headers (SPDR swaps payload for PDF without them) and a Content-Type guard. Column layout: Date \| Closing \| ... \| Tonnes \| Total NAV USD. 2. seed-gold-cb-reserves: PR #3038 shipped with IMF.STA/IFS dataflow and 3-segment key M..<indicator> — both wrong. IFS isn't exposed on api.imf.org (HTTP 204). Gold-reserves data lives under IMF.STA/IRFCL with 4 dimensions (COUNTRY.INDICATOR.SECTOR.FREQUENCY). Verified live: .IRFCLDT1_IRFCL56_FTO..M returns 111 series. Switched to IRFCL + IRFCLDT1_IRFCL56_FTO (fine troy ounces) and fallbacks. The valueIsOunces flag now matches _FTO suffix (keeps legacy _OZT/OUNCE detection for backward compat). 3. seed-imf-external: BX/BM (export/import LEVELS, USD) WEO coverage collapsed to ~10 countries in late 2026 — the seeder's >=190-country validate floor was failing every run. Dropped BX/BM from fetch + join; kept BCA (~209) / TM_RPCH (~189) / TX_RPCH (~190). exportsUsd / importsUsd / tradeBalanceUsd fields kept as explicit null so consumers see a deliberate gap. validate floor lowered to 180 (BCA∪TM∪TX union). Tests: 32/32 pass. Rewrote gold-etf tests to use synthetic XLSX fixtures (exceljs resolved from scripts/package.json since repo root doesn't have it). Updated imf-external tests for the new indicator set + null BX/BM contract + 180-country validate threshold. * fix(mcp): update get_country_macro description after BX/BM drop Consumer-side catch during PR #3076 validation: the MCP tool description still promised 'exports, imports, trade balance' fields that the seeder fix nulls out. LLM consumers would be directed to exportsUsd/importsUsd/ tradeBalanceUsd fields that always return null since seed-imf-external dropped BX/BM (WEO coverage collapsed to ~10 countries). Updated description to list only the indicators actually populated (currentAccountUsd, importVolumePctChg, exportVolumePctChg) with an explicit note about the null trade-level fields so LLMs don't attempt to use them. * fix(gold-cb-reserves): compute real pctOfReserves + add exceljs to root Follow-up to #3076 review. 1. pctOfReserves was hardcoded to 0 with a "IFS doesn't give us total reserves" comment. That was a lazy limitation claim — IMF IRFCL DOES expose total official reserve assets as IRFCLDT1_IRFCL65_USD parallel to the gold USD series IRFCLDT1_IRFCL56_USD. fetchCbReserves now pulls all three indicators (primary FTO tonnage + the two USD series) via Promise.allSettled and passes the USD pair to buildReservesPayload so it can compute the true gold share per country. Falls back to 0 only when the denominator is genuinely missing for that country (IRFCL coverage: 114 gold_usd, 96 total_usd series; ~15% of holders have no matched-month denominator). 3-month lookback window absorbs per-country reporting lag. 2. CI fix: tests couldn't find exceljs because scripts/package.json is not workspace-linked to the repo root — CI runs `npm ci` at root only. Added exceljs@^4.4.0 as a root devDependency. Runtime seeder continues to resolve it from scripts/node_modules via Node's upward module resolution. 3 new tests cover pct computation, missing-denominator fallback, and the 3-month lookback window.	2026-04-14 08:19:47 +04:00

... 3 4 5 6 7 ...

3613 Commits