3613 Commits

Author SHA1 Message Date
Elie Habib
0d076a689f style(auth): green Sign In button matching 2D selected state (#3147)
* style(auth): switch Sign In button to green accent to match 2D selected state

Use the same --green background and --bg text color as .map-dim-btn.active
so the CTA matches the existing visual vocabulary and stands out against
the dark header. Adds a soft green glow on hover/focus-visible via
color-mix (already used extensively in this stylesheet).

* fix(a11y): pin Sign In text color to #0a0a0a for WCAG AA in light theme

Reviewer on #3147 flagged that using color: var(--bg) on the green
background fails WCAG AA in light theme. var(--bg) resolves to #f8f9fa,
which lands at ~3.13:1 on var(--green) #16a34a for a 13px label (AA
needs 4.5:1).

Pinning the label to #0a0a0a in both themes gives:
  Light  #0a0a0a on #16a34a is about 6.0:1 (AA pass)
  Dark   #0a0a0a on #44ff88 is about 15.0:1 (AAA pass)

Dark theme rendering is unchanged because var(--bg) already resolved
to #0a0a0a there. Only the light-theme regression is corrected.
2026-04-17 20:20:29 +04:00
Sebastien Melki
72e3e3ee3b fix(auth): remove isProUser() gate so all visitors see Sign In button (#3115)
The setupAuthWidget() call was gated behind isProUser(), which created
a deadlock: new users without legacy API keys could never see the
Sign In button and thus could never authenticate. Removes the guard
in both the main app and the pro landing page pricing section.

Closes #034

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 19:08:29 +04:00
Elie Habib
dcf73385ca fix(scoring): rebalance formula weights severity 55%, corroboration 15% (#3144)
* fix(scoring): rebalance formula weights severity 55%, corroboration 15%

PR A of the scoring recalibration plan (docs/plans/2026-04-17-002).

The v2 shadow-log recalibration (690 items, Pearson 0.413) showed the
formula compresses scores into a narrow 30-70 range, making the 85
critical gate unreachable and the 65 high gate marginal. Root cause:
corroboration at 30% weight penalizes breaking single-source news
(the most important alerts) while severity at 40% doesn't separate
critical from high enough.

Weight change:
  BEFORE: severity 0.40 + sourceTier 0.20 + corroboration 0.30 + recency 0.10
  AFTER:  severity 0.55 + sourceTier 0.20 + corroboration 0.15 + recency 0.10

Expected effect: critical/tier1/fresh rises from 76 to 88 (clears 85
gate). critical/tier2/fresh rises from 71 to 83 (recommend lowering
critical gate to 80 at activation time). high/tier2/fresh rises from
61 to 69 (clears 65 gate). The HIGH-CRITICAL gap widens from 10 to
14 points for same-tier items.

Also:
- Bumps shadow-log key from v2 to v3 for a clean recalibration dataset
  (v2 had old-weight scores that would contaminate the 48h soak)
- Updates proto/news_item.proto formula comment to reflect new weights
- Updates cache-keys.ts documentation

No cache migration needed: the classify cache stores {level, category},
not scores. Scores are computed at read time from the stored level +
the formula, so new digest requests immediately produce new scores.

Gates remain OFF. After 48h of v3 data, re-run:
  node scripts/shadow-score-report.mjs
  node scripts/shadow-score-rank.mjs sample 25

🤖 Generated with Claude Opus 4.6 via Claude Code + Compound Engineering v2.49.0

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: regenerate proto OpenAPI docs for weight rebalance

* fix(scoring): bump SHADOW_SCORE_LOG_KEY export to v3

The exported constant in cache-keys.ts was left at v2 while the relay's
local constant was bumped to v3. Anyone importing the export (or grep-
discovering it) would get a stale key. Architecture review flagged this.

* fix(scoring): update test + stale comments for shadow-log v3

Review found the regression test still asserted v2 key, causing CI
failure. Also fixed stale v1/v2 references in report script header,
default-key comment, report title render, and shouldNotify docstring.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-17 17:43:39 +04:00
Elie Habib
20864f9c8a feat(settings): promote Notifications into its own tab (#3145)
* feat(settings): promote Notifications from settings accordion to its own tab

Notifications was buried at the bottom of the Settings accordion list, one
more click away than Panels/Sources. Since the feature is Pro-gated and
channel-heavy (Telegram pairing, Slack/Discord OAuth, webhook URL entry,
quiet hours, digest scheduling), a dedicated tab gives it the surface it
needs and makes the upsell visible to free users.

Extracts the notifications HTML and attach() logic into
src/services/notifications-settings.ts. Removes the inline block and now
unused imports (notification-channels, clerk, entitlements.hasTier,
variant, uqr, QUIET_HOURS/DIGEST_CRON rollout flags) from
preferences-content.ts. Adds a 'notifications' tab between Sources and
API Keys in UnifiedSettings (web only; desktop app keeps the old layout).

Rollout-flag tests (digest/quiet-hours) now read from the new module.

* perf(settings): lazy-attach Notifications tab to avoid eager fetch

Previously render() called notifs.attach() unconditionally, which fired
getChannelsData() on every modal open for Pro users even when they never
visited the Notifications tab. Mirrors the loadApiKeys() pattern: store
the render result in pendingNotifs and only call attach() when the tab
is first activated (or is the initial active tab). Cleanup on close,
destroy, and re-render remains unchanged.

Addresses greptile P2 on PR #3145.
2026-04-17 17:43:21 +04:00
Elie Habib
1cf249c2f8 fix(security): strip importanceScore from /api/notify payload + scope fan-out by userId (#3143)
* fix(security): strip importanceScore from /api/notify payload + scope fan-out by userId

Closes todo #196 (activation blocker for IMPORTANCE_SCORE_LIVE=1).

Before this fix, any authenticated Pro user could POST to /api/notify with
`payload.importanceScore: 100` and `severity: 'critical'`, bypassing the
relay's IMPORTANCE_SCORE_MIN gate and fan-out would hit every Pro user with
matching rules (no userId filter). This was a pre-existing vulnerability
surfaced during the scoring pipeline work in PR #3069.

Two changes:

1. api/notify.ts — strip `importanceScore` and `corroborationCount` from
   the user-submitted payload before publishing to wm:events:queue. These
   fields are relay-internal (computed by ais-relay's scoring pipeline).
   Also validates `severity` against the known allowlist (critical, high,
   medium, low, info) instead of accepting any string.

2. scripts/notification-relay.cjs — scope rule matching: if the event
   carries `event.userId` (browser-submitted via /api/notify), only match
   rules where `rule.userId === event.userId`. Relay-emitted events (from
   ais-relay, regional-snapshot) have no userId and continue to fan out to
   all matching Pro users. This prevents a single user from broadcasting
   crafted events to every other Pro subscriber's notification channels.

Net effect: browser-submitted events can only reach the submitting user's
own Telegram/Slack/Email/webhook channels, and cannot carry an injected
importanceScore.

🤖 Generated with Claude Opus 4.6 via Claude Code

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(security): reject internal relay control events from /api/notify

Review found that `flush_quiet_held` and `channel_welcome` are internal
relay control events (dispatched by Railway cron scripts) that the public
/api/notify endpoint accepted because only eventType length was checked.
A Pro user could POST `{"eventType":"flush_quiet_held","payload":{},
"variant":"full"}` to force-drain their held quiet-hours queue on demand,
bypassing batch_on_wake behavior.

Now returns 403 for reserved event types. The denylist approach (vs
allowlist) is deliberate: new user-facing event types shouldn't require
an API change to work, while new internal events must explicitly be
added to the deny set if they carry privileged semantics.

* fix(security): exempt browser events from score gate + hoist Sets to module scope

Two review findings from Greptile on PR #3143:

P1: Once IMPORTANCE_SCORE_LIVE=1 activates, browser-submitted rss_alert
events (which had importanceScore stripped by the first commit) would
evaluate to score=0 at the relay's top-level gate and be silently
dropped before rule matching. Fix: add `&& !event.userId` to the gate
condition — browser events carry userId and have no server-computed
score, so the gate must not apply to them. Relay-emitted events (no
userId, server-computed score) are still gated as before.

P2: VALID_SEVERITIES and INTERNAL_EVENT_TYPES Sets were allocated inside
the handler on every request. Hoisted to module scope.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-17 11:14:25 +04:00
Elie Habib
94571b5df9 feat(positioning): standalone 24/7 Positioning panel with directional gauges (#3141)
* feat(positioning): standalone 24/7 Positioning panel with directional gauges

Extract Hyperliquid perp positioning from the "Perp Flow" tab inside
CommoditiesPanel into a dedicated "24/7 Positioning" panel (#3140).

New panel features:
- SVG arc gauge (0-100) per asset with color encoding:
  green = bullish/longs crowded, red = bearish/shorts crowded
- Grouped by asset class: Commodities, Crypto, FX
- Visual elevation (glow + border) for scores above 40
- Click-through to relevant panels (crypto, commodities)
- Hover tooltips with score, funding rate, OI delta
- Warmup banner, stale badges, unavailable state
- Compact secondary metrics (funding rate + OI delta 1h)
- Mobile responsive (2-column grid below 600px)

Registration: panel-layout.ts, App.ts (primeTask + 5min refresh),
commands.ts (CMD+K), panels.ts (all variants), index.ts export,
en.json i18n keys.

Removals from CommoditiesPanel:
- "Perp Flow" tab and its tab type
- fetchHyperliquidFlow() method
- _renderFlow() and _renderFlowGrid() methods
- Associated state (_flow, _flowLoading)
- App.ts primeTask and refreshScheduler entries

No backend changes. Reuses existing getHyperliquidFlow RPC and
market:hyperliquid-flow:v1 Redis key.

* fix(positioning): correct oiDelta lookback, click targets, warmup state (review)

P1 #1: oiDelta1h used 2-point lookback (5min delta mislabeled as 1h).
  Restored to 13-point lookback matching the original MarketPanel.ts
  implementation (12 samples back = 1h at 5min cadence).

P1 #2: CLICK_TARGETS used display-style names (GOLD, WTI, BRENT) but
  seeded symbols are xyz:-prefixed Hyperliquid names (xyz:GOLD, xyz:CL,
  xyz:BRENTOIL). Fixed to match actual seeder output from
  seed-hyperliquid-flow.mjs.

P2: Empty/unavailable snapshots (cold seed, fresh deploy) now show
  warmup guidance instead of a hard error. The prior Perp Flow tab
  intentionally treated this as normal bootstrap. The new panel
  regressed to showError which is confusing on first deploy.

* fix(positioning): use data-panel selector, guard missing target panels (review)

P1: Click-through used getElementById('panel-X') but panels mount with
data-panel="X" attribute. Changed to querySelector('[data-panel="X"]').

P2: Crypto cards (BTC/ETH/SOL) target the 'crypto' panel which doesn't
exist in the commodity variant. resolveClickTarget() now checks if the
target panel is actually in the DOM before adding clickable styling and
data-pos-navigate. Cards whose target panel is absent render as
non-clickable (no cursor pointer, no navigate attribute).
2026-04-17 08:48:00 +04:00
Elie Habib
d9194a5179 fix(railway): tolerate Ubuntu apt mirror failures in NIXPACKS + Dockerfile builds (#3142)
Ubuntu's noble-security package-index CDN is returning hash-sum
mismatches (2026-04-17), causing ALL Railway NIXPACKS builds to fail
at the 'apt-get update && apt-get install curl' layer with exit
code 100. Multiple Railway services are red.

NIXPACKS' aptPkgs = ['curl'] generates a strict
'apt-get update && apt-get install -y' that fails hard on any
mirror error. Fix: replace aptPkgs with manual cmds that:
  1. Allow apt-get update to partially fail (|| true)
  2. Use --fix-missing on apt-get install so packages from healthy
     mirrors still install even if one mirror is broken

Same treatment for consumer-prices-core/Dockerfile.

Files changed:
- nixpacks.toml (root — used by ais-relay + standalone cron seeders)
- scripts/nixpacks.toml (used by bundled seed services)
- consumer-prices-core/Dockerfile

The || true on apt-get update is safe because:
  1. curl is the only package we install and it's often already present
     in the NIXPACKS base image (nix-env provides it)
  2. If curl genuinely isn't available, the seeder will fail at runtime
     with a clear 'curl: not found' error — not a silent degradation
2026-04-17 08:35:20 +04:00
Elie Habib
aeef68dd56 fix(relay): envelope-aware Redis reads restore chokepoint transit chart (#3139)
* fix(relay): envelope-aware Redis reads for PortWatch/CorridorRisk/stocks-bootstrap

PR #3097 migrated 91 seed producers to the contract-mode envelope shape
{_seed, data}, but ais-relay's private upstashGet() reads raw JSON and
does not unwrap the envelope. Three callsites in the relay now see the
wrapper metadata as the payload and silently corrupt downstream:

- seedTransitSummaries() iterates {_seed, data} as chokepoint IDs and
  writes supply_chain:transit-summaries:v1 with keys "_seed" and "data",
  both mapping to empty-history summary objects. Every chokepoint's
  transitSummary.history is therefore empty, which gates off the
  time-series chart in SupplyChainPanel and MapPopup waterway popup.
- loadWsbTickerSet() reads market:stocks-bootstrap:v1 and sees
  data.quotes === undefined, silently disabling WSB ticker matching.

Fix: add envelopeRead(key) next to envelopeWrite — mirrors the
server/_shared/redis.ts::getCachedJson semantics (envelope-aware by
default; legacy raw shapes pass through unchanged). Swap the three
upstashGet calls that target contract-mode canonical keys.

After the relay re-seeds at its 10-min cadence, transit-summaries:v1
will contain a proper {hormuz_strait, suez, ...} map and the chart
comes back in the panel and map popup.

Unit tests cover contract-mode unwrap, legacy passthrough, null, and
the array-with-numeric-indices false-positive edge case. Existing
static assertions updated to guard against regression to raw
upstashGet on these keys.

* fix(backtest): envelope-aware reads in resilience backtest scripts

backtest-resilience-outcomes.mjs reads economic:fx:yoy:v1, infra:outages:v1,
and conflict:ucdp-events:v1 — all migrated to the {_seed, data} envelope by
PR #3097. The private redisGetJson helper did not unwrap, so AUC computation
was silently running against the envelope wrapper instead of the country
event maps (same class of bug as PR #3139's relay fix, offline blast radius).

validate-resilience-backtest.mjs uses the same read pattern across multiple
family keys and is fixed for the same reason.

Both scripts now import unwrapEnvelope from _seed-envelope-source.mjs (the
canonical ESM source of truth used by seed-chokepoint-flows.mjs,
seed-energy-spine.mjs, seed-forecasts.mjs, and others). Legacy raw shapes
still pass through unchanged.

* fix(relay): envelopeRead for OREF_REDIS_KEY bootstrap (Greptile P1)

orefPersistHistory() writes OREF_REDIS_KEY via envelopeWrite at line 1133,
but the bootstrap reader at line 1214 was still using raw upstashGet.
cached.history was therefore undefined and Array.isArray() always false,
so OREF alert history was never restored from Redis after a relay restart
— every cold start hit the upstream API unnecessarily.

Also adds the two regression guards Greptile flagged as missing:
- loadWsbTickerSet() reading market:stocks-bootstrap:v1 via envelopeRead
- orefBootstrapFromRedis reading OREF_REDIS_KEY via envelopeRead

Same class of bug as the three callsites fixed earlier in this PR.
2026-04-17 08:12:07 +04:00
Elie Habib
bd559fec88 fix(liquidity-shifts): wire missing primeTask so fetchData() is called (#3138)
LiquidityShiftsPanel was created in panel-layout.ts but never had a
primeTask('liquidity-shifts', ...) call in App.ts, so fetchData() was
never invoked. The panel rendered its initial showLoading() state and
stayed there permanently, showing "Loading..." with the radar spinner.

Every other self-fetching panel (cot-positioning, gold-intelligence,
fear-greed, etf-flows, etc.) has its primeTask call. This one was
missed when the panel was added.
2026-04-17 07:36:09 +04:00
Sebastien Melki
a4d9b0a5fa feat(auth): user-facing API key management (create / list / revoke) (#3125)
* feat(auth): user-facing API key management (create / list / revoke)

Adds full-stack API key management so authenticated users can create,
list, and revoke their own API keys from the Settings UI.

Backend:
- Convex `userApiKeys` table with SHA-256 key hash storage
- Mutations: createApiKey, listApiKeys, revokeApiKey
- Internal query validateKeyByHash + touchKeyLastUsed for gateway
- HTTP endpoints: /api/api-keys (CRUD) + /api/internal-validate-api-key
- Gateway middleware validates user-owned keys via Convex + Redis cache

Frontend:
- New "API Keys" tab in UnifiedSettings (visible when signed in)
- Create form with copy-on-creation banner (key shown once)
- List with prefix display, timestamps, and revoke action
- Client-side key generation + hashing (plaintext never sent to DB)

Closes #3116

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(api-keys): address PR review — cache invalidation, prefix validation, revoked-key guard

- Invalidate Redis cache on key revocation so gateway rejects revoked keys
  immediately instead of waiting for 5-min TTL expiry (P1)
- Enforce `wm_` prefix format with regex instead of loose length check (P2)
- Skip `touchKeyLastUsed` for revoked keys to preserve clean audit trail (P2)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(api-keys): address consolidated PR review (P0–P3)

P0: gate createApiKey on pro entitlement (tier >= 1); isCallerPremium
now verifies key-owner tier instead of treating existence as premium.

P1: wire wm_ user keys into the domain gateway auth path with async
Convex-backed validation; user keys go through entitlement checks
(only admin keys bypass). Lower cache TTL 300s → 60s and await
revocation cache-bust instead of fire-and-forget.

P2: remove dead HTTP create/list/revoke path from convex/http.ts;
switch to cachedFetchJson (stampede protection, env-prefixed keys,
standard NEG_SENTINEL); add tenancy check on cache-invalidation
endpoint via new /api/internal-get-key-owner route; add 22 Convex
tests covering tier gate, per-user limit, duplicate hash, ownership
revoke guard, getKeyOwner, and touchKeyLastUsed debounce.

P3: tighten keyPrefix regex to exactly 5 hex chars; debounce
touchKeyLastUsed (5 min); surface PRO_REQUIRED in UI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(api-keys): gate on apiAccess (not tier), wire wm_ keys through edge routes, harden error paths

- Gate API key creation/validation on features.apiAccess instead of tier >= 1.
  Pro (tier 1, apiAccess=false) can no longer mint keys — only API_STARTER+.
- Wire wm_ user keys through standalone edge routes (shipping/route-intelligence,
  shipping/webhooks) that were short-circuiting on validateApiKey before async
  Convex validation could run.
- Restore fail-soft behavior in validateUserApiKey: transient Convex/network
  errors degrade to unauthorized instead of bubbling a 500.
- Fail-closed on cache invalidation endpoint: ownership check errors now return
  503 instead of silently proceeding (surfaces Convex outages in logs).
- Tests updated: positive paths use api_starter (apiAccess=true), new test locks
  Pro-without-API-access rejection. 23 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(webhooks): remove wm_ user key fallback from shipping webhooks

Webhook ownership is keyed to SHA-256(apiKey) via callerFingerprint(),
not to the user. With user-owned keys (up to 5 per user), this causes
cross-key blindness (webhooks invisible when calling with a different
key) and revoke-orphaning (revoking the creating key makes the webhook
permanently unmanageable). User keys remain supported on the read-only
route-intelligence endpoint. Webhook ownership migration to userId will
follow in a separate PR.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Elie Habib <elie.habib@gmail.com>
2026-04-17 07:20:39 +04:00
Elie Habib
935417e390 chore(relay): socialVelocity + wsbTickers to hourly fetch (6x Reddit traffic reduction) (#3135)
* chore(relay): socialVelocity + wsbTickers to hourly fetch (was 10min)

Reduce Reddit rate-limiting blast radius. Both seeders fetch 5 subreddits
combined (2 for SV: worldnews, geopolitics; 3 for WSB: wallstreetbets,
stocks, investing) with no proxy or OAuth. Reddit's behavioral heuristic
for datacenter IPs consistently flags the Railway IP after ~50min of
10-min polling and returns HTTP 403 on every subsequent cycle until the
container restarts with a new IP.

Evidence (2026-04-16 ais-relay log):
  13:32-14:22 UTC: 6 successful 10-min cycles for both seeders
  16:06-16:16 UTC: 2 more successful cycles after a restart
  16:26 UTC:       BOTH subs flip to HTTP 403 simultaneously
  16:36, 16:46, 16:56: every cycle, all 5 subreddits return 403

Dropping success-path frequency from 6/hour to 1/hour cuts the traffic
Reddit's heuristic sees by 6x. On failure path the 20-min retry is kept
as-is — during a block we've already been flagged, so extra retries don't
make it worse.

Changes:
- SOCIAL_VELOCITY_INTERVAL_MS:   10min → 60min
- SOCIAL_VELOCITY_TTL:           30min → 3h   (3× new interval)
- WSB_TICKERS_INTERVAL_MS:       10min → 60min
- WSB_TICKERS_TTL:               30min → 3h   (3× new interval)
- api/health.js maxStaleMin:     30min → 180min for both (3× interval)
- api/seed-health.js intervalMin: 15 → 90 for wsb-tickers (maxStaleMin / 2)

Proper fix (proxy fallback or Reddit OAuth) deferred.

* fix(seed-health): add socialVelocity parity entry — greptile P2

Review finding on PR #3135: wsbTickers was bumped from intervalMin=15 to 90
but socialVelocity had no seed-health.js entry at all. Both Reddit seeders
now share the same 60-min cadence; adding the missing entry gives parity.

P2-1 (malformed comment lines 5682-5683) is a false positive — verified
the lines do start with '//' in the file.
2026-04-16 22:17:58 +04:00
Elie Habib
0075af5a47 fix(sector-valuations): proxy Yahoo quoteSummary via Decodo curl egress (#3134)
* fix(sector-valuations): proxy Yahoo quoteSummary via Decodo curl egress

Yahoo's /v10/finance/quoteSummary returns HTTP 401 from Railway container
IPs. Railway logs 2026-04-16 show all 12 sector ETFs failing every 5-min
cron:

  [Sector] Yahoo quoteSummary XLK HTTP 401  (x12 per tick)
  [Market] Seeded 12/12 sectors, 0 valuations

Add a curl-based proxy fallback that matches scripts/_yahoo-fetch.mjs:
hit us.decodo.com (curl egress pool) NOT gate.decodo.com (CONNECT egress
pool). Per the 2026-04-16 probe documented in _yahoo-fetch.mjs header,
Yahoo blocks Decodo's CONNECT egress IPs but accepts the curl egress.
Reusing ytFetchViaProxy here would keep failing silently.

Shares the existing _yahooProxyFailCount / _yahooProxyCooldownUntil
cooldown state with fetchYahooChartDirect so both Yahoo paths pause
together if Decodo's curl pool also gets blocked.

No change to direct-path behavior when Yahoo is healthy.

* fix(sector-valuations): don't proxy on empty quoteSummary result (review)

Direct 200 with data.quoteSummary.result[0] absent is an app-level "no
data for this symbol" signal (e.g. delisted ETF). Proxy won't return
different data for a symbol Yahoo itself doesn't carry — falling back
would burn the 5-failure cooldown budget on structurally empty symbols
and mask a genuine proxy outage.

Resolve null on !result; keep JSON.parse catch going to proxy (garbage
body IS a transport-level signal — captive portal, Cloudflare challenge).

Review feedback from PR #3134.

* fix(sector-valuations): split cooldown per egress route, cover transport failures (review)

Review feedback on PR #3134, both P1.

P1 #1 — transport failures bypassed cooldown
  execFileSync timeouts, proxy-connect refusals, and JSON.parse on garbage
  bodies all went through the catch block and returned null without
  ticking _yahooProxyFailCount. In the exact failure mode this PR hardens
  against, the relay would have thrashed through 12 × 20s curl attempts
  per tick with no backoff. Extract a bumpCooldown() helper and call it
  from both the non-2xx branch and the catch block.

P1 #2 — two Decodo egress pools shared one cooldown budget
  fetchYahooChartDirect uses CONNECT via gate.decodo.com.
  _yahooQuoteSummaryProxyFallback uses curl via us.decodo.com.
  These are independent egress IP pools — per the 2026-04-16 probe,
  Yahoo blocks CONNECT but accepts curl. Sharing cooldown means 5
  CONNECT failures suppress the healthy curl path (and vice versa).
  Split into _yahooConnectProxy* (chart) and _yahooCurlProxy* (sector
  valuations).

Also: on proxy 200 with empty result, reset the curl counter. The route
is healthy even if this specific symbol has no data — don't pretend it's
a failure.

* fix(sector-valuations): non-blocking curl + settle guard (review round 3)

Review feedback on PR #3134, both P1.

P1 #1 - double proxy invocation on timeout/error race
  req.destroy() inside the timeout handler can still emit 'error', and
  both handlers eagerly called resolve(_yahooQuoteSummaryProxyFallback(...)).
  A single upstream timeout therefore launched two curl subprocesses,
  double-ticked the cooldown counter, and blocked twice. Add a settled
  flag; settle() exits early on the second handler before evaluating the
  fallback.

P1 #2 - execFileSync blocks the relay event loop
  The relay serves HTTP/WS traffic on the same thread that awaits
  seedSectorSummary's per-symbol Yahoo fetch. execFileSync for up to 20s
  per failure x 5 failures before cooldown = ~100s of frozen event loop.
  Switch to promisify(execFile). resolve(promise) chains the Promise
  through fetchYahooQuoteSummary's outer Promise, so the main-loop await
  yields while curl runs. Other traffic continues during the fetch.

tests/sector-valuations.test.mjs: bump the static-analysis window from
1500 to 2000 chars so the field-extraction markers (ytdReturn etc.)
stay inside the window after the settle guard was added.
2026-04-16 20:02:31 +04:00
Elie Habib
7d27cec21c feat(relay): seeder-loop heartbeats for chokepoint-flows + climate-news (#3133)
* feat(relay): seeder-loop heartbeats for chokepoint-flows + climate-news

Detect silent relay-loop failures (ERR_MODULE_NOT_FOUND at import, event-loop
blocked, container restart loop) up to 4 hours earlier than the data-level
seed-meta staleness window.

The chokepoint-flows bug that motivated this PR was invisible in health for
32 hours because each 6h cron tick fired, execFile'd the child, child died
at import, and NO ONE updated seed-meta:energy:chokepoint-flows. Since the
last successful write was still within its 3-day TTL, the data key was
present and the old seed-meta was still there — STALE_SEED triggered only
at +12h, and even then was a warn (not crit) that could easily be missed.

Fix:
- In scripts/ais-relay.cjs, write a success-only heartbeat via upstashSet
  after each execFile-spawned seeder exits cleanly. TTL = 3x the loop
  interval (18h for chokepoint-flows, 90min for climate-news) so a single
  missed cycle doesn't flap but two consecutive misses alarm.
  Payload shape matches seed-meta for drop-in compatibility with the
  existing health-check reader: { fetchedAt, recordCount, durMs }.

- In api/health.js, register two new STANDALONE_KEYS entries pointing at
  the heartbeat keys, plus SEED_META entries with tighter maxStaleMin:
    chokepointFlowsRelayHeartbeat: 480min (8h vs 720min existing)
    climateNewsRelayHeartbeat:     60min  (vs 90min existing)
  When the relay loop fails for >2 intervals, the heartbeat goes stale
  first and surfaces as STALE_SEED in /api/health, giving 4h more notice
  than waiting for seed-meta:energy:chokepoint-flows.

This is orthogonal to PR #3132 (fixes the actual ERR_MODULE_NOT_FOUND root
cause). Heartbeat is defensive observability for the NEXT failure mode we
can't predict.

* fix(health): gate new relay heartbeat keys as ON_DEMAND during deploy window — greptile review

Review finding on PR #3133: new heartbeat keys (relay:heartbeat:chokepoint-flows,
relay:heartbeat:climate-news) are written by ais-relay.cjs AFTER the first
successful post-deploy loop. Vercel deploys api/health.js instantly, so the
window between 'merge' and 'first heartbeat written' is:
  - chokepoint-flows: up to 6h (initial loop tick)
  - climate-news:     up to 30min

During that window the heartbeat keys don't exist in Redis. classifyKey()
would return EMPTY (crit), which counts toward critCount and can flip overall
/api/health to DEGRADED even though climateNews and chokepointFlows data
themselves are fine.

Matches existing rule in project memory
(feedback_health_required_key_needs_railway_cron_first.md) — new seeder +
health.js registration in same PR needs ON_DEMAND gating until the Railway
side catches up, then harden after ~7 days.

Fix: add both keys to ON_DEMAND_KEYS with TRANSITIONAL comments, matching
the fxYoy / hyperliquidFlow pattern already used for the same issue.
2026-04-16 18:21:51 +04:00
Elie Habib
7381a90a44 fix(sentry): guard ConvexClient on Firefox 149/Linux + filter Quark noise (#3130)
* fix(sentry): guard ConvexClient construct on Firefox 149/Linux + filter Quark browser noise

ConvexClient import throws `TypeError: t is not a constructor` on Firefox 149/Linux
(WORLDMONITOR-N0 + MX, 5 events / 2 users). Breadcrumbs proved both entitlements
and billing subscription watchers were failing on `new CC(convexUrl)`. Wraps the
constructor in try/catch so getConvexClient returns null — callers already have
the null path wired for the no-VITE_CONVEX_URL case, so subscription features
silently no-op instead of error-bubbling into Sentry via billing.ts:71.

Also filters Quark browser (Alibaba mobile) touch-tracking injection that sets
`bodyTouched` on undefined — property name has 0 matches in repo (WORLDMONITOR-N1).

* fix(convex-client): reset authReadyPromise on constructor failure

Addresses greptile-apps review on PR #3130: on the catch path, authReadyPromise
had just been set to a never-resolving Promise at function entry. Without this
reset, any future waitForConvexAuth() caller that doesn't pre-check the client
is non-null would silently block for the full 10s timeout.
2026-04-16 17:28:31 +04:00
Elie Habib
c31662c3c9 fix(relay): COPY missing _seed-envelope-source + _seed-contract — chokepointFlows stale 32h (#3132)
* fix(relay): COPY _seed-envelope-source + _seed-contract into Dockerfile.relay

Root cause of chokepointFlows STALE_SEED (1911min stale, maxStaleMin=720):
since 2026-04-14 (PR #3097/#3101 landing), scripts/_seed-utils.mjs imports
_seed-envelope-source.mjs and _seed-contract.mjs. Dockerfile.relay COPY'd
_seed-utils.mjs but NOT its new transitive dependencies, so every execFile
invocation of seed-chokepoint-flows.mjs, seed-climate-news.mjs, and
seed-ember-electricity.mjs crashed at import with ERR_MODULE_NOT_FOUND.
The ais-relay loop kept firing every 6h but each child died instantly —
no visible error because execFile only surfaces child stderr to the
parent relay's log stream.

Local repro: node scripts/seed-chokepoint-flows.mjs runs fine in 3.6s
and writes 7 records. Same command inside the relay container would
throw at the import line because the file doesn't exist.

Fix:
1. Add COPY scripts/_seed-envelope-source.mjs and
   COPY scripts/_seed-contract.mjs to Dockerfile.relay.
2. Add a static guard test (tests/dockerfile-relay-imports.test.mjs)
   that BFS's the transitive-import graph from every COPY'd entrypoint
   and fails if any reached scripts/*.mjs|cjs isn't also COPY'd. This
   would have caught the original regression.

Matches feedback_dockerfile_relay_explicit_copy.md — we now have a test
enforcing it.

* fix(test): scanner also covers require() and createRequire(...)(...) — greptile P2

Review finding on PR #3132: collectRelativeImports only matched ESM
import/export syntax, so require('./x.cjs') in ais-relay.cjs and
createRequire(import.meta.url)('./x.cjs') in _seed-utils.mjs were
invisible to the guard. No active bug (_proxy-utils.cjs is already
COPY'd) but a future createRequire pointing at a new uncopied helper
would slip through.

Two regexes now cover both forms:
- cjsRe: direct require('./x') — with a non-identifier lookbehind so
  'thisrequire(' or 'foorequire(' can't match.
- createRequireRe: createRequire(...)('./x') chained-call — the outer
  call is applied to createRequire's return value, not to a 'require('
  token, so the first regex misses it on its own.

Added a unit test asserting both forms resolve on known sites
(_seed-utils.mjs and ais-relay.cjs) so the next edit to this file
can't silently drop coverage.
2026-04-16 17:28:16 +04:00
Elie Habib
d1a3fdffed fix(portwatch): unblock port-activity seeder (global EP4 refs, conc 12, progress logs, SIGTERM) (#3128)
* fix(portwatch): global EP4 refs + concurrency 12 + progress logs + SIGTERM cleanup

seed-portwatch-port-activity has been SIGKILL'd at the Railway 10-min
container ceiling on every run since 2026-04-14 (recordCount=174,
seedAgeMin=3096 = 51.6h = 4+ failed cycles), leaving portwatchPortActivity
STALE_SEED and the 30-min lock leaking between runs.

Root cause: ~240 ISO3s x 2 per-country ArcGIS queries at CONCURRENCY=4
with zero per-batch logging — slow enough to miss the 420s timeoutMs
and silent enough that the timeout line was the only log on failure.

Fixes (all 4):
1. fetchAllPortRefs(): one paginated EP4 query (where=1=1), grouped by
   ISO3 locally — collapses ~240 ref calls into ~5 pages.
2. CONCURRENCY 4 -> 12 and only queue activity fetches for iso3s that
   appear in refsByIso3 and in the iso3->iso2 map.
3. Per-page ref logs + per-batch activity logs every 5 batches — next
   failure will show exactly where it stalls.
4. SIGTERM/SIGINT handler releases the lock and extends prev-snapshot
   TTLs before exit so the next cron tick isn't blocked and Redis data
   doesn't evaporate when the bundle-runner kills the child.

* fix(portwatch): advance pagination by actual features.length, not PAGE_SIZE

Review finding on PR #3128: the new fetchAllPortRefs() global pager
assumes the server honors resultRecordCount=2000, but ArcGIS's
PortWatch_ports_database FeatureServer caps responses at 1000 rows.
Incrementing offset by PAGE_SIZE (=2000) silently skipped rows
1000-1999 on EVERY page.

Verified against the live endpoint:
- returnCountOnly: 2065
- offset=0 size=2000: returned 1000 (ETL=true)
- offset=1000 size=2000: returned 1000 (ETL=true)
- offset=2000 size=2000: returned 65 (ETL=false)

The buggy loop therefore loaded 1065 refs instead of 2065 — silently
dropping 34 mapped countries with EP3 activity entirely and leaving
110 more countries with partial ref coverage. Partial coverage fell
back to (0,0) lat/lon via `refMap.get(portId) || { lat: 0, lon: 0 }`.
Not a regression from the old code (per-country EP4 fetches maxed
at ~148 ports and never hit the cap), but a real bug introduced by
the global pager.

Fix: advance `offset` by `features.length` (the actual returned count)
instead of by `PAGE_SIZE`. Applied to both fetchAllPortRefs (EP4) and
fetchActivityRows (EP3) for consistency. Added break-on-empty guard so
a server that returns exceededTransferLimit=true with 0 features can't
infinite-loop.

Regression test asserts `offset += features.length` appears for both
paginators and `offset += PAGE_SIZE` appears nowhere in the file.
2026-04-16 15:51:21 +04:00
Elie Habib
5093d82e45 fix(seed-bundle-resilience): drop Resilience-Scores interval 6h to 2h so refresh runs (#3126)
Live log 2026-04-16 09:25 showed the bundle runner SKIPPING Resilience-Scores
(last seeded 203min ago, interval 360min -> 288min skip threshold). Every
Railway cron fire within the 4.8h skip window bypassed the section entirely,
so refreshRankingAggregate() -- the whole point of the Slice B work merged in
#3124 -- never ran. Ranking could then silently expire in the gap.

Lower intervalMs to 2h. The bundle runner skip threshold becomes 96min;
hourly Railway fires run the section about every 2h. Well within the 12h
ranking TTL, and cheap per warm-path run:

  - computeAndWriteIntervals (~100ms local CPU + one pipeline write)
  - refreshRankingAggregate -> /api/resilience/v1/get-resilience-ranking?refresh=1
    (handler recompute + 2-SET pipeline, ~2-5s)
  - STRLEN + GET-meta verify in parallel (~200ms)

Total ~5-10s per warm-scores run. The expensive 222-country warm still only
runs when scores are actually missing.

Structural test pins intervalMs <= 2 hours so this doesn't silently regress.

Full resilience suite: 378/378.
2026-04-16 13:41:28 +04:00
Elie Habib
da1fa3367b fix(resilience-ranking): chunked warm SET, always-on rebuild, truthful meta (Slice B) (#3124)
* fix(resilience-ranking): chunked warm SET, always-on rebuild, truthful meta

Slice B follow-up to PR #3121. Three coupled production failures observed:

1. Per-country score persistence works (Slice A), but the 222-SET single
   pipeline body (~600KB) exceeds REDIS_PIPELINE_TIMEOUT_MS (5s) on Vercel
   Edge. runRedisPipeline returns []; persistence guard correctly returns
   empty; coverage = 0/222 < 75%; ranking publish silently dropped. Live
   Railway log: "Ranking: 0 ranked, 222 greyed out" → "Rebuilt … with 222
   countries (bulk-call race left ranking:v9 null)" — second call only
   succeeded because Upstash had finally caught up between attempts.

2. The seeder's probe + rebuild block lives inside `if (missing > 0)`. When
   per-country scores survive a cron tick (TTL 6h, cron every 6h), missing=0
   and the rebuild path is skipped. Ranking aggregate then expires alone and
   is never refreshed until scores also expire — multi-hour gaps where
   `resilience:ranking:v9` is gone while seed-meta still claims freshness.

3. `writeRankingSeedMeta` fires whenever finalWarmed > 0, regardless of
   whether the ranking key is actually present. Health endpoint sees fresh
   meta + missing data → EMPTY_ON_DEMAND with a misleading seedAge.

Fixes:
- _shared.ts: split the warm pipeline SET into SET_BATCH=30-command chunks
  so each pipeline body fits well under timeout. Pad missing-batch results
  with empty entries so the per-command alignment stays correct (failed
  batches stay excluded from `warmed`, no proof = no claim).
- seed-resilience-scores.mjs: extract `ensureRankingPresent` helper, call
  it from BOTH the missing>0 and missing===0 branches so the ranking gets
  refreshed every cron. Add a post-rebuild STRLEN verification — rebuild
  HTTP can return 200 with a payload but still skip the SET (coverage gate,
  pipeline failure).
- main(): only writeRankingSeedMeta when result.rankingPresent === true.
  Otherwise log and let the next cron retry.

Tests:
- resilience-ranking.test.mts: assert pipelines stay ≤30 commands.
- resilience-scores-seed.test.mjs: structural checks that the rebuild is
  hoisted (≥2 callsites of ensureRankingPresent), STRLEN verification is
  present, and meta write is gated on rankingPresent.

Full resilience suite: 373/373 pass (was 370 — 3 new tests).

* fix(resilience-ranking): seeder no longer writes seed-meta (handler is sole writer)

Reviewer P1: ensureRankingPresent() returning true only means the ranking
key exists in Redis — not that THIS cron actually wrote it. The handler
skips both the ranking SET and the meta SET when coverage < 75%, so an
older ranking from a prior cron can linger while this cron's data didn't
land. Under that scenario, the previous commit still wrote a fresh
seed-meta:resilience:ranking, recreating the stale-meta-over-stale-data
failure this PR is meant to eliminate.

Fix: remove seeder-side seed-meta writes entirely. The ranking handler
already writes ranking + meta atomically in the same pipeline when (and
only when) coverage passes the gate. ensureRankingPresent() triggers the
handler every cron, which addresses the original rationale for the seeder
heartbeat (meta going stale during quiet Pro usage) without the seeder
needing to lie.

Consequence on failure:
- Coverage gate trips → handler writes neither ranking nor meta.
- seed-meta stays at its previous timestamp; api/health reports accurate
  staleness (STALE_SEED after maxStaleMin, then CRIT) instead of a fresh
  meta over stale/empty data.

Tests updated: the "meta gated on rankingPresent" assertion is replaced
with "seeder must not SET seed-meta:resilience:ranking" + "no
writeRankingSeedMeta". Comments may still reference the key name for
maintainer clarity — the assertion targets actual SET commands.

Full resilience suite: 373/373 pass.

* fix(resilience-ranking): always refresh + 12h TTL (close timing hole)

Reviewer P1+P2:

- P1: ranking TTL == cron interval (both 6h) left a timing hole. If a cron
  wrote the key near the end of its run and the next cron fired near the
  start of its interval, the key was still alive at probe time →
  ensureRankingPresent() returned early → no rebuild → key expired a short
  while later and stayed absent until a cron eventually ran while the key
  was missing. Multi-hour EMPTY_ON_DEMAND gaps.

- P2: probing only the ranking data key (not seed-meta) meant a partial
  handler pipeline (ranking SET ok, meta SET missed) would self-heal only
  when the ranking itself expired — never during its TTL window.

Fix:

1. Bump RESILIENCE_RANKING_CACHE_TTL_SECONDS from 6h to 12h (2x cron
   interval). A single missed or slow cron no longer causes a gap.
   Server-side and seeder-side constants kept in sync.

2. Replace ensureRankingPresent() with refreshRankingAggregate(): drop the
   'if key present, skip' short-circuit. Rebuild every cron, unconditionally.
   One cheap HTTP call keeps ranking + seed-meta rolling forward together
   and self-heals the partial-pipeline case — handler retries the atomic
   pair every 6h regardless of whether the keys are currently live.

3. Update health.js comment to reflect the new TTL and refresh cadence
   (12h data TTL, 6h refresh, 12h staleness threshold = 2 missed ticks).

Tests:
- RESILIENCE_RANKING_CACHE_TTL_SECONDS asserts 12h (was 6h).
- New assertion: refreshRankingAggregate must NOT early-return on probe-
  hit, and the rebuild HTTP call must be unconditional in its body.
- DEL-guard test relaxed to allow comments between '{' and the DEL line
  (structural property preserved).

Full resilience suite: 375/375.

* fix(resilience-ranking): parallelize warm batches + atomic rebuild via ?refresh=1

Reviewer P2s:

- Warm path serialized the 8 batch pipelines with `await` in a for-loop,
  adding ~7 extra Upstash round-trips (100-500ms each on Edge) to the warm
  wall-clock. Batches are independent; Promise.all collapses them into one
  slowest-batch window.

- DEL+rebuild created a brief absence window: if the rebuild request failed
  transiently, the ranking stayed absent until the next cron. Now seeder
  calls `/api/resilience/v1/get-resilience-ranking?refresh=1` and the
  handler bypasses its cache-hit early-return, recomputing and SETting
  atomically. On rebuild failure, the existing (possibly stale-but-present)
  ranking is preserved instead of being nuked.

Handler: read ctx.request.url for the refresh query param; guard the URL
parse with try/catch so an unparseable url falls back to the cached-first
behavior.

Tests:
- New: ?refresh=1 must bypass the cache-hit early-return (fails on old code,
  passes now).
- DEL-guard test replaced with 'does NOT DEL' + 'uses ?refresh=1'.
- Batch chunking still asserted at SET_BATCH=30.

Full resilience suite: 376/376.

* fix(resilience-ranking): bulk-warm call also needs ?refresh=1 (asymmetric TTL hazard)

Reviewer P1: in the 6h-12h window, per-country score keys have expired
(TTL 6h) but the ranking aggregate is still alive (TTL 12h). The seeder's
bulk-warm call was hitting get-resilience-ranking without ?refresh=1, so
the handler's cache-hit early-return fired and the entire warm path was
skipped. Scores stayed missing; coverage degraded; the only recovery was
the per-country laggard loop (5-request batches) — which silently no-ops
when WM_KEY is absent. This defeated the whole point of the chunked bulk
warm introduced in this PR.

Fix: the bulk-warm fetch at scripts/seed-resilience-scores.mjs:167 now
appends ?refresh=1, matching the rebuild call. Every seeder-initiated hit
on the ranking endpoint forces the handler to route through
warmMissingResilienceScores and its chunked pipeline SET, regardless of
whether the aggregate is still cached.

Test extended: structural assertion now scans ALL occurrences of
get-resilience-ranking in the seeder and requires every one of them to
carry ?refresh=1. Fails the moment a future change adds a bare call.

Full resilience suite: 376/376.

* fix(resilience-ranking): gate ?refresh=1 on seed key + detect partial pipeline publish

Reviewer P1: ?refresh=1 was honored for any caller — including valid Pro
bearer tokens. A full warm is ~222 score computations + chunked pipeline
SETs; a Pro user looping on refresh=1 (or an automated client) could DoS
Upstash quota and Edge budget. Gate refresh behind
WORLDMONITOR_VALID_KEYS / WORLDMONITOR_API_KEY (X-WorldMonitor-Key
header) — the same allowlist the cron uses. Pro bearer tokens get the
standard cache-first path; refresh requires the seed service key.

Reviewer P2: the handler's atomic runRedisPipeline SET of ranking + meta
is non-transactional on Upstash REST — either SET can fail independently.
If the ranking landed but meta missed, the seeder's STRLEN verify would
pass (ranking present) while /api/health stays stuck on stale meta.

Two-part fix:
- Handler inspects pipelineResult[0] and [1] and logs a warning when
  either SET didn't return OK. Ops-greppable signal.
- Seeder's verify now checks BOTH keys in parallel: STRLEN on ranking
  data, and GET + fetchedAt freshness (<5min) on seed-meta. Partial
  publish logs a warning; next cron retries (SET is idempotent).

Tests:
- New: ?refresh=1 without/with-wrong X-WorldMonitor-Key must NOT trigger
  recompute (falls back to cached response). Existing bypass test updated
  to carry a valid seed key header.

Full resilience suite: 376/376 + 1 new = 377/377.
2026-04-16 12:48:41 +04:00
Elie Habib
3c1caa75e6 feat(gdelt): _gdelt-fetch helper with curl-multi-retry proxy + seed-gdelt-intel migration (#3122)
* feat(_gdelt-fetch): curl proxy multi-retry helper for per-IP-throttled API

GDELT (api.gdeltproject.org) is a public free API with strict per-IP
throttling. seed-gdelt-intel currently has no proxy fallback — Railway
egress IPs hit 429 storms and the seeder degrades.

Probed 2026-04-16: Decodo curl egress against GDELT gives ~40% success
per attempt (session-rotates IPs per call). Helper retries up to 5×;
expected overall success ~92% (1 - 0.6^5).

PROXY STRATEGY — CURL ONLY WITH MULTI-RETRY

Differs from _yahoo-fetch.mjs (single proxy attempt) and
_open-meteo-archive.mjs (CONNECT + curl cascade):
- Curl-only: CONNECT not yet probed cleanly against GDELT.
- Multi-retry on the curl leg: the proxy IS the rotation mechanism
  (each call → different egress IP), so successive attempts probe
  different IPs in the throttle pool.
- Distinguishes retryable (HTTP 429/503 from upstream) from
  non-retryable (parse failure, auth, network) — bails immediately on
  non-retryable to avoid 5× of wasted log noise.

Direct loop uses LONGER backoff than Yahoo's 5s base (10s) — GDELT's
throttle window is wider than Yahoo's, so quick retries usually re-hit
the same throttle.

Tests (tests/gdelt-fetch.test.mjs, 13 cases — every learning from
PR #3118 + #3119 + #3120 baked in):

- Production defaults: curl resolver/fetcher reference equality
- Production defaults: NO CONNECT leg (regression guard for unverified path)
- 200 OK passthrough
- 429 with no proxy → throws with HTTP 429 in message
- Retry-After parsed (DI _sleep capture asserts 7000ms not retryBaseMs)
- Retry-After absent → linear backoff retryBaseMs (paired branch test)
- **Proxy multi-retry: 4× HTTP 429 then 5th succeeds → returns data**
  (asserts 5 proxy calls + 4 inter-proxy backoffs of proxyRetryBaseMs)
- **Proxy non-retryable (parse failure) bails after 1 attempt**
  (does NOT burn all proxyMaxAttempts on a structural failure)
- **Proxy retryable + non-retryable mix: retries on 429, bails on parse**
- Thrown fetch error on final retry → proxy multi-retry runs (P1 guard)
- All proxy attempts fail → throws with 'X/N attempts' in message + cause
- Malformed JSON does NOT emit succeeded log before throw (P2 guard)
- parseRetryAfterMs unit

Verification:
- tests/gdelt-fetch.test.mjs → 13/13 pass
- node --check scripts/_gdelt-fetch.mjs → clean

Phase 1 of 2. Seeder migration follows.

* feat(seed-gdelt-intel): migrate to _gdelt-fetch helper

Replaces direct fetch + ad-hoc retry in seed-gdelt-intel with the new
fetchGdeltJson helper. Each topic call now gets:
  3 direct retries (10/20/40s backoff) → 5 curl proxy attempts via
  Decodo session-rotating egress.

Specific changes:
- import fetchGdeltJson from _gdelt-fetch.mjs
- fetchTopicArticles: replace fetch+retry+throw block with single
  await fetchGdeltJson(url, { label: topic.id })
- fetchTopicTimeline: same — best-effort try/catch returns [] on any
  failure (preserved). Helper still attempts proxy fallback before
  throwing, so a 429-throttled IP doesn't kill the timeline.
- fetchWithRetry: collapsed from outer 3-retry loop with 60/120/240s
  backoff (which would have multiplied to 24 attempts/topic on top of
  helper's 8) to a thin wrapper that translates exhaustion into the
  {exhausted, articles:[]} shape the caller uses to drive
  POST_EXHAUST_DELAY_MS cooldown.
- Drop CHROME_UA import (no longer used directly; helper handles it).

Helper's exhausted-throw includes 'HTTP 429' substring when 429 was
the upstream signal, so the existing is429 detection in
fetchWithRetry continues to work without modification.

Verification:
- node --check scripts/seed-gdelt-intel.mjs → clean
- npm run typecheck:all → clean
- npm run test:data → 5382/5382 (was 5363, +13 from helper + 6 from
  prior PR work)

Phase 2 of 2.

* fix(_gdelt-fetch): proxy timeouts/network errors RETRY (rotates Decodo session)

P1 from PR #3122 review: probed Decodo curl egress against GDELT
(2026-04-16) gave 200/200/429/TIMEOUT/429 — TIMEOUT is part of the
normal transient mix that the multi-retry design exists to absorb.
Pre-fix logic only retried on substring 'HTTP 429'/'HTTP 503' matches,
so a curl exec timeout (Node Error with no .status, not a SyntaxError)
bailed on the first attempt. The PR's headline 'expected ~92% success
with 5 attempts' was therefore not actually achievable for one of the
exact failure modes that motivated the design.

Reframed the proxy retryability decision around what we CAN reliably
discriminate from the curl error shape:

  curlErr.status == number    → retry only if 429/503
                                (curlFetch attaches .status only when
                                 curl returned a clean HTTP status)
  curlErr instanceof SyntaxError → bail (parse failure is structural)
  otherwise                   → RETRY (timeout, ECONNRESET, DNS, curl
                                 exec failure, CONNECT tunnel failure
                                 — all transient; rotating Decodo
                                 session usually clears them)

P2 from same review: tests covered HTTP-status proxy retries + parse
failures but never the timeout/thrown-error class. Added 3 tests:

- proxy timeout (no .status) RETRIES → asserts proxyCalls=2 after a
  first-attempt ETIMEDOUT then second-attempt success
- proxy ECONNRESET (no .status) RETRIES → same pattern
- proxy HTTP 4xx with .status (e.g. 401 auth) does NOT retry → bails
  after 1 attempt

Existing tests still pass — they use 'HTTP 429' Error WITHOUT .status,
which now flows through the 'else: assume transient' branch and still
retries. Only differences: the regex parsing is gone and curlFetch's
.status property is the canonical signal.

Verification:
- tests/gdelt-fetch.test.mjs: 16/16 (was 13, +3)
- npm run test:data: 5385/5385 (+3)
- npm run typecheck:all: clean

Followup commit on PR #3122.

* fix(seed-gdelt-intel): timeline calls fast-fail (maxRetries:0, proxyMaxAttempts:0)

P1 from PR #3122 review: fetchTopicTimeline is best-effort (returns []
on any failure), but the migration routed it through fetchGdeltJson
with the helper's article-fetch defaults: 3 direct retries (10/20/40s
backoff = ~70s) + 5 proxy attempts (5s base = ~20s) = ~90s worst case
per call. Called 2× per topic × 6 topics = 12 calls = up to ~18 minutes
of blocking on data the seeder discards on failure. Pre-helper code
did a single direct fetch with no retry.

Real operational regression under exactly the GDELT 429 storm conditions
this PR is meant to absorb.

Fix:

1. seed-gdelt-intel.mjs:fetchTopicTimeline now passes
   maxRetries:0, proxyMaxAttempts:0 — single direct attempt, no proxy,
   throws on first failure → caught, returns []. Matches pre-helper
   timing exactly. Article fetches keep the full retry budget; only
   timelines fast-fail.

2. _gdelt-fetch.mjs gate: skip the proxy block entirely when
   proxyMaxAttempts <= 0. Pre-fix the 'trying proxy (curl) up to 0×'
   log line would still emit even though the for loop runs zero times,
   producing a misleading line that the proxy was attempted when it
   wasn't.

Tests (2 new):

- maxRetries:0 + proxyMaxAttempts:0 → asserts directCalls=1,
  proxyCalls=0 even though _curlProxyResolver returns a valid auth
  string (proxy block must be fully bypassed).
- proxyMaxAttempts:0 → captures console.log and asserts no 'trying
  proxy' line emitted (no misleading 'up to 0×' line).

Verification:
- tests/gdelt-fetch.test.mjs: 18/18 (was 16, +2)
- npm run test:data: 5387/5387 (+2)
- npm run typecheck:all: clean

Followup commit on PR #3122.

* fix(gdelt): direct parse-failure reaches proxy + timeline budget tweak + JSDoc accuracy

3 Greptile P2s on PR #3122:

P2a — _gdelt-fetch.mjs:112: `resp.json()` was called outside the
try/catch that guards fetch(). A 200 OK with HTML/garbage body (WAF
challenge, partial response, gzip mismatch) would throw SyntaxError
and escape the helper entirely — proxy fallback never ran. The proxy
leg already parsed inside its own catch; making the direct leg
symmetric. New regression test: direct 200 OK with malformed JSON
must reach the proxy and recover.

P2b — seed-gdelt-intel.mjs timeline budget bumped from 0/0 to 0/2.
Best-effort timelines still fast-fail on direct 429 (no direct
retries) but get 2 proxy attempts via Decodo session rotation before
returning []. Worst case: ~25s/call × 12 calls = ~5 min ceiling under
heavy throttling vs ~3 min with 0/0. Tradeoff: small additional time
budget for a real chance to recover timeline data via proxy IP rotation.
Articles still keep the full retry budget.

P2c — JSDoc said 'Linear proxy backoff base' but the implementation
uses a flat constant (proxyRetryBaseMs, line 156). Linear growth
would not help here because Decodo rotates the session IP per call —
the next attempt's success is independent of the previous wait. Doc
now reads 'Fixed (constant, NOT linear) backoff' with the rationale.

Verification:
- tests/gdelt-fetch.test.mjs: 19/19 pass (was 18, +1)
- npm run test:data: 5388/5388 (+1)
- npm run typecheck:all: clean

Followup commit on PR #3122.

* test(gdelt): clarify helper-API vs seeder-mirror tests + add 0/2 lock

Reviewer feedback on PR #3122 conflated two test classes:
- Helper-API tests (lock the helper's contract for arbitrary callers
  using budget knobs like 0/0 — independent of any specific seeder)
- Seeder-mirror tests (lock the budget the actual production caller
  in seed-gdelt-intel.mjs uses)

Pre-fix the test file only had the 0/0 helper-API tests, with a
section header that read 'Best-effort caller budgets (fast-fail)' —
ambiguous about whether 0/0 was the helper API contract or the
seeder's choice. Reviewer assumed seeder still used 0/0 because the
tests locked it, but seed-gdelt-intel.mjs:97-98 actually uses 0/2
(per the prior P2b fix).

Fixes:

1. Section header for the 0/0 tests now explicitly says these are
   helper-API tests and notes that seed-gdelt-intel uses 0/2 (not
   0/0). Eliminates the conflation.

2. New 'Seeder-mirror: 0/2' section with 2 tests that lock the
   seeder's actual choice end-to-end:

   - 0/2 with first proxy attempt 429 + second succeeds → returns
     data (asserts directCalls=1, proxyCalls=2)
   - 0/2 with both proxy attempts failing → throws exhausted with
     '2/2 attempts' in message (asserts the budget propagates to the
     error message correctly)

   These tests would catch any future regression where the seeder's
   0/2 choice gets reverted to 0/0 OR where the helper stops
   honoring the proxyMaxAttempts override.

Verification:
- tests/gdelt-fetch.test.mjs: 21/21 (was 19, +2)
- npm run test:data: 5390/5390 (+2)
- npm run typecheck:all: clean

Followup commit on PR #3122.
2026-04-16 10:41:15 +04:00
Elie Habib
bdfb415f8f fix(resilience-ranking): return warmed scores from memory, skip lossy re-read (#3121)
* fix(resilience-ranking): return warmed scores from memory, skip lossy re-read

Upstash REST writes via /set aren't always visible to an immediately-following
/pipeline GET in the same Vercel invocation (documented in PR #3057 /
feedback_upstash_write_reread_race_in_handler.md). The ranking handler was
warming 222 countries then re-reading them from Redis to compute a coverage
ratio; that re-read could return 0 despite every SET succeeding, collapsing
coverage to 0% < 75% and silently dropping the ranking publish. Consequence:
`resilience:ranking:v9` missing, per-country score keys absent, health
reports EMPTY_ON_DEMAND even while the seeder keeps writing a "fresh" meta.

Fix: warmMissingResilienceScores now returns Map<cc, GetResilienceScoreResponse>
with every successfully computed score. The handler merges those into
cachedScores directly and drops the post-warm re-read. Coverage now reflects
what was actually computed in-memory this invocation, not what Redis happens
to surface after write lag.

Adds a regression test that simulates pipeline-GET returning null for
freshly-SET score keys; it fails against the old code (coverage=0, no
ranking written) and passes with the fix (coverage=100%, ranking written).

Slice A of the resilience-ranking recovery plan; Slice B (seeder meta
truthfulness) follows.

* fix(resilience-ranking): verify score-key persistence via pipeline SET response

PR review P1: trusting every fulfilled ensureResilienceScoreCached() result as
"cached" turned the read-lag fix into a write-failure false positive.
cachedFetchJson's underlying setCachedJson only logs and swallows write errors,
so a transient /set failure on resilience:score:v9:* would leave per-country
scores absent while the ranking aggregate and its seed-meta got published on
top of them — worse than the bug this PR was meant to fix.

Fix: use the pipeline SET response as the authoritative persistence signal.

- Extract the score builder into a pure `buildResilienceScore()` with no
  caching side-effects (appendHistory stays — it's part of the score
  semantics).
- `ensureResilienceScoreCached()` still wraps it in cachedFetchJson so
  single-country RPC callers keep their log-and-return-anyway behavior.
- `warmMissingResilienceScores()` now computes in-memory, persists all
  scores in one pipeline SET, and only returns countries whose command
  reported `result: OK`. Pipeline SET's response is synchronous with the
  write, so OK means actually stored — no ambiguity with read-after-write
  lag.
- When runRedisPipeline returns fewer responses than commands (transport
  failure), return an empty map: no proof of persistence → coverage gate
  can't false-positive.

Adds regression test that blocks pipeline SETs to score keys and asserts
the ranking + meta are NOT published. Existing race-regression test still
passes.

* fix(resilience-ranking): preserve env key prefix on warm pipeline SET

PR review P1: the pipeline SET added to verify score-key persistence was
called with raw=true, bypassing the preview/dev key prefix (preview:<sha>:).
Two coupled regressions:

  1. Preview/dev deploys write unprefixed `resilience:score:v9:*` keys, but
     all reads (getCachedResilienceScores, ensureResilienceScoreCached via
     setCachedJson/cachedFetchJson) look in the prefixed namespace. Warmed
     scores become invisible to the same preview on the next read.
  2. Because production uses the empty prefix, preview writes land directly
     in the production-visible namespace, defeating the environment
     isolation guard in server/_shared/redis.ts.

Fix: drop the raw=true flag so runRedisPipeline applies prefixKey on each
command, symmetric with the reads. Adds __resetKeyPrefixCacheForTests in
redis.ts so tests can exercise a non-empty prefix without relying on
process-startup memoization order.

Adds regression test that simulates VERCEL_ENV=preview + a commit SHA and
asserts every score-key SET in the pipeline carries the preview:<sha>:
prefix. Fails on old code (raw writes), passes now. installRedis gains an
opt-in `keepVercelEnv` so the test can run under a forced env without
being clobbered by the helper's default reset.

* test(resilience-ranking): snapshot + restore VERCEL_GIT_COMMIT_SHA in afterEach

PR review P2: the preview-prefix test mutates process.env.VERCEL_GIT_COMMIT_SHA
but the file's afterEach only restored VERCEL_ENV. A process started with a
real preview SHA (e.g. CI) would have that value unconditionally deleted after
the test ran, leaking changed state into later tests and producing different
prefix behavior locally vs. CI.

Fix: capture originalVercelSha at module load, restore it in afterEach, and
invalidate the memoized key prefix after each test so the next one recomputes
against the restored env. The preview-prefix test's finally block is no
longer needed — the shared teardown handles it.

Verified: suite still passes 11/11 under both `VERCEL_ENV=production` (unset)
and `VERCEL_ENV=preview VERCEL_GIT_COMMIT_SHA=ci-original-sha` process
environments.
2026-04-16 10:17:22 +04:00
Elie Habib
fec135d6b8 chore(sentry): filter DuckDuckGo browser Response did not contain 'success' or 'data' noise (#3123)
Adds one ignoreErrors pattern for the distinctive DuckDuckGo browser-internal
error phrase (WORLDMONITOR-MZ). The message is never emitted by our own code,
contains backtick-quoted field names that are not in our vocabulary, and arrives
with an empty stack from DuckDuckGo 26.3 on macOS.
2026-04-16 10:06:58 +04:00
Elie Habib
9b07fc8d8a feat(yahoo): _yahoo-fetch helper with curl-only Decodo proxy fallback + 4 seeder migrations (#3120)
* feat(_yahoo-fetch): curl-only Decodo proxy fallback helper

Yahoo Finance throttles Railway egress IPs aggressively. 4 seeders
(seed-commodity-quotes, seed-etf-flows, seed-gulf-quotes, seed-market-quotes)
duplicated the same fetchYahooWithRetry block with no proxy fallback.
This helper consolidates them and adds the proxy fallback.

Yahoo-specific: CURL-ONLY proxy strategy. Probed 2026-04-16:
  query1.finance.yahoo.com via CONNECT (httpsProxyFetchRaw): HTTP 404
  query1.finance.yahoo.com via curl    (curlFetch):          HTTP 200
Yahoo's edge blocks Decodo's CONNECT egress IPs but accepts the curl
egress IPs. Helper deliberately omits the CONNECT leg — adding it
would burn time on guaranteed-404 attempts. Production defaults expose
ONLY curlProxyResolver + curlFetcher.

All learnings from PR #3118 + #3119 reviews baked in:
- lastDirectError accumulator across the loop, embedded in final throw +
  Error.cause chain
- catch block uses break (NOT throw) so thrown errors also reach proxy
- DI seams (_curlProxyResolver, _proxyCurlFetcher) for hermetic tests
- _PROXY_DEFAULTS exported for production-default lock tests
- Sync curlFetch wrapped with await Promise.resolve() to future-proof
  against an async refactor (Greptile P2 from #3119)

Tests (tests/yahoo-fetch.test.mjs, 11 cases):
- Production defaults: curl resolver/fetcher reference equality
- Production defaults: NO CONNECT leg present (regression guard)
- 200 OK passthrough, never touches proxy
- 429 with no proxy → throws exhausted with HTTP 429 in message
- Retry-After header parsed correctly
- 429 + curl proxy succeeds → returns proxy data
- Thrown fetch error on final retry → proxy fallback runs (P1 guard)
- 429 + proxy ALSO fails → both errors visible in message + cause chain
- Proxy malformed JSON → throws exhausted
- Non-retryable 500 → no extra direct retry, falls to proxy
- parseRetryAfterMs unit (exported sanity check)

Verification: 11/11 helper tests pass. node --check clean.

Phase 1 of 2 — seeder migrations follow.

* feat(yahoo-seeders): migrate 4 seeders to _yahoo-fetch helper

Removes the duplicated fetchYahooWithRetry function (4 byte-identical
copies across seed-commodity-quotes, seed-etf-flows, seed-gulf-quotes,
seed-market-quotes) and routes all Yahoo Finance fetches through the
new scripts/_yahoo-fetch.mjs helper. Each seeder gains the curl-only
Decodo proxy fallback baked into the helper.

Per-seeder changes (mechanical):
- import { fetchYahooJson } from './_yahoo-fetch.mjs'
- delete the local fetchYahooWithRetry function
- replace 'const resp = await fetchYahooWithRetry(url, label); if (!resp)
  return X; const json = await resp.json()' with
  'let json; try { json = await fetchYahooJson(url, { label }); }
  catch { return X; }'
- prune now-unused CHROME_UA/sleep imports where applicable

Latent bugs fixed in passing:
- seed-etf-flows.mjs:23 and seed-market-quotes.mjs:38 referenced
  CHROME_UA without importing it (would throw ReferenceError at
  runtime if the helper were called). Now the call site is gone in
  etf-flows; in market-quotes CHROME_UA is properly imported because
  Finnhub call still uses it.

seed-commodity-quotes also has fetchYahooChart1y (separate non-retry
function for gold history). Migrated to use fetchYahooJson under the
hood — preserves return shape, adds proxy fallback automatically.

Verification:
- node --check clean on all 4 modified seeders
- npm run typecheck:all clean
- npm run test:data: 5374/5374 pass

Phase 2 of 2.

* fix(_yahoo-fetch): log success AFTER parse + add _sleep DI seam for honest Retry-After test

Greptile P2: "[YAHOO] proxy (curl) succeeded" was logged BEFORE
JSON.parse(text). On malformed proxy JSON, Railway logs would show:

  [YAHOO] proxy (curl) succeeded for AAPL
  throw: Yahoo retries exhausted ...

Contradictory + breaks the post-deploy log-grep verification this PR
relies on ("look for [YAHOO] proxy (curl) succeeded"). Fix: parse
first; success log only fires when parse succeeds AND the value is
about to be returned.

Greptile P3: 'Retry-After header parsed correctly' test used header
value '0', but parseRetryAfterMs() treats non-positive seconds as null
→ helper falls through to default linear backoff. So the test was
exercising the wrong branch despite its name.

Fix: added _sleep DI opt seam to the helper. New test injects a sleep
spy and asserts the captured duration:

  Retry-After: '7' → captured sleep == [7000]   (Retry-After branch)
  no Retry-After  → captured sleep == [10]      (default backoff = retryBaseMs * 1)

Two paired tests lock both branches separately so a future regression
that collapses them is caught.

Also added a log-ordering regression test: malformed proxy JSON must
NOT emit the 'succeeded' log. Captures console.log into an array and
asserts no 'proxy (curl) succeeded' line appeared before the throw.

Verification:
- tests/yahoo-fetch.test.mjs: 13/13 (was 11, +2)
- npm run test:data: 5376/5376 (+2)
- npm run typecheck:all: clean

Followup commits on PR #3120.
2026-04-16 09:25:06 +04:00
Elie Habib
57414e4762 fix(open-meteo): curl proxy as second-choice when CONNECT proxy fails (#3119)
* fix(open-meteo): curl proxy as second-choice when CONNECT proxy fails

Decodo's CONNECT egress and curl egress reach DIFFERENT IP pools (per
scripts/_proxy-utils.cjs:67). Probed 2026-04-16 against Yahoo Finance:

  Yahoo via CONNECT (httpsProxyFetchRaw): HTTP 404
  Yahoo via curl (curlFetch):              HTTP 200

For Open-Meteo both paths happen to work today, but pinning the helper
to one path is a single point of failure if Decodo rebalances pools, or
if Open-Meteo starts behaving like Yahoo. PR #3118 wired only the
CONNECT path (`httpsProxyFetchRaw`); this commit adds curl as a
second-choice attempt that runs only when CONNECT also fails.

Cascade:
  direct retries (3) → CONNECT proxy (1) → curl proxy (1) → throw

Steady-state cost: zero. Curl exec only runs when CONNECT also failed.

Final exhausted-throw now appends the LAST proxy error too, so on-call
sees both upstream signals (direct + proxy) instead of just direct.

Tests: added 4 cases locking the cascade behavior:

- CONNECT fails → curl succeeds: returns curl data, neither throws
- CONNECT succeeds: curl never invoked (cost gate)
- CONNECT fails AND curl fails: throws exhausted with both errors
  visible in the message (HTTP 429 from direct + curl 502 from proxy)
- curl returns malformed JSON: caught + warns + throws exhausted

Updated 2 existing tests to also stub _proxyCurlFetcher so they don't
shell out to real curl when CONNECT is mocked-failed (would have run
real curl with proxy.test:8000 → 8s timeout per test).

Verification:
- tests/open-meteo-proxy-fallback.test.mjs → 12/12 pass (was 8, +4 new)
- npm run test:data → 5367/5367 (+4)
- npm run typecheck:all → clean

Followup to PR #3118.

* fix: CONNECT leg uses resolveProxyForConnect; lock production defaults

P1 from PR #3119 review: the cascade was logged as 'CONNECT proxy → curl
proxy' but BOTH legs were resolving via resolveProxy() — which rewrites
gate.decodo.com → us.decodo.com for curl egress. So the 'two-leg
cascade' was actually one Decodo egress pool wearing two transport
mechanisms. Defeats the redundancy this PR is supposed to provide.

Fix: import resolveProxyForConnect (preserves gate.decodo.com — the
host Decodo routes via its CONNECT egress pool, distinct from the
curl-egress pool reached by us.decodo.com via resolveProxy). CONNECT
leg uses resolveProxyForConnect; curl leg uses resolveProxy. Matches
the established pattern in scripts/seed-portwatch-chokepoints-ref.mjs:33-37
and scripts/seed-recovery-external-debt.mjs:31-35.

Refactored test seams: split single _proxyResolver into
_connectProxyResolver + _curlProxyResolver. Test files inject both.

P2 fix: every cascade test injected _proxyResolver, so the suite stayed
green even when production defaults were misconfigured. Exported
_PROXY_DEFAULTS object and added 2 lock-tests:

  1. CONNECT leg uses resolveProxyForConnect, curl leg uses resolveProxy
     (reference equality on each of 4 default fields).
  2. connect/curl resolvers are different functions — guards against the
     'collapsed cascade' regression class generally, not just this
     specific instance.

Updated the 8 existing cascade tests to inject BOTH resolvers. The
docstring at the top of the file now spells out the wiring invariant
and points to the lock-tests.

Verification:
- tests/open-meteo-proxy-fallback.test.mjs: 14/14 pass (+2)
- npm run test:data: 5369/5369 (+2)
- npm run typecheck:all: clean

Followup commit on PR #3119.

* fix(open-meteo): future-proof sync curlFetch call with Promise.resolve+await

Greptile P2: _proxyCurlFetcher (curlFetch / execFileSync) is sync today,
adjacent CONNECT path is async (await _proxyFetcher(...)). A future
refactor of curlFetch to async would silently break this line — JSON.parse
would receive a Promise<string> instead of a string and explode at parse
time, not at the obvious call site.

Wrapping with await Promise.resolve(...) is a no-op for the current sync
implementation but auto-handles a future async refactor. Comment spells
out the contract so the wrap doesn't read as cargo-cult.

Tests still 14/14.
2026-04-16 09:24:12 +04:00
Elie Habib
5d1c8625e9 fix(seed-climate-zone-normals): proxy fallback when Open-Meteo 429s on Railway IP (#3118)
* fix(seed-climate-zone-normals): proxy fallback when Open-Meteo 429s on Railway IP

Railway logs.1776312819911.log showed seed-climate-zone-normals failing
every batch with HTTP 429 from Open-Meteo's free-tier per-IP throttle
(2026-04-16). The seeder retried with 2/4/8/16s backoff but exhausted
without ever falling back to the project's Decodo proxy infrastructure
that other rate-limited sources (FRED, IMF) already use.

Open-Meteo throttles by source IP. Railway containers share IP pools and
get 429 storms whenever zone-normals fires (monthly cron — high churn
when it runs). Result: PR #3097's bake clock for climate:zone-normals:v1
couldn't start, because the seeder couldn't write the contract envelope
even when manually triggered.

Fix: after direct retries exhaust, _open-meteo-archive.mjs falls back to
httpsProxyFetchRaw (Decodo) — same pattern as fredFetchJson and
imfFetchJson in _seed-utils.mjs. Skips silently if no proxy is configured
(preserves existing behavior in non-Railway envs).

Added tests/open-meteo-proxy-fallback.test.mjs (4 cases):
- 429 with no proxy → throws after exhausting retries (pre-fix behavior preserved)
- 200 OK → returns parsed batch without touching proxy path
- batch size mismatch → throws even on 200
- Non-retryable 500 → break out, attempt proxy, throw exhausted (no extra
  direct retry — matches new control flow)

Verification: npm run test:data → 5359/5359, +4 new. node --check clean.

Same pattern can be applied to any other helper that fetches Open-Meteo
(grep 'open-meteo' scripts/) if more 429s show up.

* fix: proxy fallback runs on thrown direct errors + actually-exercised tests

Addresses two PR #3118 review findings.

P1: catch block did 'throw err' on the final direct attempt, silently
bypassing the proxy fallback for thrown-error cases (timeout, ECONNRESET,
DNS failures). Only non-OK HTTP responses reached the proxy path. Fix:
record the error in lastDirectError and 'break' so control falls through
to the proxy fallback regardless of whether the direct path failed via
thrown error or non-OK status.

Also: include lastDirectError context in the final 'retries exhausted'
message + Error.cause so on-call can see what triggered the fallback
attempt (was: opaque 'retries exhausted').

P2: tests didn't exercise the actual proxy path. Refactored helper to
accept _proxyResolver and _proxyFetcher opt overrides (production
defaults to real resolveProxy/httpsProxyFetchRaw from _seed-utils.mjs;
tests inject mocks). Added 4 new cases:

- 429 + proxy succeeds → returns proxy data
- thrown fetch error on final retry → proxy fallback runs (P1 regression
  guard with explicit assertion: directCalls=2, proxyCalls=1)
- 429 + proxy ALSO fails → throws exhausted, original HTTP 429 in
  message + cause chain
- Proxy returns wrong batch size → caught + warns + throws exhausted

Verification:
- tests/open-meteo-proxy-fallback.test.mjs: 8/8 pass (4 added)
- npm run test:data: 5363/5363 pass (+4 from prior 5359)
- node --check clean
2026-04-16 08:28:05 +04:00
Elie Habib
e6a6d4e326 fix(bundle-runner): stream child stdio + SIGKILL escalation on timeout (#3114)
* fix(bundle-runner): stream child stdio + SIGKILL escalation on timeout

Silent Railway crashes in seed-bundle-portwatch — container exits after
~7min with ZERO logs from the hanging section. Root cause in the runner,
not the seeder: execFile buffers child stdout until the callback fires,
and its default SIGTERM never escalates to SIGKILL, so a child with
in-flight HTTPS sockets can outlive the timeout and be killed by the
container limit before any error is logged.

Switch to spawn + live line-prefixed streaming. On timeout, send SIGTERM,
then SIGKILL after a 10s grace. Always log the terminal reason (timeout
/ exit code / signal) so the next failing bundle surfaces the hung
section on its own line instead of going dark.

Applies to all 15 seed-bundle-*.mjs services that use this runner.

* fix(bundle-runner): guard double-resolve, update docstring, add tests

Review follow-ups:
- Idempotent settle() so spawn 'error' + 'close' can't double-resolve
- Header comment reflects spawn + streaming + SIGKILL behavior
- tests/bundle-runner.test.mjs covers live streaming, SIGKILL escalation
  when a child ignores SIGTERM, and non-zero exit reporting

* fix(bundle-runner): address PR review — declare softKill before settle, handle stdio error

* fix(bundle-runner): log terminal reason BEFORE SIGKILL grace + include grace in budget

Review P1 follow-up. Two gaps the previous commit left open:

1. A section with timeoutMs close to Railway's ~10min container cap could
   be killed by the container mid-grace, before the "Failed ... timeout"
   line reached the log stream. Fix: emit the terminal Failed line at the
   moment softKill fires (before SIGTERM), so the reason is flushed BEFORE
   any grace window that could be truncated by a container kill.

2. The admission check used raw timeoutMs, but worst-case runtime is
   timeoutMs + KILL_GRACE_MS when the child ignores SIGTERM. A section
   that "fit" the budget could still overrun. Fix: compare elapsed +
   timeout + grace against maxBundleMs.

close handler still settles the promise but no longer re-logs on the
timeout path (alreadyLogged flag). New test asserts the Failed line
precedes SIGKILL escalation, and that budget accounts for grace.
2026-04-16 07:58:18 +04:00
Elie Habib
13446a2170 fix(seed-contract-probe): send Origin header so /api/bootstrap boundary check doesn't 401 (#3100)
* fix(seed-contract-probe): send Origin header so /api/bootstrap boundary check doesn't 401

Production probe returned {boundary: [{endpoint: '/api/bootstrap', pass: false,
status: 401, reason: 'status:401'}]}. Root cause: checkPublicBoundary's
self-fetch had no Origin header, so /api/bootstrap's validateApiKey() treated
it as a non-browser caller and required an API key.

Fix: set Origin: https://worldmonitor.app on the boundary self-fetch. This
takes the trusted-browser path without needing to embed an API key in the
probe. The probe runs edge-side with x-probe-secret internal auth; emulating
a trusted browser is only for boundary response-shape verification.

Tests still 17/17.

* fix(seed-contract-probe): explicit User-Agent on boundary self-fetch

Per AGENTS.md, server-side fetches must include a UA. middleware.ts:138
returns 403 for !ua || ua.length < 10 on non-public paths, and
/api/bootstrap is not in PUBLIC_API_PATHS — the probe works today only
because Vercel Edge implicitly adds a UA. Making it explicit.

Addresses greptile P2 on PR #3100.
2026-04-15 15:34:38 +04:00
Elie Habib
1b4335353f fix(sentry): suppress 6 noise patterns from triage (#3105)
* fix(sentry): suppress 6 noise patterns flagged in triage

Add filters so resolved issues don't re-fire:
- Convex "Connection lost while action was in flight" → ignoreErrors
- Convex WS onmessage JSON.parse truncation (Ping/Updated frames) → beforeSend gated on onmessage frame
- chrome/moz/safari-extension frames intercepting fetch → beforeSend
- Sentry SDK breadcrumb null.contains DOM crash → beforeSend gated on sentry chunk
- bare "Failed to fetch" (no TypeError: prefix in msg) → extend existing regex

* test(sentry): tighten onmessage guard + add tests for 3 new filters

Review feedback from PR #3105:
- add !hasFirstParty guard to Convex WS onmessage JSON-parse filter
- add 6 test cases covering chrome-extension drop, sentry null.contains
  gate, and Convex onmessage suppression + first-party regression paths

* fix(sentry): gate extension and sentry-contains filters on !hasFirstParty

Review feedback PR #3105 (round 2):
- extension-frame drop no longer suppresses events when a first-party
  frame is also on the stack (a real app regression could have an
  extension wrapper around it)
- sentry null.contains filter no longer suppresses when a first-party
  frame is present (Sentry wraps first-party handlers, so a genuine
  el.contains() bug produces a stack with both main-*.js and sentry-*.js)

Adds 3 more tests covering the !hasFirstParty boundary.
2026-04-15 15:28:15 +04:00
Elie Habib
90f4ac0f78 feat(consumer-prices): strict search-hit validator (shadow mode) (#3101)
* feat(consumer-prices): add 'candidate' match state + negativeTokens schema

Schema foundation for the strict-validator plan:
- migration 008 widens product_matches.match_status CHECK to include
  'candidate' so weak search hits can be persisted without entering
  aggregates (aggregate.ts + snapshots filter on ('auto','approved')
  so candidates are excluded automatically).
- BasketItemSchema gains optional negativeTokens[] — config-driven
  reject tokens for obvious class errors (e.g. 'canned' for fresh
  tomatoes). Product-taxonomy splits like plain vs greek yogurt
  belong in separate substitutionGroup values, not here.
- upsertProductMatch accepts 'candidate' and writes evidence_json
  so reviewers can see why a match was downgraded.

* feat(consumer-prices): add validateSearchHit pure helper + known-bad test fixtures

Deterministic post-extraction validator that replaces the boolean
isTitlePlausible gate for scoring and candidate triage. Evaluates
four signals and returns { ok, score, reasons, signals }:

  - class-error rejects from BasketItem.negativeTokens (whole-token
    match for single words; substring match for hyphenated entries
    like 'plant-based' so 'Plant-Based Yogurt' trips without needing
    token-splitting gymnastics)
  - non-food indicators (seeds, fertilizer, planting) — shared with
    the legacy gate
  - token-overlap ratio over identity tokens (>2 chars, non-packaging)
  - quantity-window conformance against minBaseQty/maxBaseQty

Score is a 0..1 weighted sum (overlap 0.55, size 0.35/0.2/0, class-
clean 0.10). AUTO_MATCH_THRESHOLD=0.75 exported for the scrape-side
auto-vs-candidate decision.

Locked all five bad log examples into regression tests and added
matching positive cases so the rule set proves both sides of the
boundary. Also added vitest.config.ts so consumer-prices-core tests
run under its own config instead of inheriting the worldmonitor
root config (which excludes this directory).

* feat(consumer-prices): wire validator (shadow) + replace 1.0 auto-match

search.ts:
- Thread BasketItem constraints (baseUnit, min/maxBaseQty, negativeTokens,
  substitutionGroup) through discoverTargets → fetchTarget → parseListing
  using explicit named fields, not an opaque JSON blob.
- _extractFromUrl now runs validateSearchHit alongside isTitlePlausible.
  Legacy gate remains the hard gate; validator is shadow-only for now —
  when legacy accepts but validator rejects, a [search:shadow-reject]
  line is logged with reasons + score so the rollout diff report can
  inform the decision to flip the gate. No live behavior change.
- ValidatorResult attached to SearchPayload + rawPayload so scrape.ts
  can score the match without re-running the validator.

scrape.ts:
- Remove unconditional matchScore:1.0 / status:'auto' insert. Use the
  validator score from the adapter payload. Hits with ok=true and
  score >= AUTO_MATCH_THRESHOLD (0.75) keep 'auto'; everything else
  (including validator.ok=false) writes 'candidate' with evidence_json
  carrying the reasons + signals. Aggregates filter on ('auto','approved')
  so candidates are excluded automatically.
- Adapters without a validator (exa-search, etc.) fall back to the
  legacy 1.0/auto behavior so this PR is a no-op for non-search paths.

* feat(consumer-prices): populate negativeTokens for 6 known-bad groups

* fix(consumer-prices): enforce validator on pin path + drop 'cane' from sugar rejects

Addresses PR #3101 review:

1. Pinned direct hits bypassed the validator downgrade — the new
   auto-vs-candidate decision only ran inside the !wasDirectHit block,
   so a pin that drifted onto the wrong product (the steady-state
   common path) would still flow poisoned prices into aggregates
   through the existing 'auto' match. Now: before inserting an
   observation, if the direct hit's validator.ok === false, skip the
   observation and route the target through handlePinError so the pin
   soft-disables after 3 strikes. Legacy isTitlePlausible continues to
   gate the pin extraction itself.

2. 'cane' was a hard reject for sugar_white across all 10 baskets but
   'white cane sugar' is a legitimate SKU descriptor — would have
   downgraded real products to candidate and dropped coverage. Removed
   from every essentials_*.yaml sugar_white negativeTokens list.
   Added a regression test that locks in 'Silver Spoon White Cane
   Sugar 1kg' as a must-pass positive case.

* fix(consumer-prices): strip size tokens from identity + protect approved rows

Addresses PR #3101 round 2 review:

1. Compact size tokens ("1kg", "500g", "250ml") were kept as identity
   tokens. Firecrawl emits size spaced ("1 kg"), which tokenises to
   ["1","kg"] — both below the length>2 floor — so the compact "1kg"
   token could never match. Short canonical names like "Onions 1kg"
   lost 0.5 token overlap and legitimate hits landed at score 0.725 <
   AUTO_MATCH_THRESHOLD, silently downgrading to candidate. Size
   fidelity is already enforced by the quantity-window check; identity
   tokens now ignore /^\d+(?:\.\d+)?[a-z]+$/. New regression test
   locks in "Fresh Red Onions 1 kg" as a must-pass case.

2. upsertProductMatch's DO UPDATE unconditionally wrote EXCLUDED.status.
   A re-scrape whose validator scored an already-approved URL below
   0.75 would silently demote human-curated 'approved' rows to
   'candidate'. Added a CASE guard so approved stays approved; every
   other state follows the new validator verdict.

* fix(consumer-prices): widen curated-state guard to review + rejected

PR #3101 round 3: the CASE only protected 'approved' from being
overwritten. 'review' (written by validate.ts when a price is an
outlier, or by humans sending a row back) and 'rejected' (human
block) are equally curated — a re-scrape under this path silently
overwrites them with the fresh validator verdict and re-enables the
URL in aggregate queries on the next pass.

Widen the immutable set to ('approved','review','rejected'). Also
stop clearing pin_disabled_at on those rows so a quarantined pin
keeps its disabled flag until the review workflow resolves it.

* fix(analyze-stock): classify dividend frequency by median gap

recentDivs.length within a hard 365.25-day window misclassifies quarterly
payers whose last-year Q1 payment falls just outside the cutoff — common
after mid-April each year, when Date.now() - 365.25d lands after Jan's
payment timestamp. The test 'non-zero CAGR for a quarterly payer' flaked
calendar-dependently for this reason.

Prefer median inter-payment interval: quarterly = ~91d median gap,
regardless of where the trailing-12-month window happens to bisect the
payment series. Falls back to the old count when <2 entries exist.

Also documents the CAGR filter invariant in the test helper.

* fix(analyze-stock): suppress frequency when no recent divs + detect regime slowdowns

Addresses PR #3102 review:

1. Suspended programs no longer leak a frequency badge. When recentDivs
   is empty, dividendYield and trailingAnnualDividendRate are both 0;
   emitting 'Quarterly' derived from historical median would contradict
   those zeros in the UI. paymentsPerYear now short-circuits to 0 before
   the interval classifier runs.

2. Whole-history median-gap no longer masks cadence regime changes. The
   reconciliation now depends on trailing-year count:
     recent >= 3  → interval classifier (robust to calendar drift)
     recent 1..2  → inspect most-recent inter-payment gap:
                    > 180d = real slowdown, trust count (Annual)
                    <= 180d = calendar drift, trust interval (Quarterly)
     recent 0     → empty frequency (suspended)
   The interval classifier itself is now scoped to the last 2 years so
   it responds to regime changes instead of averaging over 5y of history.

Regression tests:
- 'emits empty frequency when the dividend program has been suspended' —
  3y of quarterly history + 18mo silence must report '' not 'Quarterly'.
- 'detects a recent quarterly → annual cadence change' — 12 historical
  quarterly payments + 1 recent annual payment must report 'Annual'.

* fix(analyze-stock): scope interval median to trailing year when recent>=3

Addresses PR #3102 review round 2: the reconciler's recent>=3 branch
called paymentsPerYearFromInterval(entries), which scopes to the last
2 years. A monthly→quarterly shift (12 monthly payments in year -2..-1
plus 4 quarterly in year -1..0) produced a 2-year median of ~30d and
misclassified as Monthly even though the current trailing-year cadence
is clearly quarterly.

Pass recentDivs directly to the interval classifier when recent>=3.
Two payments in the trailing year = 1 gap which suffices for the median
(gap count >=1, median well-defined). The historical-window 2y scoping
still applies for the recent 1..2 branch, where we actively need
history to distinguish drift from slowdown.

Regression test: 12 monthly payments from -13..-24 months ago + 4
quarterly payments inside the trailing year must classify as Quarterly.

* fix(analyze-stock): use true median (avg of two middles) for even gap counts

PR #3102 P2: gaps[floor(length/2)] returns the upper-middle value for
even-length arrays, biasing toward slower cadence at classifier
thresholds when the trailing-year sample is small. Use the average of
the two middles for even lengths. Harmless on 5-year histories with
50+ gaps where values cluster, but correct at sparse sample sizes where
the trailing-year branch can have only 2–3 gaps.
2026-04-15 14:28:18 +04:00
Elie Habib
fffc5d9607 fix(analyze-stock): classify dividend frequency by median gap (#3102)
* fix(analyze-stock): classify dividend frequency by median gap

recentDivs.length within a hard 365.25-day window misclassifies quarterly
payers whose last-year Q1 payment falls just outside the cutoff — common
after mid-April each year, when Date.now() - 365.25d lands after Jan's
payment timestamp. The test 'non-zero CAGR for a quarterly payer' flaked
calendar-dependently for this reason.

Prefer median inter-payment interval: quarterly = ~91d median gap,
regardless of where the trailing-12-month window happens to bisect the
payment series. Falls back to the old count when <2 entries exist.

Also documents the CAGR filter invariant in the test helper.

* fix(analyze-stock): suppress frequency when no recent divs + detect regime slowdowns

Addresses PR #3102 review:

1. Suspended programs no longer leak a frequency badge. When recentDivs
   is empty, dividendYield and trailingAnnualDividendRate are both 0;
   emitting 'Quarterly' derived from historical median would contradict
   those zeros in the UI. paymentsPerYear now short-circuits to 0 before
   the interval classifier runs.

2. Whole-history median-gap no longer masks cadence regime changes. The
   reconciliation now depends on trailing-year count:
     recent >= 3  → interval classifier (robust to calendar drift)
     recent 1..2  → inspect most-recent inter-payment gap:
                    > 180d = real slowdown, trust count (Annual)
                    <= 180d = calendar drift, trust interval (Quarterly)
     recent 0     → empty frequency (suspended)
   The interval classifier itself is now scoped to the last 2 years so
   it responds to regime changes instead of averaging over 5y of history.

Regression tests:
- 'emits empty frequency when the dividend program has been suspended' —
  3y of quarterly history + 18mo silence must report '' not 'Quarterly'.
- 'detects a recent quarterly → annual cadence change' — 12 historical
  quarterly payments + 1 recent annual payment must report 'Annual'.

* fix(analyze-stock): scope interval median to trailing year when recent>=3

Addresses PR #3102 review round 2: the reconciler's recent>=3 branch
called paymentsPerYearFromInterval(entries), which scopes to the last
2 years. A monthly→quarterly shift (12 monthly payments in year -2..-1
plus 4 quarterly in year -1..0) produced a 2-year median of ~30d and
misclassified as Monthly even though the current trailing-year cadence
is clearly quarterly.

Pass recentDivs directly to the interval classifier when recent>=3.
Two payments in the trailing year = 1 gap which suffices for the median
(gap count >=1, median well-defined). The historical-window 2y scoping
still applies for the recent 1..2 branch, where we actively need
history to distinguish drift from slowdown.

Regression test: 12 monthly payments from -13..-24 months ago + 4
quarterly payments inside the trailing year must classify as Quarterly.

* fix(analyze-stock): use true median (avg of two middles) for even gap counts

PR #3102 P2: gaps[floor(length/2)] returns the upper-middle value for
even-length arrays, biasing toward slower cadence at classifier
thresholds when the trailing-year sample is small. Use the average of
the two middles for even lengths. Harmless on 5-year histories with
50+ gaps where values cluster, but correct at sparse sample sizes where
the trailing-year branch can have only 2–3 gaps.
2026-04-15 14:00:57 +04:00
Elie Habib
1346946f15 fix(middleware): allow /api/seed-contract-probe to bypass bot UA filter (#3099)
Vercel log showed 'Middleware 403 Forbidden' on /api/seed-contract-probe
for both curl-from-ops and UptimeRobot requests. middleware.ts's BOT_UA
regex matches 'curl/' and 'bot', so any monitoring/probe UA was blocked
before reaching the handler — even though the probe has its own
RELAY_SHARED_SECRET auth that makes the UA check redundant.

Added /api/seed-contract-probe to PUBLIC_API_PATHS (joining /api/version
and /api/health). Safe: the endpoint enforces x-probe-secret matching
RELAY_SHARED_SECRET internally; bypassing the generic UA gate does not
reduce security.

Commented the allowlist to spell out the invariant: entries must carry
their own auth, because this list disables the middleware's generic bot
gate.

Verified via Vercel Inspector log trace:
  Firewall: bypass → OK
  Middleware: 403 Forbidden ← this commit fixes it
  Handler: (unreachable before fix)
2026-04-15 09:40:35 +04:00
Elie Habib
044598346e feat(seed-contract): PR 2a — runSeed envelope dual-write + 91 seeders migrated (#3097)
* feat(seed-contract): PR 2a — runSeed envelope dual-write + 91 seeders migrated

Opt-in contract path in runSeed: when opts.declareRecords is provided, write
{_seed, data} envelope to the canonical key alongside legacy seed-meta:*
(dual-write). State machine: OK / OK_ZERO / RETRY with zeroIsValid opt.
declareRecords throws or returns non-integer → hard fail (contract violation).
extraKeys[*] support per-key declareRecords; each extra key writes its own
envelope. Legacy seeders (no declareRecords) entirely unchanged.

Migrated all 91 scripts/seed-*.mjs to contract mode. Each exports
declareRecords returning the canonical record count, and passes
schemaVersion: 1 + maxStaleMin (matched to api/health.js SEED_META, or 2.5x
interval where no registry entry exists). Contract conformance reports 84/86
seeders with full descriptor (2 pre-existing warnings).

Legacy seed-meta keys still written so unmigrated readers keep working;
follow-up slices flip health.js + readers to envelope-first.

Tests: 61/61 PR 1 tests still pass.

Next slices for PR 2:
- api/health.js registry collapse + 15 seed-bundle-*.mjs canonicalKey wiring
- reader migration (mcp, resilience, aviation, displacement, regional-snapshot)
- direct writers — ais-relay.cjs, consumer-prices-core publish.ts
- public-boundary stripSeedEnvelope + test migration

Plan: docs/plans/2026-04-14-002-fix-runseed-zero-record-lockout-plan.md

* fix(seed-contract): unwrap envelopes in internal cross-seed readers

After PR 2a enveloped 91 canonical keys as {_seed, data}, every script-side
reader that returned the raw parsed JSON started silently handing callers the
envelope instead of the bare payload. WoW baselines (bigmac, grocery-basket,
fear-greed) saw undefined .countries / .composite; seed-climate-anomalies saw
undefined .normals from climate:zone-normals:v1; seed-thermal-escalation saw
undefined .fireDetections from wildfire:fires:v1; seed-forecasts' ~40-key
pipeline batch returned envelopes for every input.

Fix: route every script-side reader through unwrapEnvelope(...).data. Legacy
bare-shape values pass through unchanged (unwrapEnvelope returns
{_seed: null, data: raw} for any non-envelope shape).

Changed:
- scripts/_seed-utils.mjs: import unwrapEnvelope; redisGet, readSeedSnapshot,
  verifySeedKey all unwrap. Exported new readCanonicalValue() helper for
  cross-seed consumers.
- 18 seed-*.mjs scripts with local redisGet-style helpers or inline fetch
  patched to unwrap via the envelope source module (subagent sweep).
- scripts/seed-forecasts.mjs pipeline batch: parse() unwraps each result.
- scripts/seed-energy-spine.mjs redisMget: unwraps each result.

Tests:
- tests/seed-utils-envelope-reads.test.mjs: 7 new cases covering envelope
  + legacy + null paths for readSeedSnapshot and verifySeedKey.
- Full seed suite: 67/67 pass (was 61, +6 new).

Addresses both of user's P1 findings on PR #3097.

* feat(seed-contract): envelope-aware reads in server + api helpers

Every RPC and public-boundary reader now automatically strips _seed from
contract-mode canonical keys. Legacy bare-shape values pass through unchanged
(unwrapEnvelope no-ops on non-envelope shapes).

Changed helpers (one-place fix — unblocks ~60 call sites):
- server/_shared/redis.ts: getRawJson, getCachedJson, getCachedJsonBatch
  unwrap by default. cachedFetchJson inherits via getCachedJson.
- api/_upstash-json.js: readJsonFromUpstash unwraps (covers api/mcp.ts
  tool responses + all its canonical-key reads).
- api/bootstrap.js: getCachedJsonBatch unwraps (public-boundary —
  clients never see envelope metadata).

Left intentionally unchanged:
- api/health.js / api/seed-health.js: read only seed-meta:* keys which
  remain bare-shape during dual-write. unwrapEnvelope already imported at
  the meta-read boundary (PR 1) as a defensive no-op.

Tests: 67/67 seed tests pass. typecheck + typecheck:api clean.

This is the blast-radius fix the PR #3097 review called out — external
readers that would otherwise see {_seed, data} after the writer side
migrated.

* fix(test): strip export keyword in vm.runInContext'd seed source

cross-source-signals-regulatory.test.mjs loads scripts/seed-cross-source-signals.mjs
via vm.runInContext, which cannot parse ESM `export` syntax. PR 2a added
`export function declareRecords` to every seeder, which broke this test's
static-analysis approach.

Fix: strip the `export` keyword from the declareRecords line in the
preprocessed source string so the function body still evaluates as a plain
declaration.

Full test:data suite: 5307/5307 pass. typecheck + typecheck:api clean.

* feat(seed-contract): consumer-prices publish.ts writes envelopes

Wrap the 5 canonical keys written by consumer-prices-core/src/jobs/publish.ts
(overview, movers:7d/30d, freshness, categories:7d/30d/90d, retailer-spread,
basket-series) in {_seed, data} envelopes. Legacy seed-meta:<key> writes
preserved for dual-write.

Inlined a buildEnvelope helper (10 lines) rather than taking a cross-package
dependency — consumer-prices-core is a standalone npm package. Documented the
four-file parity contract (mjs source, ts mirror, js edge mirror, this copy).

Contract fields: sourceVersion='consumer-prices-core-publish-v1', schemaVersion=1,
state='OK' (recordCount>0) or 'OK_ZERO' (legitimate zero).

Typecheck: no new errors in publish.ts.

* fix(seed-contract): 3 more server-side readers unwrap envelopes

Found during final audit:

- server/worldmonitor/resilience/v1/_shared.ts: resilience score reader
  parsed cached GetResilienceScoreResponse raw. Contract-mode seed-resilience-scores
  now envelopes those keys.
- server/worldmonitor/resilience/v1/get-resilience-ranking.ts: p05/p95
  interval lookup parsed raw from seed-resilience-scores' extra-key path.
- server/worldmonitor/infrastructure/v1/_shared.ts: mgetJson() used for
  count-source keys (wildfire:fires:v1, news:insights:v1) which are both
  contract-mode now.

All three now unwrap via server/_shared/seed-envelope. Legacy shapes pass
through unchanged.

Typecheck clean.

* feat(seed-contract): ais-relay.cjs direct writes produce envelopes

32 canonical-key write sites in scripts/ais-relay.cjs now produce {_seed, data}
envelopes. Inlined buildEnvelope() (CJS module can't require ESM source) +
envelopeWrite(key, data, ttlSeconds, meta) wrapper. Enveloped keys span market
bootstrap, aviation, cyber-threats, theater-posture, weather-alerts, economic
spending/fred/worldbank, tech-events, corridor-risk, usni-fleet, shipping-stress,
social:reddit, wsb-tickers, pizzint, product-catalog, chokepoint transits,
ucdp-events, satellites, oref.

Left bare (not seeded data keys): seed-meta:* (dual-write legacy),
classifyCacheKey LLM cache, notam:prev-closed-state internal state,
wm:notif:scan-dedup flags.

Updated tests/ucdp-seed-resilience.test.mjs regex to accept both upstashSet
(pre-contract) and envelopeWrite (post-contract) call patterns.

* feat(seed-contract): 15 bundle files add canonicalKey for envelope gate

54 bundle sections across 12 files now declare canonicalKey alongside the
existing seedMetaKey. _bundle-runner.mjs (from PR 1) prefers canonicalKey
when both are present — gates section runs on envelope._seed.fetchedAt
read directly from the data key, eliminating the meta-outlives-data class
of bugs.

Files touched:
- climate (5), derived-signals (2), ecb-eu (3), energy-sources (6),
  health (2), imf-extended (4), macro (10), market-backup (9),
  portwatch (4), relay-backup (2), resilience-recovery (5), static-ref (2)

Skipped (14 sections, 3 whole bundles): multi-key writers, dynamic
templated keys (displacement year-scoped), or non-runSeed orchestrators
(regional brief cron, resilience-scores' 222-country publish, validation/
benchmark scripts). These continue to use seedMetaKey or their own gate.

seedMetaKey preserved everywhere — dual-write. _bundle-runner.mjs falls
back to legacy when canonicalKey is absent.

All 15 bundles pass node --check. test:data: 5307/5307. typecheck:all: clean.

* fix(seed-contract): 4 PR #3097 review P1s — transform/declareRecords mismatches + envelope leaks

Addresses both P1 findings and the extra-key seed-meta leak surfaced in review:

1. runSeed helper-level invariant: seed-meta:* keys NEVER envelope.
   scripts/_seed-utils.mjs exports shouldEnvelopeKey(key) — returns false for
   any key starting with 'seed-meta:'. Both atomicPublish (canonical) and
   writeExtraKey (extras) gate the envelope wrap through this helper. Fixes
   seed-iea-oil-stocks' ANALYSIS_META_EXTRA_KEY silently getting enveloped,
   which broke health.js parsing the value as bare {fetchedAt, recordCount}.
   Also defends against any future manual writeExtraKey(..., envelopeMeta)
   call that happens to target a seed-meta:* key.

2. seed-token-panels canonical + extras fixed.
   publishTransform returns data.defi (the defi panel itself, shape {tokens}).
   Old declareRecords counted data.defi.tokens + data.ai.tokens + data.other.tokens
   on the transformed payload → 0 → RETRY path → canonical market:defi-tokens:v1
   never wrote, and because runSeed returned before the extraKeys loop,
   market:ai-tokens:v1 + market:other-tokens:v1 stayed stale too.
   New: declareRecords counts data.tokens on the transformed shape. AI_KEY +
   OTHER_KEY extras reuse the same function (transforms return structurally
   identical panels). Added isMain guard so test imports don't fire runSeed.

3. api/product-catalog.js cached reader unwraps envelope.
   ais-relay.cjs now envelopes product-catalog:v2 via envelopeWrite(). The
   edge reader did raw JSON.parse(result) and returned {_seed, data} to
   clients, breaking the cached path. Fix: import unwrapEnvelope from
   ./_seed-envelope.js, apply after JSON.parse. One site — :238-241 is
   downstream of getFromCache(), so the single reader fix covers both.

4. Regression lock tests/seed-contract-transform-regressions.test.mjs (11 cases):
   - shouldEnvelopeKey invariant: seed-meta:* false, canonical true
   - Token-panels declareRecords works on transformed shape (canonical + both extras)
   - Explicit repro of pre-fix buggy signature returning 0 — guards against revert
   - resolveRecordCount accepts 0, rejects non-integer
   - Product-catalog envelope unwrap returns bare shape; legacy passes through

Verification:
- npm run test:data → 5318/5318 pass (was 5307 — 11 new regressions)
- npm run typecheck:all → clean
- node --check on every modified script

iea-oil-stocks canonical declareRecords was NOT broken (user confirmed during
review — buildIndex preserves .members); only its ANALYSIS_META_EXTRA_KEY
was affected, now covered generically by commit 1's helper invariant.

* fix(seed-contract): seed-token-panels validateFn also runs on post-transform shape

Review finding: fixing declareRecords wasn't sufficient — atomicPublish() runs
validateFn(publishData) on the transformed payload too. seed-token-panels'
validate() checked data.defi/.ai/.other on the transformed {tokens} shape,
returned false, and runSeed took the early skipped-write branch (before even
reaching the declareRecords RETRY logic). Net effect: same as before the
declareRecords fix — canonical + both extras stayed stale.

Fix: validate() now checks the canonical defi panel directly (Array.isArray
(data?.tokens) && has at least one t.price > 0). AI/OTHER panels are validated
implicitly by their own extraKey declareRecords on write.

Audited the other 9 seeders with publishTransform (bls-series, bis-extended,
bis-data, gdelt-intel, trade-flows, iea-oil-stocks, jodi-gas, sanctions-pressure,
forecasts): all validateFn's correctly target the post-transform shape. Only
token-panels regressed.

Added 4 regression tests (tests/seed-contract-transform-regressions.test.mjs):
- validate accepts transformed panel with priced tokens
- validate rejects all-zero-price tokens
- validate rejects empty/missing tokens
- Explicit pre-fix repro (buggy old signature fails on transformed shape)

Verification:
- npm run test:data → 5322/5322 pass (was 5318; +4 new)
- npm run typecheck:all → clean
- node --check clean

* feat(seed-contract): add /api/seed-contract-probe validation endpoint

Single machine-readable gate for 'is PR #3097 working in production'.
Replaces the curl/jq ritual with one authenticated edge call that returns
HTTP 200 ok:true or 503 + failing check list.

What it validates:
- 8 canonical keys have {_seed, data} envelopes with required data fields
  and minRecords floors (fsi-eu, zone-normals, 3 token panels + minRecords
  guard against token-panels RETRY regression, product-catalog, wildfire,
  earthquakes).
- 2 seed-meta:* keys remain BARE (shouldEnvelopeKey invariant; guards
  against iea-oil-stocks ANALYSIS_META_EXTRA_KEY-class regressions).
- /api/product-catalog + /api/bootstrap responses contain no '_seed' leak.

Auth: x-probe-secret header must match RELAY_SHARED_SECRET (reuses existing
Vercel↔Railway internal trust boundary).

Probe logic is exported (checkProbe, checkPublicBoundary, DEFAULT_PROBES) for
hermetic testing. tests/seed-contract-probe.test.mjs covers every branch:
envelope pass/fail on field/records/shape, bare pass/fail on shape/field,
missing/malformed JSON, Redis non-2xx, boundary seed-leak detection,
DEFAULT_PROBES sanity (seed-meta invariant present, token-panels minRecords
guard present).

Usage:
  curl -H "x-probe-secret: $RELAY_SHARED_SECRET" \
       https://api.worldmonitor.app/api/seed-contract-probe

PR 3 will extend the probe with a stricter mode that asserts seed-meta:*
keys are GONE (not just bare) once legacy dual-write is removed.

Verification:
- tests/seed-contract-probe.test.mjs → 15/15 pass
- npm run test:data → 5338/5338 (was 5322; +16 new incl. conformance)
- npm run typecheck:all → clean

* fix(seed-contract): tighten probe — minRecords on AI/OTHER + cache-path source header

Review P2 findings: the probe's stated guards were weaker than advertised.

1. market:ai-tokens:v1 + market:other-tokens:v1 probes claimed to guard the
   token-panels extra-key RETRY regression but only checked shape='envelope'
   + dataHas:['tokens']. If an extra-key declareRecords regressed to 0, both
   probes would still pass because checkProbe() only inspects _seed.recordCount
   when minRecords is set. Now both enforce minRecords: 1.

2. /api/product-catalog boundary check only asserted no '_seed' leak — which
   is also true for the static fallback path. A broken cached reader
   (getFromCache returning null or throwing) could serve fallback silently
   and still pass this probe. Now:
   - api/product-catalog.js emits X-Product-Catalog-Source: cache|dodo|fallback
     on the response (the json() helper gained an optional source param wired
     to each of the three branches).
   - checkPublicBoundary declaratively requires that header's value match
     'cache' for /api/product-catalog, so a fallback-serve fails the probe
     with reason 'source:fallback!=cache' or 'source:missing!=cache'.

Test updates (tests/seed-contract-probe.test.mjs):
- Boundary check reworked to use a BOUNDARY_CHECKS config with optional
  requireSourceHeader per endpoint.
- New cases: served-from-cache passes, served-from-fallback fails with source
  mismatch, missing header fails, seed-leak still takes precedence, bad
  status fails.
- Token-panels sanity test now asserts minRecords≥1 on all 3 panels.

Verification:
- tests/seed-contract-probe.test.mjs → 17/17 pass (was 15, +2 net)
- npm run test:data → 5340/5340
- npm run typecheck:all → clean
2026-04-15 09:16:27 +04:00
Elie Habib
224d6fa2e3 fix(consumer-prices): count risers+fallers in movers recordCount (#3098)
* fix(consumer-prices): count risers+fallers in movers recordCount

Health endpoint reported consumerPricesMovers as EMPTY_DATA whenever the
30d window had zero risers, because recordCount's `??` chain in publish.ts
picks only one sibling array. Bipolar payloads (risers[] + fallers[]) need
the sum; otherwise a valid all-fallers payload registers as 0 records and
trips false staleness alarms.

Fix both the authoritative publish job and the manual fallback seed script.

* fix(consumer-prices): floor movers recordCount at 1 + include essentialsSeries fallback

Addresses PR #3098 review:

1. All-flat markets (every sampled item unchanged) legitimately produce
   risers=[] AND fallers=[] from buildMoversSnapshot. Summing the two still
   yields 0 → health reports EMPTY_DATA for a valid snapshot. Floor at 1;
   advanceSeedMeta already gates writes on upstream freshness, so this
   can't mask an upstream-unavailable case.

2. Seed script's non-movers fallback was missing essentialsSeries, so
   basket-series payloads from the manual script reported recordCount=1
   instead of the series length. Align with publish.ts.

* fix(consumer-prices): force recordCount=0 for upstreamUnavailable placeholders

Addresses PR #3098 review: flooring movers at 1 in the manual fallback seeder
also floored the synthetic emptyMovers() placeholder (upstreamUnavailable=true)
the script writes when BASE_URL is unset or the upstream returns null. Since
writeExtraKeyWithMeta always persists seed-meta, that made a real outage read
green in api/health.js. Short-circuit upstreamUnavailable payloads to 0 so
the outage surfaces.
2026-04-15 09:06:24 +04:00
Elie Habib
dc10e47197 feat(seed-contract): PR 1 foundation — envelope + contract + conformance test (#3095)
* feat(seed-contract): PR 1 foundation — envelope helpers + contract validators + static conformance test

Adds the foundational pieces for the unified seed contract rollout described in
docs/plans/2026-04-14-002-fix-runseed-zero-record-lockout-plan.md. Behavior-
preserving by construction: legacy-shape Redis values unwrap as { _seed: null,
data: raw } and pass through every helper unchanged.

New files:
- scripts/_seed-envelope-source.mjs — single source of truth for unwrapEnvelope,
  stripSeedEnvelope, buildEnvelope.
- api/_seed-envelope.js — edge-safe mirror (AGENTS.md:80 forbids api/* importing
  from server/).
- server/_shared/seed-envelope.ts — TS mirror with SeedMeta, SeedEnvelope,
  UnwrapResult types.
- scripts/_seed-contract.mjs — SeedContractError + validateDescriptor (10
  required fields, 10 optional, unknown-field rejection) + resolveRecordCount
  (non-negative integer or throw).
- scripts/verify-seed-envelope-parity.mjs — diffs function bodies between the
  two JS copies; TS copy guarded by tsc.
- tests/seed-envelope.test.mjs — 14 tests for the three helpers (null,
  legacy-passthrough, stringified JSON, round-trip).
- tests/seed-contract.test.mjs — 25 tests for validateDescriptor/
  resolveRecordCount + a soft-warn conformance scan that STATICALLY parses
  scripts/seed-*.mjs (never dynamic import — several seeders process.exit() at
  module load). Currently logs 91 seeders awaiting declareRecords migration.

Wiring (minimal, behavior-preserving):
- api/health.js: imports unwrapEnvelope; routes readSeedMeta's parsed value
  through it. Legacy meta has no _seed wrapper → passes through unchanged.
- scripts/_bundle-runner.mjs: readSectionFreshness prefers envelope at
  section.canonicalKey when present, falls back to the existing
  seed-meta:<key> read via section.seedMetaKey (unchanged path today since no
  bundle defines canonicalKey yet).

No seeder modified. No writes changed. All 5279 existing data tests still
green; both typechecks clean; parity verifier green; 39 new tests pass.

PR 2 will migrate seeders, bundles, and readers to envelope semantics. PR 3
removes the legacy path and hard-fails the conformance test.

* fix(seed-contract): address PR #3095 review — metaTtlSeconds opt, bundle fallback, strict conformance mode

Review findings applied:

P1 — metaTtlSeconds missing from OPTIONAL_FIELDS whitelist.
scripts/seed-jodi-gas.mjs:250 passes metaTtlSeconds to runSeed(); field is
consumed by _seed-utils writeSeedMeta. Without it in the whitelist, PR 2's
validateDescriptor wiring would throw 'unknown field' the moment jodi-gas
migrates. Added with a 'removed in PR 3' note.

P2 — Bundle canonicalKey short-circuit over-runs during migration.
readSectionFreshness previously returned null if canonicalKey had no envelope
yet, even when a legacy seed-meta key was also declared — making every cron
re-run the section. Fixed to fall through to seedMetaKey on null envelope so
the transition state is safe.

P3 — Conformance soft-warn signal was invisible in CI.
tests/seed-contract.test.mjs now emits a t.diagnostic summary line
('N/M seeders export declareRecords') visible on every run and gates hard-fail
behind SEED_CONTRACT_STRICT=1 so PR 3 can flip to strict without more code.

Nitpick — parity regex missed 'export async function'.
Added '(?:async\s+)?' to scripts/verify-seed-envelope-parity.mjs function
extraction regex.

Verified: 39 tests green, parity verifier green, strict mode correctly
hard-fails with 91 seeders missing (expected during PR 1).

* fix(seed-contract): address review round 2 — NaN/empty-string validation, Error cause, parity CI wiring

P2 — Non-finite ttlSeconds/maxStaleMin bypassed validation.
`typeof NaN === 'number'` and `NaN > 0 === false` meant a NaN duration
passed the old typeof+<=0 checks and would have poisoned TTLs once
validateDescriptor is wired into runSeed. Now gated by Number.isFinite,
which rejects NaN and ±Infinity. Tests added for NaN/Infinity on both
fields.

P2 — Empty/whitespace-only strings for domain/resource/canonicalKey/sourceVersion
bypassed validation. Added .trim() === '' rejection + tests per field.
This mattered because canonicalKey='' would have landed writes at the
empty key and seed-meta under a blank resource namespace.

P3 — SeedContractError silently dropped the Error v2 cause option.
Constructor now forwards { cause } through super() so err.cause works
with standard tooling (Node's default stack printer, Sentry chained-cause
serialization). resolveRecordCount's manual err.cause = err assignment
was replaced with the options-bag form. Test added for both constructor
direct-use and the resolveRecordCount wrap path.

P3 — Parity verifier was not on an automated path. Added
tests/seed-envelope-parity.test.mjs which spawns scripts/verify-seed-envelope-parity.mjs
via execFile; non-zero exit (drift) → test fails. Now runs as part of
`npm run test:data` (tsx --test tests/*.test.mjs). Drift injection
confirmed: sed -i modifying api/_seed-envelope.js makes the test fail
with 'Command failed' from execFile.

51 tests total (was 39). All green on clean tree.

* fix(seed-contract): conformance test checks full descriptor, not just declareRecords

Previous conformance check green-lit any seeder that exported
declareRecords, even if the runSeed(...) call-site omitted other
validateDescriptor-required opts (validateFn, ttlSeconds, sourceVersion,
schemaVersion, maxStaleMin). That would have produced a false readiness
signal for PR 3's strict flip: test goes green, but wiring
validateDescriptor() into runSeed in PR 2 would still throw at runtime
across the fleet.

Examples verified on the PR head:
- scripts/seed-cot.mjs:188-192 — no sourceVersion/schemaVersion/maxStaleMin
- scripts/seed-market-breadth.mjs:121-124 — same
- scripts/seed-jodi-gas.mjs:248-253 — no schemaVersion/maxStaleMin

Now the conformance test:
1. AST-lite extracts the runSeed(...) call site with balanced parens,
   tolerating strings and comments.
2. Checks every REQUIRED_OPTS_FIELDS entry (validateFn, declareRecords,
   ttlSeconds, sourceVersion, schemaVersion, maxStaleMin) is present as
   an object key in that call-site.
3. Emits a per-file diagnostic listing missing fields.
4. Migration signal is now accurate: 0/91 seeders fully satisfy the
   descriptor (was claiming 0/91 missing just declareRecords). Matches
   the underlying validateDescriptor behavior.

Verified: strict mode (SEED_CONTRACT_STRICT=1) surfaces 'opt:schemaVersion,
opt:maxStaleMin' as missing fields per seeder — actionable for PR 2
migration work. 51 tests total (unchanged count; behavior change is in
which seeders the one conformance test considers migrated).

* fix(seed-contract): strip comments/strings before parsing runSeed() call site

The conformance scanner located the first 'runSeed(' substring in the raw
source, which caught commented-out mentions upstream of the real call.
Offending files where this produced false 'incomplete' diagnoses:
- scripts/seed-bis-data.mjs:209 // runSeed() calls process.exit(0)…
  real call at :220
- scripts/seed-economy.mjs:788 header comment mentioning runSeed()
  real call at :891

Three files had the same pattern. Under strict mode these would have been
false hard failures in PR 3 even when the real descriptor was migrated.

Fix:
- stripCommentsAndStrings(src) produces a view where block comments, line
  comments, and string/template literals are replaced with spaces (line
  feeds preserved). Indices stay aligned with the original source so
  extractRunSeedCall can match against the stripped view and then slice
  the original source for the real call body.
- descriptorFieldsPresent() also runs its field-presence regex against
  the stripped call body so '// TODO: validateFn' inside the call doesn't
  fool the check.
- hasRunSeedCall() uses the stripped view too, which correctly excludes
  5 seeders that only mentioned runSeed in comments. Count dropped
  91→86 real callers.

Added 4 targeted tests covering:
- runSeed() inside a line comment ahead of the real call
- runSeed() inside a block comment
- runSeed() inside a string literal ("don't call runSeed() directly")
- descriptor field names inside an inline comment don't count as present

Verified on the actual files: seed-bis-data.mjs first real runSeed( in
stripped source is at line 220 (was line 209 before fix).

40 tests total, all green.

* fix(seed-contract): parity verifier survives unbalanced braces in string/template literals

Addresses Greptile P2 on PR #3095: the body extractor in
scripts/verify-seed-envelope-parity.mjs counted raw { and } on every
character. A future helper body that legitimately contains
`const marker = '{'` would have pushed depth past zero at the literal
brace and truncated the body — silently masking drift in the rest of
the function.

Extracted the scan into scanBalanced(source, start, open, close) which
skips characters inside line comments, block comments, and string /
template literals (with escape handling and template-literal ${} recursion
for interpolation). Call sites in extractFunctions updated to use the new
scanner for both the arg-list parens and the function body braces.

Made extractFunctions and scanBalanced exported so the new test file
can exercise them directly. Gated main() behind an isMain check so
importing the module from tests doesn't trigger process.exit.

New tests in tests/seed-envelope-parity.test.mjs:
- extractFunctions tolerates unbalanced braces in string literals
- same for template literals
- same for braces inside block comments
- same for braces inside line comments
- scanBalanced respects backslash-escapes inside strings
- scanBalanced recurses into template-literal ${} interpolation

Also addresses the other two Greptile P2s which were already fixed in
earlier commits on this branch:
- Empty-string gap (99646dd9a): .trim()==='' rejection added
- SeedContractError cause drop (99646dd9a): constructor forwards cause
  through super's options bag per Error v2 spec

61 tests green. Both typechecks clean.
2026-04-14 22:11:56 +04:00
Sebastien Melki
e39ffc3c3b feat(analytics): send sub status & planKey with Umami identity (#3093)
* feat(analytics): send subscription status and planKey with Umami identity (#3092)

Enhance identifyUser() to include subStatus and planKey from billing
data. initAuthAnalytics() now subscribes to both auth and billing
changes, re-identifying when subscription data arrives via Convex.
Also identify all authenticated users, not just pro.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: store billing unsubscribe + add destroyAuthAnalytics teardown

Address Greptile P2 review: store the billing unsubscribe for symmetry
with _unsubAuth, and add destroyAuthAnalytics() so both listeners can
be properly cleaned up.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(analytics): reset _lastSub on sign-out to prevent cross-user leak

When user A signs out and user B signs in, the auth callback fires
first with user B's id while _lastSub still holds A's subscription.
identifyUser would then send (userB.id, userB.role, userA.status,
userA.planKey) to Umami until billing's Convex update landed.

Clear _lastSub in _syncIdentity's signed-out branch so the next
sign-in starts clean — subStatus/planKey are simply omitted (both
guards use != null) until billing delivers the new sub.

* fix(analytics): reset _lastSub on direct user switch, not just sign-out

App.ts:857-864 supports a direct user A -> user B account switch without
an intermediate null auth state. The previous fix only cleared _lastSub
inside _syncIdentity()'s sign-out branch, so on a direct switch the auth
subscriber fired with user B while _lastSub still held user A's
subscription, leaking A's status/planKey into B's Umami identify call.

Detect user-id change in the auth subscriber and drop _lastSub before
calling _syncIdentity(). Sign-out reset inside _syncIdentity is kept as
belt-and-suspenders for the no-auth path.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Elie Habib <elie.habib@gmail.com>
2026-04-14 21:33:12 +04:00
Elie Habib
9b180d6ee2 fix(bundle-runner): wall-time budget to prevent Railway 10min SIGKILL (#3094)
* fix(bundle-runner): enforce wall-time budget to prevent Railway 10min SIGKILL

Railway cron services SIGKILL the container at 10min. When a bundle
happened to have two heavy sections due in the same tick (e.g.
PW-Main + PW-Port-Activity with timeoutMs totaling 15min+), the second
section's stdout never flushed and Railway marked the run as crashed —
even though earlier sections published successfully.

- _bundle-runner.mjs: add maxBundleMs budget (default 9min, 60s headroom
  under Railway's 10min ceiling). Sections whose worst-case timeout would
  exceed the remaining budget are deferred to the next tick with a clear
  log line. Summary now reports ran/skipped/deferred/failed.
- seed-bundle-portwatch.mjs: lower PW-Port-Activity timeoutMs 600s→420s
  so a single section can no longer consume the entire budget.

Observed on 2026-04-14 16:03 UTC portwatch run: PW-Disruptions +
PW-Main ran cleanly, PW-Port-Activity started with ~9m37s of Railway
budget and its 10min execFile timeout, got SIGKILL'd before any output
flushed, job marked as crash.

* fix(bundle-runner): make maxBundleMs opt-in to avoid deferring other bundles

Greptile PR review flagged P1: default maxBundleMs=540_000 silently
applied to all runBundle callers. At least 12 sections across 7 other
bundles (energy-sources, climate, resilience, resilience-validation,
imf-extended, static-ref, health) have timeoutMs >= 540_000, which
means 0 + 600_000 > 540_000 is true on every first tick — those
sections would be permanently deferred with no alarm.

Default to Infinity; portwatch opts in via { maxBundleMs: 540_000 }.
Other Railway-constrained bundles can opt in as their timeouts are
audited.
2026-04-14 21:08:40 +04:00
Elie Habib
51c9d2c95f fix(seed-forecasts): pipeline timeout 10s→45s + BATCH_SIZE 10→5 (#3090)
* fix(seed-forecasts): pipeline timeout 10s→45s + BATCH_SIZE 10→5

Root cause (validated via STRLEN probe + Railway log): readInputKeys()
batched GETs against Upstash REST /pipeline deterministically timed out
at the 10s budget. ~40 input keys totaling ~2.27 MB; top 5 keys (ucdp
657KB + chokepoints 500KB + cyber-threats 390KB + commodities 192KB +
gpsjam 174KB) = 90% of payload. Worst-case co-located batch at
BATCH_SIZE=10 was ~1.55 MB; at Upstash REST observed slow-spike floor
(~100 KB/s implied by failure pattern), 1.55 MB needs ~16s, exceeding
the 10s budget.

Production proof — Railway log 2026-04-14 10:01 UTC:
  Reading input data from Redis...
  Retry 1/2 in 1000ms: The operation was aborted due to timeout
  ... 12 consecutive abort-timeouts (4 outer × ~3 inner) ...
  FETCH FAILED: The operation was aborted due to timeout
  === Failed gracefully (188070ms) ===

Fix:
  BATCH_SIZE 10 → 5    (reduces probability of tail co-location)
  timeout    10s → 45s (2.4× headroom at observed floor)

Round-trips 4 → 8; per-batch overhead ~150-500ms total amortized by
undici keep-alive. Negligible vs hourly cadence.

What this PR does NOT do (5-agent deepen-plan review caught these):
  - Does NOT remove input keys. Initial draft proposed dropping 3
    "stub" keys. All 3 are LIVE: producers traced to seed-insights,
    seed-conflict-intel, and seed-forecasts itself (L14919 — self-
    referential EMA windows state key). Zero-byte STRLEN snapshot
    caught inter-cycle gaps, not dead keys. Removing reads would
    break newsDigest, acledEvents, and EMA windows.
  - Does NOT bump api/health.js maxStaleMin. Right fix = make read
    succeed, not widen alarm.
  - Does NOT extract shared batchedPipelineGet helper. Tracked.

Latent sibling bugs (separate PRs per feedback_no_pr_pollution):
  - seed-cross-source-signals.mjs:163 (15s, 23 keys)
  - seed-correlation.mjs:26 (10s, 9 market keys)
  - seed-energy-spine.mjs:71 (30s, 300 cmds/batch)
  - seed-resilience-scores.mjs:73 (30s, BATCH=50 writes)

Plan: docs/plans/2026-04-14-001-fix-seed-forecasts-pipeline-timeout-plan.md
Skill: ~/.claude/skills/upstash-pipeline-payload-timeout/SKILL.md

Tests: node --check + typecheck + typecheck:api clean.

* docs(seed-forecasts): drop personal-machine path from comment

Architecture-strategist review on PR #3090 caught the same anti-pattern
flagged on PR #3088: the inline comment referenced ~/.claude/skills/...
which only resolves on the author's machine. Replaced with a self-contained
"diagnostic methodology" paragraph so the rationale is portable and
contributors on CI / other machines see complete context.

No code change.

* docs(seed-forecasts): correct comment math + add reorder follow-up note (Greptile P2)

Greptile review on PR #3090 caught: BATCH_SIZE divides the keys array
deterministically by index, so the worst batch is FIXED by array order
(not random co-location as my comment implied). Verified live with
STRLEN: batch 2 (indices 5-9 = chokepoints + iran + ucdp + unrest +
outages) is 1.17 MB, not the 1.9 MB worst-random-case I claimed.

Updated comment to reflect:
  - Actual deterministic worst-case batch (1.17 MB).
  - Headroom recalc: 1.17 MB at 100 KB/s = ~12s; 45s gives 3.7× margin.
  - Architectural insight as follow-up: interleaving heavies (chokepoints
    + ucdp) with smalls in the keys array would split the deterministic
    worst-case across two batches, halving per-request payload. Tracked
    for a future PR (no PR pollution).

No code change; comment-only correction.
2026-04-14 15:27:40 +04:00
Elie Habib
5152ed7a06 fix(spr-policies): wire seed-spr-policies into seed-bundle-energy-sources (#3089)
scripts/seed-spr-policies.mjs exists as a runnable seeder with proper isMain
guard, but no Railway service ever invokes it. /api/health shows
sprPolicies as EMPTY with seedAgeMin: null — energy:spr-policies:v1 has
literally never been written to Redis.

Wired it as a bundle entry alongside the other energy seeders. Cadence:
weekly. Static-registry data (scripts/data/spr-policies.json) only needs to
run once after deploys + restarts to populate the key; the 400d
maxStaleMin in api/health.js confirms intent. 60s timeout is generous
for a JSON-file read + Redis write.

Tests: node --check on the bundle clean; npm run typecheck clean.
2026-04-14 13:29:35 +04:00
Elie Habib
6fdd9d8440 fix(commodity-quotes): move .then() block to opts.afterPublish — resurrect 3 dead Redis writes (#3088)
* fix(commodity-quotes): move .then() block to opts.afterPublish — 3 dead Redis writes resurrected

runSeed() ends with process.exit(0) on success, terminating the Node
process before any chained .then() microtask runs. The block at
seed-commodity-quotes.mjs:258-287 has been silently dead — its three
Redis writes never executed:

  - market:commodities:v1:<symbols>          (alias for canonical key)
  - market:quotes:v1:<symbols>               (with finnhubSkipped flags)
  - market:gold-extended:v1                  (cross-currency XAU + drivers)

Production proof from Railway log 2026-04-14 08:50:31:
  === market:commodities Seed ===
  [Yahoo] ^VIX: $18.64 (-2.51%)
  ... 33 Yahoo symbols ...
  Verified: data present in Redis
  === Done (8691ms) ===
  Starting Container          ← next cron; ZERO [Gold] log lines

Health endpoint shows goldExtended as EMPTY with seedAgeMin: null — that
key has literally never been written.

Fix:
- Extract post-publish writes into writeCompanionKeys(data).
- Wire it via opts.afterPublish, which IS awaited inside runSeed BEFORE
  process.exit. The companion writes will now actually run.
- Wrap the call in try/catch so companion-key failures log explicitly
  rather than masking the canonical write success.
- Remove the now-redundant module-level seedData variable; afterPublish
  receives the canonical data as its first argument.

Effect: next cron cycle (~5 min) writes all 3 keys for the first time.
goldExtended health flips from EMPTY (seedAgeMin: null) to OK.

Tests: node --check syntax clean; npm run typecheck clean. The fix is
structural — verified by the runSeed contract in _seed-utils.mjs ~L792-794
which awaits afterPublish before process.exit.

* fix(commodity-quotes): split required vs optional companion writes (Codex P1 #3088)

Codex P1 review caught: the catch-all wrapper around writeCompanionKeys
masked Upstash failures on the REQUIRED alias keys (market:commodities:v1
and market:quotes:v1). A transient Upstash 5xx mid-write would log a
warning and return — runSeed would then write canonical seed-meta as fresh
and exit 0, leaving the alias keys stale or missing with no health signal.
That defeats the entire point of this PR: the canonical key would look
healthy while the resurrected companions silently failed again.

Fix: split into two functions with different error semantics.

writeRequiredCompanionKeys(data):
  - Both alias-key writes propagate errors. If either Upstash write fails,
    the exception bubbles to runSeed's outer try/catch, lock is released,
    seed-meta is NOT stamped fresh, and the outer .catch fires
    process.exit(1). Health correctly flags STALE_SEED on next /api/health
    poll.

writeOptionalGoldExtended():
  - Yahoo XAU fetch + writeExtraKeyWithMeta wrapped in its own try/catch.
    Yahoo flakiness only degrades the gold-extended panel (which has its
    own seed-meta key that goes stale independently); the canonical
    commodity publish stays healthy.

Per Codex's own suggestion: only the OPTIONAL gold-extended branch is
downgraded to a warning. Required writes are not.

Tests: node --check + typecheck clean.

* fix(commodity-quotes): address Greptile P2 — parallelize alias writes, inline JSDoc

Two P2 findings on PR #3088:

1. Sequential alias-key writes — independent Redis writes were awaited one
   after the other. Wrapped in Promise.all to halve latency on every seed
   cycle. No read-after-write ordering required.

2. Local-machine path in JSDoc (~/.claude/skills/...) was unresolvable for
   other contributors. Inlined the key facts (background, error semantics,
   parallelization rationale) directly into the JSDoc so the context
   survives on any workstation or CI runner. Also dropped a stale
   duplicate JSDoc block that was left over from the prior refactor.

Tests: node --check + typecheck clean.
2026-04-14 13:29:05 +04:00
Elie Habib
56103684c6 fix(seed-utils): payloadBytes>0 fallback for runSeed recordCount auto-detect (#3087)
* fix(seed-utils): payloadBytes>0 fallback for runSeed recordCount auto-detect

Phantom EMPTY_DATA in /api/health: 16 of 21 failing health checks were
caused by seeders publishing custom payload shapes without passing
opts.recordCount. The auto-detect chain in runSeed only matches a hardcoded
list of shapes; anything else falls through to recordCount=0 and triggers
EMPTY_DATA in /api/health even though the payload is fully populated and
verified in Redis.

Smoking-gun log signature from Railway 2026-04-14:
  [BLS-Series] recordCount:0, payloadBytes:6093, Verified: data present
  [VPD-Tracker] recordCount:0, payloadBytes:3068853, Verified: data present
  [Disease-Outbreaks] recordCount:0, payloadBytes:92684, Verified: data present

Fix:
- Extract recordCount logic into pure exported computeRecordCount() for
  unit testability.
- Add payloadBytes>0 → 1 fallback at the end of the resolution chain. When
  triggered, console.warn names the seeder so the author can add an
  explicit opts.recordCount for accurate dashboards.
- Resolution order unchanged for existing callers: opts.recordCount wins,
  then known-shape auto-detect, then the new payloadBytes fallback, then 0.
  Explicit opts.recordCount=0 still wins (test covers it).

Effect: clears 16 phantom CRITs on the next bundle cycle. Per-seeder warns
will surface in logs so we can add accurate opts.recordCount in follow-up.

Tests: 11 new computeRecordCount cases (opts precedence, auto-detect shapes,
fallback behavior, no-spurious-warn, explicit-zero precedence).
seed-utils.test.mjs 18/18 + seed-utils-empty-data-failure.test.mjs 2/2 +
typecheck clean.

* test(seed-utils): address Greptile P2 — replace it.each mutation, add empty-known-shape edge case

Greptile review on PR #3087 caught two minor test issues:

1. `it.each = undefined` mutated the imported `it` function (ES module
   live binding). Replaced with a plain comment.

2. Missing edge case: `data: { events: [] }` with payloadBytes > 0 should
   NOT trigger the payloadBytes fallback because detectedFromShape resolves
   to a real 0 (not undefined). Without this guard, a future regression
   could collapse the !=null check and silently mask genuine empty
   upstream cycles as "1 record". Test added.

Tests: 19/19 (was 18). No production code change.
2026-04-14 13:28:00 +04:00
Sebastien Melki
3314db2664 fix(relay): treat quiet hours start === end as disabled, not 24/7 (#3061) (#3066)
* fix(relay): treat quiet hours start === end as disabled, not 24/7 (#3061)

When quietHoursStart equalled quietHoursEnd, the midnight-spanning
branch evaluated `hour >= N || hour < N` which is true for all hours,
silently suppressing all non-critical alerts permanently.

Add an early return for start === end in the relay and reject the
combination in Convex validation.

Closes #3061

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: cross-check quiet hours start/end against persisted value on single-field updates

Addresses Greptile review: validateQuietHoursArgs only caught start===end
when both arrived in the same call. Now the mutation handlers also check
against the DB record to prevent sequential single-field updates from
creating a start===end state.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: gate quiet hours start===end check on effectiveEnabled

Only enforce the start !== end invariant when quiet hours are effectively
enabled. This allows users with legacy start===end records to disable
quiet hours, change timezone/override, or recover from old bad state
without getting locked out.

Addresses koala73's P1 review feedback on #3066.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(relay): extract quiet-hours + consolidate equality check, add tests

- Move isInQuietHours/toLocalHour to scripts/lib/quiet-hours.cjs so they
  are testable without importing the full relay (which has top-level
  side effects and env requirements).
- Drop the unconditional start===end check from validateQuietHoursArgs;
  the effectiveEnabled-guarded check in setQuietHours /
  setQuietHoursForUser is now the single source of truth. Previously a
  user disabling quiet hours with start===end would be rejected even
  though the values are irrelevant when disabled.
- Add tests/quiet-hours.test.mjs covering: disabled, start===end
  regression (#3061), midnight-spanning window, same-day window,
  inclusive/exclusive bounds, invalid timezone, timezone handling,
  defaults.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Elie Habib <elie.habib@gmail.com>
2026-04-14 13:14:53 +04:00
Elie Habib
29d39462e1 fix(crypto-quotes): use CoinPaprika as primary, CoinGecko as fallback (#3086)
* fix(crypto-quotes): use CoinPaprika as primary, CoinGecko as fallback

Railway bundle log 2026-04-14 07:17:10 UTC showed seed-bundle-market-backup
finishing with failed:1. Crypto-Quotes hit CoinGecko 429s on every retry:

  [Crypto-Quotes] CoinGecko 429 — waiting 10s (1/5)
  ... (5 attempts, 10/20/30/40/50s back-off)
  [Crypto-Quotes] Crypto-Quotes failed after 120.0s: timeout

Root cause: CoinGecko 5-step retry budget (10+20+30+40+50 = 150s) exceeds
the bundle 120s section timeout, so the existing CoinPaprika fallback
never runs — the child process is killed mid-retry.

Fix: swap source order. CoinPaprika is now primary; CoinGecko is retained
as fallback for sparkline_in_7d data (CoinPaprika does not provide
sparklines). Probed CoinPaprika live: all 10 mapped crypto IDs present in
/v1/tickers, no auth required.

Trade-off: when CoinPaprika is healthy, sparkline arrays will be empty.
Acceptable — the panel already handles undefined sparklines, and the
alternative (no quotes at all because CoinGecko is rate-limited) is worse.

Tests: crypto-config.test.mjs 6/6, typecheck + typecheck:api clean.

* fix(crypto-quotes): require full coverage in validate() — no partial snapshots

Codex review on PR #3086 caught: validate() only required >=1 quote with
positive price. With the new CoinPaprika-primary path, a single dropped or
renamed ticker would silently publish a 9/10 snapshot. Health stays green
while one tracked asset disappears from the panel — exactly the silent
data-loss class we want to avoid on a fixed-cardinality top-10 feed.

Tightened validate() to require:
- quotes.length === CRYPTO_IDS.length (full cardinality)
- every quote has Number.isFinite(price) && price > 0
- every configured symbol is present in the response (defends against
  duplicate IDs masquerading as full coverage)

When the validator rejects, runSeed() takes the skipped path: existing
TTL is extended, seed-meta is bumped with count=0, and the Railway log
will scream which symbol is missing on the next cycle so the broken
CoinPaprika mapping is caught immediately.

Tests: crypto-config.test.mjs 6/6, typecheck clean.

* fix(crypto-quotes): address Greptile P2 — fallback retry budget + sourceVersion

P2-1: CoinGecko fallback was still wired with maxAttempts=5 (10+20+30+40+50
= 150s budget), so when CoinPaprika fails the fallback path could itself
overrun the bundle's 120s section timeout — recreating the exact failure
mode this PR fixes. Capped at maxAttempts=2 (10+20=30s) so the fallback
always finishes well within the bundle window.

P2-2: sourceVersion in seed-meta was still 'coingecko-markets' even though
CoinPaprika is now primary. Changed to 'coinpaprika-tickers+coingecko-fallback'
so health dashboards and on-call runbooks see the real data path.

Tests: crypto-config.test.mjs 6/6, typecheck clean.
2026-04-14 12:43:11 +04:00
Elie Habib
b92b22bd20 fix(fuel-prices): tolerate Brazil ANP failure to stop Railway crash-loop (#3085)
* fix(fuel-prices): tolerate Brazil ANP failure to stop Railway crash-loop

Brazil gov.br is structurally unreachable from Railway IPs:
- Decodo proxy 403s all .gov.br CONNECTs by policy
- Direct fetch fails undici TLS handshake from Railway egress

After PR #3082 tightened the publish gate to require zero failed sources,
every run exits 1 -> Railway "Deployment crashed" banner + STALE_SEED.

Add TOLERATED_FAILURES = {'Brazil'}; validateFuel ignores tolerated names
when checking failedSources. Critical regions (US/GB/MY) and the >=30
country floor still gate publish. Brazil's outage stays visible via the
existing [FRESHNESS] log.

* fix(fuel-prices): rotate :prev on tolerated-only failures to keep WoW fresh

Reviewer catch: after tolerating Brazil, allSourcesFresh stays false forever
→ :prev never rotates → panel's WoW stretches into 2-week, 3-week, ... deltas
for every non-Brazil country while still labeled 'week-over-week'.

Gate :prev rotation on untolerated failures only. Tolerated sources are
absent from the snapshot entirely, so rotating is safe (no stale-self-
compare poisoning next week).

* fix(fuel-prices): distinguish tolerated vs untolerated sources in [DEGRADED] log

Greptile P2: the [DEGRADED] message said 'publish will be rejected' even
when only tolerated sources (Brazil) failed — confusing for operators
watching Railway logs.
2026-04-14 12:36:07 +04:00
Elie Habib
16d868bd6d fix(comtrade): retry on transient 5xx to stop silent reporter drops (#3084)
* fix(comtrade): retry on transient 5xx to stop silent reporter drops

Railway log 2026-04-14 bilateral-hs4 run: India (699) hit HTTP 503 on both
batches and was dropped entirely from the snapshot. Iran (364) hit 500
mid-batch. All three Comtrade seeders (bilateral-hs4, trade-flows,
recovery-import-hhi) retried only on 429; any 5xx = silent coverage gap.

Adds bounded 5xx retry (3 attempts, 5s then 15s backoff) in each seeder.
On giveup caller returns empty so resume cache picks the reporter up next
cycle. Exports isTransientComtrade + fetchBilateral for unit tests; 6 new
tests pin the contract.

* fix(comtrade): collapse 429+5xx into single classification loop (PR review)

Reviewer caught that the 429 branch bypassed the 5xx retry path: a 429 ->
503 sequence would return [] immediately after the 429-retry without
consuming any transient-5xx retries, leaving the silent-drop bug intact
for that specific sequence.

Both seeders now use a single while-loop that reclassifies each response:
- 429 (once, with full backoff)
- 5xx (up to 2 retries with 5s/15s or 5s/10s backoff)
- anything else -> break and return

Two new tests lock in the mixed case: 429 then 503 still consumes
transient retries; consecutive 429s cap at one wait. 8/8 pass.

* test(comtrade): inject sleep to drop retry-test runtime from 185s to 277ms

PR review flagged that the new mixed 429+5xx tests slept for the full
production backoffs (60s + 5s + 15s = 80s per test), making the unit
suite unnecessarily slow and CI-timeout-prone.

Add a module-local _retrySleep binding with __setSleepForTests(fn)
export. Production keeps the real sleep; tests swap in a no-op that
records requested delays. The sleepCalls array now pins the production
cadence so a future refactor that changes [60_000, 5_000, 15_000] has
to update the test too.

8/8 pass in 277ms (down from 185s).

* test(comtrade): update 60s-on-429 static-analysis regex for _retrySleep alias

The existing substring check 'sleep(60_000)' broke after the previous
commit renamed production calls to _retrySleep(60_000) for test injection.
Widen the regex to accept either the bare or injectable form; both
preserve the 60s production cadence.

* test(comtrade): extend retry coverage to trade-flows + recovery-import-hhi

Three P2 review findings addressed:

1. partnerCode 000 in the succeeds-on-third test was changed to real code
   156. groupByProduct() filters 0/000 downstream, so the test was passing
   while the user-visible seeder would still drop the row.
2. trade-flows and recovery-import-hhi had no unit coverage for their new
   retry state machines. Adds 7 tests covering succeed-first, retry-then-
   succeed, giveup-after-3, and mixed 429+5xx classification.
3. Both seeders now expose __setSleepForTests + export their fetch helper.
   seed-trade-flows also gets an isMain guard so tests import without
   triggering a real seed run. sleepCalls asserts pin the production
   cadence.

15 retry tests pass in 183ms. Full suite 5198/5198.

* fix(trade-flows): per-reporter coverage gate blocks full-reporter flatline

PR #3084 review P1: the existing MIN_COVERAGE_RATIO=0.70 gate was
global-only. 6 reporters × 5 commodities = 30 pairs; losing one entire
reporter (e.g. the India/Taiwan silent-drop this PR is trying to stop)
is only 5/30 missing = 83% global coverage, which passed.

Adds a per-reporter coverage floor: each reporter must have ≥40% of
its commodities populated (2 of 5). Global gate kept as the broad-
outage catch; per-reporter gate catches the single-reporter flatline.

Extracts checkCoverage() as a pure function for unit testing — mocking
30+ fetches in fetchAllFlows is fragile, and the failure mode lives in
the gate, not the fetcher.

6 new tests cover: 30/30 ok; India flatline → reject at 83% global;
Taiwan flatline; broad outage → reject via global gate; healthy
80% global with 4/5 per-reporter → ok; per-reporter breakdown shape.

5204/5204 tests pass.
2026-04-14 12:29:17 +04:00
Elie Habib
7e7ca70faf fix(fuel-prices): resilient seeder — proxy, retry, stale-carry-forward, strict gate (#3082)
* fix(fuel-prices): resilient seeder — proxy, retry, stale-carry-forward, strict gate

Addresses 2026-04-07 run where 4 of 7 sources failed (NZ 403, BR/MX fetch failed)
and the seeder silently published 30 countries with Brazil/Mexico/NZ vanishing
from the UI.

- Startup proxy diagnostic so PROXY_URL misconfigs are immediately visible.
- New fetchWithProxyPreferred (proxy-first, direct fallback) + withFuelRetry
  (3 attempts, backoff) wrapping NZ/BR/MX upstream calls.
- Swap MX from dead datos.gob.mx to CRE publicacionexterna XML (13k stations).
- Stale-carry-forward failed sources from :prev snapshot (stale: true) instead
  of dropping countries; fresh-only ranking; skip WoW for stale entries.
- Gate :prev rotation on all-sources-succeeded so partial runs don't poison
  next week's WoW.
- Strict validateFn: >=25 countries AND US+GB+MY fresh. Prior gate was >=1.
- emptyDataIsFailure: true so validation fail doesnt refresh seed-meta.
- Wrap imperative body in main() + isMain guard; export parseCREStationPrices
  and validateFuel; 9 new unit tests.

* fix(fuel-prices): remove stale-carry-forward, harden validator (PR review)

Reviewer flagged two P1s on the prior commit:

1. stale-carry-forward inserted stale: true rows into the published payload,
   but the proto schema and panel have no staleness render path. Users would
   see week-old BR/MX/NZ prices as current. Resilience turned into a
   freshness bug.
2. Validator counted stale-carried entries toward the floor. US/GB/MY fresh
   + 22 stale still passed, refreshing seed-meta.fetchedAt and leaving health
   operationally healthy indefinitely. Hid the outage.

Fix: remove stale-carry-forward entirely. Tighten validator to require
countries.length >= 30, US+GB+MY present, and failedSources.length === 0.
Partial-failure runs now rejected → 10-day cache TTL serves last healthy
snapshot → health STALE_SEED after maxStaleMin. Correct, visible signal.

Drops dead code: SOURCE_COUNTRY_CODES, staleCarried/freshCountries, stale
WoW skip. Tests updated for the failedSources gate.
2026-04-14 09:16:23 +04:00
Elie Habib
45c98284da fix(trade): correct UN Comtrade reporter codes for India and Taiwan (#3081)
* fix(trade): correct UN Comtrade reporter codes for India and Taiwan

seed-trade-flows was fetching India (356) and Taiwan (158) using UN M49
codes. UN Comtrade registers India as 699 and Taiwan as 490 ("Other
Asia, nes"), so every fetch silently returned count:0 — 10 of 30
reporter×commodity pairs yielded zero records per run. Live probe
confirms 699→500 India rows, 490→159 Taiwan rows.

- Update reporter codes in seed-trade-flows.mjs and its consumer
  list-comtrade-flows.ts.
- Update ISO2_TO_COMTRADE in _comtrade-reporters.ts and
  seed-energy-spine.mjs so energy-shock and sector-dependency RPCs
  resolve the correct Comtrade keys for IN/TW.
- Add IN/TW overrides to seed-comtrade-bilateral-hs4 and
  seed-recovery-import-hhi (they iterate shared/un-to-iso2.json which
  must remain pure M49 for other callers).
- Fix partner-dedupe bug in seed-trade-flows: the preview endpoint
  returns partner-level rows; keying by (flowCode, year) without
  summing kept only the last partner seen, so tradeValueUsd was a
  random counterparty's value, not the World aggregate. Sum across
  partners and label as World.
- Add a 70% coverage floor on reporter×commodity pairs so an entire
  reporter silently flatlining now throws in Phase 1 (TTL extend, no
  seed-meta refresh) rather than publishing a partial snapshot.
- Sync energy-shock test fixture.

* fix(trade): apply Comtrade IN/TW overrides to runtime consumers too

Follow-up to PR review: the seed-side fix was incomplete because two
request-time consumers still mapped iso2 → M49 (356/158) when hitting
Comtrade or reading the (now-rekeyed) seeded cache.

- server/worldmonitor/supply-chain/v1/_bilateral-hs4-lazy.ts: apply
  the IN=699 / TW=490 override when deriving ISO2_TO_UN, so the lazy
  bilateral-hs4 fetch path used by get-route-impact and
  get-country-chokepoint-index stops silently returning count:0 for
  India and Taiwan when the seeded cache is cold.
- src/utils/country-codes.ts: add iso2ToComtradeReporterCode helper
  with the override baked in. Keep iso2ToUnCode as pure M49 (used
  elsewhere for legitimate M49 semantics).
- src/app/country-intel.ts: switch the listComtradeFlows call on the
  country brief page to the new helper so IN/TW resolve to the same
  reporter codes the seeder now writes under.
2026-04-14 09:05:50 +04:00
Elie Habib
9d27ff0d6a fix(seeds): strict-floor validators must not poison seed-meta on empty (#3078)
* fix(seeds): strict-floor validators must not poison seed-meta on empty

When `runSeed`'s validateFn rejected (empty/short data), seed-meta was
refreshed with `fetchedAt=now, recordCount=0`. Bundle runners read
`fetchedAt` to decide skip — so one transient empty fetch locked the
IMF-extended bundle (30-day cadence) out for a full month.

Adds opt-in `emptyDataIsFailure` flag that skips the meta refresh on
validation failure, letting the bundle retry next cron fire and health
flip to STALE_SEED. Wires it on all four IMF/WEO seeders (floor 150-190
countries), which structurally can't have legitimate empty results.

Default behavior unchanged for quiet-period feeds (news, events) where
empty is normal.

Observed: Railway log 2026-04-13 18:58 — imf-external validation fail;
next fire 8h later skipped "483min ago / interval 43200min".

* test(seeds): regression coverage for emptyDataIsFailure branch

Static-analysis guard against the PR #3078 regression reintroducing itself:
- Asserts runSeed gates writeFreshnessMetadata on opts.emptyDataIsFailure
  and that extendExistingTtl still runs in both branches (cache preserved).
- Asserts the four strict-floor IMF seeders (external/growth/labor/macro)
  pass emptyDataIsFailure: true.

Prevents silent bundle-lockout if someone removes the gate or adds a new
strict-floor seeder without the flag.

* fix(seeds): strict-floor failure must exit(1) + behavioral test

P2 (surfacing upstream failures in bundle summary):
Strict-floor seeders with emptyDataIsFailure:true now process.exit(1)
after logging FAILURE. _bundle-runner's spawnSeed wraps execFile, so
non-zero exit rejects → failed++ increments → bundle itself exits 1.
Before: bundle logged 'Done' and ran++ on a poisoned upstream, hiding
30-day outages from Railway monitoring.

P3 (behavioral regression coverage, replacing static source-shape test):
Stubs globalThis.fetch (Upstash REST) + process.exit to drive runSeed
through both branches. Asserts on actual Redis commands:
- strict path: zero seed-meta SET, pipeline EXPIRE still called, exit(1)
- default path: exactly one seed-meta SET, exit(0)
Catches future regressions where writeFreshnessMetadata is reintroduced
indirectly, and is immune to cosmetic refactors of _seed-utils.mjs.

* test(seeds): regression for emptyDataIsFailure meta-refresh gate

Proves that validation failure with opts.emptyDataIsFailure:true does NOT
write seed-meta (strict-floor seeders) while the default behavior DOES
write count=0 meta (quiet-period feeds). Addresses PR #3078 review.
2026-04-14 09:02:21 +04:00
Elie Habib
1875531e2a fix(seed-economy): retry proxy/EIA transients; gate stress index on full FRED coverage (#3080)
* fix(seed-economy): retry proxy/EIA transients; fail stress index on missing FRED components

Log review of 16 runs (2026-04-14 00:45–04:31 UTC) showed 50% degraded:
- Decodo proxy flapped with HTTP 500/502/522, `fredFetchJson` fell back
  direct on first proxy error and FRED then returned 500 to Railway IP,
  dropping series.
- 5 EIA panels (EnergyPrices, Crude, NatGas, SPR, Refinery) timed out in
  lockstep at 01:00 and 02:00, producing full `no write` runs.
- StressIndex silently excluded missing FRED components (VIXCLS, T10Y3M,
  STLFSI4, ICSA), publishing a degraded composite as if healthy.

Changes:
- fredFetchJson: retry proxy 3x with jittered backoff on 5xx/522/timeout
  before falling back direct.
- eiaFetchJson helper: 20s timeout (was 10s) + 3x retry on 5xx/timeout;
  wired into all EIA call-sites.
- computeStressIndex: throw when any FRED-sourced component is missing;
  GSCPI (ais-relay) can still be absent. Caught in fetchAll so other
  secondary writes proceed but composite is not published degraded.

* fix(seed-economy): narrow stress-index catch; don't retry 4xx in eiaFetchJson

- computeStressIndex try/catch no longer wraps the Redis write so a
  write failure surfaces as a run error instead of being swallowed.
- eiaFetchJson bails immediately on 4xx and non-transient thrown
  errors; only 5xx / timeouts / network resets are retried.
2026-04-14 08:55:01 +04:00
Elie Habib
93113174b8 fix(seed-fires): retry per-country FIRMS fetch once to cut silent coverage gaps (#3079)
* fix(seed-fires): retry per-country FIRMS fetch once before giving up

Turkey/North Korea/Russia/Iran/Israel-Gaza/Saudi/Syria failed ~1.1x per
30-min run in prod logs (77 fails / 69 runs, 2026-04-12→13), silently
zeroing those regions on the map for up to ~20% of the day. Upstream is
transiently flaky on large bboxes, not auth-related.

Adds 1 retry with 5s backoff in fetchRegionSource. Worst-case added
latency ~100s, still well under the 30-min cadence.

* fix(seed-fires): align retry backoff with rate-limit pacing; extend lock TTL

Addresses review:
- Retry backoff 5s -> 6s matches the inter-call pacing budget; prevents
  breaching FIRMS free-tier 10 req/min under clustered fast-failures.
- lockTtlMs 10m -> 25m; retry path doubles worst-case per-slot runtime
  (~71s) and can exceed the old lock, risking concurrent publish races.

* fix(seed-fires): bump lock TTL to 40m to cover full worst-case retry runtime

27 slots × ~72s worst-case = 32.4 min, exceeds the previous 25m TTL.
Bump to 40m so a hung/crashed run can't hold a stale lock forever while
still safely covering legitimate long runs. Next 30m cron tick will see
the lock held and skip, which is the intended behavior.
2026-04-14 08:51:45 +04:00
Elie Habib
21d33c4bb5 fix(hyperliquid-flow): fetch both default dex and xyz builder dex (#3077)
Root cause: Hyperliquid's commodity and FX perps (xyz:CL, xyz:BRENTOIL,
xyz:GOLD, xyz:SILVER, xyz:PLATINUM, xyz:PALLADIUM, xyz:COPPER, xyz:NATGAS,
xyz:EUR, xyz:JPY) live on a separate 'xyz' builder dex, NOT the default
perp dex. The MIT reference repo listed these with xyz: prefixes but
didn't document that they require {type:metaAndAssetCtxs, dex:xyz} as a
separate POST.

Production symptom (Railway bundle logs 2026-04-14 04:10):
  [Hyperliquid-Flow] SKIPPED: validation failed (empty data)

The seeder polled the default dex only, matched 4 of 14 whitelisted assets
(BTC/ETH/SOL/PAXG), and validateFn rejected snapshots with <12 assets.
Seed-meta was refreshed on the skipped path so health stayed OK but
market:hyperliquid:flow:v1 was never written.

Fix:
- New fetchAllMetaAndCtxs(): parallel-fetches both dexes and merges
  {universe, assetCtxs} by concatenation. xyz entries already carry the
  xyz: prefix in their universe names.
- New validateDexPayload(raw, dexLabel, minUniverse): per-dex floor so the
  thinner xyz dex (~63 entries) does not false-trip the default floor of
  50. Errors include the dex label for debuggability.
- validateUpstream(): back-compat wrapper — accepts either the legacy
  single-dex [meta, assetCtxs] tuple (buildSnapshot tests) or the merged
  {universe, assetCtxs} shape from fetchAllMetaAndCtxs.

Tests: 37/37 green. New tests cover dual-dex fetch merge, cross-dex error
propagation, xyz floor accept/reject, and merged-shape pass-through.
2026-04-14 08:28:57 +04:00
Elie Habib
30ddad28d7 fix(seeds): upstream API drift — SPDR XLSX + IMF IRFCL + IMF-External BX/BM drop (#3076)
* fix(seeds): gold-etf XLSX migration, IRFCL dataflow, imf-external BX/BM drop

Three upstream-drift regressions caught from the market-backup + imf-extended
bundle logs. Root causes validated by live API probes before coding.

1. seed-gold-etf-flows: SPDR /assets/dynamic/GLD/GLD_US_archive_EN.csv now
   silently returns a PDF (Content-Type: application/pdf) — site migrated
   to api.spdrgoldshares.com/api/v1/historical-archive which serves XLSX.
   Swapped the CSV parser for an exceljs-based XLSX parser. Adds
   browser-ish Origin/Referer headers (SPDR swaps payload for PDF
   without them) and a Content-Type guard. Column layout: Date | Closing |
   ... | Tonnes | Total NAV USD.

2. seed-gold-cb-reserves: PR #3038 shipped with IMF.STA/IFS dataflow and
   3-segment key M..<indicator> — both wrong. IFS isn't exposed on
   api.imf.org (HTTP 204). Gold-reserves data lives under IMF.STA/IRFCL
   with 4 dimensions (COUNTRY.INDICATOR.SECTOR.FREQUENCY). Verified live:
   *.IRFCLDT1_IRFCL56_FTO.*.M returns 111 series. Switched to IRFCL +
   IRFCLDT1_IRFCL56_FTO (fine troy ounces) and fallbacks. The
   valueIsOunces flag now matches _FTO suffix (keeps legacy _OZT/OUNCE
   detection for backward compat).

3. seed-imf-external: BX/BM (export/import LEVELS, USD) WEO coverage
   collapsed to ~10 countries in late 2026 — the seeder's >=190-country
   validate floor was failing every run. Dropped BX/BM from fetch + join;
   kept BCA (~209) / TM_RPCH (~189) / TX_RPCH (~190). exportsUsd /
   importsUsd / tradeBalanceUsd fields kept as explicit null so consumers
   see a deliberate gap. validate floor lowered to 180 (BCA∪TM∪TX union).

Tests: 32/32 pass. Rewrote gold-etf tests to use synthetic XLSX fixtures
(exceljs resolved from scripts/package.json since repo root doesn't have
it). Updated imf-external tests for the new indicator set + null BX/BM
contract + 180-country validate threshold.

* fix(mcp): update get_country_macro description after BX/BM drop

Consumer-side catch during PR #3076 validation: the MCP tool description
still promised 'exports, imports, trade balance' fields that the seeder
fix nulls out. LLM consumers would be directed to exportsUsd/importsUsd/
tradeBalanceUsd fields that always return null since seed-imf-external
dropped BX/BM (WEO coverage collapsed to ~10 countries).

Updated description to list only the indicators actually populated
(currentAccountUsd, importVolumePctChg, exportVolumePctChg) with an
explicit note about the null trade-level fields so LLMs don't attempt
to use them.

* fix(gold-cb-reserves): compute real pctOfReserves + add exceljs to root

Follow-up to #3076 review.

1. pctOfReserves was hardcoded to 0 with a "IFS doesn't give us total
   reserves" comment. That was a lazy limitation claim — IMF IRFCL DOES
   expose total official reserve assets as IRFCLDT1_IRFCL65_USD parallel
   to the gold USD series IRFCLDT1_IRFCL56_USD. fetchCbReserves now
   pulls all three indicators (primary FTO tonnage + the two USD series)
   via Promise.allSettled and passes the USD pair to buildReservesPayload
   so it can compute the true gold share per country. Falls back to 0
   only when the denominator is genuinely missing for that country
   (IRFCL coverage: 114 gold_usd, 96 total_usd series; ~15% of holders
   have no matched-month denominator). 3-month lookback window absorbs
   per-country reporting lag.

2. CI fix: tests couldn't find exceljs because scripts/package.json is
   not workspace-linked to the repo root — CI runs `npm ci` at root
   only. Added exceljs@^4.4.0 as a root devDependency. Runtime seeder
   continues to resolve it from scripts/node_modules via Node's upward
   module resolution.

3 new tests cover pct computation, missing-denominator fallback, and
the 3-month lookback window.
2026-04-14 08:19:47 +04:00