mirror of
https://github.com/koala73/worldmonitor.git
synced 2026-04-25 17:14:57 +02:00
d9194a51793242cbb9fc0e3f494d2adbfb281638
3407 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
d9194a5179 |
fix(railway): tolerate Ubuntu apt mirror failures in NIXPACKS + Dockerfile builds (#3142)
Ubuntu's noble-security package-index CDN is returning hash-sum
mismatches (2026-04-17), causing ALL Railway NIXPACKS builds to fail
at the 'apt-get update && apt-get install curl' layer with exit
code 100. Multiple Railway services are red.
NIXPACKS' aptPkgs = ['curl'] generates a strict
'apt-get update && apt-get install -y' that fails hard on any
mirror error. Fix: replace aptPkgs with manual cmds that:
1. Allow apt-get update to partially fail (|| true)
2. Use --fix-missing on apt-get install so packages from healthy
mirrors still install even if one mirror is broken
Same treatment for consumer-prices-core/Dockerfile.
Files changed:
- nixpacks.toml (root — used by ais-relay + standalone cron seeders)
- scripts/nixpacks.toml (used by bundled seed services)
- consumer-prices-core/Dockerfile
The || true on apt-get update is safe because:
1. curl is the only package we install and it's often already present
in the NIXPACKS base image (nix-env provides it)
2. If curl genuinely isn't available, the seeder will fail at runtime
with a clear 'curl: not found' error — not a silent degradation
|
||
|
|
aeef68dd56 |
fix(relay): envelope-aware Redis reads restore chokepoint transit chart (#3139)
* fix(relay): envelope-aware Redis reads for PortWatch/CorridorRisk/stocks-bootstrap PR #3097 migrated 91 seed producers to the contract-mode envelope shape {_seed, data}, but ais-relay's private upstashGet() reads raw JSON and does not unwrap the envelope. Three callsites in the relay now see the wrapper metadata as the payload and silently corrupt downstream: - seedTransitSummaries() iterates {_seed, data} as chokepoint IDs and writes supply_chain:transit-summaries:v1 with keys "_seed" and "data", both mapping to empty-history summary objects. Every chokepoint's transitSummary.history is therefore empty, which gates off the time-series chart in SupplyChainPanel and MapPopup waterway popup. - loadWsbTickerSet() reads market:stocks-bootstrap:v1 and sees data.quotes === undefined, silently disabling WSB ticker matching. Fix: add envelopeRead(key) next to envelopeWrite — mirrors the server/_shared/redis.ts::getCachedJson semantics (envelope-aware by default; legacy raw shapes pass through unchanged). Swap the three upstashGet calls that target contract-mode canonical keys. After the relay re-seeds at its 10-min cadence, transit-summaries:v1 will contain a proper {hormuz_strait, suez, ...} map and the chart comes back in the panel and map popup. Unit tests cover contract-mode unwrap, legacy passthrough, null, and the array-with-numeric-indices false-positive edge case. Existing static assertions updated to guard against regression to raw upstashGet on these keys. * fix(backtest): envelope-aware reads in resilience backtest scripts backtest-resilience-outcomes.mjs reads economic:fx:yoy:v1, infra:outages:v1, and conflict:ucdp-events:v1 — all migrated to the {_seed, data} envelope by PR #3097. The private redisGetJson helper did not unwrap, so AUC computation was silently running against the envelope wrapper instead of the country event maps (same class of bug as PR #3139's relay fix, offline blast radius). validate-resilience-backtest.mjs uses the same read pattern across multiple family keys and is fixed for the same reason. Both scripts now import unwrapEnvelope from _seed-envelope-source.mjs (the canonical ESM source of truth used by seed-chokepoint-flows.mjs, seed-energy-spine.mjs, seed-forecasts.mjs, and others). Legacy raw shapes still pass through unchanged. * fix(relay): envelopeRead for OREF_REDIS_KEY bootstrap (Greptile P1) orefPersistHistory() writes OREF_REDIS_KEY via envelopeWrite at line 1133, but the bootstrap reader at line 1214 was still using raw upstashGet. cached.history was therefore undefined and Array.isArray() always false, so OREF alert history was never restored from Redis after a relay restart — every cold start hit the upstream API unnecessarily. Also adds the two regression guards Greptile flagged as missing: - loadWsbTickerSet() reading market:stocks-bootstrap:v1 via envelopeRead - orefBootstrapFromRedis reading OREF_REDIS_KEY via envelopeRead Same class of bug as the three callsites fixed earlier in this PR. |
||
|
|
bd559fec88 |
fix(liquidity-shifts): wire missing primeTask so fetchData() is called (#3138)
LiquidityShiftsPanel was created in panel-layout.ts but never had a
primeTask('liquidity-shifts', ...) call in App.ts, so fetchData() was
never invoked. The panel rendered its initial showLoading() state and
stayed there permanently, showing "Loading..." with the radar spinner.
Every other self-fetching panel (cot-positioning, gold-intelligence,
fear-greed, etf-flows, etc.) has its primeTask call. This one was
missed when the panel was added.
|
||
|
|
a4d9b0a5fa |
feat(auth): user-facing API key management (create / list / revoke) (#3125)
* feat(auth): user-facing API key management (create / list / revoke) Adds full-stack API key management so authenticated users can create, list, and revoke their own API keys from the Settings UI. Backend: - Convex `userApiKeys` table with SHA-256 key hash storage - Mutations: createApiKey, listApiKeys, revokeApiKey - Internal query validateKeyByHash + touchKeyLastUsed for gateway - HTTP endpoints: /api/api-keys (CRUD) + /api/internal-validate-api-key - Gateway middleware validates user-owned keys via Convex + Redis cache Frontend: - New "API Keys" tab in UnifiedSettings (visible when signed in) - Create form with copy-on-creation banner (key shown once) - List with prefix display, timestamps, and revoke action - Client-side key generation + hashing (plaintext never sent to DB) Closes #3116 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(api-keys): address PR review — cache invalidation, prefix validation, revoked-key guard - Invalidate Redis cache on key revocation so gateway rejects revoked keys immediately instead of waiting for 5-min TTL expiry (P1) - Enforce `wm_` prefix format with regex instead of loose length check (P2) - Skip `touchKeyLastUsed` for revoked keys to preserve clean audit trail (P2) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(api-keys): address consolidated PR review (P0–P3) P0: gate createApiKey on pro entitlement (tier >= 1); isCallerPremium now verifies key-owner tier instead of treating existence as premium. P1: wire wm_ user keys into the domain gateway auth path with async Convex-backed validation; user keys go through entitlement checks (only admin keys bypass). Lower cache TTL 300s → 60s and await revocation cache-bust instead of fire-and-forget. P2: remove dead HTTP create/list/revoke path from convex/http.ts; switch to cachedFetchJson (stampede protection, env-prefixed keys, standard NEG_SENTINEL); add tenancy check on cache-invalidation endpoint via new /api/internal-get-key-owner route; add 22 Convex tests covering tier gate, per-user limit, duplicate hash, ownership revoke guard, getKeyOwner, and touchKeyLastUsed debounce. P3: tighten keyPrefix regex to exactly 5 hex chars; debounce touchKeyLastUsed (5 min); surface PRO_REQUIRED in UI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(api-keys): gate on apiAccess (not tier), wire wm_ keys through edge routes, harden error paths - Gate API key creation/validation on features.apiAccess instead of tier >= 1. Pro (tier 1, apiAccess=false) can no longer mint keys — only API_STARTER+. - Wire wm_ user keys through standalone edge routes (shipping/route-intelligence, shipping/webhooks) that were short-circuiting on validateApiKey before async Convex validation could run. - Restore fail-soft behavior in validateUserApiKey: transient Convex/network errors degrade to unauthorized instead of bubbling a 500. - Fail-closed on cache invalidation endpoint: ownership check errors now return 503 instead of silently proceeding (surfaces Convex outages in logs). - Tests updated: positive paths use api_starter (apiAccess=true), new test locks Pro-without-API-access rejection. 23 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(webhooks): remove wm_ user key fallback from shipping webhooks Webhook ownership is keyed to SHA-256(apiKey) via callerFingerprint(), not to the user. With user-owned keys (up to 5 per user), this causes cross-key blindness (webhooks invisible when calling with a different key) and revoke-orphaning (revoking the creating key makes the webhook permanently unmanageable). User keys remain supported on the read-only route-intelligence endpoint. Webhook ownership migration to userId will follow in a separate PR. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Elie Habib <elie.habib@gmail.com> |
||
|
|
935417e390 |
chore(relay): socialVelocity + wsbTickers to hourly fetch (6x Reddit traffic reduction) (#3135)
* chore(relay): socialVelocity + wsbTickers to hourly fetch (was 10min) Reduce Reddit rate-limiting blast radius. Both seeders fetch 5 subreddits combined (2 for SV: worldnews, geopolitics; 3 for WSB: wallstreetbets, stocks, investing) with no proxy or OAuth. Reddit's behavioral heuristic for datacenter IPs consistently flags the Railway IP after ~50min of 10-min polling and returns HTTP 403 on every subsequent cycle until the container restarts with a new IP. Evidence (2026-04-16 ais-relay log): 13:32-14:22 UTC: 6 successful 10-min cycles for both seeders 16:06-16:16 UTC: 2 more successful cycles after a restart 16:26 UTC: BOTH subs flip to HTTP 403 simultaneously 16:36, 16:46, 16:56: every cycle, all 5 subreddits return 403 Dropping success-path frequency from 6/hour to 1/hour cuts the traffic Reddit's heuristic sees by 6x. On failure path the 20-min retry is kept as-is — during a block we've already been flagged, so extra retries don't make it worse. Changes: - SOCIAL_VELOCITY_INTERVAL_MS: 10min → 60min - SOCIAL_VELOCITY_TTL: 30min → 3h (3× new interval) - WSB_TICKERS_INTERVAL_MS: 10min → 60min - WSB_TICKERS_TTL: 30min → 3h (3× new interval) - api/health.js maxStaleMin: 30min → 180min for both (3× interval) - api/seed-health.js intervalMin: 15 → 90 for wsb-tickers (maxStaleMin / 2) Proper fix (proxy fallback or Reddit OAuth) deferred. * fix(seed-health): add socialVelocity parity entry — greptile P2 Review finding on PR #3135: wsbTickers was bumped from intervalMin=15 to 90 but socialVelocity had no seed-health.js entry at all. Both Reddit seeders now share the same 60-min cadence; adding the missing entry gives parity. P2-1 (malformed comment lines 5682-5683) is a false positive — verified the lines do start with '//' in the file. |
||
|
|
0075af5a47 |
fix(sector-valuations): proxy Yahoo quoteSummary via Decodo curl egress (#3134)
* fix(sector-valuations): proxy Yahoo quoteSummary via Decodo curl egress Yahoo's /v10/finance/quoteSummary returns HTTP 401 from Railway container IPs. Railway logs 2026-04-16 show all 12 sector ETFs failing every 5-min cron: [Sector] Yahoo quoteSummary XLK HTTP 401 (x12 per tick) [Market] Seeded 12/12 sectors, 0 valuations Add a curl-based proxy fallback that matches scripts/_yahoo-fetch.mjs: hit us.decodo.com (curl egress pool) NOT gate.decodo.com (CONNECT egress pool). Per the 2026-04-16 probe documented in _yahoo-fetch.mjs header, Yahoo blocks Decodo's CONNECT egress IPs but accepts the curl egress. Reusing ytFetchViaProxy here would keep failing silently. Shares the existing _yahooProxyFailCount / _yahooProxyCooldownUntil cooldown state with fetchYahooChartDirect so both Yahoo paths pause together if Decodo's curl pool also gets blocked. No change to direct-path behavior when Yahoo is healthy. * fix(sector-valuations): don't proxy on empty quoteSummary result (review) Direct 200 with data.quoteSummary.result[0] absent is an app-level "no data for this symbol" signal (e.g. delisted ETF). Proxy won't return different data for a symbol Yahoo itself doesn't carry — falling back would burn the 5-failure cooldown budget on structurally empty symbols and mask a genuine proxy outage. Resolve null on !result; keep JSON.parse catch going to proxy (garbage body IS a transport-level signal — captive portal, Cloudflare challenge). Review feedback from PR #3134. * fix(sector-valuations): split cooldown per egress route, cover transport failures (review) Review feedback on PR #3134, both P1. P1 #1 — transport failures bypassed cooldown execFileSync timeouts, proxy-connect refusals, and JSON.parse on garbage bodies all went through the catch block and returned null without ticking _yahooProxyFailCount. In the exact failure mode this PR hardens against, the relay would have thrashed through 12 × 20s curl attempts per tick with no backoff. Extract a bumpCooldown() helper and call it from both the non-2xx branch and the catch block. P1 #2 — two Decodo egress pools shared one cooldown budget fetchYahooChartDirect uses CONNECT via gate.decodo.com. _yahooQuoteSummaryProxyFallback uses curl via us.decodo.com. These are independent egress IP pools — per the 2026-04-16 probe, Yahoo blocks CONNECT but accepts curl. Sharing cooldown means 5 CONNECT failures suppress the healthy curl path (and vice versa). Split into _yahooConnectProxy* (chart) and _yahooCurlProxy* (sector valuations). Also: on proxy 200 with empty result, reset the curl counter. The route is healthy even if this specific symbol has no data — don't pretend it's a failure. * fix(sector-valuations): non-blocking curl + settle guard (review round 3) Review feedback on PR #3134, both P1. P1 #1 - double proxy invocation on timeout/error race req.destroy() inside the timeout handler can still emit 'error', and both handlers eagerly called resolve(_yahooQuoteSummaryProxyFallback(...)). A single upstream timeout therefore launched two curl subprocesses, double-ticked the cooldown counter, and blocked twice. Add a settled flag; settle() exits early on the second handler before evaluating the fallback. P1 #2 - execFileSync blocks the relay event loop The relay serves HTTP/WS traffic on the same thread that awaits seedSectorSummary's per-symbol Yahoo fetch. execFileSync for up to 20s per failure x 5 failures before cooldown = ~100s of frozen event loop. Switch to promisify(execFile). resolve(promise) chains the Promise through fetchYahooQuoteSummary's outer Promise, so the main-loop await yields while curl runs. Other traffic continues during the fetch. tests/sector-valuations.test.mjs: bump the static-analysis window from 1500 to 2000 chars so the field-extraction markers (ytdReturn etc.) stay inside the window after the settle guard was added. |
||
|
|
7d27cec21c |
feat(relay): seeder-loop heartbeats for chokepoint-flows + climate-news (#3133)
* feat(relay): seeder-loop heartbeats for chokepoint-flows + climate-news
Detect silent relay-loop failures (ERR_MODULE_NOT_FOUND at import, event-loop
blocked, container restart loop) up to 4 hours earlier than the data-level
seed-meta staleness window.
The chokepoint-flows bug that motivated this PR was invisible in health for
32 hours because each 6h cron tick fired, execFile'd the child, child died
at import, and NO ONE updated seed-meta:energy:chokepoint-flows. Since the
last successful write was still within its 3-day TTL, the data key was
present and the old seed-meta was still there — STALE_SEED triggered only
at +12h, and even then was a warn (not crit) that could easily be missed.
Fix:
- In scripts/ais-relay.cjs, write a success-only heartbeat via upstashSet
after each execFile-spawned seeder exits cleanly. TTL = 3x the loop
interval (18h for chokepoint-flows, 90min for climate-news) so a single
missed cycle doesn't flap but two consecutive misses alarm.
Payload shape matches seed-meta for drop-in compatibility with the
existing health-check reader: { fetchedAt, recordCount, durMs }.
- In api/health.js, register two new STANDALONE_KEYS entries pointing at
the heartbeat keys, plus SEED_META entries with tighter maxStaleMin:
chokepointFlowsRelayHeartbeat: 480min (8h vs 720min existing)
climateNewsRelayHeartbeat: 60min (vs 90min existing)
When the relay loop fails for >2 intervals, the heartbeat goes stale
first and surfaces as STALE_SEED in /api/health, giving 4h more notice
than waiting for seed-meta:energy:chokepoint-flows.
This is orthogonal to PR #3132 (fixes the actual ERR_MODULE_NOT_FOUND root
cause). Heartbeat is defensive observability for the NEXT failure mode we
can't predict.
* fix(health): gate new relay heartbeat keys as ON_DEMAND during deploy window — greptile review
Review finding on PR #3133: new heartbeat keys (relay:heartbeat:chokepoint-flows,
relay:heartbeat:climate-news) are written by ais-relay.cjs AFTER the first
successful post-deploy loop. Vercel deploys api/health.js instantly, so the
window between 'merge' and 'first heartbeat written' is:
- chokepoint-flows: up to 6h (initial loop tick)
- climate-news: up to 30min
During that window the heartbeat keys don't exist in Redis. classifyKey()
would return EMPTY (crit), which counts toward critCount and can flip overall
/api/health to DEGRADED even though climateNews and chokepointFlows data
themselves are fine.
Matches existing rule in project memory
(feedback_health_required_key_needs_railway_cron_first.md) — new seeder +
health.js registration in same PR needs ON_DEMAND gating until the Railway
side catches up, then harden after ~7 days.
Fix: add both keys to ON_DEMAND_KEYS with TRANSITIONAL comments, matching
the fxYoy / hyperliquidFlow pattern already used for the same issue.
|
||
|
|
7381a90a44 |
fix(sentry): guard ConvexClient on Firefox 149/Linux + filter Quark noise (#3130)
* fix(sentry): guard ConvexClient construct on Firefox 149/Linux + filter Quark browser noise ConvexClient import throws `TypeError: t is not a constructor` on Firefox 149/Linux (WORLDMONITOR-N0 + MX, 5 events / 2 users). Breadcrumbs proved both entitlements and billing subscription watchers were failing on `new CC(convexUrl)`. Wraps the constructor in try/catch so getConvexClient returns null — callers already have the null path wired for the no-VITE_CONVEX_URL case, so subscription features silently no-op instead of error-bubbling into Sentry via billing.ts:71. Also filters Quark browser (Alibaba mobile) touch-tracking injection that sets `bodyTouched` on undefined — property name has 0 matches in repo (WORLDMONITOR-N1). * fix(convex-client): reset authReadyPromise on constructor failure Addresses greptile-apps review on PR #3130: on the catch path, authReadyPromise had just been set to a never-resolving Promise at function entry. Without this reset, any future waitForConvexAuth() caller that doesn't pre-check the client is non-null would silently block for the full 10s timeout. |
||
|
|
c31662c3c9 |
fix(relay): COPY missing _seed-envelope-source + _seed-contract — chokepointFlows stale 32h (#3132)
* fix(relay): COPY _seed-envelope-source + _seed-contract into Dockerfile.relay Root cause of chokepointFlows STALE_SEED (1911min stale, maxStaleMin=720): since 2026-04-14 (PR #3097/#3101 landing), scripts/_seed-utils.mjs imports _seed-envelope-source.mjs and _seed-contract.mjs. Dockerfile.relay COPY'd _seed-utils.mjs but NOT its new transitive dependencies, so every execFile invocation of seed-chokepoint-flows.mjs, seed-climate-news.mjs, and seed-ember-electricity.mjs crashed at import with ERR_MODULE_NOT_FOUND. The ais-relay loop kept firing every 6h but each child died instantly — no visible error because execFile only surfaces child stderr to the parent relay's log stream. Local repro: node scripts/seed-chokepoint-flows.mjs runs fine in 3.6s and writes 7 records. Same command inside the relay container would throw at the import line because the file doesn't exist. Fix: 1. Add COPY scripts/_seed-envelope-source.mjs and COPY scripts/_seed-contract.mjs to Dockerfile.relay. 2. Add a static guard test (tests/dockerfile-relay-imports.test.mjs) that BFS's the transitive-import graph from every COPY'd entrypoint and fails if any reached scripts/*.mjs|cjs isn't also COPY'd. This would have caught the original regression. Matches feedback_dockerfile_relay_explicit_copy.md — we now have a test enforcing it. * fix(test): scanner also covers require() and createRequire(...)(...) — greptile P2 Review finding on PR #3132: collectRelativeImports only matched ESM import/export syntax, so require('./x.cjs') in ais-relay.cjs and createRequire(import.meta.url)('./x.cjs') in _seed-utils.mjs were invisible to the guard. No active bug (_proxy-utils.cjs is already COPY'd) but a future createRequire pointing at a new uncopied helper would slip through. Two regexes now cover both forms: - cjsRe: direct require('./x') — with a non-identifier lookbehind so 'thisrequire(' or 'foorequire(' can't match. - createRequireRe: createRequire(...)('./x') chained-call — the outer call is applied to createRequire's return value, not to a 'require(' token, so the first regex misses it on its own. Added a unit test asserting both forms resolve on known sites (_seed-utils.mjs and ais-relay.cjs) so the next edit to this file can't silently drop coverage. |
||
|
|
d1a3fdffed |
fix(portwatch): unblock port-activity seeder (global EP4 refs, conc 12, progress logs, SIGTERM) (#3128)
* fix(portwatch): global EP4 refs + concurrency 12 + progress logs + SIGTERM cleanup seed-portwatch-port-activity has been SIGKILL'd at the Railway 10-min container ceiling on every run since 2026-04-14 (recordCount=174, seedAgeMin=3096 = 51.6h = 4+ failed cycles), leaving portwatchPortActivity STALE_SEED and the 30-min lock leaking between runs. Root cause: ~240 ISO3s x 2 per-country ArcGIS queries at CONCURRENCY=4 with zero per-batch logging — slow enough to miss the 420s timeoutMs and silent enough that the timeout line was the only log on failure. Fixes (all 4): 1. fetchAllPortRefs(): one paginated EP4 query (where=1=1), grouped by ISO3 locally — collapses ~240 ref calls into ~5 pages. 2. CONCURRENCY 4 -> 12 and only queue activity fetches for iso3s that appear in refsByIso3 and in the iso3->iso2 map. 3. Per-page ref logs + per-batch activity logs every 5 batches — next failure will show exactly where it stalls. 4. SIGTERM/SIGINT handler releases the lock and extends prev-snapshot TTLs before exit so the next cron tick isn't blocked and Redis data doesn't evaporate when the bundle-runner kills the child. * fix(portwatch): advance pagination by actual features.length, not PAGE_SIZE Review finding on PR #3128: the new fetchAllPortRefs() global pager assumes the server honors resultRecordCount=2000, but ArcGIS's PortWatch_ports_database FeatureServer caps responses at 1000 rows. Incrementing offset by PAGE_SIZE (=2000) silently skipped rows 1000-1999 on EVERY page. Verified against the live endpoint: - returnCountOnly: 2065 - offset=0 size=2000: returned 1000 (ETL=true) - offset=1000 size=2000: returned 1000 (ETL=true) - offset=2000 size=2000: returned 65 (ETL=false) The buggy loop therefore loaded 1065 refs instead of 2065 — silently dropping 34 mapped countries with EP3 activity entirely and leaving 110 more countries with partial ref coverage. Partial coverage fell back to (0,0) lat/lon via `refMap.get(portId) || { lat: 0, lon: 0 }`. Not a regression from the old code (per-country EP4 fetches maxed at ~148 ports and never hit the cap), but a real bug introduced by the global pager. Fix: advance `offset` by `features.length` (the actual returned count) instead of by `PAGE_SIZE`. Applied to both fetchAllPortRefs (EP4) and fetchActivityRows (EP3) for consistency. Added break-on-empty guard so a server that returns exceededTransferLimit=true with 0 features can't infinite-loop. Regression test asserts `offset += features.length` appears for both paginators and `offset += PAGE_SIZE` appears nowhere in the file. |
||
|
|
5093d82e45 |
fix(seed-bundle-resilience): drop Resilience-Scores interval 6h to 2h so refresh runs (#3126)
Live log 2026-04-16 09:25 showed the bundle runner SKIPPING Resilience-Scores (last seeded 203min ago, interval 360min -> 288min skip threshold). Every Railway cron fire within the 4.8h skip window bypassed the section entirely, so refreshRankingAggregate() -- the whole point of the Slice B work merged in #3124 -- never ran. Ranking could then silently expire in the gap. Lower intervalMs to 2h. The bundle runner skip threshold becomes 96min; hourly Railway fires run the section about every 2h. Well within the 12h ranking TTL, and cheap per warm-path run: - computeAndWriteIntervals (~100ms local CPU + one pipeline write) - refreshRankingAggregate -> /api/resilience/v1/get-resilience-ranking?refresh=1 (handler recompute + 2-SET pipeline, ~2-5s) - STRLEN + GET-meta verify in parallel (~200ms) Total ~5-10s per warm-scores run. The expensive 222-country warm still only runs when scores are actually missing. Structural test pins intervalMs <= 2 hours so this doesn't silently regress. Full resilience suite: 378/378. |
||
|
|
da1fa3367b |
fix(resilience-ranking): chunked warm SET, always-on rebuild, truthful meta (Slice B) (#3124)
* fix(resilience-ranking): chunked warm SET, always-on rebuild, truthful meta Slice B follow-up to PR #3121. Three coupled production failures observed: 1. Per-country score persistence works (Slice A), but the 222-SET single pipeline body (~600KB) exceeds REDIS_PIPELINE_TIMEOUT_MS (5s) on Vercel Edge. runRedisPipeline returns []; persistence guard correctly returns empty; coverage = 0/222 < 75%; ranking publish silently dropped. Live Railway log: "Ranking: 0 ranked, 222 greyed out" → "Rebuilt … with 222 countries (bulk-call race left ranking:v9 null)" — second call only succeeded because Upstash had finally caught up between attempts. 2. The seeder's probe + rebuild block lives inside `if (missing > 0)`. When per-country scores survive a cron tick (TTL 6h, cron every 6h), missing=0 and the rebuild path is skipped. Ranking aggregate then expires alone and is never refreshed until scores also expire — multi-hour gaps where `resilience:ranking:v9` is gone while seed-meta still claims freshness. 3. `writeRankingSeedMeta` fires whenever finalWarmed > 0, regardless of whether the ranking key is actually present. Health endpoint sees fresh meta + missing data → EMPTY_ON_DEMAND with a misleading seedAge. Fixes: - _shared.ts: split the warm pipeline SET into SET_BATCH=30-command chunks so each pipeline body fits well under timeout. Pad missing-batch results with empty entries so the per-command alignment stays correct (failed batches stay excluded from `warmed`, no proof = no claim). - seed-resilience-scores.mjs: extract `ensureRankingPresent` helper, call it from BOTH the missing>0 and missing===0 branches so the ranking gets refreshed every cron. Add a post-rebuild STRLEN verification — rebuild HTTP can return 200 with a payload but still skip the SET (coverage gate, pipeline failure). - main(): only writeRankingSeedMeta when result.rankingPresent === true. Otherwise log and let the next cron retry. Tests: - resilience-ranking.test.mts: assert pipelines stay ≤30 commands. - resilience-scores-seed.test.mjs: structural checks that the rebuild is hoisted (≥2 callsites of ensureRankingPresent), STRLEN verification is present, and meta write is gated on rankingPresent. Full resilience suite: 373/373 pass (was 370 — 3 new tests). * fix(resilience-ranking): seeder no longer writes seed-meta (handler is sole writer) Reviewer P1: ensureRankingPresent() returning true only means the ranking key exists in Redis — not that THIS cron actually wrote it. The handler skips both the ranking SET and the meta SET when coverage < 75%, so an older ranking from a prior cron can linger while this cron's data didn't land. Under that scenario, the previous commit still wrote a fresh seed-meta:resilience:ranking, recreating the stale-meta-over-stale-data failure this PR is meant to eliminate. Fix: remove seeder-side seed-meta writes entirely. The ranking handler already writes ranking + meta atomically in the same pipeline when (and only when) coverage passes the gate. ensureRankingPresent() triggers the handler every cron, which addresses the original rationale for the seeder heartbeat (meta going stale during quiet Pro usage) without the seeder needing to lie. Consequence on failure: - Coverage gate trips → handler writes neither ranking nor meta. - seed-meta stays at its previous timestamp; api/health reports accurate staleness (STALE_SEED after maxStaleMin, then CRIT) instead of a fresh meta over stale/empty data. Tests updated: the "meta gated on rankingPresent" assertion is replaced with "seeder must not SET seed-meta:resilience:ranking" + "no writeRankingSeedMeta". Comments may still reference the key name for maintainer clarity — the assertion targets actual SET commands. Full resilience suite: 373/373 pass. * fix(resilience-ranking): always refresh + 12h TTL (close timing hole) Reviewer P1+P2: - P1: ranking TTL == cron interval (both 6h) left a timing hole. If a cron wrote the key near the end of its run and the next cron fired near the start of its interval, the key was still alive at probe time → ensureRankingPresent() returned early → no rebuild → key expired a short while later and stayed absent until a cron eventually ran while the key was missing. Multi-hour EMPTY_ON_DEMAND gaps. - P2: probing only the ranking data key (not seed-meta) meant a partial handler pipeline (ranking SET ok, meta SET missed) would self-heal only when the ranking itself expired — never during its TTL window. Fix: 1. Bump RESILIENCE_RANKING_CACHE_TTL_SECONDS from 6h to 12h (2x cron interval). A single missed or slow cron no longer causes a gap. Server-side and seeder-side constants kept in sync. 2. Replace ensureRankingPresent() with refreshRankingAggregate(): drop the 'if key present, skip' short-circuit. Rebuild every cron, unconditionally. One cheap HTTP call keeps ranking + seed-meta rolling forward together and self-heals the partial-pipeline case — handler retries the atomic pair every 6h regardless of whether the keys are currently live. 3. Update health.js comment to reflect the new TTL and refresh cadence (12h data TTL, 6h refresh, 12h staleness threshold = 2 missed ticks). Tests: - RESILIENCE_RANKING_CACHE_TTL_SECONDS asserts 12h (was 6h). - New assertion: refreshRankingAggregate must NOT early-return on probe- hit, and the rebuild HTTP call must be unconditional in its body. - DEL-guard test relaxed to allow comments between '{' and the DEL line (structural property preserved). Full resilience suite: 375/375. * fix(resilience-ranking): parallelize warm batches + atomic rebuild via ?refresh=1 Reviewer P2s: - Warm path serialized the 8 batch pipelines with `await` in a for-loop, adding ~7 extra Upstash round-trips (100-500ms each on Edge) to the warm wall-clock. Batches are independent; Promise.all collapses them into one slowest-batch window. - DEL+rebuild created a brief absence window: if the rebuild request failed transiently, the ranking stayed absent until the next cron. Now seeder calls `/api/resilience/v1/get-resilience-ranking?refresh=1` and the handler bypasses its cache-hit early-return, recomputing and SETting atomically. On rebuild failure, the existing (possibly stale-but-present) ranking is preserved instead of being nuked. Handler: read ctx.request.url for the refresh query param; guard the URL parse with try/catch so an unparseable url falls back to the cached-first behavior. Tests: - New: ?refresh=1 must bypass the cache-hit early-return (fails on old code, passes now). - DEL-guard test replaced with 'does NOT DEL' + 'uses ?refresh=1'. - Batch chunking still asserted at SET_BATCH=30. Full resilience suite: 376/376. * fix(resilience-ranking): bulk-warm call also needs ?refresh=1 (asymmetric TTL hazard) Reviewer P1: in the 6h-12h window, per-country score keys have expired (TTL 6h) but the ranking aggregate is still alive (TTL 12h). The seeder's bulk-warm call was hitting get-resilience-ranking without ?refresh=1, so the handler's cache-hit early-return fired and the entire warm path was skipped. Scores stayed missing; coverage degraded; the only recovery was the per-country laggard loop (5-request batches) — which silently no-ops when WM_KEY is absent. This defeated the whole point of the chunked bulk warm introduced in this PR. Fix: the bulk-warm fetch at scripts/seed-resilience-scores.mjs:167 now appends ?refresh=1, matching the rebuild call. Every seeder-initiated hit on the ranking endpoint forces the handler to route through warmMissingResilienceScores and its chunked pipeline SET, regardless of whether the aggregate is still cached. Test extended: structural assertion now scans ALL occurrences of get-resilience-ranking in the seeder and requires every one of them to carry ?refresh=1. Fails the moment a future change adds a bare call. Full resilience suite: 376/376. * fix(resilience-ranking): gate ?refresh=1 on seed key + detect partial pipeline publish Reviewer P1: ?refresh=1 was honored for any caller — including valid Pro bearer tokens. A full warm is ~222 score computations + chunked pipeline SETs; a Pro user looping on refresh=1 (or an automated client) could DoS Upstash quota and Edge budget. Gate refresh behind WORLDMONITOR_VALID_KEYS / WORLDMONITOR_API_KEY (X-WorldMonitor-Key header) — the same allowlist the cron uses. Pro bearer tokens get the standard cache-first path; refresh requires the seed service key. Reviewer P2: the handler's atomic runRedisPipeline SET of ranking + meta is non-transactional on Upstash REST — either SET can fail independently. If the ranking landed but meta missed, the seeder's STRLEN verify would pass (ranking present) while /api/health stays stuck on stale meta. Two-part fix: - Handler inspects pipelineResult[0] and [1] and logs a warning when either SET didn't return OK. Ops-greppable signal. - Seeder's verify now checks BOTH keys in parallel: STRLEN on ranking data, and GET + fetchedAt freshness (<5min) on seed-meta. Partial publish logs a warning; next cron retries (SET is idempotent). Tests: - New: ?refresh=1 without/with-wrong X-WorldMonitor-Key must NOT trigger recompute (falls back to cached response). Existing bypass test updated to carry a valid seed key header. Full resilience suite: 376/376 + 1 new = 377/377. |
||
|
|
3c1caa75e6 |
feat(gdelt): _gdelt-fetch helper with curl-multi-retry proxy + seed-gdelt-intel migration (#3122)
* feat(_gdelt-fetch): curl proxy multi-retry helper for per-IP-throttled API GDELT (api.gdeltproject.org) is a public free API with strict per-IP throttling. seed-gdelt-intel currently has no proxy fallback — Railway egress IPs hit 429 storms and the seeder degrades. Probed 2026-04-16: Decodo curl egress against GDELT gives ~40% success per attempt (session-rotates IPs per call). Helper retries up to 5×; expected overall success ~92% (1 - 0.6^5). PROXY STRATEGY — CURL ONLY WITH MULTI-RETRY Differs from _yahoo-fetch.mjs (single proxy attempt) and _open-meteo-archive.mjs (CONNECT + curl cascade): - Curl-only: CONNECT not yet probed cleanly against GDELT. - Multi-retry on the curl leg: the proxy IS the rotation mechanism (each call → different egress IP), so successive attempts probe different IPs in the throttle pool. - Distinguishes retryable (HTTP 429/503 from upstream) from non-retryable (parse failure, auth, network) — bails immediately on non-retryable to avoid 5× of wasted log noise. Direct loop uses LONGER backoff than Yahoo's 5s base (10s) — GDELT's throttle window is wider than Yahoo's, so quick retries usually re-hit the same throttle. Tests (tests/gdelt-fetch.test.mjs, 13 cases — every learning from PR #3118 + #3119 + #3120 baked in): - Production defaults: curl resolver/fetcher reference equality - Production defaults: NO CONNECT leg (regression guard for unverified path) - 200 OK passthrough - 429 with no proxy → throws with HTTP 429 in message - Retry-After parsed (DI _sleep capture asserts 7000ms not retryBaseMs) - Retry-After absent → linear backoff retryBaseMs (paired branch test) - **Proxy multi-retry: 4× HTTP 429 then 5th succeeds → returns data** (asserts 5 proxy calls + 4 inter-proxy backoffs of proxyRetryBaseMs) - **Proxy non-retryable (parse failure) bails after 1 attempt** (does NOT burn all proxyMaxAttempts on a structural failure) - **Proxy retryable + non-retryable mix: retries on 429, bails on parse** - Thrown fetch error on final retry → proxy multi-retry runs (P1 guard) - All proxy attempts fail → throws with 'X/N attempts' in message + cause - Malformed JSON does NOT emit succeeded log before throw (P2 guard) - parseRetryAfterMs unit Verification: - tests/gdelt-fetch.test.mjs → 13/13 pass - node --check scripts/_gdelt-fetch.mjs → clean Phase 1 of 2. Seeder migration follows. * feat(seed-gdelt-intel): migrate to _gdelt-fetch helper Replaces direct fetch + ad-hoc retry in seed-gdelt-intel with the new fetchGdeltJson helper. Each topic call now gets: 3 direct retries (10/20/40s backoff) → 5 curl proxy attempts via Decodo session-rotating egress. Specific changes: - import fetchGdeltJson from _gdelt-fetch.mjs - fetchTopicArticles: replace fetch+retry+throw block with single await fetchGdeltJson(url, { label: topic.id }) - fetchTopicTimeline: same — best-effort try/catch returns [] on any failure (preserved). Helper still attempts proxy fallback before throwing, so a 429-throttled IP doesn't kill the timeline. - fetchWithRetry: collapsed from outer 3-retry loop with 60/120/240s backoff (which would have multiplied to 24 attempts/topic on top of helper's 8) to a thin wrapper that translates exhaustion into the {exhausted, articles:[]} shape the caller uses to drive POST_EXHAUST_DELAY_MS cooldown. - Drop CHROME_UA import (no longer used directly; helper handles it). Helper's exhausted-throw includes 'HTTP 429' substring when 429 was the upstream signal, so the existing is429 detection in fetchWithRetry continues to work without modification. Verification: - node --check scripts/seed-gdelt-intel.mjs → clean - npm run typecheck:all → clean - npm run test:data → 5382/5382 (was 5363, +13 from helper + 6 from prior PR work) Phase 2 of 2. * fix(_gdelt-fetch): proxy timeouts/network errors RETRY (rotates Decodo session) P1 from PR #3122 review: probed Decodo curl egress against GDELT (2026-04-16) gave 200/200/429/TIMEOUT/429 — TIMEOUT is part of the normal transient mix that the multi-retry design exists to absorb. Pre-fix logic only retried on substring 'HTTP 429'/'HTTP 503' matches, so a curl exec timeout (Node Error with no .status, not a SyntaxError) bailed on the first attempt. The PR's headline 'expected ~92% success with 5 attempts' was therefore not actually achievable for one of the exact failure modes that motivated the design. Reframed the proxy retryability decision around what we CAN reliably discriminate from the curl error shape: curlErr.status == number → retry only if 429/503 (curlFetch attaches .status only when curl returned a clean HTTP status) curlErr instanceof SyntaxError → bail (parse failure is structural) otherwise → RETRY (timeout, ECONNRESET, DNS, curl exec failure, CONNECT tunnel failure — all transient; rotating Decodo session usually clears them) P2 from same review: tests covered HTTP-status proxy retries + parse failures but never the timeout/thrown-error class. Added 3 tests: - proxy timeout (no .status) RETRIES → asserts proxyCalls=2 after a first-attempt ETIMEDOUT then second-attempt success - proxy ECONNRESET (no .status) RETRIES → same pattern - proxy HTTP 4xx with .status (e.g. 401 auth) does NOT retry → bails after 1 attempt Existing tests still pass — they use 'HTTP 429' Error WITHOUT .status, which now flows through the 'else: assume transient' branch and still retries. Only differences: the regex parsing is gone and curlFetch's .status property is the canonical signal. Verification: - tests/gdelt-fetch.test.mjs: 16/16 (was 13, +3) - npm run test:data: 5385/5385 (+3) - npm run typecheck:all: clean Followup commit on PR #3122. * fix(seed-gdelt-intel): timeline calls fast-fail (maxRetries:0, proxyMaxAttempts:0) P1 from PR #3122 review: fetchTopicTimeline is best-effort (returns [] on any failure), but the migration routed it through fetchGdeltJson with the helper's article-fetch defaults: 3 direct retries (10/20/40s backoff = ~70s) + 5 proxy attempts (5s base = ~20s) = ~90s worst case per call. Called 2× per topic × 6 topics = 12 calls = up to ~18 minutes of blocking on data the seeder discards on failure. Pre-helper code did a single direct fetch with no retry. Real operational regression under exactly the GDELT 429 storm conditions this PR is meant to absorb. Fix: 1. seed-gdelt-intel.mjs:fetchTopicTimeline now passes maxRetries:0, proxyMaxAttempts:0 — single direct attempt, no proxy, throws on first failure → caught, returns []. Matches pre-helper timing exactly. Article fetches keep the full retry budget; only timelines fast-fail. 2. _gdelt-fetch.mjs gate: skip the proxy block entirely when proxyMaxAttempts <= 0. Pre-fix the 'trying proxy (curl) up to 0×' log line would still emit even though the for loop runs zero times, producing a misleading line that the proxy was attempted when it wasn't. Tests (2 new): - maxRetries:0 + proxyMaxAttempts:0 → asserts directCalls=1, proxyCalls=0 even though _curlProxyResolver returns a valid auth string (proxy block must be fully bypassed). - proxyMaxAttempts:0 → captures console.log and asserts no 'trying proxy' line emitted (no misleading 'up to 0×' line). Verification: - tests/gdelt-fetch.test.mjs: 18/18 (was 16, +2) - npm run test:data: 5387/5387 (+2) - npm run typecheck:all: clean Followup commit on PR #3122. * fix(gdelt): direct parse-failure reaches proxy + timeline budget tweak + JSDoc accuracy 3 Greptile P2s on PR #3122: P2a — _gdelt-fetch.mjs:112: `resp.json()` was called outside the try/catch that guards fetch(). A 200 OK with HTML/garbage body (WAF challenge, partial response, gzip mismatch) would throw SyntaxError and escape the helper entirely — proxy fallback never ran. The proxy leg already parsed inside its own catch; making the direct leg symmetric. New regression test: direct 200 OK with malformed JSON must reach the proxy and recover. P2b — seed-gdelt-intel.mjs timeline budget bumped from 0/0 to 0/2. Best-effort timelines still fast-fail on direct 429 (no direct retries) but get 2 proxy attempts via Decodo session rotation before returning []. Worst case: ~25s/call × 12 calls = ~5 min ceiling under heavy throttling vs ~3 min with 0/0. Tradeoff: small additional time budget for a real chance to recover timeline data via proxy IP rotation. Articles still keep the full retry budget. P2c — JSDoc said 'Linear proxy backoff base' but the implementation uses a flat constant (proxyRetryBaseMs, line 156). Linear growth would not help here because Decodo rotates the session IP per call — the next attempt's success is independent of the previous wait. Doc now reads 'Fixed (constant, NOT linear) backoff' with the rationale. Verification: - tests/gdelt-fetch.test.mjs: 19/19 pass (was 18, +1) - npm run test:data: 5388/5388 (+1) - npm run typecheck:all: clean Followup commit on PR #3122. * test(gdelt): clarify helper-API vs seeder-mirror tests + add 0/2 lock Reviewer feedback on PR #3122 conflated two test classes: - Helper-API tests (lock the helper's contract for arbitrary callers using budget knobs like 0/0 — independent of any specific seeder) - Seeder-mirror tests (lock the budget the actual production caller in seed-gdelt-intel.mjs uses) Pre-fix the test file only had the 0/0 helper-API tests, with a section header that read 'Best-effort caller budgets (fast-fail)' — ambiguous about whether 0/0 was the helper API contract or the seeder's choice. Reviewer assumed seeder still used 0/0 because the tests locked it, but seed-gdelt-intel.mjs:97-98 actually uses 0/2 (per the prior P2b fix). Fixes: 1. Section header for the 0/0 tests now explicitly says these are helper-API tests and notes that seed-gdelt-intel uses 0/2 (not 0/0). Eliminates the conflation. 2. New 'Seeder-mirror: 0/2' section with 2 tests that lock the seeder's actual choice end-to-end: - 0/2 with first proxy attempt 429 + second succeeds → returns data (asserts directCalls=1, proxyCalls=2) - 0/2 with both proxy attempts failing → throws exhausted with '2/2 attempts' in message (asserts the budget propagates to the error message correctly) These tests would catch any future regression where the seeder's 0/2 choice gets reverted to 0/0 OR where the helper stops honoring the proxyMaxAttempts override. Verification: - tests/gdelt-fetch.test.mjs: 21/21 (was 19, +2) - npm run test:data: 5390/5390 (+2) - npm run typecheck:all: clean Followup commit on PR #3122. |
||
|
|
bdfb415f8f |
fix(resilience-ranking): return warmed scores from memory, skip lossy re-read (#3121)
* fix(resilience-ranking): return warmed scores from memory, skip lossy re-read Upstash REST writes via /set aren't always visible to an immediately-following /pipeline GET in the same Vercel invocation (documented in PR #3057 / feedback_upstash_write_reread_race_in_handler.md). The ranking handler was warming 222 countries then re-reading them from Redis to compute a coverage ratio; that re-read could return 0 despite every SET succeeding, collapsing coverage to 0% < 75% and silently dropping the ranking publish. Consequence: `resilience:ranking:v9` missing, per-country score keys absent, health reports EMPTY_ON_DEMAND even while the seeder keeps writing a "fresh" meta. Fix: warmMissingResilienceScores now returns Map<cc, GetResilienceScoreResponse> with every successfully computed score. The handler merges those into cachedScores directly and drops the post-warm re-read. Coverage now reflects what was actually computed in-memory this invocation, not what Redis happens to surface after write lag. Adds a regression test that simulates pipeline-GET returning null for freshly-SET score keys; it fails against the old code (coverage=0, no ranking written) and passes with the fix (coverage=100%, ranking written). Slice A of the resilience-ranking recovery plan; Slice B (seeder meta truthfulness) follows. * fix(resilience-ranking): verify score-key persistence via pipeline SET response PR review P1: trusting every fulfilled ensureResilienceScoreCached() result as "cached" turned the read-lag fix into a write-failure false positive. cachedFetchJson's underlying setCachedJson only logs and swallows write errors, so a transient /set failure on resilience:score:v9:* would leave per-country scores absent while the ranking aggregate and its seed-meta got published on top of them — worse than the bug this PR was meant to fix. Fix: use the pipeline SET response as the authoritative persistence signal. - Extract the score builder into a pure `buildResilienceScore()` with no caching side-effects (appendHistory stays — it's part of the score semantics). - `ensureResilienceScoreCached()` still wraps it in cachedFetchJson so single-country RPC callers keep their log-and-return-anyway behavior. - `warmMissingResilienceScores()` now computes in-memory, persists all scores in one pipeline SET, and only returns countries whose command reported `result: OK`. Pipeline SET's response is synchronous with the write, so OK means actually stored — no ambiguity with read-after-write lag. - When runRedisPipeline returns fewer responses than commands (transport failure), return an empty map: no proof of persistence → coverage gate can't false-positive. Adds regression test that blocks pipeline SETs to score keys and asserts the ranking + meta are NOT published. Existing race-regression test still passes. * fix(resilience-ranking): preserve env key prefix on warm pipeline SET PR review P1: the pipeline SET added to verify score-key persistence was called with raw=true, bypassing the preview/dev key prefix (preview:<sha>:). Two coupled regressions: 1. Preview/dev deploys write unprefixed `resilience:score:v9:*` keys, but all reads (getCachedResilienceScores, ensureResilienceScoreCached via setCachedJson/cachedFetchJson) look in the prefixed namespace. Warmed scores become invisible to the same preview on the next read. 2. Because production uses the empty prefix, preview writes land directly in the production-visible namespace, defeating the environment isolation guard in server/_shared/redis.ts. Fix: drop the raw=true flag so runRedisPipeline applies prefixKey on each command, symmetric with the reads. Adds __resetKeyPrefixCacheForTests in redis.ts so tests can exercise a non-empty prefix without relying on process-startup memoization order. Adds regression test that simulates VERCEL_ENV=preview + a commit SHA and asserts every score-key SET in the pipeline carries the preview:<sha>: prefix. Fails on old code (raw writes), passes now. installRedis gains an opt-in `keepVercelEnv` so the test can run under a forced env without being clobbered by the helper's default reset. * test(resilience-ranking): snapshot + restore VERCEL_GIT_COMMIT_SHA in afterEach PR review P2: the preview-prefix test mutates process.env.VERCEL_GIT_COMMIT_SHA but the file's afterEach only restored VERCEL_ENV. A process started with a real preview SHA (e.g. CI) would have that value unconditionally deleted after the test ran, leaking changed state into later tests and producing different prefix behavior locally vs. CI. Fix: capture originalVercelSha at module load, restore it in afterEach, and invalidate the memoized key prefix after each test so the next one recomputes against the restored env. The preview-prefix test's finally block is no longer needed — the shared teardown handles it. Verified: suite still passes 11/11 under both `VERCEL_ENV=production` (unset) and `VERCEL_ENV=preview VERCEL_GIT_COMMIT_SHA=ci-original-sha` process environments. |
||
|
|
fec135d6b8 |
chore(sentry): filter DuckDuckGo browser Response did not contain 'success' or 'data' noise (#3123)
Adds one ignoreErrors pattern for the distinctive DuckDuckGo browser-internal error phrase (WORLDMONITOR-MZ). The message is never emitted by our own code, contains backtick-quoted field names that are not in our vocabulary, and arrives with an empty stack from DuckDuckGo 26.3 on macOS. |
||
|
|
9b07fc8d8a |
feat(yahoo): _yahoo-fetch helper with curl-only Decodo proxy fallback + 4 seeder migrations (#3120)
* feat(_yahoo-fetch): curl-only Decodo proxy fallback helper Yahoo Finance throttles Railway egress IPs aggressively. 4 seeders (seed-commodity-quotes, seed-etf-flows, seed-gulf-quotes, seed-market-quotes) duplicated the same fetchYahooWithRetry block with no proxy fallback. This helper consolidates them and adds the proxy fallback. Yahoo-specific: CURL-ONLY proxy strategy. Probed 2026-04-16: query1.finance.yahoo.com via CONNECT (httpsProxyFetchRaw): HTTP 404 query1.finance.yahoo.com via curl (curlFetch): HTTP 200 Yahoo's edge blocks Decodo's CONNECT egress IPs but accepts the curl egress IPs. Helper deliberately omits the CONNECT leg — adding it would burn time on guaranteed-404 attempts. Production defaults expose ONLY curlProxyResolver + curlFetcher. All learnings from PR #3118 + #3119 reviews baked in: - lastDirectError accumulator across the loop, embedded in final throw + Error.cause chain - catch block uses break (NOT throw) so thrown errors also reach proxy - DI seams (_curlProxyResolver, _proxyCurlFetcher) for hermetic tests - _PROXY_DEFAULTS exported for production-default lock tests - Sync curlFetch wrapped with await Promise.resolve() to future-proof against an async refactor (Greptile P2 from #3119) Tests (tests/yahoo-fetch.test.mjs, 11 cases): - Production defaults: curl resolver/fetcher reference equality - Production defaults: NO CONNECT leg present (regression guard) - 200 OK passthrough, never touches proxy - 429 with no proxy → throws exhausted with HTTP 429 in message - Retry-After header parsed correctly - 429 + curl proxy succeeds → returns proxy data - Thrown fetch error on final retry → proxy fallback runs (P1 guard) - 429 + proxy ALSO fails → both errors visible in message + cause chain - Proxy malformed JSON → throws exhausted - Non-retryable 500 → no extra direct retry, falls to proxy - parseRetryAfterMs unit (exported sanity check) Verification: 11/11 helper tests pass. node --check clean. Phase 1 of 2 — seeder migrations follow. * feat(yahoo-seeders): migrate 4 seeders to _yahoo-fetch helper Removes the duplicated fetchYahooWithRetry function (4 byte-identical copies across seed-commodity-quotes, seed-etf-flows, seed-gulf-quotes, seed-market-quotes) and routes all Yahoo Finance fetches through the new scripts/_yahoo-fetch.mjs helper. Each seeder gains the curl-only Decodo proxy fallback baked into the helper. Per-seeder changes (mechanical): - import { fetchYahooJson } from './_yahoo-fetch.mjs' - delete the local fetchYahooWithRetry function - replace 'const resp = await fetchYahooWithRetry(url, label); if (!resp) return X; const json = await resp.json()' with 'let json; try { json = await fetchYahooJson(url, { label }); } catch { return X; }' - prune now-unused CHROME_UA/sleep imports where applicable Latent bugs fixed in passing: - seed-etf-flows.mjs:23 and seed-market-quotes.mjs:38 referenced CHROME_UA without importing it (would throw ReferenceError at runtime if the helper were called). Now the call site is gone in etf-flows; in market-quotes CHROME_UA is properly imported because Finnhub call still uses it. seed-commodity-quotes also has fetchYahooChart1y (separate non-retry function for gold history). Migrated to use fetchYahooJson under the hood — preserves return shape, adds proxy fallback automatically. Verification: - node --check clean on all 4 modified seeders - npm run typecheck:all clean - npm run test:data: 5374/5374 pass Phase 2 of 2. * fix(_yahoo-fetch): log success AFTER parse + add _sleep DI seam for honest Retry-After test Greptile P2: "[YAHOO] proxy (curl) succeeded" was logged BEFORE JSON.parse(text). On malformed proxy JSON, Railway logs would show: [YAHOO] proxy (curl) succeeded for AAPL throw: Yahoo retries exhausted ... Contradictory + breaks the post-deploy log-grep verification this PR relies on ("look for [YAHOO] proxy (curl) succeeded"). Fix: parse first; success log only fires when parse succeeds AND the value is about to be returned. Greptile P3: 'Retry-After header parsed correctly' test used header value '0', but parseRetryAfterMs() treats non-positive seconds as null → helper falls through to default linear backoff. So the test was exercising the wrong branch despite its name. Fix: added _sleep DI opt seam to the helper. New test injects a sleep spy and asserts the captured duration: Retry-After: '7' → captured sleep == [7000] (Retry-After branch) no Retry-After → captured sleep == [10] (default backoff = retryBaseMs * 1) Two paired tests lock both branches separately so a future regression that collapses them is caught. Also added a log-ordering regression test: malformed proxy JSON must NOT emit the 'succeeded' log. Captures console.log into an array and asserts no 'proxy (curl) succeeded' line appeared before the throw. Verification: - tests/yahoo-fetch.test.mjs: 13/13 (was 11, +2) - npm run test:data: 5376/5376 (+2) - npm run typecheck:all: clean Followup commits on PR #3120. |
||
|
|
57414e4762 |
fix(open-meteo): curl proxy as second-choice when CONNECT proxy fails (#3119)
* fix(open-meteo): curl proxy as second-choice when CONNECT proxy fails Decodo's CONNECT egress and curl egress reach DIFFERENT IP pools (per scripts/_proxy-utils.cjs:67). Probed 2026-04-16 against Yahoo Finance: Yahoo via CONNECT (httpsProxyFetchRaw): HTTP 404 Yahoo via curl (curlFetch): HTTP 200 For Open-Meteo both paths happen to work today, but pinning the helper to one path is a single point of failure if Decodo rebalances pools, or if Open-Meteo starts behaving like Yahoo. PR #3118 wired only the CONNECT path (`httpsProxyFetchRaw`); this commit adds curl as a second-choice attempt that runs only when CONNECT also fails. Cascade: direct retries (3) → CONNECT proxy (1) → curl proxy (1) → throw Steady-state cost: zero. Curl exec only runs when CONNECT also failed. Final exhausted-throw now appends the LAST proxy error too, so on-call sees both upstream signals (direct + proxy) instead of just direct. Tests: added 4 cases locking the cascade behavior: - CONNECT fails → curl succeeds: returns curl data, neither throws - CONNECT succeeds: curl never invoked (cost gate) - CONNECT fails AND curl fails: throws exhausted with both errors visible in the message (HTTP 429 from direct + curl 502 from proxy) - curl returns malformed JSON: caught + warns + throws exhausted Updated 2 existing tests to also stub _proxyCurlFetcher so they don't shell out to real curl when CONNECT is mocked-failed (would have run real curl with proxy.test:8000 → 8s timeout per test). Verification: - tests/open-meteo-proxy-fallback.test.mjs → 12/12 pass (was 8, +4 new) - npm run test:data → 5367/5367 (+4) - npm run typecheck:all → clean Followup to PR #3118. * fix: CONNECT leg uses resolveProxyForConnect; lock production defaults P1 from PR #3119 review: the cascade was logged as 'CONNECT proxy → curl proxy' but BOTH legs were resolving via resolveProxy() — which rewrites gate.decodo.com → us.decodo.com for curl egress. So the 'two-leg cascade' was actually one Decodo egress pool wearing two transport mechanisms. Defeats the redundancy this PR is supposed to provide. Fix: import resolveProxyForConnect (preserves gate.decodo.com — the host Decodo routes via its CONNECT egress pool, distinct from the curl-egress pool reached by us.decodo.com via resolveProxy). CONNECT leg uses resolveProxyForConnect; curl leg uses resolveProxy. Matches the established pattern in scripts/seed-portwatch-chokepoints-ref.mjs:33-37 and scripts/seed-recovery-external-debt.mjs:31-35. Refactored test seams: split single _proxyResolver into _connectProxyResolver + _curlProxyResolver. Test files inject both. P2 fix: every cascade test injected _proxyResolver, so the suite stayed green even when production defaults were misconfigured. Exported _PROXY_DEFAULTS object and added 2 lock-tests: 1. CONNECT leg uses resolveProxyForConnect, curl leg uses resolveProxy (reference equality on each of 4 default fields). 2. connect/curl resolvers are different functions — guards against the 'collapsed cascade' regression class generally, not just this specific instance. Updated the 8 existing cascade tests to inject BOTH resolvers. The docstring at the top of the file now spells out the wiring invariant and points to the lock-tests. Verification: - tests/open-meteo-proxy-fallback.test.mjs: 14/14 pass (+2) - npm run test:data: 5369/5369 (+2) - npm run typecheck:all: clean Followup commit on PR #3119. * fix(open-meteo): future-proof sync curlFetch call with Promise.resolve+await Greptile P2: _proxyCurlFetcher (curlFetch / execFileSync) is sync today, adjacent CONNECT path is async (await _proxyFetcher(...)). A future refactor of curlFetch to async would silently break this line — JSON.parse would receive a Promise<string> instead of a string and explode at parse time, not at the obvious call site. Wrapping with await Promise.resolve(...) is a no-op for the current sync implementation but auto-handles a future async refactor. Comment spells out the contract so the wrap doesn't read as cargo-cult. Tests still 14/14. |
||
|
|
5d1c8625e9 |
fix(seed-climate-zone-normals): proxy fallback when Open-Meteo 429s on Railway IP (#3118)
* fix(seed-climate-zone-normals): proxy fallback when Open-Meteo 429s on Railway IP Railway logs.1776312819911.log showed seed-climate-zone-normals failing every batch with HTTP 429 from Open-Meteo's free-tier per-IP throttle (2026-04-16). The seeder retried with 2/4/8/16s backoff but exhausted without ever falling back to the project's Decodo proxy infrastructure that other rate-limited sources (FRED, IMF) already use. Open-Meteo throttles by source IP. Railway containers share IP pools and get 429 storms whenever zone-normals fires (monthly cron — high churn when it runs). Result: PR #3097's bake clock for climate:zone-normals:v1 couldn't start, because the seeder couldn't write the contract envelope even when manually triggered. Fix: after direct retries exhaust, _open-meteo-archive.mjs falls back to httpsProxyFetchRaw (Decodo) — same pattern as fredFetchJson and imfFetchJson in _seed-utils.mjs. Skips silently if no proxy is configured (preserves existing behavior in non-Railway envs). Added tests/open-meteo-proxy-fallback.test.mjs (4 cases): - 429 with no proxy → throws after exhausting retries (pre-fix behavior preserved) - 200 OK → returns parsed batch without touching proxy path - batch size mismatch → throws even on 200 - Non-retryable 500 → break out, attempt proxy, throw exhausted (no extra direct retry — matches new control flow) Verification: npm run test:data → 5359/5359, +4 new. node --check clean. Same pattern can be applied to any other helper that fetches Open-Meteo (grep 'open-meteo' scripts/) if more 429s show up. * fix: proxy fallback runs on thrown direct errors + actually-exercised tests Addresses two PR #3118 review findings. P1: catch block did 'throw err' on the final direct attempt, silently bypassing the proxy fallback for thrown-error cases (timeout, ECONNRESET, DNS failures). Only non-OK HTTP responses reached the proxy path. Fix: record the error in lastDirectError and 'break' so control falls through to the proxy fallback regardless of whether the direct path failed via thrown error or non-OK status. Also: include lastDirectError context in the final 'retries exhausted' message + Error.cause so on-call can see what triggered the fallback attempt (was: opaque 'retries exhausted'). P2: tests didn't exercise the actual proxy path. Refactored helper to accept _proxyResolver and _proxyFetcher opt overrides (production defaults to real resolveProxy/httpsProxyFetchRaw from _seed-utils.mjs; tests inject mocks). Added 4 new cases: - 429 + proxy succeeds → returns proxy data - thrown fetch error on final retry → proxy fallback runs (P1 regression guard with explicit assertion: directCalls=2, proxyCalls=1) - 429 + proxy ALSO fails → throws exhausted, original HTTP 429 in message + cause chain - Proxy returns wrong batch size → caught + warns + throws exhausted Verification: - tests/open-meteo-proxy-fallback.test.mjs: 8/8 pass (4 added) - npm run test:data: 5363/5363 pass (+4 from prior 5359) - node --check clean |
||
|
|
e6a6d4e326 |
fix(bundle-runner): stream child stdio + SIGKILL escalation on timeout (#3114)
* fix(bundle-runner): stream child stdio + SIGKILL escalation on timeout Silent Railway crashes in seed-bundle-portwatch — container exits after ~7min with ZERO logs from the hanging section. Root cause in the runner, not the seeder: execFile buffers child stdout until the callback fires, and its default SIGTERM never escalates to SIGKILL, so a child with in-flight HTTPS sockets can outlive the timeout and be killed by the container limit before any error is logged. Switch to spawn + live line-prefixed streaming. On timeout, send SIGTERM, then SIGKILL after a 10s grace. Always log the terminal reason (timeout / exit code / signal) so the next failing bundle surfaces the hung section on its own line instead of going dark. Applies to all 15 seed-bundle-*.mjs services that use this runner. * fix(bundle-runner): guard double-resolve, update docstring, add tests Review follow-ups: - Idempotent settle() so spawn 'error' + 'close' can't double-resolve - Header comment reflects spawn + streaming + SIGKILL behavior - tests/bundle-runner.test.mjs covers live streaming, SIGKILL escalation when a child ignores SIGTERM, and non-zero exit reporting * fix(bundle-runner): address PR review — declare softKill before settle, handle stdio error * fix(bundle-runner): log terminal reason BEFORE SIGKILL grace + include grace in budget Review P1 follow-up. Two gaps the previous commit left open: 1. A section with timeoutMs close to Railway's ~10min container cap could be killed by the container mid-grace, before the "Failed ... timeout" line reached the log stream. Fix: emit the terminal Failed line at the moment softKill fires (before SIGTERM), so the reason is flushed BEFORE any grace window that could be truncated by a container kill. 2. The admission check used raw timeoutMs, but worst-case runtime is timeoutMs + KILL_GRACE_MS when the child ignores SIGTERM. A section that "fit" the budget could still overrun. Fix: compare elapsed + timeout + grace against maxBundleMs. close handler still settles the promise but no longer re-logs on the timeout path (alreadyLogged flag). New test asserts the Failed line precedes SIGKILL escalation, and that budget accounts for grace. |
||
|
|
13446a2170 |
fix(seed-contract-probe): send Origin header so /api/bootstrap boundary check doesn't 401 (#3100)
* fix(seed-contract-probe): send Origin header so /api/bootstrap boundary check doesn't 401
Production probe returned {boundary: [{endpoint: '/api/bootstrap', pass: false,
status: 401, reason: 'status:401'}]}. Root cause: checkPublicBoundary's
self-fetch had no Origin header, so /api/bootstrap's validateApiKey() treated
it as a non-browser caller and required an API key.
Fix: set Origin: https://worldmonitor.app on the boundary self-fetch. This
takes the trusted-browser path without needing to embed an API key in the
probe. The probe runs edge-side with x-probe-secret internal auth; emulating
a trusted browser is only for boundary response-shape verification.
Tests still 17/17.
* fix(seed-contract-probe): explicit User-Agent on boundary self-fetch
Per AGENTS.md, server-side fetches must include a UA. middleware.ts:138
returns 403 for !ua || ua.length < 10 on non-public paths, and
/api/bootstrap is not in PUBLIC_API_PATHS — the probe works today only
because Vercel Edge implicitly adds a UA. Making it explicit.
Addresses greptile P2 on PR #3100.
|
||
|
|
1b4335353f |
fix(sentry): suppress 6 noise patterns from triage (#3105)
* fix(sentry): suppress 6 noise patterns flagged in triage Add filters so resolved issues don't re-fire: - Convex "Connection lost while action was in flight" → ignoreErrors - Convex WS onmessage JSON.parse truncation (Ping/Updated frames) → beforeSend gated on onmessage frame - chrome/moz/safari-extension frames intercepting fetch → beforeSend - Sentry SDK breadcrumb null.contains DOM crash → beforeSend gated on sentry chunk - bare "Failed to fetch" (no TypeError: prefix in msg) → extend existing regex * test(sentry): tighten onmessage guard + add tests for 3 new filters Review feedback from PR #3105: - add !hasFirstParty guard to Convex WS onmessage JSON-parse filter - add 6 test cases covering chrome-extension drop, sentry null.contains gate, and Convex onmessage suppression + first-party regression paths * fix(sentry): gate extension and sentry-contains filters on !hasFirstParty Review feedback PR #3105 (round 2): - extension-frame drop no longer suppresses events when a first-party frame is also on the stack (a real app regression could have an extension wrapper around it) - sentry null.contains filter no longer suppresses when a first-party frame is present (Sentry wraps first-party handlers, so a genuine el.contains() bug produces a stack with both main-*.js and sentry-*.js) Adds 3 more tests covering the !hasFirstParty boundary. |
||
|
|
90f4ac0f78 |
feat(consumer-prices): strict search-hit validator (shadow mode) (#3101)
* feat(consumer-prices): add 'candidate' match state + negativeTokens schema
Schema foundation for the strict-validator plan:
- migration 008 widens product_matches.match_status CHECK to include
'candidate' so weak search hits can be persisted without entering
aggregates (aggregate.ts + snapshots filter on ('auto','approved')
so candidates are excluded automatically).
- BasketItemSchema gains optional negativeTokens[] — config-driven
reject tokens for obvious class errors (e.g. 'canned' for fresh
tomatoes). Product-taxonomy splits like plain vs greek yogurt
belong in separate substitutionGroup values, not here.
- upsertProductMatch accepts 'candidate' and writes evidence_json
so reviewers can see why a match was downgraded.
* feat(consumer-prices): add validateSearchHit pure helper + known-bad test fixtures
Deterministic post-extraction validator that replaces the boolean
isTitlePlausible gate for scoring and candidate triage. Evaluates
four signals and returns { ok, score, reasons, signals }:
- class-error rejects from BasketItem.negativeTokens (whole-token
match for single words; substring match for hyphenated entries
like 'plant-based' so 'Plant-Based Yogurt' trips without needing
token-splitting gymnastics)
- non-food indicators (seeds, fertilizer, planting) — shared with
the legacy gate
- token-overlap ratio over identity tokens (>2 chars, non-packaging)
- quantity-window conformance against minBaseQty/maxBaseQty
Score is a 0..1 weighted sum (overlap 0.55, size 0.35/0.2/0, class-
clean 0.10). AUTO_MATCH_THRESHOLD=0.75 exported for the scrape-side
auto-vs-candidate decision.
Locked all five bad log examples into regression tests and added
matching positive cases so the rule set proves both sides of the
boundary. Also added vitest.config.ts so consumer-prices-core tests
run under its own config instead of inheriting the worldmonitor
root config (which excludes this directory).
* feat(consumer-prices): wire validator (shadow) + replace 1.0 auto-match
search.ts:
- Thread BasketItem constraints (baseUnit, min/maxBaseQty, negativeTokens,
substitutionGroup) through discoverTargets → fetchTarget → parseListing
using explicit named fields, not an opaque JSON blob.
- _extractFromUrl now runs validateSearchHit alongside isTitlePlausible.
Legacy gate remains the hard gate; validator is shadow-only for now —
when legacy accepts but validator rejects, a [search:shadow-reject]
line is logged with reasons + score so the rollout diff report can
inform the decision to flip the gate. No live behavior change.
- ValidatorResult attached to SearchPayload + rawPayload so scrape.ts
can score the match without re-running the validator.
scrape.ts:
- Remove unconditional matchScore:1.0 / status:'auto' insert. Use the
validator score from the adapter payload. Hits with ok=true and
score >= AUTO_MATCH_THRESHOLD (0.75) keep 'auto'; everything else
(including validator.ok=false) writes 'candidate' with evidence_json
carrying the reasons + signals. Aggregates filter on ('auto','approved')
so candidates are excluded automatically.
- Adapters without a validator (exa-search, etc.) fall back to the
legacy 1.0/auto behavior so this PR is a no-op for non-search paths.
* feat(consumer-prices): populate negativeTokens for 6 known-bad groups
* fix(consumer-prices): enforce validator on pin path + drop 'cane' from sugar rejects
Addresses PR #3101 review:
1. Pinned direct hits bypassed the validator downgrade — the new
auto-vs-candidate decision only ran inside the !wasDirectHit block,
so a pin that drifted onto the wrong product (the steady-state
common path) would still flow poisoned prices into aggregates
through the existing 'auto' match. Now: before inserting an
observation, if the direct hit's validator.ok === false, skip the
observation and route the target through handlePinError so the pin
soft-disables after 3 strikes. Legacy isTitlePlausible continues to
gate the pin extraction itself.
2. 'cane' was a hard reject for sugar_white across all 10 baskets but
'white cane sugar' is a legitimate SKU descriptor — would have
downgraded real products to candidate and dropped coverage. Removed
from every essentials_*.yaml sugar_white negativeTokens list.
Added a regression test that locks in 'Silver Spoon White Cane
Sugar 1kg' as a must-pass positive case.
* fix(consumer-prices): strip size tokens from identity + protect approved rows
Addresses PR #3101 round 2 review:
1. Compact size tokens ("1kg", "500g", "250ml") were kept as identity
tokens. Firecrawl emits size spaced ("1 kg"), which tokenises to
["1","kg"] — both below the length>2 floor — so the compact "1kg"
token could never match. Short canonical names like "Onions 1kg"
lost 0.5 token overlap and legitimate hits landed at score 0.725 <
AUTO_MATCH_THRESHOLD, silently downgrading to candidate. Size
fidelity is already enforced by the quantity-window check; identity
tokens now ignore /^\d+(?:\.\d+)?[a-z]+$/. New regression test
locks in "Fresh Red Onions 1 kg" as a must-pass case.
2. upsertProductMatch's DO UPDATE unconditionally wrote EXCLUDED.status.
A re-scrape whose validator scored an already-approved URL below
0.75 would silently demote human-curated 'approved' rows to
'candidate'. Added a CASE guard so approved stays approved; every
other state follows the new validator verdict.
* fix(consumer-prices): widen curated-state guard to review + rejected
PR #3101 round 3: the CASE only protected 'approved' from being
overwritten. 'review' (written by validate.ts when a price is an
outlier, or by humans sending a row back) and 'rejected' (human
block) are equally curated — a re-scrape under this path silently
overwrites them with the fresh validator verdict and re-enables the
URL in aggregate queries on the next pass.
Widen the immutable set to ('approved','review','rejected'). Also
stop clearing pin_disabled_at on those rows so a quarantined pin
keeps its disabled flag until the review workflow resolves it.
* fix(analyze-stock): classify dividend frequency by median gap
recentDivs.length within a hard 365.25-day window misclassifies quarterly
payers whose last-year Q1 payment falls just outside the cutoff — common
after mid-April each year, when Date.now() - 365.25d lands after Jan's
payment timestamp. The test 'non-zero CAGR for a quarterly payer' flaked
calendar-dependently for this reason.
Prefer median inter-payment interval: quarterly = ~91d median gap,
regardless of where the trailing-12-month window happens to bisect the
payment series. Falls back to the old count when <2 entries exist.
Also documents the CAGR filter invariant in the test helper.
* fix(analyze-stock): suppress frequency when no recent divs + detect regime slowdowns
Addresses PR #3102 review:
1. Suspended programs no longer leak a frequency badge. When recentDivs
is empty, dividendYield and trailingAnnualDividendRate are both 0;
emitting 'Quarterly' derived from historical median would contradict
those zeros in the UI. paymentsPerYear now short-circuits to 0 before
the interval classifier runs.
2. Whole-history median-gap no longer masks cadence regime changes. The
reconciliation now depends on trailing-year count:
recent >= 3 → interval classifier (robust to calendar drift)
recent 1..2 → inspect most-recent inter-payment gap:
> 180d = real slowdown, trust count (Annual)
<= 180d = calendar drift, trust interval (Quarterly)
recent 0 → empty frequency (suspended)
The interval classifier itself is now scoped to the last 2 years so
it responds to regime changes instead of averaging over 5y of history.
Regression tests:
- 'emits empty frequency when the dividend program has been suspended' —
3y of quarterly history + 18mo silence must report '' not 'Quarterly'.
- 'detects a recent quarterly → annual cadence change' — 12 historical
quarterly payments + 1 recent annual payment must report 'Annual'.
* fix(analyze-stock): scope interval median to trailing year when recent>=3
Addresses PR #3102 review round 2: the reconciler's recent>=3 branch
called paymentsPerYearFromInterval(entries), which scopes to the last
2 years. A monthly→quarterly shift (12 monthly payments in year -2..-1
plus 4 quarterly in year -1..0) produced a 2-year median of ~30d and
misclassified as Monthly even though the current trailing-year cadence
is clearly quarterly.
Pass recentDivs directly to the interval classifier when recent>=3.
Two payments in the trailing year = 1 gap which suffices for the median
(gap count >=1, median well-defined). The historical-window 2y scoping
still applies for the recent 1..2 branch, where we actively need
history to distinguish drift from slowdown.
Regression test: 12 monthly payments from -13..-24 months ago + 4
quarterly payments inside the trailing year must classify as Quarterly.
* fix(analyze-stock): use true median (avg of two middles) for even gap counts
PR #3102 P2: gaps[floor(length/2)] returns the upper-middle value for
even-length arrays, biasing toward slower cadence at classifier
thresholds when the trailing-year sample is small. Use the average of
the two middles for even lengths. Harmless on 5-year histories with
50+ gaps where values cluster, but correct at sparse sample sizes where
the trailing-year branch can have only 2–3 gaps.
|
||
|
|
fffc5d9607 |
fix(analyze-stock): classify dividend frequency by median gap (#3102)
* fix(analyze-stock): classify dividend frequency by median gap recentDivs.length within a hard 365.25-day window misclassifies quarterly payers whose last-year Q1 payment falls just outside the cutoff — common after mid-April each year, when Date.now() - 365.25d lands after Jan's payment timestamp. The test 'non-zero CAGR for a quarterly payer' flaked calendar-dependently for this reason. Prefer median inter-payment interval: quarterly = ~91d median gap, regardless of where the trailing-12-month window happens to bisect the payment series. Falls back to the old count when <2 entries exist. Also documents the CAGR filter invariant in the test helper. * fix(analyze-stock): suppress frequency when no recent divs + detect regime slowdowns Addresses PR #3102 review: 1. Suspended programs no longer leak a frequency badge. When recentDivs is empty, dividendYield and trailingAnnualDividendRate are both 0; emitting 'Quarterly' derived from historical median would contradict those zeros in the UI. paymentsPerYear now short-circuits to 0 before the interval classifier runs. 2. Whole-history median-gap no longer masks cadence regime changes. The reconciliation now depends on trailing-year count: recent >= 3 → interval classifier (robust to calendar drift) recent 1..2 → inspect most-recent inter-payment gap: > 180d = real slowdown, trust count (Annual) <= 180d = calendar drift, trust interval (Quarterly) recent 0 → empty frequency (suspended) The interval classifier itself is now scoped to the last 2 years so it responds to regime changes instead of averaging over 5y of history. Regression tests: - 'emits empty frequency when the dividend program has been suspended' — 3y of quarterly history + 18mo silence must report '' not 'Quarterly'. - 'detects a recent quarterly → annual cadence change' — 12 historical quarterly payments + 1 recent annual payment must report 'Annual'. * fix(analyze-stock): scope interval median to trailing year when recent>=3 Addresses PR #3102 review round 2: the reconciler's recent>=3 branch called paymentsPerYearFromInterval(entries), which scopes to the last 2 years. A monthly→quarterly shift (12 monthly payments in year -2..-1 plus 4 quarterly in year -1..0) produced a 2-year median of ~30d and misclassified as Monthly even though the current trailing-year cadence is clearly quarterly. Pass recentDivs directly to the interval classifier when recent>=3. Two payments in the trailing year = 1 gap which suffices for the median (gap count >=1, median well-defined). The historical-window 2y scoping still applies for the recent 1..2 branch, where we actively need history to distinguish drift from slowdown. Regression test: 12 monthly payments from -13..-24 months ago + 4 quarterly payments inside the trailing year must classify as Quarterly. * fix(analyze-stock): use true median (avg of two middles) for even gap counts PR #3102 P2: gaps[floor(length/2)] returns the upper-middle value for even-length arrays, biasing toward slower cadence at classifier thresholds when the trailing-year sample is small. Use the average of the two middles for even lengths. Harmless on 5-year histories with 50+ gaps where values cluster, but correct at sparse sample sizes where the trailing-year branch can have only 2–3 gaps. |
||
|
|
1346946f15 |
fix(middleware): allow /api/seed-contract-probe to bypass bot UA filter (#3099)
Vercel log showed 'Middleware 403 Forbidden' on /api/seed-contract-probe for both curl-from-ops and UptimeRobot requests. middleware.ts's BOT_UA regex matches 'curl/' and 'bot', so any monitoring/probe UA was blocked before reaching the handler — even though the probe has its own RELAY_SHARED_SECRET auth that makes the UA check redundant. Added /api/seed-contract-probe to PUBLIC_API_PATHS (joining /api/version and /api/health). Safe: the endpoint enforces x-probe-secret matching RELAY_SHARED_SECRET internally; bypassing the generic UA gate does not reduce security. Commented the allowlist to spell out the invariant: entries must carry their own auth, because this list disables the middleware's generic bot gate. Verified via Vercel Inspector log trace: Firewall: bypass → OK Middleware: 403 Forbidden ← this commit fixes it Handler: (unreachable before fix) |
||
|
|
044598346e |
feat(seed-contract): PR 2a — runSeed envelope dual-write + 91 seeders migrated (#3097)
* feat(seed-contract): PR 2a — runSeed envelope dual-write + 91 seeders migrated
Opt-in contract path in runSeed: when opts.declareRecords is provided, write
{_seed, data} envelope to the canonical key alongside legacy seed-meta:*
(dual-write). State machine: OK / OK_ZERO / RETRY with zeroIsValid opt.
declareRecords throws or returns non-integer → hard fail (contract violation).
extraKeys[*] support per-key declareRecords; each extra key writes its own
envelope. Legacy seeders (no declareRecords) entirely unchanged.
Migrated all 91 scripts/seed-*.mjs to contract mode. Each exports
declareRecords returning the canonical record count, and passes
schemaVersion: 1 + maxStaleMin (matched to api/health.js SEED_META, or 2.5x
interval where no registry entry exists). Contract conformance reports 84/86
seeders with full descriptor (2 pre-existing warnings).
Legacy seed-meta keys still written so unmigrated readers keep working;
follow-up slices flip health.js + readers to envelope-first.
Tests: 61/61 PR 1 tests still pass.
Next slices for PR 2:
- api/health.js registry collapse + 15 seed-bundle-*.mjs canonicalKey wiring
- reader migration (mcp, resilience, aviation, displacement, regional-snapshot)
- direct writers — ais-relay.cjs, consumer-prices-core publish.ts
- public-boundary stripSeedEnvelope + test migration
Plan: docs/plans/2026-04-14-002-fix-runseed-zero-record-lockout-plan.md
* fix(seed-contract): unwrap envelopes in internal cross-seed readers
After PR 2a enveloped 91 canonical keys as {_seed, data}, every script-side
reader that returned the raw parsed JSON started silently handing callers the
envelope instead of the bare payload. WoW baselines (bigmac, grocery-basket,
fear-greed) saw undefined .countries / .composite; seed-climate-anomalies saw
undefined .normals from climate:zone-normals:v1; seed-thermal-escalation saw
undefined .fireDetections from wildfire:fires:v1; seed-forecasts' ~40-key
pipeline batch returned envelopes for every input.
Fix: route every script-side reader through unwrapEnvelope(...).data. Legacy
bare-shape values pass through unchanged (unwrapEnvelope returns
{_seed: null, data: raw} for any non-envelope shape).
Changed:
- scripts/_seed-utils.mjs: import unwrapEnvelope; redisGet, readSeedSnapshot,
verifySeedKey all unwrap. Exported new readCanonicalValue() helper for
cross-seed consumers.
- 18 seed-*.mjs scripts with local redisGet-style helpers or inline fetch
patched to unwrap via the envelope source module (subagent sweep).
- scripts/seed-forecasts.mjs pipeline batch: parse() unwraps each result.
- scripts/seed-energy-spine.mjs redisMget: unwraps each result.
Tests:
- tests/seed-utils-envelope-reads.test.mjs: 7 new cases covering envelope
+ legacy + null paths for readSeedSnapshot and verifySeedKey.
- Full seed suite: 67/67 pass (was 61, +6 new).
Addresses both of user's P1 findings on PR #3097.
* feat(seed-contract): envelope-aware reads in server + api helpers
Every RPC and public-boundary reader now automatically strips _seed from
contract-mode canonical keys. Legacy bare-shape values pass through unchanged
(unwrapEnvelope no-ops on non-envelope shapes).
Changed helpers (one-place fix — unblocks ~60 call sites):
- server/_shared/redis.ts: getRawJson, getCachedJson, getCachedJsonBatch
unwrap by default. cachedFetchJson inherits via getCachedJson.
- api/_upstash-json.js: readJsonFromUpstash unwraps (covers api/mcp.ts
tool responses + all its canonical-key reads).
- api/bootstrap.js: getCachedJsonBatch unwraps (public-boundary —
clients never see envelope metadata).
Left intentionally unchanged:
- api/health.js / api/seed-health.js: read only seed-meta:* keys which
remain bare-shape during dual-write. unwrapEnvelope already imported at
the meta-read boundary (PR 1) as a defensive no-op.
Tests: 67/67 seed tests pass. typecheck + typecheck:api clean.
This is the blast-radius fix the PR #3097 review called out — external
readers that would otherwise see {_seed, data} after the writer side
migrated.
* fix(test): strip export keyword in vm.runInContext'd seed source
cross-source-signals-regulatory.test.mjs loads scripts/seed-cross-source-signals.mjs
via vm.runInContext, which cannot parse ESM `export` syntax. PR 2a added
`export function declareRecords` to every seeder, which broke this test's
static-analysis approach.
Fix: strip the `export` keyword from the declareRecords line in the
preprocessed source string so the function body still evaluates as a plain
declaration.
Full test:data suite: 5307/5307 pass. typecheck + typecheck:api clean.
* feat(seed-contract): consumer-prices publish.ts writes envelopes
Wrap the 5 canonical keys written by consumer-prices-core/src/jobs/publish.ts
(overview, movers:7d/30d, freshness, categories:7d/30d/90d, retailer-spread,
basket-series) in {_seed, data} envelopes. Legacy seed-meta:<key> writes
preserved for dual-write.
Inlined a buildEnvelope helper (10 lines) rather than taking a cross-package
dependency — consumer-prices-core is a standalone npm package. Documented the
four-file parity contract (mjs source, ts mirror, js edge mirror, this copy).
Contract fields: sourceVersion='consumer-prices-core-publish-v1', schemaVersion=1,
state='OK' (recordCount>0) or 'OK_ZERO' (legitimate zero).
Typecheck: no new errors in publish.ts.
* fix(seed-contract): 3 more server-side readers unwrap envelopes
Found during final audit:
- server/worldmonitor/resilience/v1/_shared.ts: resilience score reader
parsed cached GetResilienceScoreResponse raw. Contract-mode seed-resilience-scores
now envelopes those keys.
- server/worldmonitor/resilience/v1/get-resilience-ranking.ts: p05/p95
interval lookup parsed raw from seed-resilience-scores' extra-key path.
- server/worldmonitor/infrastructure/v1/_shared.ts: mgetJson() used for
count-source keys (wildfire:fires:v1, news:insights:v1) which are both
contract-mode now.
All three now unwrap via server/_shared/seed-envelope. Legacy shapes pass
through unchanged.
Typecheck clean.
* feat(seed-contract): ais-relay.cjs direct writes produce envelopes
32 canonical-key write sites in scripts/ais-relay.cjs now produce {_seed, data}
envelopes. Inlined buildEnvelope() (CJS module can't require ESM source) +
envelopeWrite(key, data, ttlSeconds, meta) wrapper. Enveloped keys span market
bootstrap, aviation, cyber-threats, theater-posture, weather-alerts, economic
spending/fred/worldbank, tech-events, corridor-risk, usni-fleet, shipping-stress,
social:reddit, wsb-tickers, pizzint, product-catalog, chokepoint transits,
ucdp-events, satellites, oref.
Left bare (not seeded data keys): seed-meta:* (dual-write legacy),
classifyCacheKey LLM cache, notam:prev-closed-state internal state,
wm:notif:scan-dedup flags.
Updated tests/ucdp-seed-resilience.test.mjs regex to accept both upstashSet
(pre-contract) and envelopeWrite (post-contract) call patterns.
* feat(seed-contract): 15 bundle files add canonicalKey for envelope gate
54 bundle sections across 12 files now declare canonicalKey alongside the
existing seedMetaKey. _bundle-runner.mjs (from PR 1) prefers canonicalKey
when both are present — gates section runs on envelope._seed.fetchedAt
read directly from the data key, eliminating the meta-outlives-data class
of bugs.
Files touched:
- climate (5), derived-signals (2), ecb-eu (3), energy-sources (6),
health (2), imf-extended (4), macro (10), market-backup (9),
portwatch (4), relay-backup (2), resilience-recovery (5), static-ref (2)
Skipped (14 sections, 3 whole bundles): multi-key writers, dynamic
templated keys (displacement year-scoped), or non-runSeed orchestrators
(regional brief cron, resilience-scores' 222-country publish, validation/
benchmark scripts). These continue to use seedMetaKey or their own gate.
seedMetaKey preserved everywhere — dual-write. _bundle-runner.mjs falls
back to legacy when canonicalKey is absent.
All 15 bundles pass node --check. test:data: 5307/5307. typecheck:all: clean.
* fix(seed-contract): 4 PR #3097 review P1s — transform/declareRecords mismatches + envelope leaks
Addresses both P1 findings and the extra-key seed-meta leak surfaced in review:
1. runSeed helper-level invariant: seed-meta:* keys NEVER envelope.
scripts/_seed-utils.mjs exports shouldEnvelopeKey(key) — returns false for
any key starting with 'seed-meta:'. Both atomicPublish (canonical) and
writeExtraKey (extras) gate the envelope wrap through this helper. Fixes
seed-iea-oil-stocks' ANALYSIS_META_EXTRA_KEY silently getting enveloped,
which broke health.js parsing the value as bare {fetchedAt, recordCount}.
Also defends against any future manual writeExtraKey(..., envelopeMeta)
call that happens to target a seed-meta:* key.
2. seed-token-panels canonical + extras fixed.
publishTransform returns data.defi (the defi panel itself, shape {tokens}).
Old declareRecords counted data.defi.tokens + data.ai.tokens + data.other.tokens
on the transformed payload → 0 → RETRY path → canonical market:defi-tokens:v1
never wrote, and because runSeed returned before the extraKeys loop,
market:ai-tokens:v1 + market:other-tokens:v1 stayed stale too.
New: declareRecords counts data.tokens on the transformed shape. AI_KEY +
OTHER_KEY extras reuse the same function (transforms return structurally
identical panels). Added isMain guard so test imports don't fire runSeed.
3. api/product-catalog.js cached reader unwraps envelope.
ais-relay.cjs now envelopes product-catalog:v2 via envelopeWrite(). The
edge reader did raw JSON.parse(result) and returned {_seed, data} to
clients, breaking the cached path. Fix: import unwrapEnvelope from
./_seed-envelope.js, apply after JSON.parse. One site — :238-241 is
downstream of getFromCache(), so the single reader fix covers both.
4. Regression lock tests/seed-contract-transform-regressions.test.mjs (11 cases):
- shouldEnvelopeKey invariant: seed-meta:* false, canonical true
- Token-panels declareRecords works on transformed shape (canonical + both extras)
- Explicit repro of pre-fix buggy signature returning 0 — guards against revert
- resolveRecordCount accepts 0, rejects non-integer
- Product-catalog envelope unwrap returns bare shape; legacy passes through
Verification:
- npm run test:data → 5318/5318 pass (was 5307 — 11 new regressions)
- npm run typecheck:all → clean
- node --check on every modified script
iea-oil-stocks canonical declareRecords was NOT broken (user confirmed during
review — buildIndex preserves .members); only its ANALYSIS_META_EXTRA_KEY
was affected, now covered generically by commit 1's helper invariant.
* fix(seed-contract): seed-token-panels validateFn also runs on post-transform shape
Review finding: fixing declareRecords wasn't sufficient — atomicPublish() runs
validateFn(publishData) on the transformed payload too. seed-token-panels'
validate() checked data.defi/.ai/.other on the transformed {tokens} shape,
returned false, and runSeed took the early skipped-write branch (before even
reaching the declareRecords RETRY logic). Net effect: same as before the
declareRecords fix — canonical + both extras stayed stale.
Fix: validate() now checks the canonical defi panel directly (Array.isArray
(data?.tokens) && has at least one t.price > 0). AI/OTHER panels are validated
implicitly by their own extraKey declareRecords on write.
Audited the other 9 seeders with publishTransform (bls-series, bis-extended,
bis-data, gdelt-intel, trade-flows, iea-oil-stocks, jodi-gas, sanctions-pressure,
forecasts): all validateFn's correctly target the post-transform shape. Only
token-panels regressed.
Added 4 regression tests (tests/seed-contract-transform-regressions.test.mjs):
- validate accepts transformed panel with priced tokens
- validate rejects all-zero-price tokens
- validate rejects empty/missing tokens
- Explicit pre-fix repro (buggy old signature fails on transformed shape)
Verification:
- npm run test:data → 5322/5322 pass (was 5318; +4 new)
- npm run typecheck:all → clean
- node --check clean
* feat(seed-contract): add /api/seed-contract-probe validation endpoint
Single machine-readable gate for 'is PR #3097 working in production'.
Replaces the curl/jq ritual with one authenticated edge call that returns
HTTP 200 ok:true or 503 + failing check list.
What it validates:
- 8 canonical keys have {_seed, data} envelopes with required data fields
and minRecords floors (fsi-eu, zone-normals, 3 token panels + minRecords
guard against token-panels RETRY regression, product-catalog, wildfire,
earthquakes).
- 2 seed-meta:* keys remain BARE (shouldEnvelopeKey invariant; guards
against iea-oil-stocks ANALYSIS_META_EXTRA_KEY-class regressions).
- /api/product-catalog + /api/bootstrap responses contain no '_seed' leak.
Auth: x-probe-secret header must match RELAY_SHARED_SECRET (reuses existing
Vercel↔Railway internal trust boundary).
Probe logic is exported (checkProbe, checkPublicBoundary, DEFAULT_PROBES) for
hermetic testing. tests/seed-contract-probe.test.mjs covers every branch:
envelope pass/fail on field/records/shape, bare pass/fail on shape/field,
missing/malformed JSON, Redis non-2xx, boundary seed-leak detection,
DEFAULT_PROBES sanity (seed-meta invariant present, token-panels minRecords
guard present).
Usage:
curl -H "x-probe-secret: $RELAY_SHARED_SECRET" \
https://api.worldmonitor.app/api/seed-contract-probe
PR 3 will extend the probe with a stricter mode that asserts seed-meta:*
keys are GONE (not just bare) once legacy dual-write is removed.
Verification:
- tests/seed-contract-probe.test.mjs → 15/15 pass
- npm run test:data → 5338/5338 (was 5322; +16 new incl. conformance)
- npm run typecheck:all → clean
* fix(seed-contract): tighten probe — minRecords on AI/OTHER + cache-path source header
Review P2 findings: the probe's stated guards were weaker than advertised.
1. market:ai-tokens:v1 + market:other-tokens:v1 probes claimed to guard the
token-panels extra-key RETRY regression but only checked shape='envelope'
+ dataHas:['tokens']. If an extra-key declareRecords regressed to 0, both
probes would still pass because checkProbe() only inspects _seed.recordCount
when minRecords is set. Now both enforce minRecords: 1.
2. /api/product-catalog boundary check only asserted no '_seed' leak — which
is also true for the static fallback path. A broken cached reader
(getFromCache returning null or throwing) could serve fallback silently
and still pass this probe. Now:
- api/product-catalog.js emits X-Product-Catalog-Source: cache|dodo|fallback
on the response (the json() helper gained an optional source param wired
to each of the three branches).
- checkPublicBoundary declaratively requires that header's value match
'cache' for /api/product-catalog, so a fallback-serve fails the probe
with reason 'source:fallback!=cache' or 'source:missing!=cache'.
Test updates (tests/seed-contract-probe.test.mjs):
- Boundary check reworked to use a BOUNDARY_CHECKS config with optional
requireSourceHeader per endpoint.
- New cases: served-from-cache passes, served-from-fallback fails with source
mismatch, missing header fails, seed-leak still takes precedence, bad
status fails.
- Token-panels sanity test now asserts minRecords≥1 on all 3 panels.
Verification:
- tests/seed-contract-probe.test.mjs → 17/17 pass (was 15, +2 net)
- npm run test:data → 5340/5340
- npm run typecheck:all → clean
|
||
|
|
224d6fa2e3 |
fix(consumer-prices): count risers+fallers in movers recordCount (#3098)
* fix(consumer-prices): count risers+fallers in movers recordCount Health endpoint reported consumerPricesMovers as EMPTY_DATA whenever the 30d window had zero risers, because recordCount's `??` chain in publish.ts picks only one sibling array. Bipolar payloads (risers[] + fallers[]) need the sum; otherwise a valid all-fallers payload registers as 0 records and trips false staleness alarms. Fix both the authoritative publish job and the manual fallback seed script. * fix(consumer-prices): floor movers recordCount at 1 + include essentialsSeries fallback Addresses PR #3098 review: 1. All-flat markets (every sampled item unchanged) legitimately produce risers=[] AND fallers=[] from buildMoversSnapshot. Summing the two still yields 0 → health reports EMPTY_DATA for a valid snapshot. Floor at 1; advanceSeedMeta already gates writes on upstream freshness, so this can't mask an upstream-unavailable case. 2. Seed script's non-movers fallback was missing essentialsSeries, so basket-series payloads from the manual script reported recordCount=1 instead of the series length. Align with publish.ts. * fix(consumer-prices): force recordCount=0 for upstreamUnavailable placeholders Addresses PR #3098 review: flooring movers at 1 in the manual fallback seeder also floored the synthetic emptyMovers() placeholder (upstreamUnavailable=true) the script writes when BASE_URL is unset or the upstream returns null. Since writeExtraKeyWithMeta always persists seed-meta, that made a real outage read green in api/health.js. Short-circuit upstreamUnavailable payloads to 0 so the outage surfaces. |
||
|
|
dc10e47197 |
feat(seed-contract): PR 1 foundation — envelope + contract + conformance test (#3095)
* feat(seed-contract): PR 1 foundation — envelope helpers + contract validators + static conformance test
Adds the foundational pieces for the unified seed contract rollout described in
docs/plans/2026-04-14-002-fix-runseed-zero-record-lockout-plan.md. Behavior-
preserving by construction: legacy-shape Redis values unwrap as { _seed: null,
data: raw } and pass through every helper unchanged.
New files:
- scripts/_seed-envelope-source.mjs — single source of truth for unwrapEnvelope,
stripSeedEnvelope, buildEnvelope.
- api/_seed-envelope.js — edge-safe mirror (AGENTS.md:80 forbids api/* importing
from server/).
- server/_shared/seed-envelope.ts — TS mirror with SeedMeta, SeedEnvelope,
UnwrapResult types.
- scripts/_seed-contract.mjs — SeedContractError + validateDescriptor (10
required fields, 10 optional, unknown-field rejection) + resolveRecordCount
(non-negative integer or throw).
- scripts/verify-seed-envelope-parity.mjs — diffs function bodies between the
two JS copies; TS copy guarded by tsc.
- tests/seed-envelope.test.mjs — 14 tests for the three helpers (null,
legacy-passthrough, stringified JSON, round-trip).
- tests/seed-contract.test.mjs — 25 tests for validateDescriptor/
resolveRecordCount + a soft-warn conformance scan that STATICALLY parses
scripts/seed-*.mjs (never dynamic import — several seeders process.exit() at
module load). Currently logs 91 seeders awaiting declareRecords migration.
Wiring (minimal, behavior-preserving):
- api/health.js: imports unwrapEnvelope; routes readSeedMeta's parsed value
through it. Legacy meta has no _seed wrapper → passes through unchanged.
- scripts/_bundle-runner.mjs: readSectionFreshness prefers envelope at
section.canonicalKey when present, falls back to the existing
seed-meta:<key> read via section.seedMetaKey (unchanged path today since no
bundle defines canonicalKey yet).
No seeder modified. No writes changed. All 5279 existing data tests still
green; both typechecks clean; parity verifier green; 39 new tests pass.
PR 2 will migrate seeders, bundles, and readers to envelope semantics. PR 3
removes the legacy path and hard-fails the conformance test.
* fix(seed-contract): address PR #3095 review — metaTtlSeconds opt, bundle fallback, strict conformance mode
Review findings applied:
P1 — metaTtlSeconds missing from OPTIONAL_FIELDS whitelist.
scripts/seed-jodi-gas.mjs:250 passes metaTtlSeconds to runSeed(); field is
consumed by _seed-utils writeSeedMeta. Without it in the whitelist, PR 2's
validateDescriptor wiring would throw 'unknown field' the moment jodi-gas
migrates. Added with a 'removed in PR 3' note.
P2 — Bundle canonicalKey short-circuit over-runs during migration.
readSectionFreshness previously returned null if canonicalKey had no envelope
yet, even when a legacy seed-meta key was also declared — making every cron
re-run the section. Fixed to fall through to seedMetaKey on null envelope so
the transition state is safe.
P3 — Conformance soft-warn signal was invisible in CI.
tests/seed-contract.test.mjs now emits a t.diagnostic summary line
('N/M seeders export declareRecords') visible on every run and gates hard-fail
behind SEED_CONTRACT_STRICT=1 so PR 3 can flip to strict without more code.
Nitpick — parity regex missed 'export async function'.
Added '(?:async\s+)?' to scripts/verify-seed-envelope-parity.mjs function
extraction regex.
Verified: 39 tests green, parity verifier green, strict mode correctly
hard-fails with 91 seeders missing (expected during PR 1).
* fix(seed-contract): address review round 2 — NaN/empty-string validation, Error cause, parity CI wiring
P2 — Non-finite ttlSeconds/maxStaleMin bypassed validation.
`typeof NaN === 'number'` and `NaN > 0 === false` meant a NaN duration
passed the old typeof+<=0 checks and would have poisoned TTLs once
validateDescriptor is wired into runSeed. Now gated by Number.isFinite,
which rejects NaN and ±Infinity. Tests added for NaN/Infinity on both
fields.
P2 — Empty/whitespace-only strings for domain/resource/canonicalKey/sourceVersion
bypassed validation. Added .trim() === '' rejection + tests per field.
This mattered because canonicalKey='' would have landed writes at the
empty key and seed-meta under a blank resource namespace.
P3 — SeedContractError silently dropped the Error v2 cause option.
Constructor now forwards { cause } through super() so err.cause works
with standard tooling (Node's default stack printer, Sentry chained-cause
serialization). resolveRecordCount's manual err.cause = err assignment
was replaced with the options-bag form. Test added for both constructor
direct-use and the resolveRecordCount wrap path.
P3 — Parity verifier was not on an automated path. Added
tests/seed-envelope-parity.test.mjs which spawns scripts/verify-seed-envelope-parity.mjs
via execFile; non-zero exit (drift) → test fails. Now runs as part of
`npm run test:data` (tsx --test tests/*.test.mjs). Drift injection
confirmed: sed -i modifying api/_seed-envelope.js makes the test fail
with 'Command failed' from execFile.
51 tests total (was 39). All green on clean tree.
* fix(seed-contract): conformance test checks full descriptor, not just declareRecords
Previous conformance check green-lit any seeder that exported
declareRecords, even if the runSeed(...) call-site omitted other
validateDescriptor-required opts (validateFn, ttlSeconds, sourceVersion,
schemaVersion, maxStaleMin). That would have produced a false readiness
signal for PR 3's strict flip: test goes green, but wiring
validateDescriptor() into runSeed in PR 2 would still throw at runtime
across the fleet.
Examples verified on the PR head:
- scripts/seed-cot.mjs:188-192 — no sourceVersion/schemaVersion/maxStaleMin
- scripts/seed-market-breadth.mjs:121-124 — same
- scripts/seed-jodi-gas.mjs:248-253 — no schemaVersion/maxStaleMin
Now the conformance test:
1. AST-lite extracts the runSeed(...) call site with balanced parens,
tolerating strings and comments.
2. Checks every REQUIRED_OPTS_FIELDS entry (validateFn, declareRecords,
ttlSeconds, sourceVersion, schemaVersion, maxStaleMin) is present as
an object key in that call-site.
3. Emits a per-file diagnostic listing missing fields.
4. Migration signal is now accurate: 0/91 seeders fully satisfy the
descriptor (was claiming 0/91 missing just declareRecords). Matches
the underlying validateDescriptor behavior.
Verified: strict mode (SEED_CONTRACT_STRICT=1) surfaces 'opt:schemaVersion,
opt:maxStaleMin' as missing fields per seeder — actionable for PR 2
migration work. 51 tests total (unchanged count; behavior change is in
which seeders the one conformance test considers migrated).
* fix(seed-contract): strip comments/strings before parsing runSeed() call site
The conformance scanner located the first 'runSeed(' substring in the raw
source, which caught commented-out mentions upstream of the real call.
Offending files where this produced false 'incomplete' diagnoses:
- scripts/seed-bis-data.mjs:209 // runSeed() calls process.exit(0)…
real call at :220
- scripts/seed-economy.mjs:788 header comment mentioning runSeed()
real call at :891
Three files had the same pattern. Under strict mode these would have been
false hard failures in PR 3 even when the real descriptor was migrated.
Fix:
- stripCommentsAndStrings(src) produces a view where block comments, line
comments, and string/template literals are replaced with spaces (line
feeds preserved). Indices stay aligned with the original source so
extractRunSeedCall can match against the stripped view and then slice
the original source for the real call body.
- descriptorFieldsPresent() also runs its field-presence regex against
the stripped call body so '// TODO: validateFn' inside the call doesn't
fool the check.
- hasRunSeedCall() uses the stripped view too, which correctly excludes
5 seeders that only mentioned runSeed in comments. Count dropped
91→86 real callers.
Added 4 targeted tests covering:
- runSeed() inside a line comment ahead of the real call
- runSeed() inside a block comment
- runSeed() inside a string literal ("don't call runSeed() directly")
- descriptor field names inside an inline comment don't count as present
Verified on the actual files: seed-bis-data.mjs first real runSeed( in
stripped source is at line 220 (was line 209 before fix).
40 tests total, all green.
* fix(seed-contract): parity verifier survives unbalanced braces in string/template literals
Addresses Greptile P2 on PR #3095: the body extractor in
scripts/verify-seed-envelope-parity.mjs counted raw { and } on every
character. A future helper body that legitimately contains
`const marker = '{'` would have pushed depth past zero at the literal
brace and truncated the body — silently masking drift in the rest of
the function.
Extracted the scan into scanBalanced(source, start, open, close) which
skips characters inside line comments, block comments, and string /
template literals (with escape handling and template-literal ${} recursion
for interpolation). Call sites in extractFunctions updated to use the new
scanner for both the arg-list parens and the function body braces.
Made extractFunctions and scanBalanced exported so the new test file
can exercise them directly. Gated main() behind an isMain check so
importing the module from tests doesn't trigger process.exit.
New tests in tests/seed-envelope-parity.test.mjs:
- extractFunctions tolerates unbalanced braces in string literals
- same for template literals
- same for braces inside block comments
- same for braces inside line comments
- scanBalanced respects backslash-escapes inside strings
- scanBalanced recurses into template-literal ${} interpolation
Also addresses the other two Greptile P2s which were already fixed in
earlier commits on this branch:
- Empty-string gap (
|
||
|
|
e39ffc3c3b |
feat(analytics): send sub status & planKey with Umami identity (#3093)
* feat(analytics): send subscription status and planKey with Umami identity (#3092) Enhance identifyUser() to include subStatus and planKey from billing data. initAuthAnalytics() now subscribes to both auth and billing changes, re-identifying when subscription data arrives via Convex. Also identify all authenticated users, not just pro. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: store billing unsubscribe + add destroyAuthAnalytics teardown Address Greptile P2 review: store the billing unsubscribe for symmetry with _unsubAuth, and add destroyAuthAnalytics() so both listeners can be properly cleaned up. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(analytics): reset _lastSub on sign-out to prevent cross-user leak When user A signs out and user B signs in, the auth callback fires first with user B's id while _lastSub still holds A's subscription. identifyUser would then send (userB.id, userB.role, userA.status, userA.planKey) to Umami until billing's Convex update landed. Clear _lastSub in _syncIdentity's signed-out branch so the next sign-in starts clean — subStatus/planKey are simply omitted (both guards use != null) until billing delivers the new sub. * fix(analytics): reset _lastSub on direct user switch, not just sign-out App.ts:857-864 supports a direct user A -> user B account switch without an intermediate null auth state. The previous fix only cleared _lastSub inside _syncIdentity()'s sign-out branch, so on a direct switch the auth subscriber fired with user B while _lastSub still held user A's subscription, leaking A's status/planKey into B's Umami identify call. Detect user-id change in the auth subscriber and drop _lastSub before calling _syncIdentity(). Sign-out reset inside _syncIdentity is kept as belt-and-suspenders for the no-auth path. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Elie Habib <elie.habib@gmail.com> |
||
|
|
9b180d6ee2 |
fix(bundle-runner): wall-time budget to prevent Railway 10min SIGKILL (#3094)
* fix(bundle-runner): enforce wall-time budget to prevent Railway 10min SIGKILL
Railway cron services SIGKILL the container at 10min. When a bundle
happened to have two heavy sections due in the same tick (e.g.
PW-Main + PW-Port-Activity with timeoutMs totaling 15min+), the second
section's stdout never flushed and Railway marked the run as crashed —
even though earlier sections published successfully.
- _bundle-runner.mjs: add maxBundleMs budget (default 9min, 60s headroom
under Railway's 10min ceiling). Sections whose worst-case timeout would
exceed the remaining budget are deferred to the next tick with a clear
log line. Summary now reports ran/skipped/deferred/failed.
- seed-bundle-portwatch.mjs: lower PW-Port-Activity timeoutMs 600s→420s
so a single section can no longer consume the entire budget.
Observed on 2026-04-14 16:03 UTC portwatch run: PW-Disruptions +
PW-Main ran cleanly, PW-Port-Activity started with ~9m37s of Railway
budget and its 10min execFile timeout, got SIGKILL'd before any output
flushed, job marked as crash.
* fix(bundle-runner): make maxBundleMs opt-in to avoid deferring other bundles
Greptile PR review flagged P1: default maxBundleMs=540_000 silently
applied to all runBundle callers. At least 12 sections across 7 other
bundles (energy-sources, climate, resilience, resilience-validation,
imf-extended, static-ref, health) have timeoutMs >= 540_000, which
means 0 + 600_000 > 540_000 is true on every first tick — those
sections would be permanently deferred with no alarm.
Default to Infinity; portwatch opts in via { maxBundleMs: 540_000 }.
Other Railway-constrained bundles can opt in as their timeouts are
audited.
|
||
|
|
51c9d2c95f |
fix(seed-forecasts): pipeline timeout 10s→45s + BATCH_SIZE 10→5 (#3090)
* fix(seed-forecasts): pipeline timeout 10s→45s + BATCH_SIZE 10→5
Root cause (validated via STRLEN probe + Railway log): readInputKeys()
batched GETs against Upstash REST /pipeline deterministically timed out
at the 10s budget. ~40 input keys totaling ~2.27 MB; top 5 keys (ucdp
657KB + chokepoints 500KB + cyber-threats 390KB + commodities 192KB +
gpsjam 174KB) = 90% of payload. Worst-case co-located batch at
BATCH_SIZE=10 was ~1.55 MB; at Upstash REST observed slow-spike floor
(~100 KB/s implied by failure pattern), 1.55 MB needs ~16s, exceeding
the 10s budget.
Production proof — Railway log 2026-04-14 10:01 UTC:
Reading input data from Redis...
Retry 1/2 in 1000ms: The operation was aborted due to timeout
... 12 consecutive abort-timeouts (4 outer × ~3 inner) ...
FETCH FAILED: The operation was aborted due to timeout
=== Failed gracefully (188070ms) ===
Fix:
BATCH_SIZE 10 → 5 (reduces probability of tail co-location)
timeout 10s → 45s (2.4× headroom at observed floor)
Round-trips 4 → 8; per-batch overhead ~150-500ms total amortized by
undici keep-alive. Negligible vs hourly cadence.
What this PR does NOT do (5-agent deepen-plan review caught these):
- Does NOT remove input keys. Initial draft proposed dropping 3
"stub" keys. All 3 are LIVE: producers traced to seed-insights,
seed-conflict-intel, and seed-forecasts itself (L14919 — self-
referential EMA windows state key). Zero-byte STRLEN snapshot
caught inter-cycle gaps, not dead keys. Removing reads would
break newsDigest, acledEvents, and EMA windows.
- Does NOT bump api/health.js maxStaleMin. Right fix = make read
succeed, not widen alarm.
- Does NOT extract shared batchedPipelineGet helper. Tracked.
Latent sibling bugs (separate PRs per feedback_no_pr_pollution):
- seed-cross-source-signals.mjs:163 (15s, 23 keys)
- seed-correlation.mjs:26 (10s, 9 market keys)
- seed-energy-spine.mjs:71 (30s, 300 cmds/batch)
- seed-resilience-scores.mjs:73 (30s, BATCH=50 writes)
Plan: docs/plans/2026-04-14-001-fix-seed-forecasts-pipeline-timeout-plan.md
Skill: ~/.claude/skills/upstash-pipeline-payload-timeout/SKILL.md
Tests: node --check + typecheck + typecheck:api clean.
* docs(seed-forecasts): drop personal-machine path from comment
Architecture-strategist review on PR #3090 caught the same anti-pattern
flagged on PR #3088: the inline comment referenced ~/.claude/skills/...
which only resolves on the author's machine. Replaced with a self-contained
"diagnostic methodology" paragraph so the rationale is portable and
contributors on CI / other machines see complete context.
No code change.
* docs(seed-forecasts): correct comment math + add reorder follow-up note (Greptile P2)
Greptile review on PR #3090 caught: BATCH_SIZE divides the keys array
deterministically by index, so the worst batch is FIXED by array order
(not random co-location as my comment implied). Verified live with
STRLEN: batch 2 (indices 5-9 = chokepoints + iran + ucdp + unrest +
outages) is 1.17 MB, not the 1.9 MB worst-random-case I claimed.
Updated comment to reflect:
- Actual deterministic worst-case batch (1.17 MB).
- Headroom recalc: 1.17 MB at 100 KB/s = ~12s; 45s gives 3.7× margin.
- Architectural insight as follow-up: interleaving heavies (chokepoints
+ ucdp) with smalls in the keys array would split the deterministic
worst-case across two batches, halving per-request payload. Tracked
for a future PR (no PR pollution).
No code change; comment-only correction.
|
||
|
|
5152ed7a06 |
fix(spr-policies): wire seed-spr-policies into seed-bundle-energy-sources (#3089)
scripts/seed-spr-policies.mjs exists as a runnable seeder with proper isMain guard, but no Railway service ever invokes it. /api/health shows sprPolicies as EMPTY with seedAgeMin: null — energy:spr-policies:v1 has literally never been written to Redis. Wired it as a bundle entry alongside the other energy seeders. Cadence: weekly. Static-registry data (scripts/data/spr-policies.json) only needs to run once after deploys + restarts to populate the key; the 400d maxStaleMin in api/health.js confirms intent. 60s timeout is generous for a JSON-file read + Redis write. Tests: node --check on the bundle clean; npm run typecheck clean. |
||
|
|
6fdd9d8440 |
fix(commodity-quotes): move .then() block to opts.afterPublish — resurrect 3 dead Redis writes (#3088)
* fix(commodity-quotes): move .then() block to opts.afterPublish — 3 dead Redis writes resurrected runSeed() ends with process.exit(0) on success, terminating the Node process before any chained .then() microtask runs. The block at seed-commodity-quotes.mjs:258-287 has been silently dead — its three Redis writes never executed: - market:commodities:v1:<symbols> (alias for canonical key) - market:quotes:v1:<symbols> (with finnhubSkipped flags) - market:gold-extended:v1 (cross-currency XAU + drivers) Production proof from Railway log 2026-04-14 08:50:31: === market:commodities Seed === [Yahoo] ^VIX: $18.64 (-2.51%) ... 33 Yahoo symbols ... Verified: data present in Redis === Done (8691ms) === Starting Container ← next cron; ZERO [Gold] log lines Health endpoint shows goldExtended as EMPTY with seedAgeMin: null — that key has literally never been written. Fix: - Extract post-publish writes into writeCompanionKeys(data). - Wire it via opts.afterPublish, which IS awaited inside runSeed BEFORE process.exit. The companion writes will now actually run. - Wrap the call in try/catch so companion-key failures log explicitly rather than masking the canonical write success. - Remove the now-redundant module-level seedData variable; afterPublish receives the canonical data as its first argument. Effect: next cron cycle (~5 min) writes all 3 keys for the first time. goldExtended health flips from EMPTY (seedAgeMin: null) to OK. Tests: node --check syntax clean; npm run typecheck clean. The fix is structural — verified by the runSeed contract in _seed-utils.mjs ~L792-794 which awaits afterPublish before process.exit. * fix(commodity-quotes): split required vs optional companion writes (Codex P1 #3088) Codex P1 review caught: the catch-all wrapper around writeCompanionKeys masked Upstash failures on the REQUIRED alias keys (market:commodities:v1 and market:quotes:v1). A transient Upstash 5xx mid-write would log a warning and return — runSeed would then write canonical seed-meta as fresh and exit 0, leaving the alias keys stale or missing with no health signal. That defeats the entire point of this PR: the canonical key would look healthy while the resurrected companions silently failed again. Fix: split into two functions with different error semantics. writeRequiredCompanionKeys(data): - Both alias-key writes propagate errors. If either Upstash write fails, the exception bubbles to runSeed's outer try/catch, lock is released, seed-meta is NOT stamped fresh, and the outer .catch fires process.exit(1). Health correctly flags STALE_SEED on next /api/health poll. writeOptionalGoldExtended(): - Yahoo XAU fetch + writeExtraKeyWithMeta wrapped in its own try/catch. Yahoo flakiness only degrades the gold-extended panel (which has its own seed-meta key that goes stale independently); the canonical commodity publish stays healthy. Per Codex's own suggestion: only the OPTIONAL gold-extended branch is downgraded to a warning. Required writes are not. Tests: node --check + typecheck clean. * fix(commodity-quotes): address Greptile P2 — parallelize alias writes, inline JSDoc Two P2 findings on PR #3088: 1. Sequential alias-key writes — independent Redis writes were awaited one after the other. Wrapped in Promise.all to halve latency on every seed cycle. No read-after-write ordering required. 2. Local-machine path in JSDoc (~/.claude/skills/...) was unresolvable for other contributors. Inlined the key facts (background, error semantics, parallelization rationale) directly into the JSDoc so the context survives on any workstation or CI runner. Also dropped a stale duplicate JSDoc block that was left over from the prior refactor. Tests: node --check + typecheck clean. |
||
|
|
56103684c6 |
fix(seed-utils): payloadBytes>0 fallback for runSeed recordCount auto-detect (#3087)
* fix(seed-utils): payloadBytes>0 fallback for runSeed recordCount auto-detect Phantom EMPTY_DATA in /api/health: 16 of 21 failing health checks were caused by seeders publishing custom payload shapes without passing opts.recordCount. The auto-detect chain in runSeed only matches a hardcoded list of shapes; anything else falls through to recordCount=0 and triggers EMPTY_DATA in /api/health even though the payload is fully populated and verified in Redis. Smoking-gun log signature from Railway 2026-04-14: [BLS-Series] recordCount:0, payloadBytes:6093, Verified: data present [VPD-Tracker] recordCount:0, payloadBytes:3068853, Verified: data present [Disease-Outbreaks] recordCount:0, payloadBytes:92684, Verified: data present Fix: - Extract recordCount logic into pure exported computeRecordCount() for unit testability. - Add payloadBytes>0 → 1 fallback at the end of the resolution chain. When triggered, console.warn names the seeder so the author can add an explicit opts.recordCount for accurate dashboards. - Resolution order unchanged for existing callers: opts.recordCount wins, then known-shape auto-detect, then the new payloadBytes fallback, then 0. Explicit opts.recordCount=0 still wins (test covers it). Effect: clears 16 phantom CRITs on the next bundle cycle. Per-seeder warns will surface in logs so we can add accurate opts.recordCount in follow-up. Tests: 11 new computeRecordCount cases (opts precedence, auto-detect shapes, fallback behavior, no-spurious-warn, explicit-zero precedence). seed-utils.test.mjs 18/18 + seed-utils-empty-data-failure.test.mjs 2/2 + typecheck clean. * test(seed-utils): address Greptile P2 — replace it.each mutation, add empty-known-shape edge case Greptile review on PR #3087 caught two minor test issues: 1. `it.each = undefined` mutated the imported `it` function (ES module live binding). Replaced with a plain comment. 2. Missing edge case: `data: { events: [] }` with payloadBytes > 0 should NOT trigger the payloadBytes fallback because detectedFromShape resolves to a real 0 (not undefined). Without this guard, a future regression could collapse the !=null check and silently mask genuine empty upstream cycles as "1 record". Test added. Tests: 19/19 (was 18). No production code change. |
||
|
|
3314db2664 |
fix(relay): treat quiet hours start === end as disabled, not 24/7 (#3061) (#3066)
* fix(relay): treat quiet hours start === end as disabled, not 24/7 (#3061) When quietHoursStart equalled quietHoursEnd, the midnight-spanning branch evaluated `hour >= N || hour < N` which is true for all hours, silently suppressing all non-critical alerts permanently. Add an early return for start === end in the relay and reject the combination in Convex validation. Closes #3061 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: cross-check quiet hours start/end against persisted value on single-field updates Addresses Greptile review: validateQuietHoursArgs only caught start===end when both arrived in the same call. Now the mutation handlers also check against the DB record to prevent sequential single-field updates from creating a start===end state. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: gate quiet hours start===end check on effectiveEnabled Only enforce the start !== end invariant when quiet hours are effectively enabled. This allows users with legacy start===end records to disable quiet hours, change timezone/override, or recover from old bad state without getting locked out. Addresses koala73's P1 review feedback on #3066. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(relay): extract quiet-hours + consolidate equality check, add tests - Move isInQuietHours/toLocalHour to scripts/lib/quiet-hours.cjs so they are testable without importing the full relay (which has top-level side effects and env requirements). - Drop the unconditional start===end check from validateQuietHoursArgs; the effectiveEnabled-guarded check in setQuietHours / setQuietHoursForUser is now the single source of truth. Previously a user disabling quiet hours with start===end would be rejected even though the values are irrelevant when disabled. - Add tests/quiet-hours.test.mjs covering: disabled, start===end regression (#3061), midnight-spanning window, same-day window, inclusive/exclusive bounds, invalid timezone, timezone handling, defaults. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Elie Habib <elie.habib@gmail.com> |
||
|
|
29d39462e1 |
fix(crypto-quotes): use CoinPaprika as primary, CoinGecko as fallback (#3086)
* fix(crypto-quotes): use CoinPaprika as primary, CoinGecko as fallback Railway bundle log 2026-04-14 07:17:10 UTC showed seed-bundle-market-backup finishing with failed:1. Crypto-Quotes hit CoinGecko 429s on every retry: [Crypto-Quotes] CoinGecko 429 — waiting 10s (1/5) ... (5 attempts, 10/20/30/40/50s back-off) [Crypto-Quotes] Crypto-Quotes failed after 120.0s: timeout Root cause: CoinGecko 5-step retry budget (10+20+30+40+50 = 150s) exceeds the bundle 120s section timeout, so the existing CoinPaprika fallback never runs — the child process is killed mid-retry. Fix: swap source order. CoinPaprika is now primary; CoinGecko is retained as fallback for sparkline_in_7d data (CoinPaprika does not provide sparklines). Probed CoinPaprika live: all 10 mapped crypto IDs present in /v1/tickers, no auth required. Trade-off: when CoinPaprika is healthy, sparkline arrays will be empty. Acceptable — the panel already handles undefined sparklines, and the alternative (no quotes at all because CoinGecko is rate-limited) is worse. Tests: crypto-config.test.mjs 6/6, typecheck + typecheck:api clean. * fix(crypto-quotes): require full coverage in validate() — no partial snapshots Codex review on PR #3086 caught: validate() only required >=1 quote with positive price. With the new CoinPaprika-primary path, a single dropped or renamed ticker would silently publish a 9/10 snapshot. Health stays green while one tracked asset disappears from the panel — exactly the silent data-loss class we want to avoid on a fixed-cardinality top-10 feed. Tightened validate() to require: - quotes.length === CRYPTO_IDS.length (full cardinality) - every quote has Number.isFinite(price) && price > 0 - every configured symbol is present in the response (defends against duplicate IDs masquerading as full coverage) When the validator rejects, runSeed() takes the skipped path: existing TTL is extended, seed-meta is bumped with count=0, and the Railway log will scream which symbol is missing on the next cycle so the broken CoinPaprika mapping is caught immediately. Tests: crypto-config.test.mjs 6/6, typecheck clean. * fix(crypto-quotes): address Greptile P2 — fallback retry budget + sourceVersion P2-1: CoinGecko fallback was still wired with maxAttempts=5 (10+20+30+40+50 = 150s budget), so when CoinPaprika fails the fallback path could itself overrun the bundle's 120s section timeout — recreating the exact failure mode this PR fixes. Capped at maxAttempts=2 (10+20=30s) so the fallback always finishes well within the bundle window. P2-2: sourceVersion in seed-meta was still 'coingecko-markets' even though CoinPaprika is now primary. Changed to 'coinpaprika-tickers+coingecko-fallback' so health dashboards and on-call runbooks see the real data path. Tests: crypto-config.test.mjs 6/6, typecheck clean. |
||
|
|
b92b22bd20 |
fix(fuel-prices): tolerate Brazil ANP failure to stop Railway crash-loop (#3085)
* fix(fuel-prices): tolerate Brazil ANP failure to stop Railway crash-loop Brazil gov.br is structurally unreachable from Railway IPs: - Decodo proxy 403s all .gov.br CONNECTs by policy - Direct fetch fails undici TLS handshake from Railway egress After PR #3082 tightened the publish gate to require zero failed sources, every run exits 1 -> Railway "Deployment crashed" banner + STALE_SEED. Add TOLERATED_FAILURES = {'Brazil'}; validateFuel ignores tolerated names when checking failedSources. Critical regions (US/GB/MY) and the >=30 country floor still gate publish. Brazil's outage stays visible via the existing [FRESHNESS] log. * fix(fuel-prices): rotate :prev on tolerated-only failures to keep WoW fresh Reviewer catch: after tolerating Brazil, allSourcesFresh stays false forever → :prev never rotates → panel's WoW stretches into 2-week, 3-week, ... deltas for every non-Brazil country while still labeled 'week-over-week'. Gate :prev rotation on untolerated failures only. Tolerated sources are absent from the snapshot entirely, so rotating is safe (no stale-self- compare poisoning next week). * fix(fuel-prices): distinguish tolerated vs untolerated sources in [DEGRADED] log Greptile P2: the [DEGRADED] message said 'publish will be rejected' even when only tolerated sources (Brazil) failed — confusing for operators watching Railway logs. |
||
|
|
16d868bd6d |
fix(comtrade): retry on transient 5xx to stop silent reporter drops (#3084)
* fix(comtrade): retry on transient 5xx to stop silent reporter drops Railway log 2026-04-14 bilateral-hs4 run: India (699) hit HTTP 503 on both batches and was dropped entirely from the snapshot. Iran (364) hit 500 mid-batch. All three Comtrade seeders (bilateral-hs4, trade-flows, recovery-import-hhi) retried only on 429; any 5xx = silent coverage gap. Adds bounded 5xx retry (3 attempts, 5s then 15s backoff) in each seeder. On giveup caller returns empty so resume cache picks the reporter up next cycle. Exports isTransientComtrade + fetchBilateral for unit tests; 6 new tests pin the contract. * fix(comtrade): collapse 429+5xx into single classification loop (PR review) Reviewer caught that the 429 branch bypassed the 5xx retry path: a 429 -> 503 sequence would return [] immediately after the 429-retry without consuming any transient-5xx retries, leaving the silent-drop bug intact for that specific sequence. Both seeders now use a single while-loop that reclassifies each response: - 429 (once, with full backoff) - 5xx (up to 2 retries with 5s/15s or 5s/10s backoff) - anything else -> break and return Two new tests lock in the mixed case: 429 then 503 still consumes transient retries; consecutive 429s cap at one wait. 8/8 pass. * test(comtrade): inject sleep to drop retry-test runtime from 185s to 277ms PR review flagged that the new mixed 429+5xx tests slept for the full production backoffs (60s + 5s + 15s = 80s per test), making the unit suite unnecessarily slow and CI-timeout-prone. Add a module-local _retrySleep binding with __setSleepForTests(fn) export. Production keeps the real sleep; tests swap in a no-op that records requested delays. The sleepCalls array now pins the production cadence so a future refactor that changes [60_000, 5_000, 15_000] has to update the test too. 8/8 pass in 277ms (down from 185s). * test(comtrade): update 60s-on-429 static-analysis regex for _retrySleep alias The existing substring check 'sleep(60_000)' broke after the previous commit renamed production calls to _retrySleep(60_000) for test injection. Widen the regex to accept either the bare or injectable form; both preserve the 60s production cadence. * test(comtrade): extend retry coverage to trade-flows + recovery-import-hhi Three P2 review findings addressed: 1. partnerCode 000 in the succeeds-on-third test was changed to real code 156. groupByProduct() filters 0/000 downstream, so the test was passing while the user-visible seeder would still drop the row. 2. trade-flows and recovery-import-hhi had no unit coverage for their new retry state machines. Adds 7 tests covering succeed-first, retry-then- succeed, giveup-after-3, and mixed 429+5xx classification. 3. Both seeders now expose __setSleepForTests + export their fetch helper. seed-trade-flows also gets an isMain guard so tests import without triggering a real seed run. sleepCalls asserts pin the production cadence. 15 retry tests pass in 183ms. Full suite 5198/5198. * fix(trade-flows): per-reporter coverage gate blocks full-reporter flatline PR #3084 review P1: the existing MIN_COVERAGE_RATIO=0.70 gate was global-only. 6 reporters × 5 commodities = 30 pairs; losing one entire reporter (e.g. the India/Taiwan silent-drop this PR is trying to stop) is only 5/30 missing = 83% global coverage, which passed. Adds a per-reporter coverage floor: each reporter must have ≥40% of its commodities populated (2 of 5). Global gate kept as the broad- outage catch; per-reporter gate catches the single-reporter flatline. Extracts checkCoverage() as a pure function for unit testing — mocking 30+ fetches in fetchAllFlows is fragile, and the failure mode lives in the gate, not the fetcher. 6 new tests cover: 30/30 ok; India flatline → reject at 83% global; Taiwan flatline; broad outage → reject via global gate; healthy 80% global with 4/5 per-reporter → ok; per-reporter breakdown shape. 5204/5204 tests pass. |
||
|
|
7e7ca70faf |
fix(fuel-prices): resilient seeder — proxy, retry, stale-carry-forward, strict gate (#3082)
* fix(fuel-prices): resilient seeder — proxy, retry, stale-carry-forward, strict gate Addresses 2026-04-07 run where 4 of 7 sources failed (NZ 403, BR/MX fetch failed) and the seeder silently published 30 countries with Brazil/Mexico/NZ vanishing from the UI. - Startup proxy diagnostic so PROXY_URL misconfigs are immediately visible. - New fetchWithProxyPreferred (proxy-first, direct fallback) + withFuelRetry (3 attempts, backoff) wrapping NZ/BR/MX upstream calls. - Swap MX from dead datos.gob.mx to CRE publicacionexterna XML (13k stations). - Stale-carry-forward failed sources from :prev snapshot (stale: true) instead of dropping countries; fresh-only ranking; skip WoW for stale entries. - Gate :prev rotation on all-sources-succeeded so partial runs don't poison next week's WoW. - Strict validateFn: >=25 countries AND US+GB+MY fresh. Prior gate was >=1. - emptyDataIsFailure: true so validation fail doesnt refresh seed-meta. - Wrap imperative body in main() + isMain guard; export parseCREStationPrices and validateFuel; 9 new unit tests. * fix(fuel-prices): remove stale-carry-forward, harden validator (PR review) Reviewer flagged two P1s on the prior commit: 1. stale-carry-forward inserted stale: true rows into the published payload, but the proto schema and panel have no staleness render path. Users would see week-old BR/MX/NZ prices as current. Resilience turned into a freshness bug. 2. Validator counted stale-carried entries toward the floor. US/GB/MY fresh + 22 stale still passed, refreshing seed-meta.fetchedAt and leaving health operationally healthy indefinitely. Hid the outage. Fix: remove stale-carry-forward entirely. Tighten validator to require countries.length >= 30, US+GB+MY present, and failedSources.length === 0. Partial-failure runs now rejected → 10-day cache TTL serves last healthy snapshot → health STALE_SEED after maxStaleMin. Correct, visible signal. Drops dead code: SOURCE_COUNTRY_CODES, staleCarried/freshCountries, stale WoW skip. Tests updated for the failedSources gate. |
||
|
|
45c98284da |
fix(trade): correct UN Comtrade reporter codes for India and Taiwan (#3081)
* fix(trade): correct UN Comtrade reporter codes for India and Taiwan
seed-trade-flows was fetching India (356) and Taiwan (158) using UN M49
codes. UN Comtrade registers India as 699 and Taiwan as 490 ("Other
Asia, nes"), so every fetch silently returned count:0 — 10 of 30
reporter×commodity pairs yielded zero records per run. Live probe
confirms 699→500 India rows, 490→159 Taiwan rows.
- Update reporter codes in seed-trade-flows.mjs and its consumer
list-comtrade-flows.ts.
- Update ISO2_TO_COMTRADE in _comtrade-reporters.ts and
seed-energy-spine.mjs so energy-shock and sector-dependency RPCs
resolve the correct Comtrade keys for IN/TW.
- Add IN/TW overrides to seed-comtrade-bilateral-hs4 and
seed-recovery-import-hhi (they iterate shared/un-to-iso2.json which
must remain pure M49 for other callers).
- Fix partner-dedupe bug in seed-trade-flows: the preview endpoint
returns partner-level rows; keying by (flowCode, year) without
summing kept only the last partner seen, so tradeValueUsd was a
random counterparty's value, not the World aggregate. Sum across
partners and label as World.
- Add a 70% coverage floor on reporter×commodity pairs so an entire
reporter silently flatlining now throws in Phase 1 (TTL extend, no
seed-meta refresh) rather than publishing a partial snapshot.
- Sync energy-shock test fixture.
* fix(trade): apply Comtrade IN/TW overrides to runtime consumers too
Follow-up to PR review: the seed-side fix was incomplete because two
request-time consumers still mapped iso2 → M49 (356/158) when hitting
Comtrade or reading the (now-rekeyed) seeded cache.
- server/worldmonitor/supply-chain/v1/_bilateral-hs4-lazy.ts: apply
the IN=699 / TW=490 override when deriving ISO2_TO_UN, so the lazy
bilateral-hs4 fetch path used by get-route-impact and
get-country-chokepoint-index stops silently returning count:0 for
India and Taiwan when the seeded cache is cold.
- src/utils/country-codes.ts: add iso2ToComtradeReporterCode helper
with the override baked in. Keep iso2ToUnCode as pure M49 (used
elsewhere for legitimate M49 semantics).
- src/app/country-intel.ts: switch the listComtradeFlows call on the
country brief page to the new helper so IN/TW resolve to the same
reporter codes the seeder now writes under.
|
||
|
|
9d27ff0d6a |
fix(seeds): strict-floor validators must not poison seed-meta on empty (#3078)
* fix(seeds): strict-floor validators must not poison seed-meta on empty When `runSeed`'s validateFn rejected (empty/short data), seed-meta was refreshed with `fetchedAt=now, recordCount=0`. Bundle runners read `fetchedAt` to decide skip — so one transient empty fetch locked the IMF-extended bundle (30-day cadence) out for a full month. Adds opt-in `emptyDataIsFailure` flag that skips the meta refresh on validation failure, letting the bundle retry next cron fire and health flip to STALE_SEED. Wires it on all four IMF/WEO seeders (floor 150-190 countries), which structurally can't have legitimate empty results. Default behavior unchanged for quiet-period feeds (news, events) where empty is normal. Observed: Railway log 2026-04-13 18:58 — imf-external validation fail; next fire 8h later skipped "483min ago / interval 43200min". * test(seeds): regression coverage for emptyDataIsFailure branch Static-analysis guard against the PR #3078 regression reintroducing itself: - Asserts runSeed gates writeFreshnessMetadata on opts.emptyDataIsFailure and that extendExistingTtl still runs in both branches (cache preserved). - Asserts the four strict-floor IMF seeders (external/growth/labor/macro) pass emptyDataIsFailure: true. Prevents silent bundle-lockout if someone removes the gate or adds a new strict-floor seeder without the flag. * fix(seeds): strict-floor failure must exit(1) + behavioral test P2 (surfacing upstream failures in bundle summary): Strict-floor seeders with emptyDataIsFailure:true now process.exit(1) after logging FAILURE. _bundle-runner's spawnSeed wraps execFile, so non-zero exit rejects → failed++ increments → bundle itself exits 1. Before: bundle logged 'Done' and ran++ on a poisoned upstream, hiding 30-day outages from Railway monitoring. P3 (behavioral regression coverage, replacing static source-shape test): Stubs globalThis.fetch (Upstash REST) + process.exit to drive runSeed through both branches. Asserts on actual Redis commands: - strict path: zero seed-meta SET, pipeline EXPIRE still called, exit(1) - default path: exactly one seed-meta SET, exit(0) Catches future regressions where writeFreshnessMetadata is reintroduced indirectly, and is immune to cosmetic refactors of _seed-utils.mjs. * test(seeds): regression for emptyDataIsFailure meta-refresh gate Proves that validation failure with opts.emptyDataIsFailure:true does NOT write seed-meta (strict-floor seeders) while the default behavior DOES write count=0 meta (quiet-period feeds). Addresses PR #3078 review. |
||
|
|
1875531e2a |
fix(seed-economy): retry proxy/EIA transients; gate stress index on full FRED coverage (#3080)
* fix(seed-economy): retry proxy/EIA transients; fail stress index on missing FRED components Log review of 16 runs (2026-04-14 00:45–04:31 UTC) showed 50% degraded: - Decodo proxy flapped with HTTP 500/502/522, `fredFetchJson` fell back direct on first proxy error and FRED then returned 500 to Railway IP, dropping series. - 5 EIA panels (EnergyPrices, Crude, NatGas, SPR, Refinery) timed out in lockstep at 01:00 and 02:00, producing full `no write` runs. - StressIndex silently excluded missing FRED components (VIXCLS, T10Y3M, STLFSI4, ICSA), publishing a degraded composite as if healthy. Changes: - fredFetchJson: retry proxy 3x with jittered backoff on 5xx/522/timeout before falling back direct. - eiaFetchJson helper: 20s timeout (was 10s) + 3x retry on 5xx/timeout; wired into all EIA call-sites. - computeStressIndex: throw when any FRED-sourced component is missing; GSCPI (ais-relay) can still be absent. Caught in fetchAll so other secondary writes proceed but composite is not published degraded. * fix(seed-economy): narrow stress-index catch; don't retry 4xx in eiaFetchJson - computeStressIndex try/catch no longer wraps the Redis write so a write failure surfaces as a run error instead of being swallowed. - eiaFetchJson bails immediately on 4xx and non-transient thrown errors; only 5xx / timeouts / network resets are retried. |
||
|
|
93113174b8 |
fix(seed-fires): retry per-country FIRMS fetch once to cut silent coverage gaps (#3079)
* fix(seed-fires): retry per-country FIRMS fetch once before giving up Turkey/North Korea/Russia/Iran/Israel-Gaza/Saudi/Syria failed ~1.1x per 30-min run in prod logs (77 fails / 69 runs, 2026-04-12→13), silently zeroing those regions on the map for up to ~20% of the day. Upstream is transiently flaky on large bboxes, not auth-related. Adds 1 retry with 5s backoff in fetchRegionSource. Worst-case added latency ~100s, still well under the 30-min cadence. * fix(seed-fires): align retry backoff with rate-limit pacing; extend lock TTL Addresses review: - Retry backoff 5s -> 6s matches the inter-call pacing budget; prevents breaching FIRMS free-tier 10 req/min under clustered fast-failures. - lockTtlMs 10m -> 25m; retry path doubles worst-case per-slot runtime (~71s) and can exceed the old lock, risking concurrent publish races. * fix(seed-fires): bump lock TTL to 40m to cover full worst-case retry runtime 27 slots × ~72s worst-case = 32.4 min, exceeds the previous 25m TTL. Bump to 40m so a hung/crashed run can't hold a stale lock forever while still safely covering legitimate long runs. Next 30m cron tick will see the lock held and skip, which is the intended behavior. |
||
|
|
21d33c4bb5 |
fix(hyperliquid-flow): fetch both default dex and xyz builder dex (#3077)
Root cause: Hyperliquid's commodity and FX perps (xyz:CL, xyz:BRENTOIL,
xyz:GOLD, xyz:SILVER, xyz:PLATINUM, xyz:PALLADIUM, xyz:COPPER, xyz:NATGAS,
xyz:EUR, xyz:JPY) live on a separate 'xyz' builder dex, NOT the default
perp dex. The MIT reference repo listed these with xyz: prefixes but
didn't document that they require {type:metaAndAssetCtxs, dex:xyz} as a
separate POST.
Production symptom (Railway bundle logs 2026-04-14 04:10):
[Hyperliquid-Flow] SKIPPED: validation failed (empty data)
The seeder polled the default dex only, matched 4 of 14 whitelisted assets
(BTC/ETH/SOL/PAXG), and validateFn rejected snapshots with <12 assets.
Seed-meta was refreshed on the skipped path so health stayed OK but
market:hyperliquid:flow:v1 was never written.
Fix:
- New fetchAllMetaAndCtxs(): parallel-fetches both dexes and merges
{universe, assetCtxs} by concatenation. xyz entries already carry the
xyz: prefix in their universe names.
- New validateDexPayload(raw, dexLabel, minUniverse): per-dex floor so the
thinner xyz dex (~63 entries) does not false-trip the default floor of
50. Errors include the dex label for debuggability.
- validateUpstream(): back-compat wrapper — accepts either the legacy
single-dex [meta, assetCtxs] tuple (buildSnapshot tests) or the merged
{universe, assetCtxs} shape from fetchAllMetaAndCtxs.
Tests: 37/37 green. New tests cover dual-dex fetch merge, cross-dex error
propagation, xyz floor accept/reject, and merged-shape pass-through.
|
||
|
|
30ddad28d7 |
fix(seeds): upstream API drift — SPDR XLSX + IMF IRFCL + IMF-External BX/BM drop (#3076)
* fix(seeds): gold-etf XLSX migration, IRFCL dataflow, imf-external BX/BM drop Three upstream-drift regressions caught from the market-backup + imf-extended bundle logs. Root causes validated by live API probes before coding. 1. seed-gold-etf-flows: SPDR /assets/dynamic/GLD/GLD_US_archive_EN.csv now silently returns a PDF (Content-Type: application/pdf) — site migrated to api.spdrgoldshares.com/api/v1/historical-archive which serves XLSX. Swapped the CSV parser for an exceljs-based XLSX parser. Adds browser-ish Origin/Referer headers (SPDR swaps payload for PDF without them) and a Content-Type guard. Column layout: Date | Closing | ... | Tonnes | Total NAV USD. 2. seed-gold-cb-reserves: PR #3038 shipped with IMF.STA/IFS dataflow and 3-segment key M..<indicator> — both wrong. IFS isn't exposed on api.imf.org (HTTP 204). Gold-reserves data lives under IMF.STA/IRFCL with 4 dimensions (COUNTRY.INDICATOR.SECTOR.FREQUENCY). Verified live: *.IRFCLDT1_IRFCL56_FTO.*.M returns 111 series. Switched to IRFCL + IRFCLDT1_IRFCL56_FTO (fine troy ounces) and fallbacks. The valueIsOunces flag now matches _FTO suffix (keeps legacy _OZT/OUNCE detection for backward compat). 3. seed-imf-external: BX/BM (export/import LEVELS, USD) WEO coverage collapsed to ~10 countries in late 2026 — the seeder's >=190-country validate floor was failing every run. Dropped BX/BM from fetch + join; kept BCA (~209) / TM_RPCH (~189) / TX_RPCH (~190). exportsUsd / importsUsd / tradeBalanceUsd fields kept as explicit null so consumers see a deliberate gap. validate floor lowered to 180 (BCA∪TM∪TX union). Tests: 32/32 pass. Rewrote gold-etf tests to use synthetic XLSX fixtures (exceljs resolved from scripts/package.json since repo root doesn't have it). Updated imf-external tests for the new indicator set + null BX/BM contract + 180-country validate threshold. * fix(mcp): update get_country_macro description after BX/BM drop Consumer-side catch during PR #3076 validation: the MCP tool description still promised 'exports, imports, trade balance' fields that the seeder fix nulls out. LLM consumers would be directed to exportsUsd/importsUsd/ tradeBalanceUsd fields that always return null since seed-imf-external dropped BX/BM (WEO coverage collapsed to ~10 countries). Updated description to list only the indicators actually populated (currentAccountUsd, importVolumePctChg, exportVolumePctChg) with an explicit note about the null trade-level fields so LLMs don't attempt to use them. * fix(gold-cb-reserves): compute real pctOfReserves + add exceljs to root Follow-up to #3076 review. 1. pctOfReserves was hardcoded to 0 with a "IFS doesn't give us total reserves" comment. That was a lazy limitation claim — IMF IRFCL DOES expose total official reserve assets as IRFCLDT1_IRFCL65_USD parallel to the gold USD series IRFCLDT1_IRFCL56_USD. fetchCbReserves now pulls all three indicators (primary FTO tonnage + the two USD series) via Promise.allSettled and passes the USD pair to buildReservesPayload so it can compute the true gold share per country. Falls back to 0 only when the denominator is genuinely missing for that country (IRFCL coverage: 114 gold_usd, 96 total_usd series; ~15% of holders have no matched-month denominator). 3-month lookback window absorbs per-country reporting lag. 2. CI fix: tests couldn't find exceljs because scripts/package.json is not workspace-linked to the repo root — CI runs `npm ci` at root only. Added exceljs@^4.4.0 as a root devDependency. Runtime seeder continues to resolve it from scripts/node_modules via Node's upward module resolution. 3 new tests cover pct computation, missing-denominator fallback, and the 3-month lookback window. |
||
|
|
e32d9b631c |
feat(market): Hyperliquid perp positioning flow as leading indicator (#3074)
* feat(market): Hyperliquid perp positioning flow as leading indicator Adds a 4-component composite (funding × volume × OI × basis) "positioning stress" score for ~14 perps spanning crypto (BTC/ETH/SOL), tokenized gold (PAXG), commodity perps (WTI, Brent, Gold, Silver, Pt, Pd, Cu, NatGas), and FX perps (EUR, JPY). Polls Hyperliquid /info every 5min via Railway cron; publishes a single self-contained snapshot with embedded sparkline arrays (60 samples = 5h history). Surfaces as a new "Perp Flow" tab in CommoditiesPanel with separate Commodities / FX sections. Why: existing CFTC COT is weekly + US-centric; market quotes are price-only. Hyperliquid xyz: perps give 24/7 global positioning data that has been shown to lead spot moves on commodities and FX by minutes-to-hours. Implementation: - scripts/seed-hyperliquid-flow.mjs — pure scoring math, symbol whitelist, content-type + schema validation, prior-state read via readSeedSnapshot(), warmup contract (first run / post-outage zeroes vol/OI deltas), missing-symbol carry-forward, $500k/24h min-notional guard to suppress thin xyz: noise. TTL 2700s (9× cadence). - proto/worldmonitor/market/v1/get_hyperliquid_flow.proto + service.proto registration; make generate regenerated client/server bindings. - server/worldmonitor/market/v1/get-hyperliquid-flow.ts — getCachedJson reader matching get-cot-positioning.ts seeded-handler pattern. - server/gateway.ts cache-tier entry (medium). - api/health.js: hyperliquidFlow registered with maxStaleMin:15 (3× cadence) + transitional ON_DEMAND_KEYS gate for the first ~7 days of bake-in. - api/seed-health.js mirror with intervalMin:5. - scripts/seed-bundle-market-backup.mjs entry (NIXPACKS auto-redeploy on scripts/** watch). - src/components/MarketPanel.ts: CommoditiesPanel grows a Perp Flow tab + fetchHyperliquidFlow() RPC method; OI Δ1h derived from sparkOi tail. - src/App.ts: prime via primeVisiblePanelData() + recurring refresh via refreshScheduler.scheduleRefresh() at 5min cadence (panel does NOT own setInterval; matches the App.ts:1251 lifecycle convention). - 28 unit tests covering scoring parity, warmup flag, min-notional guard, schema rejection, missing-symbol carry-forward, post-outage cold start, sparkline cap, alert threshold. Tests: test:data 5169/5169, hyperliquid-flow-seed 28/28, route-cache-tier 5/5, typecheck + typecheck:api green. One pre-existing test:sidecar failure (cloud-fallback origin headers) is unrelated and reproduces on origin/main. * fix(hyperliquid-flow): address review feedback — volume baseline window, warmup lifecycle, error logging Two real correctness bugs and four review nits from PR #3074 review pass. Correctness fixes: 1. Volume baseline was anchored to the OLDEST 12 samples, not the newest. sparkVol is newest-at-tail (shiftAndAppend), so slice(0, 12) pinned the rolling mean to the first hour of data forever once len >= 12. Volume scoring would drift further from current conditions each poll. Switched to slice(-VOLUME_BASELINE_MIN_SAMPLES) so the baseline tracks the most recent window. Regression test added. 2. Warmup flag flipped to false on the second successful poll while volume scoring still needed 12+ samples to activate. UI told users warmup lasted ~1h but the badge disappeared after 5 min. Tied per-asset warmup to real baseline readiness (coldStart OR vol samples < 12 OR prior OI missing). Snapshot-level warmup = any asset still warming. Three new tests cover the persist-through-baseline-build, clear-once-ready, and missing-OI paths. Review nits: - Handler: bare catch swallowed Redis/parse errors; now logs err.message. - Panel: bare catch in fetchHyperliquidFlow hid RPC 500s; now logs. - MarketPanel.ts: deleted hand-rolled RawHyperliquidAsset; mapHyperliquidFlowResponse now takes GetHyperliquidFlowResponse from the generated client so proto drift fails compilation instead of silently. - Seeder: added @ts-check + JSDoc on computeAsset per type-safety rule. - validateUpstream: MAX_UPSTREAM_UNIVERSE=2000 cap bounds memory. - buildSnapshot: logs unknown xyz: perps upstream (once per run) so ops sees when Hyperliquid adds markets we could whitelist. Tests: 37/37 green; typecheck + typecheck:api clean. * fix(hyperliquid-flow): wire bootstrap hydration per AGENTS.md mandate Greptile review caught that AGENTS.md:187 mandates new data sources be wired into bootstrap hydration. Plan had deferred this on "lazy deep-dive signal" grounds, but the project convention is binding. - server/_shared/cache-keys.ts: add hyperliquidFlow to BOOTSTRAP_CACHE_KEYS + BOOTSTRAP_TIERS ('slow' — non-blocking, page-load-parallel). - api/bootstrap.js: add to inlined BOOTSTRAP_CACHE_KEYS + SLOW_KEYS so bootstrap.test.mjs canonical-mirror assertions pass. - src/components/MarketPanel.ts: - Import getHydratedData from @/services/bootstrap. - New mapHyperliquidFlowSeed() normalizes the raw seed-JSON shape (numeric fields) into HyperliquidFlowView. The RPC mapper handles the proto shape (string-encoded numbers); bootstrap emits the raw blob. - fetchHyperliquidFlow now reads hydrated data first, renders immediately, then refreshes from RPC — mirrors FearGreedPanel pattern. Tests: 72/72 green (bootstrap + cache-tier + hyperliquid-flow-seed). |
||
|
|
825313eee7 |
Add Liquidity Shifts panel and include Silver in COT seeder (#3070)
* Add liquidity shifts panel and extend COT seeding for silver * fix(liquidity-shifts): review fixes — i18n, CSS, CMD+K, div-by-zero Addresses review feedback on PR #3070: - Pull all user-facing strings into src/locales/en.json under components.liquidityShifts.*; title and tooltip now use t(). - Move inline styles into .liquidity-shifts-panel / .liquidity-row CSS classes in src/styles/main.css so theme tokens and print/dark rules apply. - Replace bespoke renderShiftPill with shared formatChange + getChangeClass helpers (consistency with GulfEconomiesPanel and the .up/.down theme colors). - pct() now returns null when long+short is 0; renderShiftPill and lev formatter show '—' instead of a misleading 100%. - Replace em-dash in the stocks section title with a colon (repo rule). - Add Liquidity Shifts to CMD+K via src/config/commands.ts. - Instrument label lookup moved to INSTRUMENT_LABELS map. * fix(liquidity-shifts): register in COMMODITY variant, hide Lev line for commodities Greptile review on PR #3070: - P1: add 'liquidity-shifts' to COMMODITY_PANELS — panel was specifically designed for commodity positioning but was absent from the variant. - P2: CFTC Disaggregated report (GC/SI/CL) has no Leveraged Funds category; only the TFF report (ES/NQ) populates those fields. The row previously rendered 'Lev +0.0%' for every commodity, which looked like broken data. Skip the sub-line entirely when both leveragedFunds values are 0. * fix(liquidity-shifts): sort stock quotes by TOP_STOCKS order listMarketQuotes preserves seed-bootstrap order (provider fallback sequence from seed-market-quotes), not request order, so the panel rendered 'Top US Stocks' in a shifting arbitrary order. Re-sort the returned quotes by the TOP_STOCKS index client-side to keep row order stable and deterministic. |
||
|
|
ced3e7058b |
perf(aviation): halve seed cadence to 30min + extend Redis TTLs (#3073)
* perf(aviation): halve seed cadence to 30min + extend Redis TTLs AviationStack upstream call volume has been climbing over the past few days. Halving the seed-aviation Railway cron from every 15 min to every 30 min cuts this seeder's contribution from ~576/day to ~288/day. With the longer interval, the Redis TTLs must span the new cron gap with buffer — otherwise keys expire between runs and the panel goes empty. Bumping OPS_TTL and NEWS_TTL from 300/900s to 2100s (35 min) gives a 5-minute buffer over the 30-min interval. The Railway cron schedule itself is changed on the dashboard (service a8e49386-64c1-4e1e-9f82-4eb69a55fce3), out of band from this PR. * perf(aviation): widen TTL buffer to 10min per review Greptile review flagged the 5-min buffer on 2100s TTL as tight. Worst case: serial AviationStack calls for all default airports can take ~63s, and the Railway cron can fire slightly late — a run that starts 29 min after the previous could write a key only seconds before the old one expires, risking a brief empty-panel window. Bump OPS_TTL and NEWS_TTL from 2100s to 2400s (40 min) to give a 10-minute buffer over the new 30-min cron interval. |
||
|
|
c072edc89f |
fix(economy): GSCPI shape mismatch with ais-relay payload (#3072)
* fix(economy): GSCPI shape mismatch with ais-relay payload
`seed-economy.mjs` was reporting `[StressIndex] GSCPI not in Redis yet
(ais-relay lag or first run) — excluding` even when GSCPI was current
in Redis. The Stress Index then computed on 5/6 components instead
of 6/6 every run.
Root cause: shape mismatch.
- ais-relay.cjs (`seedGscpi()`) writes the FRED-compatible payload
`{ series: { series_id, title, units, frequency, observations: [...] } }`
- seed-economy.mjs `fetchGscpiFromRedis()` was reading the legacy
flat shape `{ observations: [...] }` (top-level), so
`Array.isArray(parsed.observations)` was always false → null returned
→ "not in Redis yet" log, even though 343 monthly observations were
sitting in `economic:fred:v1:GSCPI:0`
Fix: extract the parsing into `extractGscpiObservations()` which checks
both shapes (`parsed.series.observations` first, then top-level
`parsed.observations` for back-compat). The "not in Redis yet" message
will now correctly fire only when the relay is genuinely behind.
Verified against live Redis: returns `{ observations: 343 entries,
latest 2026-03-01 = 0.68 }` instead of null.
Tests added in tests/gscpi-shape-extraction.test.mjs (3 cases:
ais-relay shape, legacy flat shape, malformed payload).
* style(economy): single @type cast in extractGscpiObservations
PR #3072 review (P2): cast `parsed` to `any` once into a local instead
of repeating the inline `/** @type {any} */` annotation on every access.
Same behavior, less visual noise.
|
||
|
|
46d17efe55 |
fix(resilience): wider FX YoY upstream + sanctions absolute threshold (#3071)
* fix(resilience): wider FX YoY upstream + sanctions absolute threshold Two backtest families consistently failed Outcome-Backtest gates because the detectors were reading the wrong shape of upstream data, not because the upstream seeders were missing. FX Stress (was AUC=0.500): - BIS WS_EER (`economic:bis:eer:v1`) only covers 12 G10/major-EM countries — Argentina, Egypt, Turkey, Pakistan, Nigeria etc. are absent, so the detector had no positive events to score against - Add `seed-fx-yoy.mjs` fetching Yahoo Finance 2-year monthly history across 45 single-country currencies, computing YoY % and 24-month peak-to-trough drawdown - Switch detector to read drawdown24m with -15% threshold (matches methodology spec); falls back to yoyChange/realChange for back-compat - Why drawdown not just YoY: rolling 12-month windows slice through historic crises (Egypt's March 2024 devaluation falls outside an April→April window by 2026); drawdown captures actual stress magnitude regardless of crisis timing - Verified locally: flags AR (-38%), TR (-28%), NG (-21%), MX (-18%) Sanctions Shocks (was AUC=0.624): - Detector previously used top-quartile (Q3) of country-counts which conflated genuine comprehensive-sanctions targets (RU, IR, KP, CU, SY, VE, BY, MM) with financial hubs (UK, CH, DE, US) hosting many sanctioned entities - Replace with absolute threshold of 100 entities — the OFAC distribution is heavy-tailed enough that this cleanly separates targets from hubs Both fixes use existing seeded data (or new seeded data via seed-fx-yoy.mjs) — no hardcoded country curation. api/health.js: register `economic:fx:yoy:v1` in STANDALONE_KEYS + SEED_META so the validation cron monitors freshness. Railway: deploy `seed-fx-yoy` as a daily cron service (NIXPACKS builder, startCommand `node scripts/seed-fx-yoy.mjs`, schedule `30 6 * * *`). * fix(seed-fx-yoy): use running-peak max-drawdown instead of global peak PR #3071 review (P1): the original drawdown calculation found the global maximum across the entire window, then the lowest point AFTER that peak. This silently erased earlier crashes when the currency later recovered to a new high — exactly the class of events the FX Stress family is trying to capture. Example series [5, 10, 7, 9, 6, 11, 10]: true worst drawdown is 10 → 6 = -40%, but the broken implementation picked the later global peak 11 and reported only 11 → 10 = -9.1%. Fix: sweep forward tracking the running peak; for each subsequent bar compute the drop from that running peak; keep the largest such drop. This is the standard max-drawdown computation and correctly handles recover-then-fall-again sequences. Live data verification: - BR (Brazilian real) was missing from the flagged set under the broken algorithm because BRL recovered above its 2024-04 peak. With the fix it correctly surfaces at drawdown=-15.8% (peak 2024-04, trough 2024-12). - KR / CO peaks now resolve to mid-series dates instead of end-of-window, proving the running-peak scan is finding intermediate peaks. Tests added covering: reviewer's regression case, peak-at-start (NGN style), pure appreciation, multi-trough series, yoyChange anchor. * fix(health): gate fxYoy as on-demand to avoid post-merge CRIT alarm PR #3071 review (P1): registering `fxYoy` as a required standalone seeded key creates an operational hazard during the deploy gap. After merge, Vercel auto-deploys `api/health.js` immediately, but the `seed-fx-yoy` Railway cron lives in a separate deployment surface that must be triggered manually. Any gap (deploy-order race, first-cron failure, env var typo) flips health to DEGRADED/UNHEALTHY because `classifyKey()` marks the check as `EMPTY` without an on-demand or empty-data-OK exemption. Add `fxYoy` to ON_DEMAND_KEYS as a transitional safety net (matches the pattern recovery* uses for "stub seeders not yet deployed"). The key is still monitored — freshness via seed-meta — but missing data downgrades from CRIT to WARN, which won't page anyone. Once the Railway cron has fired cleanly for ~7 days in production we can remove this entry and let it be a hard-required key like the rest of the FRED family. Note: the Railway service IS already provisioned (cron `30 6 * * *`, 0.5 vCPU / 0.5 GB, NIXPACKS, watchPatterns scoped to the seeder + utils) and the `economic:fx:yoy:v1` key is currently fresh in production from local test runs. The gating here is defense-in-depth against the operational coupling, not against a known absent key. |
||
|
|
7aa8dd1bf2 |
fix(scoring): relay recomputes importanceScore post-LLM + shadow-log v2 + parity test (#3069)
* fix(scoring): relay recomputes importanceScore post-LLM + shadow-log v2 + parity test
Before this change, classified rss_alert events published by ais-relay carried
a stale importanceScore: the digest computed it from the keyword-level threat
before the LLM upgrade, and the relay republished that value unchanged. Shadow
log (2,850 entries / 7 days) had Pearson 0.31 vs human rating with zero events
reaching the ≥85 critical gate — the score being measured was the keyword
fallback, not the AI classification.
Fixes:
- ais-relay.cjs: recompute importanceScore locally from the post-LLM level
using an exact mirror of the digest formula (SEVERITY_SCORES, SCORE_WEIGHTS,
SOURCE_TIERS, formula). Publish includes corroborationCount for downstream
shadow-log enrichment.
- notification-relay.cjs: delete the duplicate shadowLogScore() call that
produced ~50% near-duplicate pairs. Move shadow log to v2 key with
JSON-encoded members carrying severity, source, corroborationCount,
variant — future calibration cycles get cleaner data.
- shadow-score-{report,rank}.mjs: parse both v2 JSON and legacy v1 string
members; default to v2, override via SHADOW_SCORE_KEY env.
- _classifier.ts: narrow keyword additions — blockade, siege, sanction
(singular), escalation → HIGH; evacuation orders (plural) → CRITICAL.
- tests/importance-score-parity.test.mjs: extracts tier map and formula from
both TS digest and CJS relay sources, asserts identical output across 12
sample cases. Catches any future drift.
- tests/relay-importance-recompute.test.mjs + notification-relay-shadow-log
.test.mjs: regression tests for the publish path and single-write discipline.
Gates remain OFF. After deploy, collect 48h of fresh shadow:score-log:v2
data, re-run scripts/shadow-score-rank.mjs for calibration, then set final
IMPORTANCE_SCORE_MIN / high / critical thresholds before enabling
IMPORTANCE_SCORE_LIVE=1.
See docs/internal/scoringDiagnostic.md (local) for full diagnosis.
🤖 Generated with Claude Sonnet 4.6 via Claude Code + Compound Engineering v2.49.0
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(scoring): PR #3069 review amendments — revert risky keywords + extract SOURCE_TIERS
Addresses review findings on PR #3069 (todos/193 through 204).
BLOCKING (P1):
- Revert 5 keyword additions in _classifier.ts. Review showed `escalation`,
`sanction`, `siege`, `blockade`, `evacuation orders` fire HIGH on
`de-escalation`, `sanctioned`, `besieged`, `blockaded` (substring matches),
and the plural `evacuation orders` is already covered by the singular.
Classifier work will land in a separate PR with phrase-based rules.
- Delete dead `digestImportanceScore` field from relay allTitles metadata
(written in two places, read nowhere).
IMPORTANT (P2):
- Extract SOURCE_TIERS to shared/source-tiers.{json,cjs} using the
existing shared/rss-allowed-domains.* precedent. Dockerfile.relay
already `COPY shared/` (whole dir), so no infra change. Deletes
255-line inline duplicate from ais-relay.cjs; TS digest imports the
same JSON via resolveJsonModule. Tier-map parity is now structural.
- Simplify parity test — tier extraction no longer needed. SEVERITY_SCORES
+ SCORE_WEIGHTS + scoring function parity retained across 12 cases
plus an unknown-level defensiveness case. Deleted no-op regex replace
(`.replace(/X/g, 'X')`). Fixed misleading recency docstring.
- Pipeline shadow log: ZADD + ZREMRANGEBYSCORE + belt+suspenders EXPIRE
now go in a single Upstash /pipeline POST (~50% RT reduction, no
billing delta).
- Bounded ZRANGE in shadow-score-report.mjs (20k cap + warn if reached).
- Bump outbound webhook envelope v1→v2 to signal the new
`corroborationCount` field on rss_alert payloads.
- Restore rss_alert eventType gate at shadowLogScore caller (skip
promise cost for non-rss events).
- Align ais-relay scorer comment with reality: it has ONE intentional
deviation from digest (`?? 0` on severity for defensiveness, returning
0 vs NaN on unknown levels). Documented + tested.
P3:
- Narrow loadEnv in scripts to only UPSTASH_REDIS_REST_* (was setting
any UPPER_UNDERSCORE env var from .env.local).
- Escape markdown specials in rating-sheet.md title embeds.
Pre-existing activation blockers NOT fixed here (tracked as todos 196,
203): `/api/notify` accepts arbitrary importanceScore from authenticated
Pro users, and notification channel bodies don't escape mrkdwn/Discord
markup. Both must close before `IMPORTANCE_SCORE_LIVE=1`.
Net: -614 lines (more deleted than added). 26 regression assertions pass.
npm run typecheck, typecheck:api, test:data all pass.
🤖 Generated with Claude Sonnet 4.6 via Claude Code + Compound Engineering v2.49.0
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(scoring): mirror source-tiers.{json,cjs} into scripts/shared/
The `scripts-shared-mirror` enforcement in tests/edge-functions.test.mjs
requires every *.json and *.cjs in shared/ to have a byte-identical copy
in scripts/shared/ (Railway rootDirectory=scripts deploy bundle cannot
reach repo-root shared/). Last commit added shared/source-tiers.{json,cjs}
without mirroring them. CI caught it.
* fix(scoring): revert webhook envelope to v1 + log shadow-log pipeline failures
Two P1/P2 review findings on PR #3069:
P1: Bumping webhook envelope to version: '2' was a unilateral breaking
change — the other webhook producers (proactive-intelligence.mjs:407,
seed-digest-notifications.mjs:736) still emit v1, so the same endpoint
would receive mixed envelope versions per event type. A consumer
validating `version === '1'` would break specifically on realtime
rss_alert deliveries while proactive_brief and digest events kept
working. Revert to '1' and document why — `corroborationCount` is an
additive payload field, backwards-compatible for typical consumers;
strict consumers using `additionalProperties: false` should be handled
via a coordinated version bump across all producers in a separate PR.
P2: The new shadow-log /pipeline write swallowed all errors silently
(no resp.ok check, no per-command error inspection), a regression from
the old upstashRest() path which at least logged non-2xx. Since the
48h recalibration cycle depends on shadow:score-log:v2 filling with
clean data, a bad auth token or malformed pipeline body would leave
operators staring at an empty ZSET with no signal. Now logs HTTP
failures and per-command pipeline errors.
* docs(scoring): fix stale v1 references + clarify two-copy source-tiers mirror
Two follow-up review findings on PR #3069:
P2 — Source-tier "single source of truth" comments were outdated.
PR #3069 ships TWO JSON copies (shared/source-tiers.json for Vercel edge +
main relay container, scripts/shared/source-tiers.json for Railway services
using rootDirectory=scripts). Comments in server/_shared/source-tiers.ts
and scripts/ais-relay.cjs now explicitly document the mirror setup and
point at the two tests that enforce byte-identity: the existing
scripts-shared-mirror test (tests/edge-functions.test.mjs:37-48) and a
new explicit cross-check in tests/importance-score-parity.test.mjs.
Adding the assertion here is belt-and-suspenders: if edge-functions.test.mjs
is ever narrowed, the parity test still catches drift.
P3 — Stale v1 references in shared metadata/docs. The actual writer
moved to shadow:score-log:v2 in notification-relay.cjs, but
server/_shared/cache-keys.ts:23-31 still documented v1 and exported the
v1 string. No runtime impact (the export has zero importers — relay
uses its own local const) but misleading. Updated the doc block to
explain both v1 (legacy, self-pruning) and v2 (current), bumped the
constant to v2, and added a comment that notification-relay.cjs is the
owner. Header comment in scripts/shadow-score-report.mjs now documents
the SHADOW_SCORE_KEY override path too.
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
|