eliott/worldmonitor - worldmonitor - lab48

eliott/worldmonitor

mirror of https://github.com/koala73/worldmonitor.git synced 2026-04-25 17:14:57 +02:00

Author	SHA1	Message	Date
Elie Habib	044598346e	feat(seed-contract): PR 2a — runSeed envelope dual-write + 91 seeders migrated (#3097 ) * feat(seed-contract): PR 2a — runSeed envelope dual-write + 91 seeders migrated Opt-in contract path in runSeed: when opts.declareRecords is provided, write {_seed, data} envelope to the canonical key alongside legacy seed-meta:* (dual-write). State machine: OK / OK_ZERO / RETRY with zeroIsValid opt. declareRecords throws or returns non-integer → hard fail (contract violation). extraKeys[] support per-key declareRecords; each extra key writes its own envelope. Legacy seeders (no declareRecords) entirely unchanged. Migrated all 91 scripts/seed-.mjs to contract mode. Each exports declareRecords returning the canonical record count, and passes schemaVersion: 1 + maxStaleMin (matched to api/health.js SEED_META, or 2.5x interval where no registry entry exists). Contract conformance reports 84/86 seeders with full descriptor (2 pre-existing warnings). Legacy seed-meta keys still written so unmigrated readers keep working; follow-up slices flip health.js + readers to envelope-first. Tests: 61/61 PR 1 tests still pass. Next slices for PR 2: - api/health.js registry collapse + 15 seed-bundle-.mjs canonicalKey wiring - reader migration (mcp, resilience, aviation, displacement, regional-snapshot) - direct writers — ais-relay.cjs, consumer-prices-core publish.ts - public-boundary stripSeedEnvelope + test migration Plan: docs/plans/2026-04-14-002-fix-runseed-zero-record-lockout-plan.md fix(seed-contract): unwrap envelopes in internal cross-seed readers After PR 2a enveloped 91 canonical keys as {_seed, data}, every script-side reader that returned the raw parsed JSON started silently handing callers the envelope instead of the bare payload. WoW baselines (bigmac, grocery-basket, fear-greed) saw undefined .countries / .composite; seed-climate-anomalies saw undefined .normals from climate:zone-normals:v1; seed-thermal-escalation saw undefined .fireDetections from wildfire:fires:v1; seed-forecasts' ~40-key pipeline batch returned envelopes for every input. Fix: route every script-side reader through unwrapEnvelope(...).data. Legacy bare-shape values pass through unchanged (unwrapEnvelope returns {_seed: null, data: raw} for any non-envelope shape). Changed: - scripts/_seed-utils.mjs: import unwrapEnvelope; redisGet, readSeedSnapshot, verifySeedKey all unwrap. Exported new readCanonicalValue() helper for cross-seed consumers. - 18 seed-.mjs scripts with local redisGet-style helpers or inline fetch patched to unwrap via the envelope source module (subagent sweep). - scripts/seed-forecasts.mjs pipeline batch: parse() unwraps each result. - scripts/seed-energy-spine.mjs redisMget: unwraps each result. Tests: - tests/seed-utils-envelope-reads.test.mjs: 7 new cases covering envelope + legacy + null paths for readSeedSnapshot and verifySeedKey. - Full seed suite: 67/67 pass (was 61, +6 new). Addresses both of user's P1 findings on PR #3097. feat(seed-contract): envelope-aware reads in server + api helpers Every RPC and public-boundary reader now automatically strips _seed from contract-mode canonical keys. Legacy bare-shape values pass through unchanged (unwrapEnvelope no-ops on non-envelope shapes). Changed helpers (one-place fix — unblocks ~60 call sites): - server/_shared/redis.ts: getRawJson, getCachedJson, getCachedJsonBatch unwrap by default. cachedFetchJson inherits via getCachedJson. - api/_upstash-json.js: readJsonFromUpstash unwraps (covers api/mcp.ts tool responses + all its canonical-key reads). - api/bootstrap.js: getCachedJsonBatch unwraps (public-boundary — clients never see envelope metadata). Left intentionally unchanged: - api/health.js / api/seed-health.js: read only seed-meta:* keys which remain bare-shape during dual-write. unwrapEnvelope already imported at the meta-read boundary (PR 1) as a defensive no-op. Tests: 67/67 seed tests pass. typecheck + typecheck:api clean. This is the blast-radius fix the PR #3097 review called out — external readers that would otherwise see {_seed, data} after the writer side migrated. * fix(test): strip export keyword in vm.runInContext'd seed source cross-source-signals-regulatory.test.mjs loads scripts/seed-cross-source-signals.mjs via vm.runInContext, which cannot parse ESM `export` syntax. PR 2a added `export function declareRecords` to every seeder, which broke this test's static-analysis approach. Fix: strip the `export` keyword from the declareRecords line in the preprocessed source string so the function body still evaluates as a plain declaration. Full test:data suite: 5307/5307 pass. typecheck + typecheck:api clean. * feat(seed-contract): consumer-prices publish.ts writes envelopes Wrap the 5 canonical keys written by consumer-prices-core/src/jobs/publish.ts (overview, movers:7d/30d, freshness, categories:7d/30d/90d, retailer-spread, basket-series) in {_seed, data} envelopes. Legacy seed-meta:<key> writes preserved for dual-write. Inlined a buildEnvelope helper (10 lines) rather than taking a cross-package dependency — consumer-prices-core is a standalone npm package. Documented the four-file parity contract (mjs source, ts mirror, js edge mirror, this copy). Contract fields: sourceVersion='consumer-prices-core-publish-v1', schemaVersion=1, state='OK' (recordCount>0) or 'OK_ZERO' (legitimate zero). Typecheck: no new errors in publish.ts. * fix(seed-contract): 3 more server-side readers unwrap envelopes Found during final audit: - server/worldmonitor/resilience/v1/_shared.ts: resilience score reader parsed cached GetResilienceScoreResponse raw. Contract-mode seed-resilience-scores now envelopes those keys. - server/worldmonitor/resilience/v1/get-resilience-ranking.ts: p05/p95 interval lookup parsed raw from seed-resilience-scores' extra-key path. - server/worldmonitor/infrastructure/v1/_shared.ts: mgetJson() used for count-source keys (wildfire:fires:v1, news:insights:v1) which are both contract-mode now. All three now unwrap via server/_shared/seed-envelope. Legacy shapes pass through unchanged. Typecheck clean. * feat(seed-contract): ais-relay.cjs direct writes produce envelopes 32 canonical-key write sites in scripts/ais-relay.cjs now produce {_seed, data} envelopes. Inlined buildEnvelope() (CJS module can't require ESM source) + envelopeWrite(key, data, ttlSeconds, meta) wrapper. Enveloped keys span market bootstrap, aviation, cyber-threats, theater-posture, weather-alerts, economic spending/fred/worldbank, tech-events, corridor-risk, usni-fleet, shipping-stress, social:reddit, wsb-tickers, pizzint, product-catalog, chokepoint transits, ucdp-events, satellites, oref. Left bare (not seeded data keys): seed-meta:* (dual-write legacy), classifyCacheKey LLM cache, notam:prev-closed-state internal state, wm:notif:scan-dedup flags. Updated tests/ucdp-seed-resilience.test.mjs regex to accept both upstashSet (pre-contract) and envelopeWrite (post-contract) call patterns. * feat(seed-contract): 15 bundle files add canonicalKey for envelope gate 54 bundle sections across 12 files now declare canonicalKey alongside the existing seedMetaKey. _bundle-runner.mjs (from PR 1) prefers canonicalKey when both are present — gates section runs on envelope._seed.fetchedAt read directly from the data key, eliminating the meta-outlives-data class of bugs. Files touched: - climate (5), derived-signals (2), ecb-eu (3), energy-sources (6), health (2), imf-extended (4), macro (10), market-backup (9), portwatch (4), relay-backup (2), resilience-recovery (5), static-ref (2) Skipped (14 sections, 3 whole bundles): multi-key writers, dynamic templated keys (displacement year-scoped), or non-runSeed orchestrators (regional brief cron, resilience-scores' 222-country publish, validation/ benchmark scripts). These continue to use seedMetaKey or their own gate. seedMetaKey preserved everywhere — dual-write. _bundle-runner.mjs falls back to legacy when canonicalKey is absent. All 15 bundles pass node --check. test:data: 5307/5307. typecheck:all: clean. * fix(seed-contract): 4 PR #3097 review P1s — transform/declareRecords mismatches + envelope leaks Addresses both P1 findings and the extra-key seed-meta leak surfaced in review: 1. runSeed helper-level invariant: seed-meta:* keys NEVER envelope. scripts/_seed-utils.mjs exports shouldEnvelopeKey(key) — returns false for any key starting with 'seed-meta:'. Both atomicPublish (canonical) and writeExtraKey (extras) gate the envelope wrap through this helper. Fixes seed-iea-oil-stocks' ANALYSIS_META_EXTRA_KEY silently getting enveloped, which broke health.js parsing the value as bare {fetchedAt, recordCount}. Also defends against any future manual writeExtraKey(..., envelopeMeta) call that happens to target a seed-meta:* key. 2. seed-token-panels canonical + extras fixed. publishTransform returns data.defi (the defi panel itself, shape {tokens}). Old declareRecords counted data.defi.tokens + data.ai.tokens + data.other.tokens on the transformed payload → 0 → RETRY path → canonical market:defi-tokens:v1 never wrote, and because runSeed returned before the extraKeys loop, market:ai-tokens:v1 + market:other-tokens:v1 stayed stale too. New: declareRecords counts data.tokens on the transformed shape. AI_KEY + OTHER_KEY extras reuse the same function (transforms return structurally identical panels). Added isMain guard so test imports don't fire runSeed. 3. api/product-catalog.js cached reader unwraps envelope. ais-relay.cjs now envelopes product-catalog:v2 via envelopeWrite(). The edge reader did raw JSON.parse(result) and returned {_seed, data} to clients, breaking the cached path. Fix: import unwrapEnvelope from ./_seed-envelope.js, apply after JSON.parse. One site — :238-241 is downstream of getFromCache(), so the single reader fix covers both. 4. Regression lock tests/seed-contract-transform-regressions.test.mjs (11 cases): - shouldEnvelopeKey invariant: seed-meta:* false, canonical true - Token-panels declareRecords works on transformed shape (canonical + both extras) - Explicit repro of pre-fix buggy signature returning 0 — guards against revert - resolveRecordCount accepts 0, rejects non-integer - Product-catalog envelope unwrap returns bare shape; legacy passes through Verification: - npm run test:data → 5318/5318 pass (was 5307 — 11 new regressions) - npm run typecheck:all → clean - node --check on every modified script iea-oil-stocks canonical declareRecords was NOT broken (user confirmed during review — buildIndex preserves .members); only its ANALYSIS_META_EXTRA_KEY was affected, now covered generically by commit 1's helper invariant. * fix(seed-contract): seed-token-panels validateFn also runs on post-transform shape Review finding: fixing declareRecords wasn't sufficient — atomicPublish() runs validateFn(publishData) on the transformed payload too. seed-token-panels' validate() checked data.defi/.ai/.other on the transformed {tokens} shape, returned false, and runSeed took the early skipped-write branch (before even reaching the declareRecords RETRY logic). Net effect: same as before the declareRecords fix — canonical + both extras stayed stale. Fix: validate() now checks the canonical defi panel directly (Array.isArray (data?.tokens) && has at least one t.price > 0). AI/OTHER panels are validated implicitly by their own extraKey declareRecords on write. Audited the other 9 seeders with publishTransform (bls-series, bis-extended, bis-data, gdelt-intel, trade-flows, iea-oil-stocks, jodi-gas, sanctions-pressure, forecasts): all validateFn's correctly target the post-transform shape. Only token-panels regressed. Added 4 regression tests (tests/seed-contract-transform-regressions.test.mjs): - validate accepts transformed panel with priced tokens - validate rejects all-zero-price tokens - validate rejects empty/missing tokens - Explicit pre-fix repro (buggy old signature fails on transformed shape) Verification: - npm run test:data → 5322/5322 pass (was 5318; +4 new) - npm run typecheck:all → clean - node --check clean * feat(seed-contract): add /api/seed-contract-probe validation endpoint Single machine-readable gate for 'is PR #3097 working in production'. Replaces the curl/jq ritual with one authenticated edge call that returns HTTP 200 ok:true or 503 + failing check list. What it validates: - 8 canonical keys have {_seed, data} envelopes with required data fields and minRecords floors (fsi-eu, zone-normals, 3 token panels + minRecords guard against token-panels RETRY regression, product-catalog, wildfire, earthquakes). - 2 seed-meta:* keys remain BARE (shouldEnvelopeKey invariant; guards against iea-oil-stocks ANALYSIS_META_EXTRA_KEY-class regressions). - /api/product-catalog + /api/bootstrap responses contain no '_seed' leak. Auth: x-probe-secret header must match RELAY_SHARED_SECRET (reuses existing Vercel↔Railway internal trust boundary). Probe logic is exported (checkProbe, checkPublicBoundary, DEFAULT_PROBES) for hermetic testing. tests/seed-contract-probe.test.mjs covers every branch: envelope pass/fail on field/records/shape, bare pass/fail on shape/field, missing/malformed JSON, Redis non-2xx, boundary seed-leak detection, DEFAULT_PROBES sanity (seed-meta invariant present, token-panels minRecords guard present). Usage: curl -H "x-probe-secret: $RELAY_SHARED_SECRET" \ https://api.worldmonitor.app/api/seed-contract-probe PR 3 will extend the probe with a stricter mode that asserts seed-meta:* keys are GONE (not just bare) once legacy dual-write is removed. Verification: - tests/seed-contract-probe.test.mjs → 15/15 pass - npm run test:data → 5338/5338 (was 5322; +16 new incl. conformance) - npm run typecheck:all → clean * fix(seed-contract): tighten probe — minRecords on AI/OTHER + cache-path source header Review P2 findings: the probe's stated guards were weaker than advertised. 1. market:ai-tokens:v1 + market:other-tokens:v1 probes claimed to guard the token-panels extra-key RETRY regression but only checked shape='envelope' + dataHas:['tokens']. If an extra-key declareRecords regressed to 0, both probes would still pass because checkProbe() only inspects _seed.recordCount when minRecords is set. Now both enforce minRecords: 1. 2. /api/product-catalog boundary check only asserted no '_seed' leak — which is also true for the static fallback path. A broken cached reader (getFromCache returning null or throwing) could serve fallback silently and still pass this probe. Now: - api/product-catalog.js emits X-Product-Catalog-Source: cache\|dodo\|fallback on the response (the json() helper gained an optional source param wired to each of the three branches). - checkPublicBoundary declaratively requires that header's value match 'cache' for /api/product-catalog, so a fallback-serve fails the probe with reason 'source:fallback!=cache' or 'source:missing!=cache'. Test updates (tests/seed-contract-probe.test.mjs): - Boundary check reworked to use a BOUNDARY_CHECKS config with optional requireSourceHeader per endpoint. - New cases: served-from-cache passes, served-from-fallback fails with source mismatch, missing header fails, seed-leak still takes precedence, bad status fails. - Token-panels sanity test now asserts minRecords≥1 on all 3 panels. Verification: - tests/seed-contract-probe.test.mjs → 17/17 pass (was 15, +2 net) - npm run test:data → 5340/5340 - npm run typecheck:all → clean	2026-04-15 09:16:27 +04:00
Elie Habib	249c088639	fix: add fetch error cause logging to all remaining seed scripts (#1643 )	2026-03-15 12:02:37 +04:00
Elie Habib	fe67111dc9	feat: harness engineering P0 - linting, testing, architecture docs (#1587 ) * feat: harness engineering P0 - linting, testing, architecture docs Add foundational infrastructure for agent-first development: - AGENTS.md: agent entry point with progressive disclosure to deeper docs - ARCHITECTURE.md: 12-section system reference with source-file refs and ownership rule - Biome 2.4.7 linter with project-tuned rules, CI workflow (lint-code.yml) - Architectural boundary lint enforcing forward-only dependency direction (lint-boundaries.mjs) - Unit test CI workflow (test.yml), all 1083 tests passing - Fixed 9 pre-existing test failures (bootstrap sync, deploy-config headers, globe parity, redis mocks, geometry URL, import.meta.env null safety) - Fixed 12 architectural boundary violations (types moved to proper layers) - Added 3 missing cache tier entries in gateway.ts - Synced cache-keys.ts with bootstrap.js - Renamed docs/architecture.mdx to "Design Philosophy" with cross-references - Deprecated legacy docs/Docs_To_Review/ARCHITECTURE.md - Harness engineering roadmap tracking doc * fix: address PR review feedback on harness-engineering-p0 - countries-geojson.test.mjs: skip gracefully when CDN unreachable instead of failing CI on network issues - country-geometry-overrides.test.mts: relax timing assertion (250ms -> 2000ms) for constrained CI environments - lint-boundaries.mjs: implement the documented api/ boundary check (was documented but missing, causing false green) * fix(lint): scan api/ .ts files in boundary check The api/ boundary check only scanned .js/.mjs files, missing the 25 sebuf RPC .ts edge functions. Now scans .ts files with correct rules: - Legacy .js: fully self-contained (no server/ or src/ imports) - RPC .ts: may import server/ and src/generated/ (bundled at deploy), but blocks imports from src/ application code * fix(lint): detect import() type expressions in boundary lint - Move AppContext back to app/app-context.ts (aggregate type that references components/services/utils belongs at the top, not types/) - Move HappyContentCategory and TechHQ to types/ (simple enums/interfaces) - Boundary lint now catches import('@/layer') expressions, not just from '@/layer' imports - correlation-engine imports of AppContext marked boundary-ignore (type-only imports of top-level aggregate)	2026-03-14 21:29:21 +04:00
Elie Habib	9211339d1c	fix(seeds): prevent API quota burn and respect rate limits (#1167 ) * fix(cyber): prevent AbuseIPDB quota burn when Redis rate check fails The catch block in fetchAbuseIpDb() was falling through to the API call when the Redis rate-limit check failed (e.g. Redis down, first run with no key). With a 10-minute cron interval, this could exhaust the 100 calls/day free-plan limit in under 17 hours. Now returns early with { ok: false, threats: [] } so the other 4 IOC sources still seed normally while AbuseIPDB is safely skipped. * fix(seeds): respect API rate limits and log fetch failures 1. seed-fire-detections.mjs: increase delay from 200ms to 6s between FIRMS API calls. Free tier allows 10 req/min; 27 calls at 200ms exceeded this and caused silent failures. 2. ais-relay.cjs (positive events): increase GDELT delay from 500ms to 5.5s to respect the documented 1 req/5s rate limit. 3. ais-relay.cjs (cyber fetchers): replace 5 silent `catch { return [] }` blocks with `console.warn` logging so failures are visible in Railway logs. Dead code today (cyber loop disabled) but sets the right example for contributors. * fix(seeds): extend FIRMS lock TTL and restore AbuseIPDB resilience P1: seed-fire-detections.mjs — the 6s FIRMS pacing makes the job take ~162s minimum, exceeding the default 120s lock TTL. Extend lockTtlMs to 300s (5 min) to prevent overlapping cron invocations. P2: seed-cyber-threats.mjs — revert the early return on Redis rate-check failure. A transient Redis blip should not permanently disable AbuseIPDB for that run. Instead, log a warning and proceed with caution. The 2h rate-limit interval + 10-min cron means at most 1 extra call per Redis outage window, well within the 100/day budget. * fix(wildfire): extend lock TTL to 10 min for worst-case FIRMS timeouts 27 calls × (6s pacing + 30s per-request timeout) = 972s worst case. 300s lock was still too short under partial upstream slowness.	2026-03-07 10:51:45 +04:00
Elie Habib	804e4128f6	fix(cyber): suppress MaxListenersExceededWarning in GeoIP hydration (#1120 ) setMaxListeners on AbortSignal to match concurrent fetch count, preventing 100+ warning lines in Railway logs.	2026-03-06 13:53:27 +04:00
Elie Habib	5e25bb1386	fix(health): resolve all critical health check failures (#1111 ) ## Summary - Reclassify 10 on-demand keys (BIS, supply chain, theater posture, etc.) from BOOTSTRAP → STANDALONE + ON_DEMAND to stop false CRITs - Fix seed-insights Railway OOM by correcting service-level settings - Unify LLM fallback chain (Groq → OpenRouter → Ollama) in seed-insights - Switch OpenRouter model to `openai/gpt-oss-safeguard-20b:nitro` - Fix GDELT v2/geo → v1/gkg_geojson for unrestEvents and positiveGeoEvents (v2 endpoint is dead) - Add seed-meta writes for marketQuotes/commodityQuotes in AIS relay (zero extra Yahoo calls) - Remove aggressive coord filter in cyber threats that dropped all threats when GeoIP rate-limited ## Health impact - 6 false CRITs → eliminated (reclassified as on-demand) - marketQuotes/commodityQuotes STALE_SEED → OK (seed-meta tracking) - unrestEvents EMPTY_DATA → OK (GDELT v1 fix) - positiveGeoEvents EMPTY_DATA → OK (GDELT v1 fix in relay) - cyberThreats resilience improved (coord filter removal)	2026-03-06 13:49:15 +04:00
Elie Habib	478df641fa	fix: rate-guard AbuseIPDB calls and disable duplicate cyber seed loop (#1055 ) Root cause: AbuseIPDB has 100 calls/day limit. The cyber seed cron runs every 2h with a 2h TTL — tight race causes Vercel handler fallthrough to live fetches when the key expires between cron runs. Three fixes: 1. Rate-guard AbuseIPDB in seed-cyber-threats.mjs: checks Redis key `rate:abuseipdb:last-call` before calling API, uses cached threats from `cache:abuseipdb:threats` between calls (2h minimum interval) 2. Disable duplicate cyber seed loop in ais-relay.cjs (standalone cron handles it — avoids 12 extra AbuseIPDB calls/day) 3. Increase seed TTL from 2h to 3h to survive 1 missed cron cycle	2026-03-05 14:37:21 +04:00
Elie Habib	78a14306d9	feat: add seed-first pattern to 15 RPC handlers with Railway seed scripts (#989 ) Migrate handlers from direct external API calls to seed-first pattern: Railway cron seeds Redis → handlers read from Redis → fallback to live fetch if seed stale and SEED_FALLBACK_* env enabled. Handlers updated: earthquakes, fire-detections, internet-outages, climate-anomalies, unrest-events, cyber-threats, market-quotes, commodity-quotes, crypto-quotes, etf-flows, gulf-quotes, stablecoin-markets, natural-events, displacement-summary, risk-scores. Also adds: - scripts/_seed-utils.mjs (shared seed framework with atomic publish, distributed locks, retry, freshness metadata) - 13 seed scripts for Railway cron - api/seed-health.js monitoring endpoint - scripts/validate-seed-migration.mjs post-deploy validation - Restored multi-source CII in get-risk-scores (8 sources: ACLED, UCDP, outages, climate, cyber, fires, GPS, Iran)	2026-03-04 17:37:15 +04:00