Commit Graph

5 Commits

Author SHA1 Message Date
Elie Habib
429b1d99dd Revert "feat: seed orchestrator with auto-seeding, persistence, and managemen…" (#2060)
This reverts commit bb79386d24.
2026-03-22 19:59:42 +04:00
Jon Torrez
bb79386d24 feat: seed orchestrator with auto-seeding, persistence, and management CLI (#1940)
* feat(docker): enable Redis RDB persistence, add seed-orchestrator to supervisord, copy scripts into image

* feat(orchestrator): add prefixed logger utility

* feat(orchestrator): add seed-meta read/write helpers

* feat(orchestrator): add child process runner with timeout support

* feat(orchestrator): add central seed catalog with 42 seeders

* feat(orchestrator): implement seed orchestrator with tiered scheduling

Adds scripts/seed-orchestrator.mjs with:
- classifySeeders: classify seeders into active/skipped by env vars
- buildStartupSummary: human-readable startup report
- Tiered cold start (hot/warm/cold/frozen with per-tier concurrency)
- Freshness check via seed-meta keys before running stale seeders
- Steady-state scheduling with setTimeout-based recurring timers
- Overlap protection, retry-after-60s, consecutive failure demotion
- Global concurrency cap of 5 with queue-based overflow
- Graceful shutdown on SIGTERM/SIGINT (15s drain timeout)
- Meta writing for null-metaKey seeders to seed-meta:orchestrator:{name}

* fix(seeds): use local API for warm-ping seeders in Docker mode

* fix(orchestrator): allow ACLED email+password as alternative to access token

* feat(wmsm): add seed manager CLI scaffold with help, catalog, and checks

* feat(wmsm): implement status command with freshness display

* feat(wmsm): implement schedule command with next-run estimates

* feat(wmsm): implement refresh command with single and --all modes

* feat(wmsm): implement flush and logs commands

* fix(wmsm): auto-detect docker vs podman runtime

* feat(orchestrator): extract pure scheduling functions and add test harness

* feat(orchestrator): add SEED_TURBO=real|dry mode with compressed intervals

* feat(orchestrator): add SEED_TURBO env passthrough and fix retry log message
2026-03-22 19:51:03 +04:00
Elie Habib
65543d71d5 fix(seeds): improve resilience and fix dead APIs across seed scripts (#1644)
* fix(seeds): improve resilience and fix dead APIs across seed scripts

- Fix wrong domain in seed-service-statuses (worldmonitor.app to api.worldmonitor.app)
- Fix Kalshi API domain migration (trading-api.kalshi.com to api.elections.kalshi.com)
- Replace dead trending APIs (gitterapp.com, herokuapp.com) with OSSInsight + GitHub Search
- Fix case-sensitive HTML detection in seed-usni-fleet (lowercase doctype not matched)
- Add Promise.allSettled rejection logging across 8 seed scripts
- Wrap fetch loops in try-catch (seed-supply-chain-trade, seed-economy) so a single
  network error no longer kills the entire function
- Update list-trending-repos.ts RPC handler to match seed changes

* fix(seeds): correct OSSInsight response parsing and period-aware GitHub Search fallback

- OSSInsight returns {data: {rows: [...]}} not {data: [...]}, fix both seed and handler
- GitHub Search fallback now respects period parameter (daily=1d, weekly=7d, monthly=30d)

* fix(seeds): correct OSSInsight period values (past_week/past_month, not past_7_days/past_28_days)
2026-03-15 13:31:41 +04:00
Elie Habib
ba3f94133c fix(seeds): add Origin header to warm-ping scripts for CORS bypass (#1619)
Warm-ping requests from Railway IPs were getting Cloudflare bot
challenge HTML instead of JSON. The relay's service-statuses warm-ping
works because it sends Origin: 'https://worldmonitor.app'. Added the
same header to seed-infra.mjs and seed-military-maritime-news.mjs.

Fixes: usniFleet STALE_SEED, cableHealth EMPTY_ON_DEMAND
2026-03-15 03:15:30 +04:00
Elie Habib
485d416065 feat(seeds): Railway seed scripts for all unseeded Vercel RPC endpoints (#1599)
* feat(seeds): add Railway seed scripts for economic and trade endpoints

Two new seed scripts to eliminate Vercel edge external API calls:

seed-economy.mjs:
- EIA energy prices (WTI, Brent) -> economic:energy:v1:all
- EIA energy capacity (Solar, Wind, Coal) -> economic:capacity:v1:COL,SUN,WND:20
- FRED series (10 series) -> economic:fred:v1:<id>:120
- Macro signals (Yahoo, Alternative.me, Mempool) -> economic:macro-signals:v1

seed-supply-chain-trade.mjs:
- Shipping rates (FRED) -> supply_chain:shipping:v2
- Trade barriers (WTO tariff gap) -> trade:barriers:v1:tariff-gap:50
- Trade restrictions (WTO MFN overview) -> trade:restrictions:v1:tariff-overview:50
- Trade flows (WTO, 15 major reporters) -> trade:flows:v1:<reporter>:000:10
- Tariff trends (WTO, 15 major reporters) -> trade:tariffs:v1:<reporter>:all:10

Cache keys match handler patterns exactly so cachedFetchJson finds
pre-seeded data and avoids live external API calls from Vercel edge.

* feat(seeds): add seed-aviation.mjs for airport ops and aviation news

Seeds 2 aviation endpoints with predictable default params:
- getAirportOpsSummary (AviationStack + NOTAM) -> aviation:ops-summary:v1:CDG,ESB,FRA,IST,LHR,SAW
- listAviationNews (9 RSS feeds, 24h window) -> aviation:news::24:v1

NOT seeded (inherently on-demand, user-specific inputs):
- getFlightStatus: specific flight number lookup
- trackAircraft: bounding-box or icao24 queries
- listAirportFlights: arbitrary airport+direction+limit combos
- getCarrierOps: depends on listAirportFlights with variable params

* feat(seeds): add seed-conflict-intel.mjs for ACLED, HAPI, and PizzINT

Seeds 3 conflict/intelligence endpoints with predictable default params:
- listAcledEvents (all countries, last 30 days) -> conflict:acled:v1:all:0:0
- getHumanitarianSummary (20 top conflict countries) -> conflict:humanitarian:v1:<CC>
- getPizzintStatus (base + GDELT variants) -> intel:pizzint:v1:base, intel:pizzint:v1:gdelt

NOT seeded (inherently on-demand, LLM or user-specific inputs):
- classifyEvent: per-headline LLM classification
- deductSituation: per-query LLM deduction
- getCountryIntelBrief: per-country LLM brief with context hash
- getCountryFacts: per-country REST Countries + Wikidata + Wikipedia
- searchGdeltDocuments: per-query GDELT search

Requires: ACLED_EMAIL, ACLED_KEY, UPSTASH_REDIS_REST_URL/TOKEN

* feat(seeds): add seed-research.mjs for arXiv, HN, tech events, trending repos

Seeds 4 research endpoints:
- listArxivPapers (cs.AI, cs.CL, cs.CR) -> research:arxiv:v1:<cat>::50
- listHackernewsItems (top, best feeds) -> research:hackernews:v1:<feed>:30
- listTechEvents (Techmeme ICS + dev.events RSS) -> research:tech-events:v1
- listTrendingRepos (python, javascript, typescript) -> research:trending:v1:<lang>:daily:50

Tech events key is also seeded by the relay, this script provides backup
hydration and ensures the key is warm even if relay hasn't run yet.

Requires: UPSTASH_REDIS_REST_URL/TOKEN

* feat(seeds): add seed-military-maritime-news.mjs for USNI and nav warnings

Seeds 2 endpoints with predictable default params:
- USNI Fleet Report (WordPress JSON API) -> usni-fleet:sebuf:v1 + stale backup
- Navigational Warnings (NGA broadcast, all areas) -> maritime:navwarnings:v1:all

NOT seeded (inherently on-demand):
- getAircraftDetails/batch: per-icao24 Wingbits lookup
- listMilitaryFlights: bounding-box query (quantized 1-degree grid)
- getVesselSnapshot: in-memory cache, reads from relay /ais-snapshot
- listFeedDigest: per-feed-URL RSS caching (hundreds of feeds, relay proxied)
- summarizeArticle: per-article LLM summarization

Requires: UPSTASH_REDIS_REST_URL/TOKEN

* feat(seeds): add seed-infra.mjs warm-ping for service statuses and cable health

Uses warm-ping pattern (calls Vercel RPC from Railway) because:
- list-service-statuses: 30 status page parsers with 8 custom formats
- get-cable-health: NGA text analysis with cable name matching + proximity
Replicating this logic in a standalone script is fragile and duplicative.

NOT seeded (on-demand):
- search-imagery: per-bbox/datetime STAC query
- get-giving-summary: hardcoded baselines, no external fetches
- get-webcam-image: per-webcamId Windy API lookup

* fix(seeds): move secondary key writes before process.exit, fix data shapes

Critical bugs found in code review:

1. runSeed() calls process.exit(0) after primary key write, so .then()
   callbacks were dead code. All secondary keys (FRED, macro signals,
   trade data, HAPI summaries, pizzint, HN, trending, etc.) were NEVER
   written. Fix: move writeExtraKey calls inside fetchAll() before return.

2. FRED cache key used :120 suffix but handler default is :0 (req.limit||0).
   Fixed to :0 so seed matches handler cache key for default requests.

3. USNI and nav warnings seed parsers produced wrong data shapes vs handler
   (different field names, missing fields). Converted to warm-ping pattern
   (like seed-infra.mjs) to avoid shape divergence.

* fix(seeds): reduce GDELT 429 rate limiting in seed-gdelt-intel

Problems from logs: every topic fetch hits 429, runs take 3-5min,
4th run failed fatally after 12min of cascading retries.

Fixes:
- Increase inter-topic delay: 12s -> 20s (GDELT needs longer cooldown)
- Increase initial backoff: 10s -> 20s, with 15s increments per retry
- Graceful degradation: exhausted retries return empty topic instead of
  throwing (prevents withRetry from restarting ALL topics from scratch)
- Align TTL with health.js: 3600s -> 7200s (matches maxStaleMin:120)
- Validation allows partial success (3/6 topics minimum)

Cron interval should also be increased from 30min to 2h on Railway
to match the new 2h TTL.

* fix(seeds): 4 bugs from review - ACLED auth, NOTAM key, infra precedence, curated events

P1: ACLED auth used wrong endpoint (api/acled/token) and env vars (ACLED_KEY).
Fixed to match server/acled-auth.ts: ACLED_EMAIL+ACLED_PASSWORD via /oauth/token,
with ACLED_ACCESS_TOKEN static fallback.

P1: Aviation NOTAM key was aviation:notam-closures:v1, handler reads
aviation:notam:closures:v2. Fixed key to match _shared.ts.

P2: Infra warm-ping had operator precedence bug in nullish coalescing:
(a ?? b) ? c : d instead of a ?? (b ? c : d). Added parens.

P2: Research seed missed curated conferences that the handler appends
(CURATED_EVENTS in list-tech-events.ts). Added same curated events so
seeded data matches what the handler would produce.

* fix(seeds): add seed-meta freshness metadata for all secondary keys

Added writeExtraKeyWithMeta() to _seed-utils.mjs that writes both the
data key and a seed-meta:<key> freshness metadata entry. All secondary
key writes in seed scripts now use this helper so health.js can track
freshness for: energy capacity, FRED series, macro signals, trade
barriers/restrictions/flows/tariffs, aviation news, HAPI summaries,
PizzINT, arXiv categories, HN feeds, tech events, trending repos.

Previously only the primary key per script got seed-meta (via runSeed),
leaving secondary keys operationally invisible to health monitoring.

* fix(seeds): align seed-meta keys with health.js conventions

P1: writeExtraKeyWithMeta wrote seed-meta:<full-cache-key> (e.g.,
seed-meta:economic:macro-signals:v1), but health.js expects normalized
names without version suffixes (seed-meta:economic:macro-signals).
Fixed by stripping trailing :v\d+ from key. Added metaKeyOverride
param for cases needing explicit control.

P1: shipping seed used runSeed('supply-chain', 'shipping-trade', ...)
producing seed-meta:supply-chain:shipping-trade, but health.js expects
seed-meta:supply_chain:shipping. Fixed domain/resource to match.

* fix(seeds): only write seed-meta after successful data key write

writeExtraKey() now returns false on failure. writeExtraKeyWithMeta()
skips seed-meta write when the data write fails, preventing false-positive
health reports for keys like macro-signals and tech-events.
2026-03-15 00:37:31 +04:00