mirror of
https://github.com/koala73/worldmonitor.git
synced 2026-05-13 02:26:22 +02:00
* feat(seeds): add Railway seed scripts for economic and trade endpoints
Two new seed scripts to eliminate Vercel edge external API calls:
seed-economy.mjs:
- EIA energy prices (WTI, Brent) -> economic:energy:v1:all
- EIA energy capacity (Solar, Wind, Coal) -> economic:capacity:v1:COL,SUN,WND:20
- FRED series (10 series) -> economic:fred:v1:<id>:120
- Macro signals (Yahoo, Alternative.me, Mempool) -> economic:macro-signals:v1
seed-supply-chain-trade.mjs:
- Shipping rates (FRED) -> supply_chain:shipping:v2
- Trade barriers (WTO tariff gap) -> trade:barriers:v1:tariff-gap:50
- Trade restrictions (WTO MFN overview) -> trade:restrictions:v1:tariff-overview:50
- Trade flows (WTO, 15 major reporters) -> trade:flows:v1:<reporter>:000:10
- Tariff trends (WTO, 15 major reporters) -> trade:tariffs:v1:<reporter>:all:10
Cache keys match handler patterns exactly so cachedFetchJson finds
pre-seeded data and avoids live external API calls from Vercel edge.
* feat(seeds): add seed-aviation.mjs for airport ops and aviation news
Seeds 2 aviation endpoints with predictable default params:
- getAirportOpsSummary (AviationStack + NOTAM) -> aviation:ops-summary:v1:CDG,ESB,FRA,IST,LHR,SAW
- listAviationNews (9 RSS feeds, 24h window) -> aviation:news::24:v1
NOT seeded (inherently on-demand, user-specific inputs):
- getFlightStatus: specific flight number lookup
- trackAircraft: bounding-box or icao24 queries
- listAirportFlights: arbitrary airport+direction+limit combos
- getCarrierOps: depends on listAirportFlights with variable params
* feat(seeds): add seed-conflict-intel.mjs for ACLED, HAPI, and PizzINT
Seeds 3 conflict/intelligence endpoints with predictable default params:
- listAcledEvents (all countries, last 30 days) -> conflict:acled:v1:all:0:0
- getHumanitarianSummary (20 top conflict countries) -> conflict:humanitarian:v1:<CC>
- getPizzintStatus (base + GDELT variants) -> intel:pizzint:v1:base, intel:pizzint:v1:gdelt
NOT seeded (inherently on-demand, LLM or user-specific inputs):
- classifyEvent: per-headline LLM classification
- deductSituation: per-query LLM deduction
- getCountryIntelBrief: per-country LLM brief with context hash
- getCountryFacts: per-country REST Countries + Wikidata + Wikipedia
- searchGdeltDocuments: per-query GDELT search
Requires: ACLED_EMAIL, ACLED_KEY, UPSTASH_REDIS_REST_URL/TOKEN
* feat(seeds): add seed-research.mjs for arXiv, HN, tech events, trending repos
Seeds 4 research endpoints:
- listArxivPapers (cs.AI, cs.CL, cs.CR) -> research:arxiv:v1:<cat>::50
- listHackernewsItems (top, best feeds) -> research:hackernews:v1:<feed>:30
- listTechEvents (Techmeme ICS + dev.events RSS) -> research:tech-events:v1
- listTrendingRepos (python, javascript, typescript) -> research:trending:v1:<lang>:daily:50
Tech events key is also seeded by the relay, this script provides backup
hydration and ensures the key is warm even if relay hasn't run yet.
Requires: UPSTASH_REDIS_REST_URL/TOKEN
* feat(seeds): add seed-military-maritime-news.mjs for USNI and nav warnings
Seeds 2 endpoints with predictable default params:
- USNI Fleet Report (WordPress JSON API) -> usni-fleet:sebuf:v1 + stale backup
- Navigational Warnings (NGA broadcast, all areas) -> maritime:navwarnings:v1:all
NOT seeded (inherently on-demand):
- getAircraftDetails/batch: per-icao24 Wingbits lookup
- listMilitaryFlights: bounding-box query (quantized 1-degree grid)
- getVesselSnapshot: in-memory cache, reads from relay /ais-snapshot
- listFeedDigest: per-feed-URL RSS caching (hundreds of feeds, relay proxied)
- summarizeArticle: per-article LLM summarization
Requires: UPSTASH_REDIS_REST_URL/TOKEN
* feat(seeds): add seed-infra.mjs warm-ping for service statuses and cable health
Uses warm-ping pattern (calls Vercel RPC from Railway) because:
- list-service-statuses: 30 status page parsers with 8 custom formats
- get-cable-health: NGA text analysis with cable name matching + proximity
Replicating this logic in a standalone script is fragile and duplicative.
NOT seeded (on-demand):
- search-imagery: per-bbox/datetime STAC query
- get-giving-summary: hardcoded baselines, no external fetches
- get-webcam-image: per-webcamId Windy API lookup
* fix(seeds): move secondary key writes before process.exit, fix data shapes
Critical bugs found in code review:
1. runSeed() calls process.exit(0) after primary key write, so .then()
callbacks were dead code. All secondary keys (FRED, macro signals,
trade data, HAPI summaries, pizzint, HN, trending, etc.) were NEVER
written. Fix: move writeExtraKey calls inside fetchAll() before return.
2. FRED cache key used :120 suffix but handler default is :0 (req.limit||0).
Fixed to :0 so seed matches handler cache key for default requests.
3. USNI and nav warnings seed parsers produced wrong data shapes vs handler
(different field names, missing fields). Converted to warm-ping pattern
(like seed-infra.mjs) to avoid shape divergence.
* fix(seeds): reduce GDELT 429 rate limiting in seed-gdelt-intel
Problems from logs: every topic fetch hits 429, runs take 3-5min,
4th run failed fatally after 12min of cascading retries.
Fixes:
- Increase inter-topic delay: 12s -> 20s (GDELT needs longer cooldown)
- Increase initial backoff: 10s -> 20s, with 15s increments per retry
- Graceful degradation: exhausted retries return empty topic instead of
throwing (prevents withRetry from restarting ALL topics from scratch)
- Align TTL with health.js: 3600s -> 7200s (matches maxStaleMin:120)
- Validation allows partial success (3/6 topics minimum)
Cron interval should also be increased from 30min to 2h on Railway
to match the new 2h TTL.
* fix(seeds): 4 bugs from review - ACLED auth, NOTAM key, infra precedence, curated events
P1: ACLED auth used wrong endpoint (api/acled/token) and env vars (ACLED_KEY).
Fixed to match server/acled-auth.ts: ACLED_EMAIL+ACLED_PASSWORD via /oauth/token,
with ACLED_ACCESS_TOKEN static fallback.
P1: Aviation NOTAM key was aviation:notam-closures:v1, handler reads
aviation:notam:closures:v2. Fixed key to match _shared.ts.
P2: Infra warm-ping had operator precedence bug in nullish coalescing:
(a ?? b) ? c : d instead of a ?? (b ? c : d). Added parens.
P2: Research seed missed curated conferences that the handler appends
(CURATED_EVENTS in list-tech-events.ts). Added same curated events so
seeded data matches what the handler would produce.
* fix(seeds): add seed-meta freshness metadata for all secondary keys
Added writeExtraKeyWithMeta() to _seed-utils.mjs that writes both the
data key and a seed-meta:<key> freshness metadata entry. All secondary
key writes in seed scripts now use this helper so health.js can track
freshness for: energy capacity, FRED series, macro signals, trade
barriers/restrictions/flows/tariffs, aviation news, HAPI summaries,
PizzINT, arXiv categories, HN feeds, tech events, trending repos.
Previously only the primary key per script got seed-meta (via runSeed),
leaving secondary keys operationally invisible to health monitoring.
* fix(seeds): align seed-meta keys with health.js conventions
P1: writeExtraKeyWithMeta wrote seed-meta:<full-cache-key> (e.g.,
seed-meta:economic:macro-signals:v1), but health.js expects normalized
names without version suffixes (seed-meta:economic:macro-signals).
Fixed by stripping trailing :v\d+ from key. Added metaKeyOverride
param for cases needing explicit control.
P1: shipping seed used runSeed('supply-chain', 'shipping-trade', ...)
producing seed-meta:supply-chain:shipping-trade, but health.js expects
seed-meta:supply_chain:shipping. Fixed domain/resource to match.
* fix(seeds): only write seed-meta after successful data key write
writeExtraKey() now returns false on failure. writeExtraKeyWithMeta()
skips seed-meta write when the data write fails, preventing false-positive
health reports for keys like macro-signals and tech-events.
115 lines
4.2 KiB
JavaScript
115 lines
4.2 KiB
JavaScript
#!/usr/bin/env node
|
|
|
|
import { loadEnvFile, CHROME_UA, runSeed, sleep } from './_seed-utils.mjs';
|
|
|
|
loadEnvFile(import.meta.url);
|
|
|
|
const CANONICAL_KEY = 'intelligence:gdelt-intel:v1';
|
|
const CACHE_TTL = 7200; // 2h — aligns with health.js maxStaleMin:120
|
|
const GDELT_DOC_API = 'https://api.gdeltproject.org/api/v2/doc/doc';
|
|
const INTER_TOPIC_DELAY_MS = 20_000; // 20s between topics to avoid 429
|
|
|
|
const INTEL_TOPICS = [
|
|
{ id: 'military', query: '(military exercise OR troop deployment OR airstrike OR "naval exercise") sourcelang:eng' },
|
|
{ id: 'cyber', query: '(cyberattack OR ransomware OR hacking OR "data breach" OR APT) sourcelang:eng' },
|
|
{ id: 'nuclear', query: '(nuclear OR uranium enrichment OR IAEA OR "nuclear weapon" OR plutonium) sourcelang:eng' },
|
|
{ id: 'sanctions', query: '(sanctions OR embargo OR "trade war" OR tariff OR "economic pressure") sourcelang:eng' },
|
|
{ id: 'intelligence', query: '(espionage OR spy OR intelligence agency OR covert OR surveillance) sourcelang:eng' },
|
|
{ id: 'maritime', query: '(naval blockade OR piracy OR "strait of hormuz" OR "south china sea" OR warship) sourcelang:eng' },
|
|
];
|
|
|
|
function isValidUrl(str) {
|
|
try {
|
|
const u = new URL(str);
|
|
return u.protocol === 'http:' || u.protocol === 'https:';
|
|
} catch { return false; }
|
|
}
|
|
|
|
function normalizeArticle(raw) {
|
|
const url = raw.url || '';
|
|
if (!isValidUrl(url)) return null;
|
|
return {
|
|
title: String(raw.title || '').slice(0, 500),
|
|
url,
|
|
source: String(raw.domain || raw.source?.domain || '').slice(0, 200),
|
|
date: String(raw.seendate || ''),
|
|
image: isValidUrl(raw.socialimage || '') ? raw.socialimage : '',
|
|
language: String(raw.language || ''),
|
|
tone: typeof raw.tone === 'number' ? raw.tone : 0,
|
|
};
|
|
}
|
|
|
|
async function fetchTopicArticles(topic) {
|
|
const url = new URL(GDELT_DOC_API);
|
|
url.searchParams.set('query', topic.query);
|
|
url.searchParams.set('mode', 'artlist');
|
|
url.searchParams.set('maxrecords', '10');
|
|
url.searchParams.set('format', 'json');
|
|
url.searchParams.set('sort', 'date');
|
|
url.searchParams.set('timespan', '24h');
|
|
|
|
const resp = await fetch(url.toString(), {
|
|
headers: { 'User-Agent': CHROME_UA },
|
|
signal: AbortSignal.timeout(15_000),
|
|
});
|
|
|
|
if (!resp.ok) throw new Error(`GDELT ${topic.id}: HTTP ${resp.status}`);
|
|
|
|
const data = await resp.json();
|
|
const articles = (data.articles || [])
|
|
.map(normalizeArticle)
|
|
.filter(Boolean);
|
|
|
|
return {
|
|
id: topic.id,
|
|
articles,
|
|
fetchedAt: new Date().toISOString(),
|
|
};
|
|
}
|
|
|
|
async function fetchWithRetry(topic, maxRetries = 3) {
|
|
for (let attempt = 0; attempt <= maxRetries; attempt++) {
|
|
try {
|
|
return await fetchTopicArticles(topic);
|
|
} catch (err) {
|
|
const is429 = err.message?.includes('429');
|
|
if (!is429 || attempt === maxRetries) {
|
|
// Non-429 error or exhausted retries: return empty rather than killing the whole seed
|
|
console.warn(` ${topic.id}: giving up after ${attempt + 1} attempts (${err.message})`);
|
|
return { id: topic.id, articles: [], fetchedAt: new Date().toISOString() };
|
|
}
|
|
// Start backoff at 20s (GDELT needs longer cooldown than 10s)
|
|
const backoff = 20_000 + attempt * 15_000;
|
|
console.log(` 429 rate-limited, waiting ${backoff / 1000}s...`);
|
|
await sleep(backoff);
|
|
}
|
|
}
|
|
}
|
|
|
|
async function fetchAllTopics() {
|
|
const topics = [];
|
|
for (let i = 0; i < INTEL_TOPICS.length; i++) {
|
|
if (i > 0) await sleep(INTER_TOPIC_DELAY_MS);
|
|
console.log(` Fetching ${INTEL_TOPICS[i].id}...`);
|
|
const result = await fetchWithRetry(INTEL_TOPICS[i]);
|
|
console.log(` ${result.articles.length} articles`);
|
|
topics.push(result);
|
|
}
|
|
return { topics, fetchedAt: new Date().toISOString() };
|
|
}
|
|
|
|
function validate(data) {
|
|
if (!Array.isArray(data?.topics) || data.topics.length === 0) return false;
|
|
const populated = data.topics.filter((t) => Array.isArray(t.articles) && t.articles.length > 0);
|
|
return populated.length >= 3; // at least 3 of 6 topics must have articles
|
|
}
|
|
|
|
runSeed('intelligence', 'gdelt-intel', CANONICAL_KEY, fetchAllTopics, {
|
|
validateFn: validate,
|
|
ttlSeconds: CACHE_TTL,
|
|
sourceVersion: 'gdelt-doc-v2',
|
|
}).catch((err) => {
|
|
console.error('FATAL:', err.message || err);
|
|
process.exit(1);
|
|
});
|