mirror of
https://github.com/koala73/worldmonitor.git
synced 2026-04-25 17:14:57 +02:00
* feat(seed-contract): PR 2a — runSeed envelope dual-write + 91 seeders migrated
Opt-in contract path in runSeed: when opts.declareRecords is provided, write
{_seed, data} envelope to the canonical key alongside legacy seed-meta:*
(dual-write). State machine: OK / OK_ZERO / RETRY with zeroIsValid opt.
declareRecords throws or returns non-integer → hard fail (contract violation).
extraKeys[*] support per-key declareRecords; each extra key writes its own
envelope. Legacy seeders (no declareRecords) entirely unchanged.
Migrated all 91 scripts/seed-*.mjs to contract mode. Each exports
declareRecords returning the canonical record count, and passes
schemaVersion: 1 + maxStaleMin (matched to api/health.js SEED_META, or 2.5x
interval where no registry entry exists). Contract conformance reports 84/86
seeders with full descriptor (2 pre-existing warnings).
Legacy seed-meta keys still written so unmigrated readers keep working;
follow-up slices flip health.js + readers to envelope-first.
Tests: 61/61 PR 1 tests still pass.
Next slices for PR 2:
- api/health.js registry collapse + 15 seed-bundle-*.mjs canonicalKey wiring
- reader migration (mcp, resilience, aviation, displacement, regional-snapshot)
- direct writers — ais-relay.cjs, consumer-prices-core publish.ts
- public-boundary stripSeedEnvelope + test migration
Plan: docs/plans/2026-04-14-002-fix-runseed-zero-record-lockout-plan.md
* fix(seed-contract): unwrap envelopes in internal cross-seed readers
After PR 2a enveloped 91 canonical keys as {_seed, data}, every script-side
reader that returned the raw parsed JSON started silently handing callers the
envelope instead of the bare payload. WoW baselines (bigmac, grocery-basket,
fear-greed) saw undefined .countries / .composite; seed-climate-anomalies saw
undefined .normals from climate:zone-normals:v1; seed-thermal-escalation saw
undefined .fireDetections from wildfire:fires:v1; seed-forecasts' ~40-key
pipeline batch returned envelopes for every input.
Fix: route every script-side reader through unwrapEnvelope(...).data. Legacy
bare-shape values pass through unchanged (unwrapEnvelope returns
{_seed: null, data: raw} for any non-envelope shape).
Changed:
- scripts/_seed-utils.mjs: import unwrapEnvelope; redisGet, readSeedSnapshot,
verifySeedKey all unwrap. Exported new readCanonicalValue() helper for
cross-seed consumers.
- 18 seed-*.mjs scripts with local redisGet-style helpers or inline fetch
patched to unwrap via the envelope source module (subagent sweep).
- scripts/seed-forecasts.mjs pipeline batch: parse() unwraps each result.
- scripts/seed-energy-spine.mjs redisMget: unwraps each result.
Tests:
- tests/seed-utils-envelope-reads.test.mjs: 7 new cases covering envelope
+ legacy + null paths for readSeedSnapshot and verifySeedKey.
- Full seed suite: 67/67 pass (was 61, +6 new).
Addresses both of user's P1 findings on PR #3097.
* feat(seed-contract): envelope-aware reads in server + api helpers
Every RPC and public-boundary reader now automatically strips _seed from
contract-mode canonical keys. Legacy bare-shape values pass through unchanged
(unwrapEnvelope no-ops on non-envelope shapes).
Changed helpers (one-place fix — unblocks ~60 call sites):
- server/_shared/redis.ts: getRawJson, getCachedJson, getCachedJsonBatch
unwrap by default. cachedFetchJson inherits via getCachedJson.
- api/_upstash-json.js: readJsonFromUpstash unwraps (covers api/mcp.ts
tool responses + all its canonical-key reads).
- api/bootstrap.js: getCachedJsonBatch unwraps (public-boundary —
clients never see envelope metadata).
Left intentionally unchanged:
- api/health.js / api/seed-health.js: read only seed-meta:* keys which
remain bare-shape during dual-write. unwrapEnvelope already imported at
the meta-read boundary (PR 1) as a defensive no-op.
Tests: 67/67 seed tests pass. typecheck + typecheck:api clean.
This is the blast-radius fix the PR #3097 review called out — external
readers that would otherwise see {_seed, data} after the writer side
migrated.
* fix(test): strip export keyword in vm.runInContext'd seed source
cross-source-signals-regulatory.test.mjs loads scripts/seed-cross-source-signals.mjs
via vm.runInContext, which cannot parse ESM `export` syntax. PR 2a added
`export function declareRecords` to every seeder, which broke this test's
static-analysis approach.
Fix: strip the `export` keyword from the declareRecords line in the
preprocessed source string so the function body still evaluates as a plain
declaration.
Full test:data suite: 5307/5307 pass. typecheck + typecheck:api clean.
* feat(seed-contract): consumer-prices publish.ts writes envelopes
Wrap the 5 canonical keys written by consumer-prices-core/src/jobs/publish.ts
(overview, movers:7d/30d, freshness, categories:7d/30d/90d, retailer-spread,
basket-series) in {_seed, data} envelopes. Legacy seed-meta:<key> writes
preserved for dual-write.
Inlined a buildEnvelope helper (10 lines) rather than taking a cross-package
dependency — consumer-prices-core is a standalone npm package. Documented the
four-file parity contract (mjs source, ts mirror, js edge mirror, this copy).
Contract fields: sourceVersion='consumer-prices-core-publish-v1', schemaVersion=1,
state='OK' (recordCount>0) or 'OK_ZERO' (legitimate zero).
Typecheck: no new errors in publish.ts.
* fix(seed-contract): 3 more server-side readers unwrap envelopes
Found during final audit:
- server/worldmonitor/resilience/v1/_shared.ts: resilience score reader
parsed cached GetResilienceScoreResponse raw. Contract-mode seed-resilience-scores
now envelopes those keys.
- server/worldmonitor/resilience/v1/get-resilience-ranking.ts: p05/p95
interval lookup parsed raw from seed-resilience-scores' extra-key path.
- server/worldmonitor/infrastructure/v1/_shared.ts: mgetJson() used for
count-source keys (wildfire:fires:v1, news:insights:v1) which are both
contract-mode now.
All three now unwrap via server/_shared/seed-envelope. Legacy shapes pass
through unchanged.
Typecheck clean.
* feat(seed-contract): ais-relay.cjs direct writes produce envelopes
32 canonical-key write sites in scripts/ais-relay.cjs now produce {_seed, data}
envelopes. Inlined buildEnvelope() (CJS module can't require ESM source) +
envelopeWrite(key, data, ttlSeconds, meta) wrapper. Enveloped keys span market
bootstrap, aviation, cyber-threats, theater-posture, weather-alerts, economic
spending/fred/worldbank, tech-events, corridor-risk, usni-fleet, shipping-stress,
social:reddit, wsb-tickers, pizzint, product-catalog, chokepoint transits,
ucdp-events, satellites, oref.
Left bare (not seeded data keys): seed-meta:* (dual-write legacy),
classifyCacheKey LLM cache, notam:prev-closed-state internal state,
wm:notif:scan-dedup flags.
Updated tests/ucdp-seed-resilience.test.mjs regex to accept both upstashSet
(pre-contract) and envelopeWrite (post-contract) call patterns.
* feat(seed-contract): 15 bundle files add canonicalKey for envelope gate
54 bundle sections across 12 files now declare canonicalKey alongside the
existing seedMetaKey. _bundle-runner.mjs (from PR 1) prefers canonicalKey
when both are present — gates section runs on envelope._seed.fetchedAt
read directly from the data key, eliminating the meta-outlives-data class
of bugs.
Files touched:
- climate (5), derived-signals (2), ecb-eu (3), energy-sources (6),
health (2), imf-extended (4), macro (10), market-backup (9),
portwatch (4), relay-backup (2), resilience-recovery (5), static-ref (2)
Skipped (14 sections, 3 whole bundles): multi-key writers, dynamic
templated keys (displacement year-scoped), or non-runSeed orchestrators
(regional brief cron, resilience-scores' 222-country publish, validation/
benchmark scripts). These continue to use seedMetaKey or their own gate.
seedMetaKey preserved everywhere — dual-write. _bundle-runner.mjs falls
back to legacy when canonicalKey is absent.
All 15 bundles pass node --check. test:data: 5307/5307. typecheck:all: clean.
* fix(seed-contract): 4 PR #3097 review P1s — transform/declareRecords mismatches + envelope leaks
Addresses both P1 findings and the extra-key seed-meta leak surfaced in review:
1. runSeed helper-level invariant: seed-meta:* keys NEVER envelope.
scripts/_seed-utils.mjs exports shouldEnvelopeKey(key) — returns false for
any key starting with 'seed-meta:'. Both atomicPublish (canonical) and
writeExtraKey (extras) gate the envelope wrap through this helper. Fixes
seed-iea-oil-stocks' ANALYSIS_META_EXTRA_KEY silently getting enveloped,
which broke health.js parsing the value as bare {fetchedAt, recordCount}.
Also defends against any future manual writeExtraKey(..., envelopeMeta)
call that happens to target a seed-meta:* key.
2. seed-token-panels canonical + extras fixed.
publishTransform returns data.defi (the defi panel itself, shape {tokens}).
Old declareRecords counted data.defi.tokens + data.ai.tokens + data.other.tokens
on the transformed payload → 0 → RETRY path → canonical market:defi-tokens:v1
never wrote, and because runSeed returned before the extraKeys loop,
market:ai-tokens:v1 + market:other-tokens:v1 stayed stale too.
New: declareRecords counts data.tokens on the transformed shape. AI_KEY +
OTHER_KEY extras reuse the same function (transforms return structurally
identical panels). Added isMain guard so test imports don't fire runSeed.
3. api/product-catalog.js cached reader unwraps envelope.
ais-relay.cjs now envelopes product-catalog:v2 via envelopeWrite(). The
edge reader did raw JSON.parse(result) and returned {_seed, data} to
clients, breaking the cached path. Fix: import unwrapEnvelope from
./_seed-envelope.js, apply after JSON.parse. One site — :238-241 is
downstream of getFromCache(), so the single reader fix covers both.
4. Regression lock tests/seed-contract-transform-regressions.test.mjs (11 cases):
- shouldEnvelopeKey invariant: seed-meta:* false, canonical true
- Token-panels declareRecords works on transformed shape (canonical + both extras)
- Explicit repro of pre-fix buggy signature returning 0 — guards against revert
- resolveRecordCount accepts 0, rejects non-integer
- Product-catalog envelope unwrap returns bare shape; legacy passes through
Verification:
- npm run test:data → 5318/5318 pass (was 5307 — 11 new regressions)
- npm run typecheck:all → clean
- node --check on every modified script
iea-oil-stocks canonical declareRecords was NOT broken (user confirmed during
review — buildIndex preserves .members); only its ANALYSIS_META_EXTRA_KEY
was affected, now covered generically by commit 1's helper invariant.
* fix(seed-contract): seed-token-panels validateFn also runs on post-transform shape
Review finding: fixing declareRecords wasn't sufficient — atomicPublish() runs
validateFn(publishData) on the transformed payload too. seed-token-panels'
validate() checked data.defi/.ai/.other on the transformed {tokens} shape,
returned false, and runSeed took the early skipped-write branch (before even
reaching the declareRecords RETRY logic). Net effect: same as before the
declareRecords fix — canonical + both extras stayed stale.
Fix: validate() now checks the canonical defi panel directly (Array.isArray
(data?.tokens) && has at least one t.price > 0). AI/OTHER panels are validated
implicitly by their own extraKey declareRecords on write.
Audited the other 9 seeders with publishTransform (bls-series, bis-extended,
bis-data, gdelt-intel, trade-flows, iea-oil-stocks, jodi-gas, sanctions-pressure,
forecasts): all validateFn's correctly target the post-transform shape. Only
token-panels regressed.
Added 4 regression tests (tests/seed-contract-transform-regressions.test.mjs):
- validate accepts transformed panel with priced tokens
- validate rejects all-zero-price tokens
- validate rejects empty/missing tokens
- Explicit pre-fix repro (buggy old signature fails on transformed shape)
Verification:
- npm run test:data → 5322/5322 pass (was 5318; +4 new)
- npm run typecheck:all → clean
- node --check clean
* feat(seed-contract): add /api/seed-contract-probe validation endpoint
Single machine-readable gate for 'is PR #3097 working in production'.
Replaces the curl/jq ritual with one authenticated edge call that returns
HTTP 200 ok:true or 503 + failing check list.
What it validates:
- 8 canonical keys have {_seed, data} envelopes with required data fields
and minRecords floors (fsi-eu, zone-normals, 3 token panels + minRecords
guard against token-panels RETRY regression, product-catalog, wildfire,
earthquakes).
- 2 seed-meta:* keys remain BARE (shouldEnvelopeKey invariant; guards
against iea-oil-stocks ANALYSIS_META_EXTRA_KEY-class regressions).
- /api/product-catalog + /api/bootstrap responses contain no '_seed' leak.
Auth: x-probe-secret header must match RELAY_SHARED_SECRET (reuses existing
Vercel↔Railway internal trust boundary).
Probe logic is exported (checkProbe, checkPublicBoundary, DEFAULT_PROBES) for
hermetic testing. tests/seed-contract-probe.test.mjs covers every branch:
envelope pass/fail on field/records/shape, bare pass/fail on shape/field,
missing/malformed JSON, Redis non-2xx, boundary seed-leak detection,
DEFAULT_PROBES sanity (seed-meta invariant present, token-panels minRecords
guard present).
Usage:
curl -H "x-probe-secret: $RELAY_SHARED_SECRET" \
https://api.worldmonitor.app/api/seed-contract-probe
PR 3 will extend the probe with a stricter mode that asserts seed-meta:*
keys are GONE (not just bare) once legacy dual-write is removed.
Verification:
- tests/seed-contract-probe.test.mjs → 15/15 pass
- npm run test:data → 5338/5338 (was 5322; +16 new incl. conformance)
- npm run typecheck:all → clean
* fix(seed-contract): tighten probe — minRecords on AI/OTHER + cache-path source header
Review P2 findings: the probe's stated guards were weaker than advertised.
1. market:ai-tokens:v1 + market:other-tokens:v1 probes claimed to guard the
token-panels extra-key RETRY regression but only checked shape='envelope'
+ dataHas:['tokens']. If an extra-key declareRecords regressed to 0, both
probes would still pass because checkProbe() only inspects _seed.recordCount
when minRecords is set. Now both enforce minRecords: 1.
2. /api/product-catalog boundary check only asserted no '_seed' leak — which
is also true for the static fallback path. A broken cached reader
(getFromCache returning null or throwing) could serve fallback silently
and still pass this probe. Now:
- api/product-catalog.js emits X-Product-Catalog-Source: cache|dodo|fallback
on the response (the json() helper gained an optional source param wired
to each of the three branches).
- checkPublicBoundary declaratively requires that header's value match
'cache' for /api/product-catalog, so a fallback-serve fails the probe
with reason 'source:fallback!=cache' or 'source:missing!=cache'.
Test updates (tests/seed-contract-probe.test.mjs):
- Boundary check reworked to use a BOUNDARY_CHECKS config with optional
requireSourceHeader per endpoint.
- New cases: served-from-cache passes, served-from-fallback fails with source
mismatch, missing header fails, seed-leak still takes precedence, bad
status fails.
- Token-panels sanity test now asserts minRecords≥1 on all 3 panels.
Verification:
- tests/seed-contract-probe.test.mjs → 17/17 pass (was 15, +2 net)
- npm run test:data → 5340/5340
- npm run typecheck:all → clean
314 lines
13 KiB
JavaScript
314 lines
13 KiB
JavaScript
#!/usr/bin/env node
|
||
|
||
import { loadEnvFile, CHROME_UA, runSeed } from './_seed-utils.mjs';
|
||
import { extractCountryCode } from './shared/geo-extract.mjs';
|
||
|
||
loadEnvFile(import.meta.url);
|
||
|
||
// WHO DON uses multi-word or hyphenated country names that the bigram scanner misses.
|
||
// These override extractCountryCode for exact substring matches (checked first, case-insensitive).
|
||
const WHO_NAME_OVERRIDES = {
|
||
'democratic republic of the congo': 'CD',
|
||
'dr congo': 'CD',
|
||
'timor-leste': 'TL',
|
||
'east timor': 'TL',
|
||
'papua new guinea': 'PG',
|
||
'kingdom of saudi arabia': 'SA',
|
||
'united kingdom': 'GB',
|
||
};
|
||
|
||
function extractCountryCodeFull(text) {
|
||
const lower = text.toLowerCase();
|
||
for (const [name, iso2] of Object.entries(WHO_NAME_OVERRIDES)) {
|
||
if (lower.includes(name)) return iso2;
|
||
}
|
||
return extractCountryCode(text) ?? '';
|
||
}
|
||
|
||
const CANONICAL_KEY = 'health:disease-outbreaks:v1';
|
||
const CACHE_TTL = 259200; // 72h (3 days) — 3× daily cron interval per gold standard; survives 2 consecutive missed runs
|
||
|
||
// WHO Disease Outbreak News JSON API (RSS at /feeds/entity/csr/don/en/rss.xml is dead since 2024)
|
||
const WHO_DON_API = 'https://www.who.int/api/emergencies/diseaseoutbreaknews?sf_provider=dynamicProvider372&sf_culture=en&$orderby=PublicationDateAndTime%20desc&$select=Title,ItemDefaultUrl,PublicationDateAndTime&$top=30';
|
||
// CDC Health Alert Network RSS (US-centric; supplements WHO for North American events)
|
||
const CDC_FEED = 'https://tools.cdc.gov/api/v2/resources/media/132608.rss';
|
||
// Outbreak News Today — aggregates WHO, CDC, and regional health ministry alerts
|
||
const OUTBREAK_NEWS_FEED = 'https://outbreaknewstoday.com/feed/';
|
||
// ThinkGlobalHealth disease tracker — 1,600+ ProMED-sourced real-time alerts with lat/lng
|
||
const THINKGLOBALHEALTH_BUNDLE = 'https://raw.githubusercontent.com/thinkglobalhealth/disease_tracker/main/index_bundle.js';
|
||
// Keep alerts within this many days; avoids flooding the map with old events
|
||
const TGH_LOOKBACK_DAYS = 90;
|
||
|
||
const RSS_MAX_BYTES = 500_000; // guard against oversized responses before regex
|
||
|
||
|
||
function stableHash(str) {
|
||
let h = 0;
|
||
for (let i = 0; i < str.length; i++) h = (Math.imul(31, h) + str.charCodeAt(i)) | 0;
|
||
return Math.abs(h).toString(36);
|
||
}
|
||
|
||
/**
|
||
* Extract location string from WHO-style titles.
|
||
* Handles: "Disease – Country" (em-dash), "Disease - Country" (hyphen), "Disease in Country".
|
||
*/
|
||
function extractLocationFromTitle(title) {
|
||
// WHO DON pattern: "Disease – Country" or "Disease - Country" (one or more dash-separated segments)
|
||
// Split on em-dash, en-dash, or " - " / " – " to get all segments, then take the last capitalized one.
|
||
const segments = title.split(/\s*[–—]\s*|\s+-\s+/);
|
||
if (segments.length >= 2) {
|
||
const last = segments[segments.length - 1].trim();
|
||
if (/^[A-Z]/.test(last)) return last;
|
||
}
|
||
// Fallback: "... in <Country/Region>"
|
||
const inMatch = title.match(/\bin\s+([A-Z][^,.(]+)/);
|
||
if (inMatch) return inMatch[1].trim();
|
||
return '';
|
||
}
|
||
|
||
function detectAlertLevel(title, desc) {
|
||
const text = `${title} ${desc}`.toLowerCase();
|
||
if (text.includes('outbreak') || text.includes('emergency') || text.includes('epidemic') || text.includes('pandemic')) return 'alert';
|
||
if (text.includes('warning') || text.includes('spread') || text.includes('cases increasing')) return 'warning';
|
||
return 'watch';
|
||
}
|
||
|
||
function detectDisease(title) {
|
||
const lower = title.toLowerCase();
|
||
const known = ['mpox', 'monkeypox', 'ebola', 'cholera', 'covid', 'dengue', 'measles',
|
||
'polio', 'marburg', 'lassa', 'plague', 'yellow fever', 'typhoid', 'influenza',
|
||
'avian flu', 'h5n1', 'h5n2', 'anthrax', 'rabies', 'meningitis', 'hepatitis',
|
||
'nipah', 'rift valley', 'crimean-congo', 'leishmaniasis', 'malaria', 'diphtheria',
|
||
'chikungunya', 'botulism', 'brucellosis', 'salmonella', 'listeria', 'e. coli',
|
||
'norovirus', 'legionella', 'campylobacter'];
|
||
for (const d of known) {
|
||
if (lower.includes(d)) return d.charAt(0).toUpperCase() + d.slice(1);
|
||
}
|
||
return 'Unknown Disease';
|
||
}
|
||
|
||
/**
|
||
* Fetch WHO Disease Outbreak News via their JSON API (RSS feed is dead since 2024).
|
||
* Returns normalized items array.
|
||
*/
|
||
async function fetchWhoDonApi() {
|
||
try {
|
||
const resp = await fetch(WHO_DON_API, {
|
||
headers: { Accept: 'application/json', 'User-Agent': CHROME_UA },
|
||
signal: AbortSignal.timeout(15000),
|
||
});
|
||
if (!resp.ok) { console.warn(`[Disease] WHO DON API HTTP ${resp.status}`); return []; }
|
||
const data = await resp.json();
|
||
const items = data?.value;
|
||
if (!Array.isArray(items)) { console.warn('[Disease] WHO DON API: unexpected response shape'); return []; }
|
||
return items.map((item) => ({
|
||
title: (item.Title || '').trim(),
|
||
link: item.ItemDefaultUrl ? `https://www.who.int${item.ItemDefaultUrl}` : '',
|
||
desc: '',
|
||
publishedMs: item.PublicationDateAndTime ? new Date(item.PublicationDateAndTime).getTime() : Date.now(),
|
||
sourceName: 'WHO',
|
||
})).filter(i => i.title && !isNaN(i.publishedMs));
|
||
} catch (e) {
|
||
console.warn('[Disease] WHO DON API fetch error:', e?.message || e);
|
||
return [];
|
||
}
|
||
}
|
||
|
||
async function fetchRssItems(url, sourceName) {
|
||
try {
|
||
const resp = await fetch(url, {
|
||
headers: { Accept: 'application/rss+xml, application/xml, text/xml', 'User-Agent': CHROME_UA },
|
||
signal: AbortSignal.timeout(15000),
|
||
});
|
||
if (!resp.ok) { console.warn(`[Disease] ${sourceName} HTTP ${resp.status}`); return []; }
|
||
const xml = await resp.text();
|
||
const bounded = xml.length > RSS_MAX_BYTES ? xml.slice(0, RSS_MAX_BYTES) : xml;
|
||
const items = [];
|
||
const itemRe = /<item>([\s\S]*?)<\/item>/g;
|
||
let match;
|
||
while ((match = itemRe.exec(bounded)) !== null) {
|
||
const block = match[1];
|
||
const title = (block.match(/<title>(?:<!\[CDATA\[)?([\s\S]*?)(?:\]\]>)?<\/title>/) || [])[1]?.trim() || '';
|
||
const link = (block.match(/<link>(?:<!\[CDATA\[)?([\s\S]*?)(?:\]\]>)?<\/link>/) || [])[1]?.trim() || '';
|
||
const rawDesc = (block.match(/<description>(?:<!\[CDATA\[)?([\s\S]*?)(?:\]\]>)?<\/description>/) || [])[1] || '';
|
||
const desc = rawDesc
|
||
.replace(/</g, '<').replace(/>/g, '>').replace(/&/g, '&').replace(/"/g, '"').replace(/'/g, "'").replace(/ /g, ' ')
|
||
.replace(/<[^>]+>/g, '').trim().slice(0, 300);
|
||
const pubDate = (block.match(/<pubDate>(.*?)<\/pubDate>/) || [])[1]?.trim() || '';
|
||
const publishedMs = pubDate ? new Date(pubDate).getTime() : Date.now();
|
||
if (!title || isNaN(publishedMs)) continue;
|
||
items.push({ title, link, desc, publishedMs, sourceName });
|
||
}
|
||
return items;
|
||
} catch (e) {
|
||
console.warn(`[Disease] ${sourceName} fetch error:`, e?.message || e);
|
||
return [];
|
||
}
|
||
}
|
||
|
||
/**
|
||
* Fetch ThinkGlobalHealth disease tracker data.
|
||
* The site (https://thinkglobalhealth.github.io/disease_tracker/) embeds all ProMED-reviewed
|
||
* disease alerts directly in index_bundle.js as a JS object literal array:
|
||
* var a=[{Alert_ID:"...",lat:"...",lng:"...",diseases:"...",country:"...",date:"M/D/YYYY",...}]
|
||
* ~1,600 records with exact lat/lng coordinates. We filter to last TGH_LOOKBACK_DAYS days.
|
||
*/
|
||
async function fetchThinkGlobalHealth() {
|
||
try {
|
||
const resp = await fetch(THINKGLOBALHEALTH_BUNDLE, {
|
||
headers: { 'User-Agent': CHROME_UA, Accept: 'application/javascript, text/javascript' },
|
||
signal: AbortSignal.timeout(30000),
|
||
});
|
||
if (!resp.ok) { console.warn(`[Disease] ThinkGlobalHealth HTTP ${resp.status}`); return []; }
|
||
const bundle = await resp.text();
|
||
|
||
// Extract the data array: "var a=[{Alert_ID:"
|
||
const marker = 'var a=[{Alert_ID:';
|
||
const startIdx = bundle.indexOf(marker);
|
||
if (startIdx === -1) { console.warn('[Disease] ThinkGlobalHealth: data marker not found'); return []; }
|
||
|
||
// Find the end of the array by counting brackets from the [ position
|
||
const arrStart = startIdx + 'var a='.length;
|
||
let depth = 0, end = arrStart;
|
||
for (; end < bundle.length; end++) {
|
||
if (bundle[end] === '[' || bundle[end] === '{') depth++;
|
||
else if (bundle[end] === ']' || bundle[end] === '}') { depth--; if (depth === 0) { end++; break; } }
|
||
}
|
||
const arrayStr = bundle.slice(arrStart, end);
|
||
|
||
// Parse JS object literals (keys are unquoted identifiers, all values are strings).
|
||
// Pattern: {Key:"value",...} — flat objects only.
|
||
const records = [];
|
||
const objRe = /\{([^{}]+)\}/g;
|
||
let m;
|
||
while ((m = objRe.exec(arrayStr)) !== null) {
|
||
const obj = {};
|
||
const pairRe = /(\w+):"((?:[^"\\]|\\.)*)"/g;
|
||
let p;
|
||
while ((p = pairRe.exec(m[1])) !== null) obj[p[1]] = p[2];
|
||
if (obj.Alert_ID) records.push(obj);
|
||
}
|
||
|
||
const cutoff = Date.now() - TGH_LOOKBACK_DAYS * 86400_000;
|
||
const items = [];
|
||
for (const rec of records) {
|
||
if (!rec.lat || !rec.lng || !rec.diseases || !rec.date) continue;
|
||
const publishedMs = new Date(rec.date).getTime();
|
||
if (isNaN(publishedMs) || publishedMs < cutoff) continue;
|
||
// place_name from TGH is often "City, District, Country" — take only the first segment for display.
|
||
const cityName = (rec.place_name || '').split(',')[0].trim() || rec.country || '';
|
||
items.push({
|
||
title: `${rec.diseases}${rec.country ? ` - ${rec.country}` : ''}`,
|
||
link: rec.link || '',
|
||
desc: rec.summary ? rec.summary.slice(0, 300) : '',
|
||
publishedMs,
|
||
sourceName: 'ThinkGlobalHealth',
|
||
_country: rec.country || '',
|
||
_disease: rec.diseases || '',
|
||
_location: cityName,
|
||
_lat: Number.isFinite(parseFloat(rec.lat)) ? parseFloat(rec.lat) : null,
|
||
_lng: Number.isFinite(parseFloat(rec.lng)) ? parseFloat(rec.lng) : null,
|
||
_cases: parseInt(rec.cases_count || rec.cases || '0', 10) || 0,
|
||
});
|
||
}
|
||
console.log(`[Disease] ThinkGlobalHealth: ${records.length} total, ${items.length} in last ${TGH_LOOKBACK_DAYS}d`);
|
||
return items;
|
||
} catch (e) {
|
||
console.warn('[Disease] ThinkGlobalHealth fetch error:', e?.message || e);
|
||
return [];
|
||
}
|
||
}
|
||
|
||
function mapItem(item) {
|
||
const location = item._location || extractLocationFromTitle(item.title)
|
||
|| (item.sourceName === 'CDC' ? 'United States' : '');
|
||
const disease = item._disease || detectDisease(item.title);
|
||
const countryCode = item._country
|
||
? (extractCountryCodeFull(item._country) || extractCountryCodeFull(location || item.title))
|
||
: extractCountryCodeFull(location || `${item.title} ${item.desc}`);
|
||
return {
|
||
id: `${item.sourceName.toLowerCase()}-${stableHash(item.link || item.title)}-${item.publishedMs}`,
|
||
disease,
|
||
location,
|
||
countryCode,
|
||
alertLevel: detectAlertLevel(item.title, item.desc),
|
||
summary: item.desc,
|
||
sourceUrl: item.link,
|
||
publishedAt: item.publishedMs,
|
||
sourceName: item.sourceName,
|
||
lat: item._lat ?? 0,
|
||
lng: item._lng ?? 0,
|
||
cases: item._cases || 0,
|
||
};
|
||
}
|
||
|
||
async function fetchDiseaseOutbreaks() {
|
||
const [whoItems, cdcItems, outbreakNewsItems, tghItems] = await Promise.all([
|
||
fetchWhoDonApi(),
|
||
fetchRssItems(CDC_FEED, 'CDC'),
|
||
fetchRssItems(OUTBREAK_NEWS_FEED, 'Outbreak News Today'),
|
||
fetchThinkGlobalHealth(),
|
||
]);
|
||
console.log(`[Disease] Sources: WHO=${whoItems.length} CDC=${cdcItems.length} ONT=${outbreakNewsItems.length} TGH=${tghItems.length}`);
|
||
|
||
// TGH items are already disease-curated with exact lat/lng — skip keyword filter,
|
||
// preserve all geo-located alerts, and don't collapse by disease+country.
|
||
const tghOutbreaks = tghItems.map(mapItem);
|
||
|
||
const diseaseKeywords = ['outbreak', 'disease', 'virus', 'fever', 'flu', 'ebola', 'mpox',
|
||
'cholera', 'dengue', 'measles', 'polio', 'plague', 'avian', 'h5n1', 'epidemic',
|
||
'infection', 'pathogen', 'rabies', 'meningitis', 'hepatitis', 'nipah', 'marburg',
|
||
'diphtheria', 'chikungunya', 'rift valley', 'influenza', 'botulism',
|
||
'salmonella', 'listeria', 'e. coli', 'norovirus', 'legionella', 'campylobacter'];
|
||
|
||
const otherOutbreaks = [...whoItems, ...cdcItems, ...outbreakNewsItems]
|
||
.filter(item => {
|
||
const text = `${item.title} ${item.desc}`.toLowerCase();
|
||
return diseaseKeywords.some(k => text.includes(k));
|
||
})
|
||
.map(mapItem);
|
||
|
||
// Sort before dedup so the first occurrence is always the most recent.
|
||
otherOutbreaks.sort((a, b) => b.publishedAt - a.publishedAt);
|
||
|
||
// Deduplicate non-TGH items by disease+country (keep most recent per pair).
|
||
// TGH items each represent a distinct geo-located event — never collapse them.
|
||
const seen = new Set();
|
||
const dedupedOthers = otherOutbreaks.filter(o => {
|
||
const key = o.disease === 'Unknown Disease' ? o.id : `${o.disease}:${o.countryCode || o.location}`;
|
||
if (seen.has(key)) return false;
|
||
seen.add(key);
|
||
return true;
|
||
});
|
||
|
||
// TGH first (precise geo), then WHO/CDC/ONT (already sorted above before dedup).
|
||
const tghSorted = tghOutbreaks.sort((a, b) => b.publishedAt - a.publishedAt);
|
||
|
||
// Up to 150 TGH geo-pinned alerts + up to 50 from other authoritative sources.
|
||
const outbreaks = [...tghSorted.slice(0, 150), ...dedupedOthers.slice(0, 50)];
|
||
|
||
return { outbreaks, fetchedAt: Date.now() };
|
||
}
|
||
|
||
function validate(data) {
|
||
return Array.isArray(data?.outbreaks) && data.outbreaks.length >= 1;
|
||
}
|
||
|
||
export function declareRecords(data) {
|
||
return Array.isArray(data?.outbreaks) ? data.outbreaks.length : 0;
|
||
}
|
||
|
||
runSeed('health', 'disease-outbreaks', CANONICAL_KEY, fetchDiseaseOutbreaks, {
|
||
validateFn: validate,
|
||
ttlSeconds: CACHE_TTL,
|
||
sourceVersion: 'who-api-cdc-ont-v6',
|
||
|
||
declareRecords,
|
||
schemaVersion: 1,
|
||
maxStaleMin: 2880,
|
||
}).catch((err) => {
|
||
const _cause = err.cause ? ` (cause: ${err.cause.message || err.cause.code || err.cause})` : '';
|
||
console.error('FATAL:', (err.message || err) + _cause);
|
||
process.exit(1);
|
||
});
|