Files
worldmonitor/scripts/seed-disease-outbreaks.mjs
Elie Habib 044598346e feat(seed-contract): PR 2a — runSeed envelope dual-write + 91 seeders migrated (#3097)
* feat(seed-contract): PR 2a — runSeed envelope dual-write + 91 seeders migrated

Opt-in contract path in runSeed: when opts.declareRecords is provided, write
{_seed, data} envelope to the canonical key alongside legacy seed-meta:*
(dual-write). State machine: OK / OK_ZERO / RETRY with zeroIsValid opt.
declareRecords throws or returns non-integer → hard fail (contract violation).
extraKeys[*] support per-key declareRecords; each extra key writes its own
envelope. Legacy seeders (no declareRecords) entirely unchanged.

Migrated all 91 scripts/seed-*.mjs to contract mode. Each exports
declareRecords returning the canonical record count, and passes
schemaVersion: 1 + maxStaleMin (matched to api/health.js SEED_META, or 2.5x
interval where no registry entry exists). Contract conformance reports 84/86
seeders with full descriptor (2 pre-existing warnings).

Legacy seed-meta keys still written so unmigrated readers keep working;
follow-up slices flip health.js + readers to envelope-first.

Tests: 61/61 PR 1 tests still pass.

Next slices for PR 2:
- api/health.js registry collapse + 15 seed-bundle-*.mjs canonicalKey wiring
- reader migration (mcp, resilience, aviation, displacement, regional-snapshot)
- direct writers — ais-relay.cjs, consumer-prices-core publish.ts
- public-boundary stripSeedEnvelope + test migration

Plan: docs/plans/2026-04-14-002-fix-runseed-zero-record-lockout-plan.md

* fix(seed-contract): unwrap envelopes in internal cross-seed readers

After PR 2a enveloped 91 canonical keys as {_seed, data}, every script-side
reader that returned the raw parsed JSON started silently handing callers the
envelope instead of the bare payload. WoW baselines (bigmac, grocery-basket,
fear-greed) saw undefined .countries / .composite; seed-climate-anomalies saw
undefined .normals from climate:zone-normals:v1; seed-thermal-escalation saw
undefined .fireDetections from wildfire:fires:v1; seed-forecasts' ~40-key
pipeline batch returned envelopes for every input.

Fix: route every script-side reader through unwrapEnvelope(...).data. Legacy
bare-shape values pass through unchanged (unwrapEnvelope returns
{_seed: null, data: raw} for any non-envelope shape).

Changed:
- scripts/_seed-utils.mjs: import unwrapEnvelope; redisGet, readSeedSnapshot,
  verifySeedKey all unwrap. Exported new readCanonicalValue() helper for
  cross-seed consumers.
- 18 seed-*.mjs scripts with local redisGet-style helpers or inline fetch
  patched to unwrap via the envelope source module (subagent sweep).
- scripts/seed-forecasts.mjs pipeline batch: parse() unwraps each result.
- scripts/seed-energy-spine.mjs redisMget: unwraps each result.

Tests:
- tests/seed-utils-envelope-reads.test.mjs: 7 new cases covering envelope
  + legacy + null paths for readSeedSnapshot and verifySeedKey.
- Full seed suite: 67/67 pass (was 61, +6 new).

Addresses both of user's P1 findings on PR #3097.

* feat(seed-contract): envelope-aware reads in server + api helpers

Every RPC and public-boundary reader now automatically strips _seed from
contract-mode canonical keys. Legacy bare-shape values pass through unchanged
(unwrapEnvelope no-ops on non-envelope shapes).

Changed helpers (one-place fix — unblocks ~60 call sites):
- server/_shared/redis.ts: getRawJson, getCachedJson, getCachedJsonBatch
  unwrap by default. cachedFetchJson inherits via getCachedJson.
- api/_upstash-json.js: readJsonFromUpstash unwraps (covers api/mcp.ts
  tool responses + all its canonical-key reads).
- api/bootstrap.js: getCachedJsonBatch unwraps (public-boundary —
  clients never see envelope metadata).

Left intentionally unchanged:
- api/health.js / api/seed-health.js: read only seed-meta:* keys which
  remain bare-shape during dual-write. unwrapEnvelope already imported at
  the meta-read boundary (PR 1) as a defensive no-op.

Tests: 67/67 seed tests pass. typecheck + typecheck:api clean.

This is the blast-radius fix the PR #3097 review called out — external
readers that would otherwise see {_seed, data} after the writer side
migrated.

* fix(test): strip export keyword in vm.runInContext'd seed source

cross-source-signals-regulatory.test.mjs loads scripts/seed-cross-source-signals.mjs
via vm.runInContext, which cannot parse ESM `export` syntax. PR 2a added
`export function declareRecords` to every seeder, which broke this test's
static-analysis approach.

Fix: strip the `export` keyword from the declareRecords line in the
preprocessed source string so the function body still evaluates as a plain
declaration.

Full test:data suite: 5307/5307 pass. typecheck + typecheck:api clean.

* feat(seed-contract): consumer-prices publish.ts writes envelopes

Wrap the 5 canonical keys written by consumer-prices-core/src/jobs/publish.ts
(overview, movers:7d/30d, freshness, categories:7d/30d/90d, retailer-spread,
basket-series) in {_seed, data} envelopes. Legacy seed-meta:<key> writes
preserved for dual-write.

Inlined a buildEnvelope helper (10 lines) rather than taking a cross-package
dependency — consumer-prices-core is a standalone npm package. Documented the
four-file parity contract (mjs source, ts mirror, js edge mirror, this copy).

Contract fields: sourceVersion='consumer-prices-core-publish-v1', schemaVersion=1,
state='OK' (recordCount>0) or 'OK_ZERO' (legitimate zero).

Typecheck: no new errors in publish.ts.

* fix(seed-contract): 3 more server-side readers unwrap envelopes

Found during final audit:

- server/worldmonitor/resilience/v1/_shared.ts: resilience score reader
  parsed cached GetResilienceScoreResponse raw. Contract-mode seed-resilience-scores
  now envelopes those keys.
- server/worldmonitor/resilience/v1/get-resilience-ranking.ts: p05/p95
  interval lookup parsed raw from seed-resilience-scores' extra-key path.
- server/worldmonitor/infrastructure/v1/_shared.ts: mgetJson() used for
  count-source keys (wildfire:fires:v1, news:insights:v1) which are both
  contract-mode now.

All three now unwrap via server/_shared/seed-envelope. Legacy shapes pass
through unchanged.

Typecheck clean.

* feat(seed-contract): ais-relay.cjs direct writes produce envelopes

32 canonical-key write sites in scripts/ais-relay.cjs now produce {_seed, data}
envelopes. Inlined buildEnvelope() (CJS module can't require ESM source) +
envelopeWrite(key, data, ttlSeconds, meta) wrapper. Enveloped keys span market
bootstrap, aviation, cyber-threats, theater-posture, weather-alerts, economic
spending/fred/worldbank, tech-events, corridor-risk, usni-fleet, shipping-stress,
social:reddit, wsb-tickers, pizzint, product-catalog, chokepoint transits,
ucdp-events, satellites, oref.

Left bare (not seeded data keys): seed-meta:* (dual-write legacy),
classifyCacheKey LLM cache, notam:prev-closed-state internal state,
wm:notif:scan-dedup flags.

Updated tests/ucdp-seed-resilience.test.mjs regex to accept both upstashSet
(pre-contract) and envelopeWrite (post-contract) call patterns.

* feat(seed-contract): 15 bundle files add canonicalKey for envelope gate

54 bundle sections across 12 files now declare canonicalKey alongside the
existing seedMetaKey. _bundle-runner.mjs (from PR 1) prefers canonicalKey
when both are present — gates section runs on envelope._seed.fetchedAt
read directly from the data key, eliminating the meta-outlives-data class
of bugs.

Files touched:
- climate (5), derived-signals (2), ecb-eu (3), energy-sources (6),
  health (2), imf-extended (4), macro (10), market-backup (9),
  portwatch (4), relay-backup (2), resilience-recovery (5), static-ref (2)

Skipped (14 sections, 3 whole bundles): multi-key writers, dynamic
templated keys (displacement year-scoped), or non-runSeed orchestrators
(regional brief cron, resilience-scores' 222-country publish, validation/
benchmark scripts). These continue to use seedMetaKey or their own gate.

seedMetaKey preserved everywhere — dual-write. _bundle-runner.mjs falls
back to legacy when canonicalKey is absent.

All 15 bundles pass node --check. test:data: 5307/5307. typecheck:all: clean.

* fix(seed-contract): 4 PR #3097 review P1s — transform/declareRecords mismatches + envelope leaks

Addresses both P1 findings and the extra-key seed-meta leak surfaced in review:

1. runSeed helper-level invariant: seed-meta:* keys NEVER envelope.
   scripts/_seed-utils.mjs exports shouldEnvelopeKey(key) — returns false for
   any key starting with 'seed-meta:'. Both atomicPublish (canonical) and
   writeExtraKey (extras) gate the envelope wrap through this helper. Fixes
   seed-iea-oil-stocks' ANALYSIS_META_EXTRA_KEY silently getting enveloped,
   which broke health.js parsing the value as bare {fetchedAt, recordCount}.
   Also defends against any future manual writeExtraKey(..., envelopeMeta)
   call that happens to target a seed-meta:* key.

2. seed-token-panels canonical + extras fixed.
   publishTransform returns data.defi (the defi panel itself, shape {tokens}).
   Old declareRecords counted data.defi.tokens + data.ai.tokens + data.other.tokens
   on the transformed payload → 0 → RETRY path → canonical market:defi-tokens:v1
   never wrote, and because runSeed returned before the extraKeys loop,
   market:ai-tokens:v1 + market:other-tokens:v1 stayed stale too.
   New: declareRecords counts data.tokens on the transformed shape. AI_KEY +
   OTHER_KEY extras reuse the same function (transforms return structurally
   identical panels). Added isMain guard so test imports don't fire runSeed.

3. api/product-catalog.js cached reader unwraps envelope.
   ais-relay.cjs now envelopes product-catalog:v2 via envelopeWrite(). The
   edge reader did raw JSON.parse(result) and returned {_seed, data} to
   clients, breaking the cached path. Fix: import unwrapEnvelope from
   ./_seed-envelope.js, apply after JSON.parse. One site — :238-241 is
   downstream of getFromCache(), so the single reader fix covers both.

4. Regression lock tests/seed-contract-transform-regressions.test.mjs (11 cases):
   - shouldEnvelopeKey invariant: seed-meta:* false, canonical true
   - Token-panels declareRecords works on transformed shape (canonical + both extras)
   - Explicit repro of pre-fix buggy signature returning 0 — guards against revert
   - resolveRecordCount accepts 0, rejects non-integer
   - Product-catalog envelope unwrap returns bare shape; legacy passes through

Verification:
- npm run test:data → 5318/5318 pass (was 5307 — 11 new regressions)
- npm run typecheck:all → clean
- node --check on every modified script

iea-oil-stocks canonical declareRecords was NOT broken (user confirmed during
review — buildIndex preserves .members); only its ANALYSIS_META_EXTRA_KEY
was affected, now covered generically by commit 1's helper invariant.

* fix(seed-contract): seed-token-panels validateFn also runs on post-transform shape

Review finding: fixing declareRecords wasn't sufficient — atomicPublish() runs
validateFn(publishData) on the transformed payload too. seed-token-panels'
validate() checked data.defi/.ai/.other on the transformed {tokens} shape,
returned false, and runSeed took the early skipped-write branch (before even
reaching the declareRecords RETRY logic). Net effect: same as before the
declareRecords fix — canonical + both extras stayed stale.

Fix: validate() now checks the canonical defi panel directly (Array.isArray
(data?.tokens) && has at least one t.price > 0). AI/OTHER panels are validated
implicitly by their own extraKey declareRecords on write.

Audited the other 9 seeders with publishTransform (bls-series, bis-extended,
bis-data, gdelt-intel, trade-flows, iea-oil-stocks, jodi-gas, sanctions-pressure,
forecasts): all validateFn's correctly target the post-transform shape. Only
token-panels regressed.

Added 4 regression tests (tests/seed-contract-transform-regressions.test.mjs):
- validate accepts transformed panel with priced tokens
- validate rejects all-zero-price tokens
- validate rejects empty/missing tokens
- Explicit pre-fix repro (buggy old signature fails on transformed shape)

Verification:
- npm run test:data → 5322/5322 pass (was 5318; +4 new)
- npm run typecheck:all → clean
- node --check clean

* feat(seed-contract): add /api/seed-contract-probe validation endpoint

Single machine-readable gate for 'is PR #3097 working in production'.
Replaces the curl/jq ritual with one authenticated edge call that returns
HTTP 200 ok:true or 503 + failing check list.

What it validates:
- 8 canonical keys have {_seed, data} envelopes with required data fields
  and minRecords floors (fsi-eu, zone-normals, 3 token panels + minRecords
  guard against token-panels RETRY regression, product-catalog, wildfire,
  earthquakes).
- 2 seed-meta:* keys remain BARE (shouldEnvelopeKey invariant; guards
  against iea-oil-stocks ANALYSIS_META_EXTRA_KEY-class regressions).
- /api/product-catalog + /api/bootstrap responses contain no '_seed' leak.

Auth: x-probe-secret header must match RELAY_SHARED_SECRET (reuses existing
Vercel↔Railway internal trust boundary).

Probe logic is exported (checkProbe, checkPublicBoundary, DEFAULT_PROBES) for
hermetic testing. tests/seed-contract-probe.test.mjs covers every branch:
envelope pass/fail on field/records/shape, bare pass/fail on shape/field,
missing/malformed JSON, Redis non-2xx, boundary seed-leak detection,
DEFAULT_PROBES sanity (seed-meta invariant present, token-panels minRecords
guard present).

Usage:
  curl -H "x-probe-secret: $RELAY_SHARED_SECRET" \
       https://api.worldmonitor.app/api/seed-contract-probe

PR 3 will extend the probe with a stricter mode that asserts seed-meta:*
keys are GONE (not just bare) once legacy dual-write is removed.

Verification:
- tests/seed-contract-probe.test.mjs → 15/15 pass
- npm run test:data → 5338/5338 (was 5322; +16 new incl. conformance)
- npm run typecheck:all → clean

* fix(seed-contract): tighten probe — minRecords on AI/OTHER + cache-path source header

Review P2 findings: the probe's stated guards were weaker than advertised.

1. market:ai-tokens:v1 + market:other-tokens:v1 probes claimed to guard the
   token-panels extra-key RETRY regression but only checked shape='envelope'
   + dataHas:['tokens']. If an extra-key declareRecords regressed to 0, both
   probes would still pass because checkProbe() only inspects _seed.recordCount
   when minRecords is set. Now both enforce minRecords: 1.

2. /api/product-catalog boundary check only asserted no '_seed' leak — which
   is also true for the static fallback path. A broken cached reader
   (getFromCache returning null or throwing) could serve fallback silently
   and still pass this probe. Now:
   - api/product-catalog.js emits X-Product-Catalog-Source: cache|dodo|fallback
     on the response (the json() helper gained an optional source param wired
     to each of the three branches).
   - checkPublicBoundary declaratively requires that header's value match
     'cache' for /api/product-catalog, so a fallback-serve fails the probe
     with reason 'source:fallback!=cache' or 'source:missing!=cache'.

Test updates (tests/seed-contract-probe.test.mjs):
- Boundary check reworked to use a BOUNDARY_CHECKS config with optional
  requireSourceHeader per endpoint.
- New cases: served-from-cache passes, served-from-fallback fails with source
  mismatch, missing header fails, seed-leak still takes precedence, bad
  status fails.
- Token-panels sanity test now asserts minRecords≥1 on all 3 panels.

Verification:
- tests/seed-contract-probe.test.mjs → 17/17 pass (was 15, +2 net)
- npm run test:data → 5340/5340
- npm run typecheck:all → clean
2026-04-15 09:16:27 +04:00

314 lines
13 KiB
JavaScript
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
#!/usr/bin/env node
import { loadEnvFile, CHROME_UA, runSeed } from './_seed-utils.mjs';
import { extractCountryCode } from './shared/geo-extract.mjs';
loadEnvFile(import.meta.url);
// WHO DON uses multi-word or hyphenated country names that the bigram scanner misses.
// These override extractCountryCode for exact substring matches (checked first, case-insensitive).
const WHO_NAME_OVERRIDES = {
'democratic republic of the congo': 'CD',
'dr congo': 'CD',
'timor-leste': 'TL',
'east timor': 'TL',
'papua new guinea': 'PG',
'kingdom of saudi arabia': 'SA',
'united kingdom': 'GB',
};
function extractCountryCodeFull(text) {
const lower = text.toLowerCase();
for (const [name, iso2] of Object.entries(WHO_NAME_OVERRIDES)) {
if (lower.includes(name)) return iso2;
}
return extractCountryCode(text) ?? '';
}
const CANONICAL_KEY = 'health:disease-outbreaks:v1';
const CACHE_TTL = 259200; // 72h (3 days) — 3× daily cron interval per gold standard; survives 2 consecutive missed runs
// WHO Disease Outbreak News JSON API (RSS at /feeds/entity/csr/don/en/rss.xml is dead since 2024)
const WHO_DON_API = 'https://www.who.int/api/emergencies/diseaseoutbreaknews?sf_provider=dynamicProvider372&sf_culture=en&$orderby=PublicationDateAndTime%20desc&$select=Title,ItemDefaultUrl,PublicationDateAndTime&$top=30';
// CDC Health Alert Network RSS (US-centric; supplements WHO for North American events)
const CDC_FEED = 'https://tools.cdc.gov/api/v2/resources/media/132608.rss';
// Outbreak News Today — aggregates WHO, CDC, and regional health ministry alerts
const OUTBREAK_NEWS_FEED = 'https://outbreaknewstoday.com/feed/';
// ThinkGlobalHealth disease tracker — 1,600+ ProMED-sourced real-time alerts with lat/lng
const THINKGLOBALHEALTH_BUNDLE = 'https://raw.githubusercontent.com/thinkglobalhealth/disease_tracker/main/index_bundle.js';
// Keep alerts within this many days; avoids flooding the map with old events
const TGH_LOOKBACK_DAYS = 90;
const RSS_MAX_BYTES = 500_000; // guard against oversized responses before regex
function stableHash(str) {
let h = 0;
for (let i = 0; i < str.length; i++) h = (Math.imul(31, h) + str.charCodeAt(i)) | 0;
return Math.abs(h).toString(36);
}
/**
* Extract location string from WHO-style titles.
* Handles: "Disease Country" (em-dash), "Disease - Country" (hyphen), "Disease in Country".
*/
function extractLocationFromTitle(title) {
// WHO DON pattern: "Disease Country" or "Disease - Country" (one or more dash-separated segments)
// Split on em-dash, en-dash, or " - " / " " to get all segments, then take the last capitalized one.
const segments = title.split(/\s*[–—]\s*|\s+-\s+/);
if (segments.length >= 2) {
const last = segments[segments.length - 1].trim();
if (/^[A-Z]/.test(last)) return last;
}
// Fallback: "... in <Country/Region>"
const inMatch = title.match(/\bin\s+([A-Z][^,.(]+)/);
if (inMatch) return inMatch[1].trim();
return '';
}
function detectAlertLevel(title, desc) {
const text = `${title} ${desc}`.toLowerCase();
if (text.includes('outbreak') || text.includes('emergency') || text.includes('epidemic') || text.includes('pandemic')) return 'alert';
if (text.includes('warning') || text.includes('spread') || text.includes('cases increasing')) return 'warning';
return 'watch';
}
function detectDisease(title) {
const lower = title.toLowerCase();
const known = ['mpox', 'monkeypox', 'ebola', 'cholera', 'covid', 'dengue', 'measles',
'polio', 'marburg', 'lassa', 'plague', 'yellow fever', 'typhoid', 'influenza',
'avian flu', 'h5n1', 'h5n2', 'anthrax', 'rabies', 'meningitis', 'hepatitis',
'nipah', 'rift valley', 'crimean-congo', 'leishmaniasis', 'malaria', 'diphtheria',
'chikungunya', 'botulism', 'brucellosis', 'salmonella', 'listeria', 'e. coli',
'norovirus', 'legionella', 'campylobacter'];
for (const d of known) {
if (lower.includes(d)) return d.charAt(0).toUpperCase() + d.slice(1);
}
return 'Unknown Disease';
}
/**
* Fetch WHO Disease Outbreak News via their JSON API (RSS feed is dead since 2024).
* Returns normalized items array.
*/
async function fetchWhoDonApi() {
try {
const resp = await fetch(WHO_DON_API, {
headers: { Accept: 'application/json', 'User-Agent': CHROME_UA },
signal: AbortSignal.timeout(15000),
});
if (!resp.ok) { console.warn(`[Disease] WHO DON API HTTP ${resp.status}`); return []; }
const data = await resp.json();
const items = data?.value;
if (!Array.isArray(items)) { console.warn('[Disease] WHO DON API: unexpected response shape'); return []; }
return items.map((item) => ({
title: (item.Title || '').trim(),
link: item.ItemDefaultUrl ? `https://www.who.int${item.ItemDefaultUrl}` : '',
desc: '',
publishedMs: item.PublicationDateAndTime ? new Date(item.PublicationDateAndTime).getTime() : Date.now(),
sourceName: 'WHO',
})).filter(i => i.title && !isNaN(i.publishedMs));
} catch (e) {
console.warn('[Disease] WHO DON API fetch error:', e?.message || e);
return [];
}
}
async function fetchRssItems(url, sourceName) {
try {
const resp = await fetch(url, {
headers: { Accept: 'application/rss+xml, application/xml, text/xml', 'User-Agent': CHROME_UA },
signal: AbortSignal.timeout(15000),
});
if (!resp.ok) { console.warn(`[Disease] ${sourceName} HTTP ${resp.status}`); return []; }
const xml = await resp.text();
const bounded = xml.length > RSS_MAX_BYTES ? xml.slice(0, RSS_MAX_BYTES) : xml;
const items = [];
const itemRe = /<item>([\s\S]*?)<\/item>/g;
let match;
while ((match = itemRe.exec(bounded)) !== null) {
const block = match[1];
const title = (block.match(/<title>(?:<!\[CDATA\[)?([\s\S]*?)(?:\]\]>)?<\/title>/) || [])[1]?.trim() || '';
const link = (block.match(/<link>(?:<!\[CDATA\[)?([\s\S]*?)(?:\]\]>)?<\/link>/) || [])[1]?.trim() || '';
const rawDesc = (block.match(/<description>(?:<!\[CDATA\[)?([\s\S]*?)(?:\]\]>)?<\/description>/) || [])[1] || '';
const desc = rawDesc
.replace(/&lt;/g, '<').replace(/&gt;/g, '>').replace(/&amp;/g, '&').replace(/&quot;/g, '"').replace(/&#39;/g, "'").replace(/&nbsp;/g, ' ')
.replace(/<[^>]+>/g, '').trim().slice(0, 300);
const pubDate = (block.match(/<pubDate>(.*?)<\/pubDate>/) || [])[1]?.trim() || '';
const publishedMs = pubDate ? new Date(pubDate).getTime() : Date.now();
if (!title || isNaN(publishedMs)) continue;
items.push({ title, link, desc, publishedMs, sourceName });
}
return items;
} catch (e) {
console.warn(`[Disease] ${sourceName} fetch error:`, e?.message || e);
return [];
}
}
/**
* Fetch ThinkGlobalHealth disease tracker data.
* The site (https://thinkglobalhealth.github.io/disease_tracker/) embeds all ProMED-reviewed
* disease alerts directly in index_bundle.js as a JS object literal array:
* var a=[{Alert_ID:"...",lat:"...",lng:"...",diseases:"...",country:"...",date:"M/D/YYYY",...}]
* ~1,600 records with exact lat/lng coordinates. We filter to last TGH_LOOKBACK_DAYS days.
*/
async function fetchThinkGlobalHealth() {
try {
const resp = await fetch(THINKGLOBALHEALTH_BUNDLE, {
headers: { 'User-Agent': CHROME_UA, Accept: 'application/javascript, text/javascript' },
signal: AbortSignal.timeout(30000),
});
if (!resp.ok) { console.warn(`[Disease] ThinkGlobalHealth HTTP ${resp.status}`); return []; }
const bundle = await resp.text();
// Extract the data array: "var a=[{Alert_ID:"
const marker = 'var a=[{Alert_ID:';
const startIdx = bundle.indexOf(marker);
if (startIdx === -1) { console.warn('[Disease] ThinkGlobalHealth: data marker not found'); return []; }
// Find the end of the array by counting brackets from the [ position
const arrStart = startIdx + 'var a='.length;
let depth = 0, end = arrStart;
for (; end < bundle.length; end++) {
if (bundle[end] === '[' || bundle[end] === '{') depth++;
else if (bundle[end] === ']' || bundle[end] === '}') { depth--; if (depth === 0) { end++; break; } }
}
const arrayStr = bundle.slice(arrStart, end);
// Parse JS object literals (keys are unquoted identifiers, all values are strings).
// Pattern: {Key:"value",...} — flat objects only.
const records = [];
const objRe = /\{([^{}]+)\}/g;
let m;
while ((m = objRe.exec(arrayStr)) !== null) {
const obj = {};
const pairRe = /(\w+):"((?:[^"\\]|\\.)*)"/g;
let p;
while ((p = pairRe.exec(m[1])) !== null) obj[p[1]] = p[2];
if (obj.Alert_ID) records.push(obj);
}
const cutoff = Date.now() - TGH_LOOKBACK_DAYS * 86400_000;
const items = [];
for (const rec of records) {
if (!rec.lat || !rec.lng || !rec.diseases || !rec.date) continue;
const publishedMs = new Date(rec.date).getTime();
if (isNaN(publishedMs) || publishedMs < cutoff) continue;
// place_name from TGH is often "City, District, Country" — take only the first segment for display.
const cityName = (rec.place_name || '').split(',')[0].trim() || rec.country || '';
items.push({
title: `${rec.diseases}${rec.country ? ` - ${rec.country}` : ''}`,
link: rec.link || '',
desc: rec.summary ? rec.summary.slice(0, 300) : '',
publishedMs,
sourceName: 'ThinkGlobalHealth',
_country: rec.country || '',
_disease: rec.diseases || '',
_location: cityName,
_lat: Number.isFinite(parseFloat(rec.lat)) ? parseFloat(rec.lat) : null,
_lng: Number.isFinite(parseFloat(rec.lng)) ? parseFloat(rec.lng) : null,
_cases: parseInt(rec.cases_count || rec.cases || '0', 10) || 0,
});
}
console.log(`[Disease] ThinkGlobalHealth: ${records.length} total, ${items.length} in last ${TGH_LOOKBACK_DAYS}d`);
return items;
} catch (e) {
console.warn('[Disease] ThinkGlobalHealth fetch error:', e?.message || e);
return [];
}
}
function mapItem(item) {
const location = item._location || extractLocationFromTitle(item.title)
|| (item.sourceName === 'CDC' ? 'United States' : '');
const disease = item._disease || detectDisease(item.title);
const countryCode = item._country
? (extractCountryCodeFull(item._country) || extractCountryCodeFull(location || item.title))
: extractCountryCodeFull(location || `${item.title} ${item.desc}`);
return {
id: `${item.sourceName.toLowerCase()}-${stableHash(item.link || item.title)}-${item.publishedMs}`,
disease,
location,
countryCode,
alertLevel: detectAlertLevel(item.title, item.desc),
summary: item.desc,
sourceUrl: item.link,
publishedAt: item.publishedMs,
sourceName: item.sourceName,
lat: item._lat ?? 0,
lng: item._lng ?? 0,
cases: item._cases || 0,
};
}
async function fetchDiseaseOutbreaks() {
const [whoItems, cdcItems, outbreakNewsItems, tghItems] = await Promise.all([
fetchWhoDonApi(),
fetchRssItems(CDC_FEED, 'CDC'),
fetchRssItems(OUTBREAK_NEWS_FEED, 'Outbreak News Today'),
fetchThinkGlobalHealth(),
]);
console.log(`[Disease] Sources: WHO=${whoItems.length} CDC=${cdcItems.length} ONT=${outbreakNewsItems.length} TGH=${tghItems.length}`);
// TGH items are already disease-curated with exact lat/lng — skip keyword filter,
// preserve all geo-located alerts, and don't collapse by disease+country.
const tghOutbreaks = tghItems.map(mapItem);
const diseaseKeywords = ['outbreak', 'disease', 'virus', 'fever', 'flu', 'ebola', 'mpox',
'cholera', 'dengue', 'measles', 'polio', 'plague', 'avian', 'h5n1', 'epidemic',
'infection', 'pathogen', 'rabies', 'meningitis', 'hepatitis', 'nipah', 'marburg',
'diphtheria', 'chikungunya', 'rift valley', 'influenza', 'botulism',
'salmonella', 'listeria', 'e. coli', 'norovirus', 'legionella', 'campylobacter'];
const otherOutbreaks = [...whoItems, ...cdcItems, ...outbreakNewsItems]
.filter(item => {
const text = `${item.title} ${item.desc}`.toLowerCase();
return diseaseKeywords.some(k => text.includes(k));
})
.map(mapItem);
// Sort before dedup so the first occurrence is always the most recent.
otherOutbreaks.sort((a, b) => b.publishedAt - a.publishedAt);
// Deduplicate non-TGH items by disease+country (keep most recent per pair).
// TGH items each represent a distinct geo-located event — never collapse them.
const seen = new Set();
const dedupedOthers = otherOutbreaks.filter(o => {
const key = o.disease === 'Unknown Disease' ? o.id : `${o.disease}:${o.countryCode || o.location}`;
if (seen.has(key)) return false;
seen.add(key);
return true;
});
// TGH first (precise geo), then WHO/CDC/ONT (already sorted above before dedup).
const tghSorted = tghOutbreaks.sort((a, b) => b.publishedAt - a.publishedAt);
// Up to 150 TGH geo-pinned alerts + up to 50 from other authoritative sources.
const outbreaks = [...tghSorted.slice(0, 150), ...dedupedOthers.slice(0, 50)];
return { outbreaks, fetchedAt: Date.now() };
}
function validate(data) {
return Array.isArray(data?.outbreaks) && data.outbreaks.length >= 1;
}
export function declareRecords(data) {
return Array.isArray(data?.outbreaks) ? data.outbreaks.length : 0;
}
runSeed('health', 'disease-outbreaks', CANONICAL_KEY, fetchDiseaseOutbreaks, {
validateFn: validate,
ttlSeconds: CACHE_TTL,
sourceVersion: 'who-api-cdc-ont-v6',
declareRecords,
schemaVersion: 1,
maxStaleMin: 2880,
}).catch((err) => {
const _cause = err.cause ? ` (cause: ${err.cause.message || err.cause.code || err.cause})` : '';
console.error('FATAL:', (err.message || err) + _cause);
process.exit(1);
});