Files
worldmonitor/scripts
Elie Habib af3d2ce45f fix(health): disease outbreaks — WHO JSON API + ThinkGlobalHealth source (#2388)
* fix(health): disease outbreaks seeder — WHO JSON API, ThinkGlobalHealth, curlFetch proxy

WHO DON RSS (feeds/entity/csr/don/en/rss.xml) has been dead since 2024, and
Outbreak News Today was being blocked by Railway IPs, leaving only US-only CDC
alerts. This produced 50 items in Redis but only 2 dots on the map.

Fixes:
- Replace dead WHO RSS with WHO DON JSON API
  (www.who.int/api/emergencies/diseaseoutbreaknews) — 30 authoritative global
  outbreak items with proper Disease – Country title format
- Add ThinkGlobalHealth disease tracker as primary source: scrapes
  index_bundle.js (ProMED-reviewed alerts with lat/lng, country, date);
  ~250 items in last 90 days across 50+ countries
- Add curlFetch + proxy support for Outbreak News Today, matching the pattern
  used by seed-fear-greed for Railway IP-blocked sources
- Fix extractLocationFromTitle to handle regular hyphen (- ) in addition to
  em-dash (–), covering all WHO DON title formats
- Add WHO_NAME_OVERRIDES for multi-word country names the bigram scanner
  misses (DRC, Timor-Leste, Papua New Guinea, Saudi Arabia)
- Use location-first country extraction to prevent disease names that are also
  country names from causing false positives (Sudan virus – Uganda → UG)
- Add deduplication by disease+country pair to avoid flooding with repeated
  state-level US alerts
- Bump sourceVersion to who-api-cdc-ont-v4

* fix(disease): address 3 Greptile review issues on PR #2388

- resolveProxy: handle PROXY_URL passwords containing colons by splitting
  on the first 3 colon positions only, joining remainder as password
- extractLocationFromTitle: split on all dash separators and take the last
  capitalized segment, preventing "Disease - Update - Country" from
  capturing "Update - Country" instead of "Country"
- dedup: use unique id as key for "Unknown Disease" events so distinct
  alerts in the same country are not collapsed into one record
2026-03-28 00:47:08 +04:00
..