Files
worldmonitor/scripts
Elie Habib 1d0f75cbf6 fix(health): fix cableHealth and spending EMPTY/CRIT flapping (#1806)
* fix(health): prevent cableHealth EMPTY/CRIT when NGA has no active warnings

When the NGA broadcast-warn API returns 0 active warnings, computeHealthMap
produces an empty cables object. The handler wrote recordCount:0 to seed-meta,
causing health.js to report EMPTY (CRIT) even though the feed is operational.

Zero cable disruptions is a valid healthy state, not missing data.
Write Math.max(count, 1) so health.js only fires CRIT if the feed is
genuinely broken (no seed-meta at all), not when NGA reports no disruptions.

* fix(health): fix cableHealth EMPTY/CRIT — TTL shorter than warm-ping interval

Root cause: CACHE_TTL was 600s (10 min) but the relay warm-ping runs every
30 min. The cable-health-v1 Redis key expired 20 min before the next ping,
causing health.js to see a missing key → EMPTY → CRIT for 20 of every 30 min.

Fix: increase CACHE_TTL to 3600s (1h) to match health.js maxStaleMin:60 and
outlive the warm-ping interval with margin.

Also reverts the earlier incorrect Math.max(count,1) change — health.js reads
the cached payload directly, not meta.recordCount.

* fix(health): fix spending EMPTY/CRIT — TTL matched seed interval exactly

SPENDING_CACHE_TTL was 3600s = SPENDING_SEED_INTERVAL_MS (1h). At exact
equality the key expires the moment the next seed runs, causing a window
where health.js sees EMPTY → CRIT. Double the TTL to 2h so the key always
outlives the seeder.

* fix(health): fix weather CACHE_TTL matching seed interval exactly

WEATHER_CACHE_TTL was 900s = WEATHER_SEED_INTERVAL_MS (15 min). Same
TTL=interval race as spending: key can expire at the exact moment the
seeder fires, leaving a window where health.js sees EMPTY. Double TTL
to 1800s (30 min) to eliminate the race.

* fix(health): fix 5 remaining TTL < 2x interval races in ais-relay

Ensure every Redis key TTL is at least 2× its seeder interval so a single
slow/missed cycle never causes EMPTY/CRIT flapping:

- USNI:              7h → 12h  (interval 6h,  was 1.17x)
- THEATER_POSTURE:  15m → 20m  (interval 10m, was 1.5x)
- CYBER:             3h →  4h  (interval 2h,  was 1.5x)
- CHOKEPOINT_TRANSIT: 15m → 20m (interval 10m, was 1.5x)
- TRANSIT_SUMMARY:  15m → 20m  (interval 10m, was 1.5x)

* test: update transit-summaries TTL assertion to match new 2x minimum rule
2026-03-18 16:43:34 +04:00
..