Files
worldmonitor/api
Elie Habib 9e022f23bb fix(cable-health): stop EMPTY alarm during NGA outages — writeback fallback + mark zero-events healthy (#3230)
User reported health endpoint showing:
  "cableHealth": { status: "EMPTY", records: 0, seedAgeMin: 0, maxStaleMin: 90 }

despite the 30-min warm-ping loop running. Two bugs stacked:

1. get-cable-health.ts null-upstream path didn't write Redis.
   cachedFetchJson with a returning-null fetcher stores NEG_SENTINEL
   (10 bytes) in cable-health-v1 for 2 min. Handler then returned
   `fallbackCache || { cables: {} }` to the client WITHOUT writing to
   cable-health-v1 or refreshing seed-meta. api/health.js saw strlen=10
   → strlenIsData=false → hasData=false → records=0 → EMPTY (CRIT).

   Fix: on null result, write the fallback response back to CACHE_KEY
   (short TTL matching NEG_SENTINEL so a recovered NGA fetch can
   overwrite immediately) AND refresh seed-meta with the real count.
   Health now sees hasData=true during an outage.

2. Zero-cables was treated as EMPTY_DATA (CRIT), but `cables: {}` is
   the valid healthy state — NGA had no active subsea cable warnings.
   The old `Math.max(count, 1)` on recordCount was an intentional lie
   to sidestep this; now honest.

   Fix: add `cableHealth` to EMPTY_DATA_OK_KEYS. Matches the existing
   pattern for notamClosures, gpsjam, weatherAlerts — "zero events is
   valid, not critical". recordCount now reports actual cables.length.

Combined: NGA outage → fallback cached locally + written back → health
reads hasData=true, records=N, no false alarm. NGA healthy with zero
active warnings → cables={}, records=0, EMPTY_DATA_OK → OK. NGA healthy
with warnings → cables={...}, records>0 → OK.

Regression guard to keep in mind: if anyone later removes cableHealth
from EMPTY_DATA_OK_KEYS and wants strict zero-events to alarm, they'd
also need to revisit `Math.max(count, 1)` or an equivalent floor so
the "legitimately empty but healthy" state doesn't CRIT.
2026-04-20 15:21:04 +04:00
..
2026-03-20 12:37:24 +04:00