Phase 3 PR1: Regime drift history (writer + RPC) (#2981)

* feat(intelligence): regime drift history (Phase 3 PR1)

Phase 3 PR1 of the Regional Intelligence Model. Adds an append-only
regime transition log per region plus a premium-gated RPC to read it.

## What landed

### New writer module: scripts/regional-snapshot/regime-history.mjs

Single public entry point:

  recordRegimeTransition(region, snapshot, diff, opts?)
    -> { recorded, entry, pushed, trimmed }

Pure builder + Redis-ops orchestrator + dependency-injected publisher.

Flow:
  1. buildTransitionEntry() returns null when diff has no regime_changed
     (steady-state snapshots produce no entry — pure transition stream)
  2. publishTransitionWithOps() LPUSHes onto
     intelligence:regime-history:v1:{region}, then LTRIMs to keep the
     most recent REGIME_HISTORY_MAX (100) entries
  3. defaultPublisher binds real Upstash REST calls; tests inject an
     in-memory ops object for offline coverage

LTRIM failure is non-fatal — entry already landed, next cycle will
re-trim. LPUSH failure short-circuits and reports pushed=false. The
recorder NEVER throws and is wrapped in its own try/catch in the seed
loop so snapshot persist is never blocked.

### seed-regional-snapshots.mjs hook

Added a regime-history call alongside the existing alert-emitter call,
right after persistSnapshot success. Same best-effort contract:
unconditional try/catch, log warn on throw, continue main loop.

### Proto + RPC: GetRegimeHistory

  proto/worldmonitor/intelligence/v1/get_regime_history.proto

  - GetRegimeHistoryRequest { region_id, limit (0..100) }
  - GetRegimeHistoryResponse { transitions: RegimeTransition[] }
  - RegimeTransition { region_id, label, previous_label,
                       transitioned_at, transition_driver, snapshot_id }

region_id validated as strict kebab-case (same regex as
get-regional-snapshot). limit capped server-side at MAX_LIMIT=100,
defaulting to 50 when omitted.

Added to IntelligenceService in service.proto. Generated openapi
JSON/YAML committed via `make generate`.

### Server handler: server/worldmonitor/intelligence/v1/get-regime-history.ts

LRANGE-based read (newest-first because the writer LPUSHes). adapter
is a dedicated exported function adaptTransition(raw) for testability.

LRANGE helper is inlined here because server/_shared/redis.ts has no
list helpers yet — this is the first list-reading handler in the
intelligence service. If a second list reader lands, the helper can
be promoted to a shared util.

Empty list / Redis miss / failed JSON parse all return
{ transitions: [] } so the client can distinguish "never changed" from
"upstream broken" via the HTTP status code, not the body.

Registered in handler.ts.

### Premium gating + cache tier

  src/shared/premium-paths.ts:   added /api/intelligence/v1/get-regime-history
  server/gateway.ts RPC_CACHE_TIER: same path with 'slow' tier (matches
                                    route-parity contract enforced by
                                    tests/route-cache-tier.test.mjs)

## Tests — 44 new unit tests

tests/regional-snapshot-regime-history.test.mjs (22 tests):

  buildTransitionEntry (7):
    - null on missing diff/region/snapshot
    - returns entry on regime change
    - first-ever transition (empty previous_label)
    - falls back to generated_at when transitioned_at is missing
    - preserves snapshot_id

  publishTransitionWithOps (8):
    - happy path (LPUSH + LTRIM both succeed)
    - canonical key prefix
    - LTRIM uses REGIME_HISTORY_MAX-1 stop
    - LPUSH failure → not pushed, LTRIM not called
    - LTRIM failure → pushed=true, trimmed=false (non-fatal)
    - LPUSH/LTRIM throwing caught and reported
    - null/empty entry → no-op

  recordRegimeTransition (5):
    - no-op on no regime change
    - records on regime change
    - publisher returning false → recorded=false
    - publisher exceptions swallowed
    - critical escalation labels preserved

  module constants (2): key prefix + max are valid

tests/get-regime-history.test.mts (22 tests):

  adaptTransition (4):
    - all fields snake → camel
    - missing fields → empty/zero defaults
    - first-ever transition shape preserved
    - non-numeric transitioned_at → 0

  handler structural checks (7): canonical key prefix, LRANGE usage,
    adapter export, handler export signature, MAX_LIMIT cap matches
    writer, missing-region short-circuit, malformed-entry filter

  intelligence handler registration (2): import + registration

  security wiring (2): premium path + cache-tier entry

  proto definition (7): RPC method declared, import wired, request
    shape, kebab regex, limit bounds, RegimeTransition fields,
    response shape

## Verification

- node --test tests/regional-snapshot-regime-history.test.mjs: 22/22 pass
- npx tsx --test tests/get-regime-history.test.mts: 22/22 pass
- npm run test:data: 4621/4621 pass
- npm run typecheck: clean
- npm run typecheck:api: clean
- biome lint on touched files: clean

## Deferred to future iterations

- Phase 3 PR2: weekly regional briefs LLM seeder (consumes regime history
  to highlight drift events in the weekly summary)
- Phase 3 PR3: UI block in RegionalIntelligenceBoard for regime drift
  timeline (can ride alongside or after PR2)
- Drift analytics: % of last N days spent in each regime, transition
  frequency rolling window, regime cycle detection
- Alert triggers on drift cycles (e.g., "thrashed between regimes 3 times
  in 7 days")

* fix(intelligence): address 2 review findings on #2981

P2 #1 — transition_driver always empty in the live path

buildRegimeState(balance, previousLabel, '') at Step 11 passed an empty
driver because the diff hasn't been computed yet. The regime-history
recorder reads snapshot.regime.transition_driver which was therefore
always '' in production, despite tests exercising synthetic fixtures
with a populated driver.

Fix: after Step 15 derives triggerReason via inferTriggerReason(diff),
backfill regime.transition_driver = triggerReason when a genuine regime
change occurred. This ensures both the persisted snapshot's regime block
AND the regime-history entry carry the real driver (e.g., 'regime_shift',
'trigger_activation', 'corridor_break').

Added 2 regression tests: populated driver flows through, and pre-fix
empty-driver snapshots remain back-compatible.

P2 #2 — Redis failure returns cached false-empty history

get-regime-history.ts returned 200 {transitions:[]} on LRANGE failure.
The gateway caches 200 GET responses at the slow tier, so a transient
Upstash outage would be pinned as a false-empty history until the cache
TTL expired.

Fix: when redisLrange returns null (Redis unavailable or credentials
missing), the response now includes upstreamUnavailable: true in the
body. The gateway already checks for this flag in the response body
(line 434) and sets Cache-Control: no-store, so transient failures are
not cached.

Added 1 structural test asserting the upstreamUnavailable flag is set.

Verification:
- 24/24 writer tests, 23/23 handler tests, 4624/4624 full suite pass
- npm run typecheck: clean
- biome lint on touched files: clean

* fix(intelligence): correct misleading 'log once per region' comment (Greptile P2)
This commit is contained in:
Elie Habib
2026-04-12 07:58:01 +04:00
committed by GitHub
parent 6dab59faba
commit 19d67cea94
14 changed files with 1252 additions and 1 deletions

File diff suppressed because one or more lines are too long