71 Commits

Author SHA1 Message Date
Elie Habib
8278c8e34e fix(forecasts): unwrap seed-contract envelope in canonical-key sim patcher (#3348)
* fix(forecasts): unwrap seed-contract envelope in canonical-key sim patcher

Production bug observed 2026-04-23 across both forecast worker services
(seed-forecasts-simulation + seed-forecasts-deep): every successful run
logs `[SimulationDecorations] Cannot patch canonical key — predictions
missing or not an array` and silently fails to write simulation
adjustments back to forecast:predictions:v2.

Root cause: PR #3097 (seed-contract envelope dual-write) wraps canonical
seed writes in `{_seed: {...}, data: {predictions: [...]}}` via runSeed.
The Lua patcher (_SIM_PATCH_LUA) and its JS test-path mirror both read
`payload.predictions` directly with no envelope unwrap, so they always
return 'MISSING' against the new shape — meeting the documented pattern
in the project's worldmonitor-seed-envelope-consumer-drift learning
(91 producers enveloped, private-helper consumers not migrated).

User-visible impact: ForecastPanel renders simulation-adjusted scores
only when a fast-path seed has touched a forecast since the bug landed;
deep-forecast and simulation re-scores never reach the canonical feed.

Fix:
  - _SIM_PATCH_LUA detects envelope shape (`type(payload._seed) == 'table'
    and type(payload.data) == 'table'`), reads `inner.predictions`, and
    re-encodes preserving the wrapper so envelope shape persists across
    patches. Legacy bare values still pass through unchanged.
  - JS test path mirrors the same unwrap/rewrap.
  - New test WD-20b locks the regression: enveloped store fixture, asserts
    `_seed` wrapper preserved on write + inner predictions patched.

Also resolves the per-run `[seed-contract] forecast:predictions missing
fields: sourceVersion — required in PR 3` warning by passing
`sourceVersion: 'detectors+llm-pipeline'` to runSeed (PR 3 of the
seed-contract migration will start enforcing this; cheap to fix now).

Verified: typecheck (both tsconfigs) clean; lint 0 errors; test:data
6631/6631 green (forecast suite 309/309 incl new WD-20b); edge-functions
176/176 green; markdown + version-check clean.

* fix(forecasts): tighten JS envelope guard to match Lua's strict table check

PR #3348 review (P2):

JS test path used `!!published._seed` (any truthy value) while the Lua
script requires `type(payload._seed) == 'table'` (strict object check).
Asymmetry: a fixture with `_seed: true`, `_seed: 1`, or `_seed: 'string'`
would be treated as enveloped by JS and bare by Lua — meaning the JS
test mirror could silently miss real Lua regressions that bisect on
fixture shape, defeating the purpose of having a parity test path.

Tighten JS to require both `_seed` and `data` be plain objects (rejecting
truthy non-objects + arrays), matching Lua's `type() == 'table'` semantics
exactly.

New test WD-20c locks the parity: fixture with non-table `_seed` (string)
+ bare-shape `predictions` → must succeed via bare path, identical to
what Lua would do.

Verified: 6632/6632 tests pass; new WD-20c green.
2026-04-23 20:38:11 +04:00
Elie Habib
2fddee6b05 feat(simulation): add keyActorRoles to fix actor overlap bonus vocabulary mismatch (#2582)
* feat(simulation): add keyActorRoles field to fix actor overlap bonus vocabulary mismatch

The +0.04 actor overlap bonus never reliably fired in production because
stateSummary.actors uses role-category strings ('Commodity traders',
'Policy officials') while simulation keyActors uses named geo-political
entities ('Iran', 'Houthi'). 53 production runs audited showed the bonus
fired once out of 53.

Fix: add keyActorRoles?: string[] to SimulationTopPath. The Round 2 prompt
now includes a CANDIDATE ACTOR ROLES section with theater-local role vocab
seeded from candidatePacket.stateSummary.actors. The LLM copies matching
roles into keyActorRoles. applySimulationMerge scores overlap against
keyActorRoles when actorSource=stateSummary, preserving the existing
keyActors entity-overlap path for the affectedAssets fallback.

- buildSimulationPackageFromDeepSnapshot: add actorRoles[] to each theater
  from candidate.stateSummary.actors (theater-scoped, no cross-theater noise)
- buildSimulationRound2SystemPrompt: inject CANDIDATE ACTOR ROLES section
  with exact-copy instruction and keyActorRoles in JSON template
- tryParseSimulationRoundPayload: extract keyActorRoles from round 2 output
- mergedPaths.map(): filter keyActorRoles against theater.actorRoles guardrail
- computeSimulationAdjustment: dual-path overlap — roleOverlapCount for
  stateSummary, keyActorsOverlapCount for affectedAssets (backwards compat)
- summarizeImpactPathScore: project roleOverlapCount + keyActorsOverlapCount
  into path-scorecards.json simDetail

New fields: roleOverlapCount, keyActorsOverlapCount in SimulationAdjustmentDetail
and ScorecardSimDetail. actorOverlapCount preserved as backwards-compat alias.

Tests: 308 pass (was 301 before). New tests T-P1/T-P2/T-P3 (prompt/parser),
T-RO1/T-RO2/T-RO3 (role overlap logic), T-PKG1 (pkg builder actorRoles),
plus fixture updates for T2/T-F/T-G/T-J/T-K/T-N2/T-SC-4.

🤖 Generated with Claude Sonnet 4.6 via Claude Code (https://claude.ai/claude-code) + Compound Engineering v2.49.0

Co-Authored-By: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com>

* fix(simulation): address CE review findings from PR #2582

- Add SimulationPackageTheater interface to seed-forecasts.types.d.ts
  (actorRoles was untyped under @ts-check)
- Add keyActorRoles to uiTheaters Redis projection in writeSimulationOutcome
  (field was stripped from Redis snapshot; only visible in R2 artifact)
- Extract keyActorRoles IIFE to named sanitizeKeyActorRoles() function;
  hoist allowedRoles Set computation out of per-path loop
- Harden bonusOverlap ternary: explicit branch for actorSource='none'
  prevents silent fallthrough if new actorSource values are added
- Eliminate roleOverlap intermediate array in computeSimulationAdjustment
- Add U+2028/U+2029 Unicode line-separator stripping to sanitizeForPrompt
- Apply sanitizeForPrompt at tryParseSimulationRoundPayload parse boundary;
  add JSDoc to newly-exported function

All 308 tests pass, typecheck + typecheck:api clean.

* fix(sim): restore const sanitized in sanitizeKeyActorRoles after early-return guard

Prior edit added `if (!allowedRoles.length) return []` but accidentally removed
the `const sanitized = ...` line, leaving the filter on line below referencing an
undefined variable. Restores the full function body:

  if (!allowedRoles.length) return [];
  const sanitized = (Array.isArray(rawRoles) ? rawRoles : [])
    .map((s) => sanitizeForPrompt(String(s)).slice(0, 80));
  const allowedNorm = new Set(allowedRoles.map(normalizeActorName));
  return sanitized.filter((s) => allowedNorm.has(normalizeActorName(s))).slice(0, 8);

308/308 tests pass.

---------

Co-authored-by: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com>
2026-04-01 08:53:13 +04:00
Elie Habib
cc914e927b feat(scorecard): add simDetail to path-scorecards.json for simulation audit (#2558)
Each path-scorecards.json entry now includes a simDetail object containing
the full SimulationAdjustmentDetail fields: bucketChannelMatch, actorOverlapCount,
candidateActorCount, actorSource, resolvedChannel, channelSource, invalidatorHit,
stabilizerHit.

Previously, details from computeSimulationAdjustment were only stored in the
in-memory adjustments[] array and never reached the R2 artifact. This meant
production runs could not be retroactively audited to confirm whether the +0.04
actor-overlap bonus fired, which channel matched, or what actor source was used.

Changes:
- applySimulationMerge: attach details to path.simulationAdjustmentDetail
- summarizeImpactPathScore: project simDetail from simulationAdjustmentDetail
- seed-forecasts.types.d.ts: add simulationAdjustmentDetail to ExpandedPath, add ScorecardSimDetail interface
- Tests T-SC-1 through T-SC-4: 301/301 passing
2026-03-31 00:52:14 +04:00
Elie Habib
303b059ba8 feat(forecast): plumb simulation decorations from ExpandedPath to Forecast Redis key (#2528)
* feat(forecast): plumb simulation decorations from ExpandedPath to Forecast Redis key

- Add `simulation_adjustment`, `sim_path_confidence`, `demoted_by_simulation` fields
  to `Forecast` proto (fields 20-22) and regenerate service client
- `writeSimulationDecorations`: after simulation rescore, builds candidateStateId →
  forecastIds map from snapshot.fullRunStateUnits, picks strongest signal per candidate,
  writes `forecast:sim-decorations:v1` to Redis (3-day TTL)
- `applySimulationDecorationsToForecasts`: reads decorations at start of fast-path seed,
  mutates predictions in-place (stale-by-one-run, non-fatal on failure)
- `buildPublishedForecastPayload`: passes sim fields through to published Redis payload
- Add `__setRedisStoreForTests` / test-store bypass in `getRedisCredentials` for unit tests
- 12 new WD-* tests covering early-exits, strongest-signal selection, candidate conflict
  resolution, demotion flag, non-fatal error handling, and full round-trip

* fix(forecast): always overwrite simulation decorations to prevent 3-day staleness

Without this fix, writeSimulationDecorations() returned early on empty adjustments
or zero decorationCount, leaving the prior run's forecast:sim-decorations:v1 intact
for the full 3-day TTL. Subsequent fast-path seeds would blindly reapply old flags.

Fixes:
- Split the `!adjustments?.length` guard: only skip on missing simulationEvidence
  (bogus data); write an empty byForecastId map when adjustments is [] so later
  runs always overwrite stale entries
- Remove the `decorationCount === 0` early return for the same reason
- Add SIMULATION_DECORATIONS_MAX_AGE_MS (48h) guard in applySimulationDecorationsToForecasts
  as defense-in-depth for the edge case where no simulation has run recently

Tests: WD-1, WD-3 updated; WD-2b, WD-13, WD-14 added (287 total, all pass)

* fix(forecast): call writeSimulationDecorations from inline deep-forecast path

processDeepForecastTask called applySimulationMerge but only extracted
simulationEvidence, discarding mergeResult without writing decorations.
Forecasts processed through the inline deep path (the common case) never
got their forecast:sim-decorations:v1 key populated.

Fix: fire-and-forget writeSimulationDecorations(mergeResult, snapshot)
immediately after applySimulationMerge in processDeepForecastTask, same
pattern as applyPostSimulationRescore. The deep path now writes on the
SAME run (no stale-by-one lag).

Test WD-15: exercises applySimulationMerge → writeSimulationDecorations
with snapshot.fullRunStateUnits, verifying both mapped forecast IDs are
decorated via the full chain used by processDeepForecastTask (288 total).

* fix(forecast): patch forecast:predictions:v2 immediately after sim decorations are written

writeSimulationDecorations only wrote the side key (forecast:sim-decorations:v1).
forecast:predictions:v2 is published by runSeed() before deep/rescore paths run,
so same-run simulation evidence never reached the canonical key — panel consumers
saw prior-run data until the next fast seed re-applied decorations.

Fix: add patchPublishedForecastsWithSimDecorations() called immediately after the
side key write in writeSimulationDecorations(). Reads forecast:predictions:v2,
updates sim fields for matched forecasts, resets to 0 for unmatched (clearing any
stale values from prior runs), writes back non-fatal.

Tests WD-16 and WD-17 assert on forecast:predictions:v2 directly — not just the
side key — proving the canonical key is updated on the same run (290 total).

* fix(forecast): await writeSimulationDecorations and applyPostSimulationRescore to prevent process.exit abandonment

Two fire-and-forget chains caused patchPublishedForecastsWithSimDecorations to
silently fail when runSimulationWorker(once=true) returned and process.exit fired:

1. processNextSimulationTask fired-and-forgot applyPostSimulationRescore then
   returned immediately, so the rescore and its canonical-key patch never ran.

2. applyPostSimulationRescore fired-and-forgot writeSimulationDecorations and
   returned on the no_path_changes branch before the Redis writes completed.

Both call sites are now awaited. The writeSimulationDecorations call in
processDeepForecastTask is also changed to await for consistency -- the deep
path was already safe (heavy artifact writing follows) but the comment said
"fire-and-forget" which contradicted the intent.

* fix(forecast): write seed-meta:forecast:sim-decorations:v1 for health monitoring (AGENTS.md requirement)

* fix(forecast): guard patchPublishedForecastsWithSimDecorations against cross-run stamping

Blind read-modify-write on forecast:predictions:v2 risked an older deep-forecast
or simulation worker overwriting a newer fast-path seed's payload (or zeroing out
newer sim fields) when workers finish out of order.

Fix: patchPublishedForecastsWithSimDecorations now accepts runGeneratedAt and
skips the patch when published.generatedAt > runGeneratedAt — i.e. the canonical
key already belongs to a newer run. writeSimulationDecorations passes
snapshot.generatedAt as the run identity token (same value baked into the
canonical key by buildPublishedSeedPayload for the matching run).

Add WD-18 (mismatch skips) and WD-19 (same-run patches) to cover overlapping-run
scenarios that the prior tests did not exercise.

* fix(forecast): atomic write-if-newer for sim-decorations side key; store run origin timestamp

Two P1 issues closed:

1. generatedAt: Date.now() on the side key made a late older worker's write look
   fresh to applySimulationDecorationsToForecasts(), which measures staleness from
   write time. Changed to snapshot?.generatedAt so freshness reflects the originating
   run's age, not the wall-clock write timestamp.

2. No run-order guard on forecast:sim-decorations:v1 write. A late older worker
   could overwrite a newer run's decorations, poisoning the source that the next
   fast seed reads. Added redisAtomicWriteSimDecorations (Lua EVAL, symmetric with
   the canonical key fix): atomically reads existing generatedAt, skips write if
   existing is from a newer run.

writeSimulationDecorations now returns early on side key skip, also bypassing the
canonical patch (which has its own Lua guard regardless).

New tests: WD-21 (end-to-end skip of both side key + canonical), WD-22 (stored
generatedAt equals snapshot.generatedAt), WD-23/WD-24 (atomic helper unit cases).
297 tests passing.
2026-03-30 08:14:17 +04:00
Elie Habib
5bfd7bc152 feat(simulation): confidence-weighted adjustments + simulationSignal trace lane (#2515)
* feat(simulation): confidence-weighted adjustments + simulationSignal trace lane

Scale +0.08 and +0.04 simulation bonuses by simPathConfidence (missing/zero
falls back to 1.0 so old artifacts are not penalized). Negative adjustments
(-0.12/-0.15) remain flat — they are structural, not sim-confidence-dependent.

Attach a compact simulationSignal object (backed, adjustmentDelta, channelSource,
demoted, simPathConfidence) to each ExpandedPath when adjustment != 0. Written
to R2 trace artifacts. ForecastPanel UI chip deferred to follow-up PR (requires
proto field + buf generate + prediction-to-path plumbing).

Add simPathConfidence to SimulationAdjustmentDetail for observability.
Add T-N1..T-N8 (confidence weighting) and T-O1..T-O4 (simulationSignal) tests.

🤖 Generated with Claude Sonnet 4.6 via Claude Code (https://claude.ai/claude-code) + Compound Engineering v2.49.0

Co-Authored-By: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com>

* fix(simulation): correct zero-confidence handling + wire simulationSignal to scorecard summaries

P1: explicit confidence=0 from LLM now correctly yields simConf=0 (no positive bonus)
instead of falling back to 1.0. Absent/non-finite confidence still uses 1.0 fallback
(conservative — old LLM artifacts without the field are not penalized). The previous
rawConf > 0 guard conflated "absent" and "explicitly unsupported" paths.

P2: summarizeImpactPathScore now forwards simulationSignal into scorecard summaries,
so path-scorecards.json and impact-expansion-debug.json include the new lane alongside
simulationAdjustment and mergedAcceptanceScore. Only forecast-eval.json had it before.

Also exports summarizeImpactPathScore for direct unit testing.

Tests: T-N5 corrected (explicit 0 → no bonus), T-N5b (zero conf + invalidator still
fires flat -0.12), T-O5 (summarizeImpactPathScore includes simulationSignal),
T-O6 (omits field when absent). 270 passing.

🤖 Generated with Claude Sonnet 4.6 via Claude Code (https://claude.ai/claude-code) + Compound Engineering v2.49.0

Co-Authored-By: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com>

* fix(simulation): clear stale sim fields before re-merge + fix simPathConfidence comment

P1 (blocking): applySimulationMerge now deletes simulationAdjustment, mergedAcceptanceScore,
simulationSignal, demotedBySimulation, and promotedBySimulation from every expanded path
before running computeSimulationAdjustment. This fires before any early-continue
(no theater match, no candidate packet, zero confidence), so reloaded forecast-eval.json
paths from applyPostSimulationRescore never retain stale simulation metadata from a prior cycle.

P3: corrected simPathConfidence JSDoc in SimulationAdjustmentDetail and SimulationSignal —
absent/non-finite → 1.0 fallback, explicit 0 preserved as 0 (was: "missing/null/zero fall back to 1.0").

Tests: T-O7 (zero-confidence match clears stale fields), T-O8 (no-theater match clears
stale fields). 272 passing.

🤖 Generated with Claude Sonnet 4.6 via Claude Code (https://claude.ai/claude-code) + Compound Engineering v2.49.0

Co-Authored-By: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com>

* fix(types): correct SimulationSignal.backed JSDoc — only true for positive adjustments

---------

Co-authored-by: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com>
2026-03-29 22:04:06 +04:00
Elie Habib
80918e357a fix(simulation): actor overlap bonus finally fires -- source from stateSummary.actors not affectedAssets (#2477)
* fix(simulation): source actor overlap from candidatePacket.stateSummary.actors (#2459)

The +0.04 actor overlap bonus in computeSimulationAdjustment has never
fired in production because extractPathActors read hop.affectedAssets
(financial instruments: "TTF gas futures", "European utility stocks")
while sim keyActors are geo/political actors ("Houthi", "Iran", "Saudi
Arabia") -- disjoint namespaces, actorOverlapCount=0 on every run.

Fix: source actors from candidatePacket.stateSummary.actors (grounded
stateUnit.actors, same domain as simulation keyActors). Strict precedence
with no union: if stateSummary.actors raw list is non-empty, use only
that; else fall back to affectedAssets for backward compat. Precedence
is on raw list presence, not post-normalization length, so entries like
['---'] use stateSummary path without falling through.

Also fixes normalizeActorName to handle underscore-separated LLM output
("Saudi_Arabia") and entity ID prefix stripping ("state-abc:iran").
Pattern /^[a-z][a-z0-9_-]*$/ strips lowercase ID prefixes but rejects
uppercase ("US", "EU") and natural language ("New York") to prevent
false strips. Both sides of the comparison go through the same function.

Guards Array.isArray on both stateSummary.actors and keyActors to prevent
crashes on non-array truthy values from malformed LLM snapshots.

Retires extractPathActors (single call site, now inlined). Adds
candidateActorCount and actorSource to SimulationAdjustmentDetail for
observability. Exports normalizeActorName for test coverage.

Tests: N-1/2/3/4 (normalizeActorName), T-F/G/H/I/K/L/L-pre/M
(computeSimulationAdjustment), T-J (applySimulationMerge end-to-end).

* fix(types): narrow channelSource to 'direct' | 'market' | 'none' union
2026-03-29 12:30:58 +04:00
Elie Habib
99d564061e feat(simulation): post-simulation re-score to fix stale-sim pipeline sequencing (#2435)
* feat(simulation): post-simulation re-score fixes stale-sim pipeline sequencing

The deep forecast worker runs before the current run's simulation completes,
so applySimulationMerge always received a prior-run simulation (isCurrentRun=false).
Prior-run stateIds don't match current-run paths → adjustments=[], bucketChannelMatch
never fires, near-threshold paths stay rejected.

Fix (Option A): fire-and-forget re-score after writeSimulationOutcome completes.

- buildForecastEvalPayload(): stores full selectedPaths/rejectedPaths (with direct/
  second/third hypothesis data) in forecast-eval.json so re-score can rebuild the
  impact expansion bundle for promoted paths
- writeForecastTraceArtifacts(): writes forecast-eval.json alongside path-scorecards
- applyPostSimulationRescore(): loads forecast-eval + snapshot from R2, reconstructs
  evaluation, calls applySimulationMerge with isCurrentRun=true, re-writes trace
  artifacts if any path was promoted or demoted
- processNextSimulationTask(): fire-and-forget call to applyPostSimulationRescore
  after writeSimulationOutcome + completeSimulationTask

Guard: only re-scores when status='completed_no_material_change' OR a rejected
expanded path is in the 0.42–0.50 acceptanceScore band (within reach of +0.08
bucketChannelMatch bonus). Skips silently otherwise.

Tests: R-1 through R-5 covering skip conditions, no-change case, promotion of a
near-threshold rejected path, and marketContext fallback for invalid LLM channel keys.

* fix(simulation): correct post-rescore guard and generatedAt derivation (P1+P2)

P1: Replace hasNearThreshold (checked rejected paths for promotion — no-op for
`completed` status since applySimulationMerge has no completed+promotion case)
with hasDemotionRisk (checks selected paths with acceptanceScore < 0.62, where
a -0.12 invalidator could push below the 0.50 acceptance threshold).

P2: Use freshOutcome.generatedAt as primary timestamp instead of
parseForecastRunGeneratedAt(runId). RunId carries the run-start timestamp; R2
artifact paths use data.generatedAt at write time. UTC midnight crossings cause
a key mismatch, silently returning no_eval_data. Fallback to runId parse only
when freshOutcome.generatedAt is absent.

* fix(simulation): add CASE 3 for partial demotion + skip duplicate pointer on rescore

applySimulationMerge CASE 3: handles completed run where some (not all) expanded
paths are demoted or promoted. Previously only CASE 1 (completed_no_material_change
→ completed) and CASE 2 (completed → completed_no_material_change) existed. With
the hasDemotionRisk guard in applyPostSimulationRescore, CASE 3 is now reachable:
selected expanded paths are non-empty after adjustment, status stays 'completed',
but evaluation.selectedPaths was never updated. Fixes the silent no-op: demoted
path stayed in selectedPaths in the published evaluation. Bundle and worldState are
rebuilt from the surviving expanded paths.

writeForecastTraceArtifacts: add context.skipPointer to suppress writeForecastTracePointer
on rescore re-writes. Rescore overwrites the same R2 artifact keys — the LPUSH
was consuming a history slot and evicting an older unique run earlier than intended.
readForecastWorldStateHistory dedupes by worldStateKey so no functional double-read,
but the extra slot consumed is still waste. Pass skipPointer: true from rescore call site.

Add R-6: partial demotion in completed run reproduces the two-path case (0.55 and
0.75 selected, invalidator fires on 0.55) and asserts selectedPaths is updated.

* fix(simulation): add hasPromotionOpportunity guard for completed-run rescore

hasDemotionRisk only checked selected paths, leaving completed runs with
near-threshold rejected paths (0.42-0.50) incorrectly skipped as
no_actionable_paths. applySimulationMerge CASE 3 now correctly handles
completed+promotion, but the gate never let it through.

Add hasPromotionOpportunity: check rejected expanded paths with
acceptanceScore in [0.42, 0.50) — a +0.08 bucketChannelMatch bonus
can push them over threshold. The outer guard now passes if either
hasDemotionRisk or hasPromotionOpportunity is true.

Also drop the evalData.status === 'completed' guard from both checks:
the outer condition (status !== 'completed_no_material_change') already
handles routing; the inner checks don't need to repeat it.

Add R-7: completed eval with safe selected (0.75) + near-threshold
rejected (0.44), simulation matches rejected path → +0.08 → 0.52 →
promoted via CASE 3, status stays completed, both paths in selectedPaths.

* fix(simulation): correct actionable-path guard thresholds to match max adjustments

The guard hardcoded 0.42/0.62 based on the +0.08 (bucketChannelMatch-only) and
-0.12 (invalidator) adjustments. computeSimulationAdjustment can reach +0.12
with actor overlap (>= 2 actors: +0.08 + 0.04) and -0.15 with a stabilizer hit.

Correct bounds derived from actual max adjustments:
  max positive: +0.12 → promotion window lower bound: 0.50 - 0.12 = 0.38 (was 0.42)
  max negative: -0.15 → demotion risk upper bound: 0.50 + 0.15 = 0.65 (was 0.62)

Production effects of old bounds:
  - Rejected path at 0.39 with two matching actors (+0.12 → 0.51) was skipped
  - Selected path at 0.64 with a matching stabilizer (-0.15 → 0.49) was skipped

Add R-8: path at 0.39 + two-actor overlap (+0.12 total) → 0.51 → promoted.
Add R-9: path at 0.64 + stabilizer (-0.15) → 0.49 → demoted via CASE 3.
2026-03-28 21:21:13 +04:00
Elie Habib
fa3bbe27ef fix(simulation): split MENA geo group so Hormuz and Red Sea theaters coexist (#2428)
THEATER_GEO_GROUPS previously mapped both 'Middle East' and 'Red Sea' to
'MENA', causing buildSimulationPackageFromDeepSnapshot to silently drop one
when both were candidates. With an active Iran/Hormuz conflict, Red Sea's
higher rankingScore could evict the Hormuz theater entirely.

Split into 'MENA_Gulf' (Middle East, Persian Gulf) and 'MENA_RedSea' (Red Sea)
so both theaters are independently eligible for simulation selection.

Tests: updated geo-dedup test to assert 3 selected theaters (Hormuz + Red Sea
+ Malacca), added within-group dedup test (Red Sea vs Suez Canal still compete).
2026-03-28 16:06:42 +04:00
Elie Habib
d55bf302d1 fix(simulation): bucketChannelMatch never fires due to two independent bugs (#2422)
Bug 1 (shape): pathBucket fallback read root candidatePacket.topBucketId
which is always undefined for new-schema packets — the grounded bucket
lives at candidatePacket.marketContext.topBucketId. Fixed by adding the
nested path before the legacy root fallback.

Bug 2 (semantic): path.direct.channel carries LLM marketImpact enum
values (supply_disruption, price_spike, risk_off, etc.) that are disjoint
from CHANNEL_KEYWORDS keys — only 3 of 11 are exact matches. The grounded
fallback marketContext.topChannel was never reached because the LLM value
is always truthy. Fixed with tiered Object.hasOwn() validation: use direct
channel if it is a known key, else fall back to marketContext.topChannel
(nested) or topChannel (legacy flat), else return '' to prevent greedy
literal substring matching.

Also adds resolvedChannel and channelSource to adjustment details for
runtime observability in debug artifacts.

5 new tests cover: valid direct key, invalid→nested market fallback,
invalid→flat legacy fallback, invalid+empty→no match, and the production
case where both bucket and channel are absent from direct and must resolve
from marketContext.
2026-03-28 15:11:52 +04:00
Elie Habib
2430fa8500 fix(simulation): expand CHANNEL_KEYWORDS bridge terms for geopolitical scenario language (#2419)
risk_off_rotation: add sovereign risk, shockwave, contagion, spiral, crisis
energy_supply_shock: add oil infrastructure, supply disruption, oil price spike, crude price
shipping_cost_shock: add rerouting, vessel, shipping lane, maritime

LLM simulation output uses geopolitical scenario text that previously missed
financial keyword matching. Bridge terms connect scenario labels to channels.

Tests: 3 new matchesChannel bridge keyword tests (227 pass)
2026-03-28 15:04:34 +04:00
Elie Habib
4eb1d292eb fix(simulation): add TypeScript types for simulation pipeline + remove dead legacy fallbacks (#2414)
- Add scripts/seed-forecasts.types.d.ts with ambient interfaces for all
  simulation data structures (CandidatePacket, TheaterResult, SimulationOutcome,
  ExpandedPath, SimulationEvidence, SimulationAdjustmentDetail)
- Add // @ts-check + @param JSDoc to contradictsPremise, negatesDisruption,
  computeSimulationAdjustment, applySimulationMerge
- Add scripts/jsconfig.json to enable tsc --checkJs on seed-forecasts.mjs
- Remove dead legacy fallbacks candidatePacket?.topBucketId and
  candidatePacket?.topChannel (these fields were never at top level in
  production; the correct path is candidatePacket.marketContext.topBucketId).
  TypeScript now enforces this at write time.
- Update all test fixtures to use marketContext: { topBucketId, topChannel }
  shape, matching production CandidatePacket structure
2026-03-28 13:05:18 +04:00
Elie Habib
e6a85f0a36 fix(simulation): normalize commodityKey underscores and expand CHANNEL_KEYWORDS (#2410)
Two matching bugs causing adjustment=0 even after #2402+#2404:

1. commodityKey underscore: code checked text.includes('crude_oil') but
   LLM-generated text always uses natural language ('crude oil'). Fixed
   by .replace(/_/g, ' ') in both contradictsPremise and negatesDisruption.

2. CHANNEL_KEYWORDS too narrow: risk_off_rotation only matched 4 exact
   phrases ('risk off', 'risk aversion', 'flight to safety', 'sell off').
   Simulation paths use broader language ('capital flight', 'risk premium',
   'sell-off', 'retreat', etc.). Expanded to cover LLM output vocabulary.
   Also expanded security_escalation to include 'military', 'geopolit'.

Update test: negatesDisruption commodity test now uses natural language
('crude oil') matching what LLM actually generates, not the internal key.
2026-03-28 12:01:35 +04:00
Elie Habib
8689d68c19 fix(simulation): read topBucketId/topChannel from marketContext in computeSimulationAdjustment (#2404) 2026-03-28 09:57:20 +04:00
Elie Habib
6af900e3b3 fix(simulation): skip NEGATION_TERMS guard for simulation-curated invalidators (#2402) 2026-03-28 09:46:28 +04:00
Elie Habib
a99eb5a023 fix(simulation): store candidateStateId in theaterResults to enable Phase 3 merge (#2374)
* fix(simulation): store candidateStateId in theaterResults to enable Phase 3 merge

applySimulationMerge looks up paths by candidateStateId but theaterResults
only stored theaterId ("theater-1"). The map lookup always returned undefined,
silently no-oping all simulationAdjustment writes.

Fix: write candidateStateId alongside theaterId in theaterResults, and key
the simByTheater map by candidateStateId with theaterId as fallback.

* fix(e2e): add missing Earthquake fields to map-harness test fixture

* Revert "fix(e2e): add missing Earthquake fields to map-harness test fixture"

This reverts commit 69d8930b88.

* test(simulation): add T13 to cover positional theaterId vs candidateStateId mismatch
2026-03-27 21:01:29 +04:00
Elie Habib
8f74288b01 feat(simulation): Phase 3 outcome re-ingestion with simulationAdjustment scoring (#2336)
* feat(simulation): Phase 3 outcome re-ingestion with simulationAdjustment scoring

Parse simulation-outcome.json back into the deep forecast evaluation pipeline
as a simulationEvidence lane. Simulation informs the forecast without replacing
structural validation or observed world signals.

Changes:
- computeSimulationAdjustment: +0.08 bucket/channel match, +0.04 actor overlap,
  -0.12 invalidator contradiction, -0.15 stabilizer negation (spec lines 447-478)
- applySimulationMerge: post-evaluation pass that demotes/promotes expanded paths
  using mergedAcceptanceScore = clamp01(acceptanceScore + simulationAdjustment)
- fetchSimulationOutcomeForMerge: reads R2 full outcome; accepts current run or
  previous run under 6h old; gracefully returns null when unavailable
- processDeepForecastTask: wires in simulation merge between evaluateDeepForecastPaths
  and writeForecastTraceArtifacts; wrapped in try/catch for non-blocking degradation
- buildImpactExpansionDebugPayload: adds simulationEvidence field to debug artifacts
- summarizeImpactPathScore: adds simulationAdjustment + mergedAcceptanceScore fields
- 24 new tests (203 total, 0 failures) covering all adjustment rules, demotion,
  promotion, matching helpers, and debug payload field

* feat(simulation): replace maritime type gate with significance-based eligibility (#2342)

* feat(simulation): replace maritime type gate with significance-based eligibility

isMaritimeChokeEnergyCandidate was a hardcoded filter requiring a route
in CHOKEPOINT_MARKET_REGIONS plus an energy/freight bucket. Any theater
driven by rate hikes, sovereign stress, political instability, volcanic
events, or infrastructure failure was silently dropped before simulation.

Replace with isSimulationEligible: rankingScore >= 0.40 + hasBucket +
hasCandidateStateId. The ranking formula already encodes signal lift,
transmission strength, and specificity — theater type is irrelevant.

Also generalize buildSimulationRequirementText with stateKind-branched
prose templates (market_repricing, political_instability, security_
escalation, infrastructure_fragility, cyber_pressure, fallback) and
extend buildSimulationPackageConstraints with two new constraint classes:
macro_financial_posture (soft) and structural_event_premise (hard:true)
to prevent MiroFish hallucinating maritime artifacts for non-maritime
theaters.

Maritime simulations are fully backward-compatible: maritime_disruption
stateKind still produces identical requirement text and constraints.

Tests: 213 pass (8 new eligibility tests, 5 requirement-text tests,
4 constraint tests replacing 7 old maritime-specific tests).

🤖 Generated with Claude Sonnet 4.6 via Claude Code (https://claude.ai/claude-code) + Compound Engineering v2.40.0

Co-Authored-By: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com>

* test(simulation): fill coverage gaps in theater-agnostic eligibility tests

Three untested paths identified post-review:

T-R6: cyber_pressure branch in buildSimulationRequirementText was
unexercised. Tests "systems availability" and "financial network
continuity" appear.

T-R7: governance_pressure was aliased to the political_instability
template but had no dedicated test. Confirms same prose as T-R3.

T-C5: political_instability + sovereign_risk bucket stacked-constraint
case. sovereign_risk IS in MACRO_FIN_BUCKETS, so this theater generates
both macro_financial_posture (soft) AND structural_event_premise (hard).
T-C2 only asserted the latter; T-C5 explicitly verifies both are
present and have the correct hard/soft values.

Also confirmed: the "critical bug" theory (rankingScore absent on theater
objects) is not a bug — buildSimulationPackageFromDeepSnapshot copies
rankingScore onto each theater object at L12593.

2388 tests pass.

* fix(simulation): theater-agnostic evaluation targets and symmetric demotion

Two P1 issues from PR review:

1. buildSimulationPackageEvaluationTargets still had maritime-specific
   questions hard-coded for all theater types: "disruption at ${route}",
   "freight rate delta on affected trade lanes", "energy price direction
   ($/bbl or %)". For market_repricing/political/cyber theaters this gave
   MiroFish contradictory instructions — constraints said no chokepoints,
   eval targets asked it to model route disruptions.

   Fix: extract buildEvalTargetQuestions() branching on stateKind (7
   branches: maritime_disruption, market_repricing, political_instability,
   governance_pressure, security_escalation, infrastructure_fragility,
   cyber_pressure, fallback). Maritime branch is identical to original.
   Also generalize the T+24h timing marker description and the round-1
   prompt market_cascade instruction.

2. contradictsPremise() and negatesDisruption() only matched on
   routeFacilityKey and commodityKey. For the newly eligible theater types
   (blank route/commodity), invalidators and stabilizers could never fire
   the -0.12 / -0.15 demotion. New theaters were promotable but not
   demotable — asymmetric scoring.

   Fix: when route+commodity are both blank, fall back to subject keywords
   derived from stateKind and topBucketId (split on _, keep >= 4 chars).
   Maritime theater behavior is unchanged (early exit on first branch).

Tests: 2393 pass (+5 new: contradictsPremise non-maritime match,
negatesDisruption non-maritime match and non-match, evalTargets market_
repricing has no maritime framing, negatesDisruption description fix).

---------

Co-authored-by: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com>
2026-03-27 10:02:09 +04:00
Elie Habib
13685ad354 fix(simulation): key evaluationTargets and constraints by theaterId (#2332)
Both buildSimulationPackageEvaluationTargets and buildSimulationPackageConstraints
returned flat arrays. The prompt builders consumed them as
pkg.evaluationTargets?.[theater.theaterId] — object lookup on an array always
returns undefined, falling back to the generic string on every run.

Fix: both builders now return Record<theaterId, ...>. Prompt builders updated
to properly render constraint class/statement and evalTarget pathType/question.

Result: evaluation questions (escalation/containment/market_cascade) and
WorldMonitor structural constraints now reach the LLM on every simulation run.
2026-03-27 01:51:58 +04:00
Elie Habib
b0af1ad84f feat(simulation): geographic theater diversity + market_cascade economic paths (#2264)
* feat(simulation): geographic theater diversity + market_cascade economic paths

Three improvements to the MiroFish simulation pipeline:

1. Geographic deduplication: adds THEATER_GEO_GROUPS constant mapping
   CHOKEPOINT_MARKET_REGIONS values to macro-groups (MENA, AsiaPacific,
   EastEurope, etc.). buildSimulationPackageFromDeepSnapshot now skips
   candidates whose macro-group is already represented, preventing
   Red Sea + Middle East (both MENA) from appearing as separate theaters.

2. Label cleanup: strips trailing (stateKind) parenthetical from theater
   labels before writing to selectedTheaters, so "Black Sea maritime
   disruption state (supply_chain)" becomes "Black Sea maritime disruption
   state" in the UI.

3. market_cascade path: renames spillover → market_cascade across 4 sites
   (evaluationTargets, Round 1 prompt + JSON template, Round 2 prompt +
   JSON template, tryParseSimulationRoundPayload expectedIds). The
   market_cascade path instructs the LLM to model 2nd/3rd order economic
   consequences: energy price direction ($/bbl), freight rate delta,
   downstream sector impacts, and FX stress on import-dependent economies.

Tests: 176 pass (3 net new — geo-dedup, label cleanup, market_cascade
prompt; plus updated entity-collision and path-validation tests).

🤖 Generated with Claude Sonnet 4.6 via Claude Code (https://claude.ai/claude-code) + Compound Engineering v2.49.0

Co-Authored-By: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com>

* docs: fix markdownlint MD032 in simulation diversity plan

---------

Co-authored-by: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com>
2026-03-26 08:31:19 +04:00
Elie Habib
01f6057389 feat(simulation): MiroFish Phase 2 — theater-limited simulation runner (#2220)
* feat(simulation): MiroFish Phase 2 — theater-limited simulation runner

Adds the simulation execution layer that consumes simulation-package.json
and produces simulation-outcome.json for maritime chokepoint + energy/logistics
theaters, closing the WorldMonitor → MiroFish handoff loop.

Changes:
- scripts/seed-forecasts.mjs: 2-round LLM simulation runner (prompt builders,
  JSON extractor, runTheaterSimulation, writeSimulationOutcome, task queue
  with NX dedup lock, runSimulationWorker poll loop)
- scripts/process-simulation-tasks.mjs: standalone worker entry point
- proto: GetSimulationOutcome RPC + make generate
- server/worldmonitor/forecast/v1/get-simulation-outcome.ts: RPC handler
- server/gateway.ts: slow tier for get-simulation-outcome
- api/health.js: simulationOutcomeLatest in STANDALONE + ON_DEMAND keys
- tests: 14 new tests for simulation runner functions

* fix(simulation): address P1/P2 code review findings from PR #2220

Security (P1 #018):
- sanitizeForPrompt() applied to all entity/seed fields interpolated into
  Round 1 prompt (entityId, class, stance, seedId, type, timing)
- sanitizeForPrompt() applied to actorId and entityIds in Round 2 prompt
- sanitizeForPrompt() + length caps applied to all LLM array fields written
  to R2 (dominantReactions, stabilizers, invalidators, keyActors, timingMarkers)

Validation (P1 #019):
- Added validateRunId() regex guard
- Applied in enqueueSimulationTask() and processNextSimulationTask() loop

Type safety (P1 #020):
- Added isOutcomePointer() and isPackagePointer() type guards in TS handlers
- Replaced unsafe as-casts with runtime-validated guards in both handlers

Correctness (P2 #022):
- Log warning when pkgPointer.runId does not match task runId

Architecture (P2 #024):
- isMaritimeChokeEnergyCandidate() accepts both flat and nested topBucketId
- Call site simplified to pass theater directly

Performance (P2 #025):
- SIMULATION_ROUND1_MAX_TOKENS raised 1800 to 2200
- Added max 3 initialReactions instruction to Round 1 prompt

Maintainability (P2 #026):
- Simulation pointer keys exported from server/_shared/cache-keys.ts
- Both TS handlers import from shared location

Documentation (P2 #027):
- Strengthened runId no-op description in proto and OpenAPI spec

* fix(todos): add blank lines around lists in markdown todo files

* style(api): reformat openapi yaml to match linter output

* test(simulation): add flat-shape filter test + getSimulationOutcome handler coverage

Two tests identified as missing during PR #2220 review:

1. isMaritimeChokeEnergyCandidate flat-shape tests — covers the || candidate.topBucketId
   normalization added in the P1/P2 review pass. The existing tests only used the nested
   marketContext.topBucketId shape; this adds the flat root-field shape that arrives from
   the simulation-package.json JSON (selectedTheaters entries have topBucketId at root).

2. getSimulationOutcome handler structural tests — verifies the isOutcomePointer guard,
   found:false NOT_FOUND return, found:true success path, note population on runId mismatch,
   and redis_unavailable error string. Follows the readSrc static-analysis pattern used
   elsewhere in server-handlers.test.mjs (handler imports Redis so full integration test
   would require a test Redis instance).
2026-03-25 13:55:59 +04:00
Elie Habib
f87c8c71c4 feat(forecast): Phase 2 simulation package read path (#2219)
* feat(forecast): Phase 2 simulation package read path (getSimulationPackage RPC + Redis existence key)

- writeSimulationPackage now writes forecast:simulation-package:latest to Redis after
  successful R2 write, containing { runId, pkgKey, schemaVersion, theaterCount, generatedAt }
  with TTL matching TRACE_REDIS_TTL_SECONDS (60 days)
- New getSimulationPackage RPC handler reads Redis key, returns pointer metadata without
  requiring an R2 fetch (zero R2 cost for existence check)
- Wired into ForecastServiceHandler and server/gateway.ts cache tier (medium)
- Proto: GetSimulationPackage RPC + get_simulation_package.proto message definitions
- api/health.js: simulationPackageLatest added to STANDALONE_KEYS + ON_DEMAND_KEYS
- Tests: SIMULATION_PACKAGE_LATEST_KEY constant + writeSimulationPackage null-guard test

Closes todo #017 (Phase 2 prerequisites for MiroFish integration)

* chore(generated): regenerate proto types for GetSimulationPackage RPC

* fix(simulation-rpc): distinguish Redis failure from not-found; signal runId mismatch

- Add `error` field to GetSimulationPackageResponse: populated with
  "redis_unavailable" on Redis errors so callers can distinguish a
  healthy not-found (found=false, error="") from a Redis failure
  (found=false, error="redis_unavailable"). Adds console.warn on error.
- Add `note` field: populated when req.runId is supplied but does not
  match the latest package's runId, signalling that per-run filtering
  is not yet active (Phase 3).
- Add proto comment on run_id: "Currently ignored; reserved for Phase 3"
- Add milliseconds annotation to generated_at description.
- Simplify handler: extract NOT_FOUND constant, remove SimulationPackagePointer
  interface, remove || '' / || 0 guards on guaranteed-present fields.
- Regenerate all buf-generated files.

Fixes todos #018 (runId silently ignored) and #019 (error indistinguishable
from not-found). Also resolves todos #022 (simplifications) and #023
(OpenAPI required fields / generatedAt unit annotation).

* fix(simulation-rpc): change cache tier from medium to slow (aligns with deep-run update frequency)

* fix(simulation-rpc): fix key prefixing, make Redis errors reachable, no-cache not-found

Three P1 regressions caught in external review:

1. Key prefix bug: getCachedJson() applies preview:<sha>: prefix in non-production
   environments, but writeSimulationPackage writes the raw key via a direct Redis
   command. In preview/dev the RPC always returned found:false even when the package
   existed. Fix: new getRawJson() in redis.ts always uses the unprefixed key AND throws
   on failure instead of swallowing errors.

2. redis_unavailable unreachable: getCachedJson swallows fetch failures and missing-
   credentials by returning null, so the catch block for redis_unavailable was dead
   code. getRawJson() throws on HTTP errors and missing credentials, making the
   error: "redis_unavailable" contract actually reachable.

3. Negative-cache stampede: slow tier caches every 200 GET. A request before any deep
   run wrote a package returned { found:false } which the CDN cached for up to 1h,
   breaking post-run discovery. Fix: markNoCacheResponse() on both not-found and
   error paths so they are served fresh on every request.
2026-03-24 22:45:22 +04:00
Elie Habib
b7e6333877 feat(deep-forecast): Phase 1 simulation package export contract (#2204)
* feat(deep-forecast): Phase 1 simulation package export contract

Add buildSimulationPackageFromDeepSnapshot and writeSimulationPackage to
produce simulation-package.json alongside deep-snapshot.json on every eligible
fast run, completing Phase 1 of the WorldMonitor → MiroFish bridge defined in
docs/internal/wm-mirofish-gap.md.

Phase 1 scope: maritime chokepoint + energy/logistics theaters only.
A candidate qualifies if its routeFacilityKey is a known chokepoint in
CHOKEPOINT_MARKET_REGIONS and the top bucket is energy or freight, or the
commodityKey is an energy commodity.

Package shape (schemaVersion: v1):
- selectedTheaters: top 1–3 qualifying candidates with theater ID, route,
  commodity, bucket, channel, and rankingScore
- simulationRequirement: deterministic template per theater (no LLM, fully
  cacheable), built from label, stateKind, route, commodity, channel, and
  criticalSignalTypes
- structuralWorld: filtered stateUnits, worldSignals, transmission edges,
  market buckets, situationClusters, situationFamilies touching the theater
- entities: extracted from actorRegistry (forecastId overlap), stateUnit
  actors, and evidence table actor entries; classified into 7 entity classes
  (state_actor, military_or_security_actor, regulator_or_central_bank,
  exporter_or_importer, logistics_operator, market_participant,
  media_or_public_bloc); falls back to anchor set if extraction finds nothing
- eventSeeds: headline evidence → live_news, disruption-keyword signal
  evidence → observed_disruption; T+0h timing; relative timing format
- constraints: route_chokepoint_status (hard if criticalSignalLift ≥ 0.25),
  commodity_exposure (always hard), market_admissibility (soft, channel
  routing), known_invalidators (soft, when contradictionScore ≥ 0.10)
- evaluationTargets: deterministic escalation/containment/spillover path
  questions + T+24h/T+48h/T+72h timing markers per theater

Also adds 6 missing chokepoints to CHOKEPOINT_MARKET_REGIONS:
Baltic Sea, Danish Straits, Strait of Gibraltar, Panama Canal,
Lombok Strait, Cape of Good Hope.

writeSimulationPackage fires-and-forgets after writeDeepForecastSnapshot
so it does not add latency to the critical seed path.

17 new unit tests covering: theater filter, package shape, simulationRequirement
content, eventSeeds, constraints (hard/soft), evaluationTargets structure,
entity extraction, key format, and 3-theater cap.

🤖 Generated with Claude Sonnet 4.6 via Claude Code (https://claude.ai/claude-code) + Compound Engineering v2.49.0

Co-Authored-By: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com>

* fix(simulation-package): address P1+P2 code review issues from PR #2204

P1 fixes:
- inferEntityClassFromName: use word-boundary regex to prevent "force"
  substring false positives (e.g. "workforce", "Salesforce")
- buildSimulationPackageEntities: key Map entries by candidateStateId
  instead of dominantRegion to prevent collision across theaters
  sharing the same region
- writeSimulationPackage call site: pass priorWorldState so
  actorRegistry is available to buildSimulationPackageFromDeepSnapshot

P2 fixes:
- buildSimulationRequirementText: apply sanitizeForPrompt to
  theater.label, stateKind, topChannel, and critTypes before
  string interpolation (stored prompt injection risk)
- buildSimulationPackageEventSeeds: apply sanitizeForPrompt to
  entry.text before .slice(0, 200)
- isMaritimeChokeEnergyCandidate: replace new Set() allocation
  per call with Array.includes for 2-element arrays
- buildSimulationPackageEntities: convert allForecastIds to Set
  before actor registry loop (O(n²) → O(n))
- buildSimulationPackageEvaluationTargets: add missing candidate
  guard with console.warn when candidate is undefined for theater
- selectedTheaters map: add label fallback to dominantRegion /
  'unknown theater' to prevent "undefined" in simulationRequirement

Tests: 6 new unit tests covering the word-boundary fix, entity key
collision, injection stripping, and undefined label guard

---------

Co-authored-by: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com>
2026-03-24 20:40:52 +04:00
Elie Habib
226cebf9bc feat(deep-forecast): Phase 2+3 scoring recalibration + autoresearch prompt self-improvement (#2178)
* fix(deep-forecast): lower acceptance threshold 0.60→0.50 to match real score distribution

computeDeepPathAcceptanceScore formula: pathScore×0.55 + quality×0.20 + coherence×0.15
With pathScore≈0.65, quality≈0.30, coherence≈0.55:
  0.358 + 0.060 + 0.083 = 0.50

The 0.60 threshold was calibrated before understanding that reportableQualityScore
is constrained by world-state simulation geometry (not hypothesis quality), and
coherence loses 0.15 for generic candidates without routeFacilityKey. The threshold
was structurally unreachable with typical expanded paths.

Verified end-to-end: deep worker now returns [DeepForecast] completed.

Also updates T6 gateDetails assertion and renames the rejection-floor test to
correctly describe the new behavior (strong inputs should be accepted).

111/111 tests pass.

* feat(deep-forecast): autoresearch prompt self-improvement loop + T9/T10 tests

- Add scoreImpactExpansionQuality() locked scorer: commodity rate (35%),
  variable diversity (35%), chain coverage (20%), mapped rate (10%)
- Add runImpactExpansionPromptRefinement(): rate-limited LLM critic loop
  (30min cooldown) that reads learned section from Redis, scores current
  run, generates critique if composite < 0.62, tests on same candidates,
  commits to forecast:prompt:impact-expansion:learned if score improves
- buildImpactExpansionSystemPrompt() now accepts learnedSection param,
  appends it after core rules with separator so model sees prior examples
- buildImpactExpansionCandidateHash() includes learnedFingerprint to
  bust cache when learned section changes
- processDeepForecastTask reads learnedSection from Redis before LLM
  call, runs refinement after both completed and no_material_change paths
- Export scoreImpactExpansionQuality + runImpactExpansionPromptRefinement
- T9: high commodity rate + chain coverage → composite > 0.70
- T10: no commodity + no chain coverage → composite < 0.40
- 113/113 tests pass

* fix(deep-forecast): raise autoresearch threshold 0.62→0.80 + fix JSON parse

- Threshold 0.62 was too low: commodity=1.00 + chain=1.00 compensated for
  diversity=0.50 (all same chains), keeping composite at 0.775 → no critique
- Raise to 0.80 so diversity<0.70 triggers critique even with good commodity/chain
- Fix JSON parser to extract first {…} block (handles Gemini code-fence wrapping)
- Add per-hypothesis log in refinement breakdown for observability
- Add refinementQualityThreshold to gateDetails for self-documenting artifacts
- Verified: critique fires on diversity=0.50 run, committed Hormuz/Baltic/Suez
  region-specific chain examples (score 0.592→0.650)

* feat(deep-forecast): per-candidate parallel LLM calls replace batch expansion

Previously: all candidates → one batch LLM call → LLM averages context →
identical route_disruption → inflation_pass_through chains for all candidates.

Now: each candidate → its own focused LLM call (parallel Promise.all) →
LLM reasons about specific stateKind/region/routeFacility for that candidate.

Results (3 candidates, 3 parallel calls):
- composite: 0.592 → 0.831 (+0.24)
- commodity: 0.17 → 1.00 (all mapped have specific commodity)
- diversity: 0.50 → 0.83 (energy_export_stress, importer_balance_stress
  appearing alongside route_disruption — genuinely different chains)
- Baseline updated: 0.831 (above 0.80 threshold → no critique needed)

Also threads learnedSection through extractSingleImpactExpansionCandidate
so the learned examples from autoresearch apply to each focused call.
Per-candidate cache keys (already existed) now serve as primary cache.

* fix(tests): update recovery test for per-candidate LLM call flow

- Change stage mock from impact_expansion → impact_expansion_single
  (batch primary path removed, per-candidate is now primary)
- Assert parseMode === per_candidate instead of parseStage /^recovered_/
  (recovered_ prefix was only set by old batch_repair path)
- 2257/2257 tests pass

* fix(deep-forecast): add Red Sea, Persian Gulf, South China Sea to chokepoint map

Candidate packets had routeFacilityKey=none for Red Sea / Persian Gulf /
Baltic Sea signals because prediction titles say "Red Sea maritime disruption"
not "Bab el-Mandeb" or "Strait of Hormuz". CHOKEPOINT_MARKET_REGIONS only
had sub-facility names (Bab el-Mandeb, Suez Canal) as keys, not the sea
regions themselves.

Fix: add Red Sea, Persian Gulf, Arabian Sea, Black Sea, South China Sea,
Mediterranean Sea as direct keys so region-level candidate titles resolve.

Result: LLM user prompt now shows routeFacilityKey=Red Sea / Persian Gulf /
Baltic Sea per candidate — giving each focused call the geographic context
needed to generate route-specific chains.

- Autoresearch baseline updated 0.932→0.965 on this run
- T8 extended with Red Sea, Persian Gulf, South China Sea assertions
- 2257/2257 tests pass

* feat(deep-forecast): free-form hypothesis schema + remove registry constraint

- Bump IMPACT_EXPANSION_REGISTRY_VERSION to v4
- Add hypothesisKey, description, geography, affectedAssets, marketImpact, causalLink fields to normalizeImpactHypothesisDraft (keep legacy fields for backward compat)
- Rewrite buildImpactExpansionSystemPrompt: remove IMPACT_VARIABLE_REGISTRY constraint table, use free-form ImpactHypothesis schema with geographic/commodity specificity rules
- Rewrite evaluateImpactHypothesisRejection: use effective key (hypothesisKey || variableKey) for dedup; legacy registry check only for old cached responses without hypothesisKey
- Update validateImpactHypotheses scoring: add geographyScore, commodityScore, causalLinkScore, assetScore terms; channelCoherence/bucketCoherence only apply to legacy responses
- Update parent-must-be-mapped invariant to use hypothesisKey || variableKey as effective key
- Update mapImpactHypothesesToWorldSignals: use effective key for dedup and sourceKey; prefer description/geography over legacy fields
- Update buildImpactPathsForCandidate: match on hypothesisKey || variableKey for parent lookup
- Update buildImpactPathId: use hypothesisKey || variableKey for hash inputs
- Rewrite scoreImpactExpansionQuality: add geographyRate and assetRate metrics; update composite weights
- Update buildImpactPromptCritiqueSystemPrompt/UserPrompt: use hypothesisKey-based chain format in examples
- Add new fields to buildImpactExpansionBundleFromPaths push calls
- Update T7 test assertion: MUST be the exact hypothesisKey instead of variableKey string

* fix(deep-forecast): update breakdown log to show free-form hypothesis fields

* feat(deep-forecast): add commodityDiversity metric to autoresearch scorer

- commodityDiversity = unique commodities / nCandidates (weight 0.35)
  Penalizes runs where all candidates default to same commodity.
  3 candidates all crude_oil → diversity=0.33 → composite ~0.76 → critique fires.
- Rebalanced composite weights: comDiversity 0.35, geo 0.20, keyDiversity 0.15, chain 0.10, commodityRate 0.10, asset 0.05, mappedRate 0.05
- Breakdown log now shows comDiversity + geo + keyDiversity
- Critique prompt updated: commodity_monoculture failure mode, diagnosis targets commodity homogeneity
- T9: added commodityDiversity=1.0 assertion (2 unique commodities across 2 candidates)

* refactor(deep-forecast): replace commodityDiversity with directCommodityDiversity + directGeoDiversity + candidateSpreadScore

Problem: measuring diversity on all mapped hypotheses misses the case where
one candidate generates 10 implications while others generate 0, or where
all candidates converge on the same commodity due to dominating signals.

Fix: score at the DIRECT hypothesis level (root causes only) and add
a candidate-spread metric:

- directCommodityDiversity: unique commodities among direct hypotheses /
  nCandidates. Measures breadth at the root-cause level. 3 candidates all
  crude_oil → 0.33 → composite ~0.77 → critique fires.

- directGeoDiversity: unique primary geographies among direct hypotheses /
  nCandidates. First segment of compound geography strings (e.g.
  'Red Sea, Suez Canal' → 'red sea') to avoid double-counting.

- candidateSpreadScore: normalized inverse-HHI. 1.0 = perfectly even
  distribution across candidates. One candidate with 10 implications and
  others with 0 → scores near 0 → critique fires.

Weight rationale: comDiversity 0.35, geoDiversity 0.20, spread 0.15,
chain 0.15, comRate 0.08, assetRate 0.04, mappedRate 0.03.

Verified: Run 2 Baltic/Hormuz/Brazil → freight/crude_oil/USD spread=0.98 ✓

* feat(deep-forecast): add convergence object to R2 debug artifact

Surface autoresearch loop outcome per run: converged (bool), finalComposite,
critiqueIterations (0 or 1), refinementCommitted, and perCandidateMappedCount
(candidateStateId → count). After 5+ runs the artifact alone answers whether
the pipeline is improving.

Architectural changes:
- runImpactExpansionPromptRefinement now returns { iterationCount, committed }
  at all exit paths instead of undefined
- Call hoisted before writeForecastTraceArtifacts so the result flows into the
  debug payload via dataForWrite.refinementResult
- buildImpactExpansionDebugPayload assembles convergence from validation +
  refinementResult; exported for direct testing
- Fix: stale diversityScore reference replaced with directCommodityDiversity

Tests: T-conv-1 (converged=true), T-conv-2 (converged=false + iterations=1),
T-conv-3 (perCandidateMappedCount grouping) — 116/116 pass

* fix(deep-forecast): address P1+P2 review issues from convergence observability PR

P1-A: sanitize LLM-returned proposed_addition before Redis write (prompt injection
      guard via sanitizeProposedLlmAddition — strips directive-phrase lines)
P1-B: restore fire-and-forget for runImpactExpansionPromptRefinement; compute
      critiqueIterations from quality score (predicted) instead of awaiting result,
      eliminating 15-30s critical-path latency on poor-quality runs
P1-C: processDeepForecastTask now returns convergence object to callers; add
      convergence_quality_met warn check to evaluateForecastRunArtifacts
P1-D: cap concurrent LLM calls in extractImpactExpansionBundle to 3 (manual
      batching — no p-limit) to respect provider rate limits

P2-1: hash full learnedSection in buildImpactExpansionCandidateHash (was sliced
      to 80 chars, causing cache collisions on long learned sections)
P2-2: add exitReason field to all runImpactExpansionPromptRefinement return paths
P2-3: sanitizeForPrompt strips directive injection phrases; new
      sanitizeProposedLlmAddition applies line-level filtering before Redis write
P2-4: add comment explaining intentional bidirectional affectedAssets/assetsOrSectors
      coalescing in normalizeImpactHypothesisDraft
P2-5: extract makeConvTestData helper in T-conv tests; remove refinementCommitted
      assertions (field removed from convergence shape)
P2-6: convergence_quality_met check added to evaluateForecastRunArtifacts (warn)

🤖 Generated with Claude Sonnet 4.6 via Claude Code (https://claude.ai/claude-code) + Compound Engineering v2.49.0

Co-Authored-By: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com>

* fix(docs): add blank lines around lists in plan (MD032)

* fix(deep-forecast): address P1+P2 reviewer issues in convergence observability

P1-1: mapImpactHypothesesToWorldSignals used free-form marketImpact values
(price_spike, shortage, credit_stress, risk_off) verbatim as signal channel
types, producing unknown types that buildMarketTransmissionGraph cannot
consume. Add IMPACT_SIGNAL_CHANNELS set + resolveImpactChannel() to map
free-form strings to the nearest valid channel before signal materialization.

P1-2: sanitizeForPrompt had directive-phrase stripping added that was too
broad for a function called on headlines, evidence tables, case files, and
geopolitical summaries. Reverted to original safe sanitizer (newline/control
char removal only). Directive stripping remains in sanitizeProposedLlmAddition
where it is scoped to Redis-bound LLM-generated additions only.

P2: Renamed convergence.critiqueIterations to predictedCritiqueIterations to
make clear this is a prediction from the quality score, not a measured count
from actual refinement behavior (refinement is fire-and-forget after artifact
write). Updated T-conv-1/2 test assertions to match.

* feat(deep-forecast): inject live news headlines into evidence table

Wire inputs.newsInsights / inputs.newsDigest through the candidate
selection pipeline so buildImpactExpansionEvidenceTable receives up to
3 commodity-relevant live headlines as 'live_news' evidence entries.

Changes:
- IMPACT_COMMODITY_LEXICON: extend fertilizer pattern (fertiliser,
  nitrogen, phosphate, npk); add food_grains and shipping_freight entries
- filterNewsHeadlinesByState: new pure helper that scores headlines by
  alert status, LNG/energy/route/sanctions signal match, lexicon commodity
  match, and source corroboration count (min score 2 to include)
- buildImpactExpansionEvidenceTable: add newsItems param, inject
  live_news entries, raise cap 8→11
- buildImpactExpansionCandidate: add newsInsights/newsDigest params,
  compute newsItems via filterNewsHeadlinesByState
- selectImpactExpansionCandidates: add newsInsights/newsDigest to options
- Call site: pass inputs.newsInsights/newsDigest at seed time
- Export filterNewsHeadlinesByState, buildImpactExpansionEvidenceTable
- 9 new tests (T-news-1 through T-lex-3): all pass, 125 total pass

🤖 Generated with Claude Sonnet 4.6 (200K context) via Claude Code (https://claude.ai/claude-code) + Compound Engineering v2.49.0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(deep-forecast): remove hardcoded LNG boost from filterNewsHeadlinesByState

The LNG+2 score was commodity-specific and inconsistent with the
intent: headline scoring should be generic, not biased toward any
named commodity. The function already handles the state's detected
commodity dynamically via lexEntry.pattern (IMPACT_COMMODITY_LEXICON).
LNG headlines still score via CRITICAL_NEWS_ENERGY_RE (+1) and
CRITICAL_NEWS_ROUTE_RE (+1) when relevant to the state's region.
All 125 tests pass.

* fix(deep-forecast): address all P1+P2 code review findings from PR #2178

P1 fixes (block-merge):
- Lower third_order mapped floor 0.74→0.70 (max achievable via 0.72 multiplier was 0.72)
- Guard runImpactExpansionPromptRefinement against empty validation (no_mapped exit)
- Replace block-list sanitizeProposedLlmAddition with pattern-based allowlist (HTML/JS/directive takeover)
- Fix TOCTOU on PROMPT_LAST_ATTEMPT_KEY: claim slot before quality check, not after LLM call

P2 fixes:
- Fix learned section overflow: use slice(-MAX) to preserve tail, not discard all prior content
- Add safe_haven_bid and global_crude_spread_stress branches to resolveImpactChannel
- quality_met path now sets rate-limit key (prevents 3 Redis GETs per good run)
- Hoist extractNewsClusterItems outside stateUnit map in selectImpactExpansionCandidates
- Export PROMPT_LEARNED_KEY, PROMPT_BASELINE_KEY, PROMPT_LAST_ATTEMPT_KEY + read/clear helpers

All 125 tests pass.

* fix(todos): add blank lines around lists/headings in todo files (markdownlint)

* fix(todos): fix markdownlint blanks-around-headings/lists in all todo files

---------

Co-authored-by: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com>
2026-03-24 18:52:02 +04:00
Elie Habib
c9d4fc49a5 fix(forecast): fix impact expansion LLM prompt + pre-filter validation diagnostic (#2169)
* fix(forecast): fix impact expansion LLM prompt + add pre-filter validation diagnostic

Problem: LLM generates 8 hypotheses but 6 get rejected post-hoc because the
prompt gave 3 flat unlinked enum lists without per-variable constraints, causing
the LLM to hallucinate plausible but invalid channel/bucket combinations. Only
2 direct-only hypotheses survived, producing no expanded paths ->
completed_no_material_change on every deep run.

Also fixes two related bugs:
- Duplicate canonical state unit labels (same region + stateKind) blocked the
  deep worker. Added label dedup filter in buildCanonicalStateUnits.
- When all hypotheses failed state-ID filtering in materializeImpactExpansion,
  impactExpansionSummary showed 0 hypotheses with no rejection reasons, losing
  all diagnostic data.

Changes:
- Bump IMPACT_EXPANSION_REGISTRY_VERSION v1 -> v2 to invalidate cached LLM
  responses so the model re-generates with new constraint guidance
- Add buildRegistryConstraintTable() that serializes IMPACT_VARIABLE_REGISTRY
  and MARKET_BUCKET_ALLOWED_CHANNELS into a compact constraint block
- Rewrite buildImpactExpansionSystemPrompt() to replace 3 flat enum lists with
  the structured constraint table, bucket-channel dual gate rule, and explicit
  MiroFish causal chain guidance (direct -> second_order -> third_order example)
- Add validation field to all 3 return paths in evaluateDeepForecastPaths so
  pre-state-filter rejection data is always preserved
- Add hypothesisValidation to buildImpactExpansionDebugPayload capturing
  totalHypotheses, validatedCount, mappedCount, rejectionReasonCounts, and
  per-hypothesis rejection identity (candidateIndex, candidateStateId,
  variableKey, channel, targetBucket, order, rejectionReason)
- Filter duplicate state unit labels post-finalization in buildCanonicalStateUnits
- Export buildRegistryConstraintTable, IMPACT_VARIABLE_REGISTRY,
  MARKET_BUCKET_ALLOWED_CHANNELS for testability
- Add 4 new tests: mapped=0 early return has validation, no-expanded-accepted
  path has validation, hypothesisValidation flows through buildForecastTraceArtifacts,
  buildRegistryConstraintTable matches registry

* refactor(forecast): memoize constraint table, clarify dedup filter, tighten test A2

- Extract IMPACT_EXPANSION_REGISTRY_CONSTRAINT_TABLE const so the
  constraint table is built once instead of on every prompt invocation
- Replace IIFE dedup filter in buildCanonicalStateUnits with an explicit
  seenLabels Set for readability
- Add format-assumption comment to buildRegistryConstraintTable
- Rename test A2 and add mapped > 0 assertion to pin it past the
  mapped=0 early-return path

* fix(forecast): disambiguate duplicate state unit labels instead of dropping units

The previous dedup filter silently dropped state units whose labels
collided after finalizeStateUnit(), losing their id, forecastIds,
signals, and deep-candidate eligibility.

Two distinct clusters can score below the merge threshold (< 5.5) while
still resolving to the same formatStateUnitLabel() output when they
share the same leadRegion and stateKind. The filter kept the first
and discarded the rest, suppressing valid deep paths.

Fix: replace the .filter() with a .map() that disambiguates collision
labels using dominantDomain suffix, falling back to the last 4 chars
of the unit id if domain-based disambiguation would itself collide.
The seenLabels Set tracks all assigned labels to prevent any secondary
collision. The snapshot validator (and deep worker) no longer sees
duplicate labels, and no units are dropped.

Also export buildCanonicalStateUnits for direct test coverage.

* feat(deep-forecast): phase 2 scoring recalibration + prompt excellence

Fixes cascade of three scoring gates that caused every deep run to return
completed_no_material_change despite valid LLM hypothesis generation.

Changes:
- Lower second_order validation floors (mapped: 0.66→0.58, internal: 0.58→0.50)
  and raise multiplier (0.85→0.88) so typical LLM quality (strength~0.75,
  conf~0.75, 2 refs, specificityScore=0.2) now reaches mapped status
- Binary evidenceSupport: refs >= 2 → 1, else → 0 (enforces 2-ref requirement
  at scoring layer, not just prompt text; 1-ref hypotheses stay trace_only)
- Parent-must-be-mapped invariant: post-validation pass downgrades mapped
  second_order/third_order whose dependsOnKey has no mapped parent
- Lower pathScore threshold from 0.66 to 0.50 to allow barely-mapped pairs
  through to expanded path generation
- Add 6 missing maritime chokepoints to CHOKEPOINT_MARKET_REGIONS
- Bump IMPACT_EXPANSION_REGISTRY_VERSION v2 → v3 (invalidates stale LLM cache)
- Prompt v3: explicit dependsOnKey pairing, 2-evidence citation rule,
  confidence calibration guidance, direct+second_order pair structure
- Add scoringBreakdown (all hypotheses with scoring factors) and gateDetails
  (active thresholds) to debug artifact for observability feedback loop
- Export buildImpactExpansionSystemPrompt, extractImpactRouteFacilityKey,
  extractImpactCommodityKey for testability
- 8 new tests (T1-T8) covering all phase 2 changes; 111/111 pass
2026-03-24 08:31:41 +04:00
Elie Habib
2e0bc86d81 feat(forecast): add replayable deep forecast lifecycle (#2161)
* feat(forecast): add replayable deep forecast lifecycle

* fix(forecast): serialize replay snapshot market index
2026-03-23 23:59:21 +04:00
Elie Habib
cdc77bcbaa feat(forecast): add queued deep forecast search (#2155)
* feat(forecast): add queued deep forecast search

* test(forecast): cover deep path scoring
2026-03-23 21:36:46 +04:00
Elie Habib
d29fd4e03f fix(forecast): harden reliability recovery pipeline (#2149) 2026-03-23 19:28:26 +04:00
Elie Habib
636ace7b2c feat(forecast): add impact expansion simulation layer (#2138)
* feat(forecast): add impact expansion simulation layer

* fix(forecast): correct impact bucket coherence gate
2026-03-23 15:19:06 +04:00
Elie Habib
00f9ce7c19 fix(forecast): preserve llm narratives on publish refresh (#2134) 2026-03-23 13:50:57 +04:00
Elie Habib
a202b8ebcc feat(consumer-prices): global All view as default, market selector, per-market cache keys (#2128)
* fix(seeders): apply gold standard TTL-extend+retry pattern to Aviation, NOTAM, Cyber, PositiveEvents

* feat(consumer-prices): default to All — global comparison table as landing view

- DEFAULT_MARKET = 'all' so panel opens with the global view
- 🌍 All pill added at front of market bar
- All view fetches all 9 markets in parallel via fetchAllMarketsOverview()
  and renders a comparison table: Market / Index / WoW / Spread / Updated
- Clicking any market row drills into that market's full tab view
- SINGLE_MARKETS exported for use in All-view iteration
- CSS: .cp-global-table and row styles
2026-03-23 10:58:37 +04:00
Elie Habib
166fc58e92 fix(forecast): tighten state coherence and promotion (#2120)
* fix(forecast): tighten state coherence and promotion

* fix(forecast): harden coherence follow-ups
2026-03-23 10:19:17 +04:00
Elie Habib
1058b648a1 feat(forecast): derive market domains from state units (#2116)
* feat(forecast): derive market domains from state units

* fix(forecast): cover state-derived backfill path
2026-03-23 09:26:13 +04:00
Elie Habib
ea991dc7ce fix(forecast): unclog market promotion and state selection (#2082) 2026-03-23 01:19:28 +04:00
Elie Habib
5e8a106999 feat(forecast): extract critical news signals (#2064)
* feat(forecast): extract critical news signals

* fix(forecast): harden critical signal extraction

* feat(forecast): add structured urgent signal extraction

* docs(env): document critical forecast llm overrides
2026-03-22 22:39:00 +04:00
Elie Habib
a24ea45983 feat(forecast): compress situations into state units (#2037) 2026-03-22 10:11:41 +04:00
Elie Habib
7eef3fd9ca feat(forecast): enrich energy transmission signals (#2021) 2026-03-21 23:05:20 +04:00
Elie Habib
3b762492fe feat(forecast): deepen market transmission simulation (#1996) 2026-03-21 17:16:31 +04:00
Elie Habib
41591e33a9 feat(forecast): add macro market signals (#1980) 2026-03-21 12:26:52 +04:00
Elie Habib
5b987ea434 feat(forecast): drive simulation from market state (#1976) 2026-03-21 11:09:04 +04:00
Elie Habib
3670716daa feat(forecast): add market transmission state (#1971) 2026-03-21 09:48:38 +04:00
Elie Habib
8d86607d21 feat(forecast): drive selection from causal memory (#1958) 2026-03-21 01:20:42 +04:00
Elie Habib
f56f11a596 feat(forecast): add simulation memory replay state (#1945) 2026-03-20 20:37:11 +04:00
Elie Habib
8e8db1b40f feat(forecast): calibrate interaction effect promotion (#1936) 2026-03-20 18:49:58 +04:00
Elie Habib
070248b792 fix(forecast): guarantee military forecast inclusion in publish selection pool (#1917)
Military detector forecasts (ADS-B flight tracking + theater posture API)
structurally score near zero on readiness metrics that require LLM-enriched
caseFile content (supporting evidence, news headlines, calibration, triggers).
This causes them to rank below the target count threshold every run despite
a valid elevated posture signal.

Add a domain guarantee post-pass after the 3 selection loops: if no military
forecast was selected and we have room below MAX_TARGET_PUBLISHED_FORECASTS,
inject the highest-scoring eligible military forecast. This does not displace
any already-selected forecast and respects all existing family/situation caps.

Diagnosis: Baltic theater at postureLevel='elevated' with 6 active flights
generates a military forecast (prob=0.41, confidence=0.30, score=0.136) but
gets buried behind 15+ well-grounded situation cluster forecasts at score 0.4+.

Tests: 3 new assertions in 'military domain guarantee in publish selection'.
2026-03-20 14:04:08 +04:00
Elie Habib
01366fcc00 fix(forecast): block generic-actor cross-theater interactions + raise enrichment budget (#1916)
* fix(forecast): block generic-actor cross-theater interactions + raise enrichment budget

Root cause: actor registry uses name:category as key (e.g. "Incumbent leadership:state"),
causing unrelated situations (Israel conflict, Taiwan political) to share the same actor
ID and fire sharedActor=true in pushInteraction. This propagated into the reportable
ledger and surfaced as junk effects like Israel→Taiwan at 80% confidence.

Two-pronged fix:

1. Specificity gate in pushInteraction: sharedActor now requires avgSpecificity >= 0.75.
   Generic blueprint actors ("Incumbent leadership" ~0.68, "Civil protection authorities"
   ~0.73) no longer qualify as structural cross-situation links. Named domain-specific
   actors ("Threat actors:adversarial" ~0.95) continue to qualify.

2. MACRO_REGION_MAP + isCrossTheaterPair + gate in buildCrossSituationEffects: for
   cross-theater pairs (different macro-regions) with non-exempt channels, requires
   sharedActor=true AND avgActorSpecificity >= 0.90. Exempt channels: cyber_disruption,
   market_repricing (legitimately global). Same-macro-region pairs (Brazil/Mexico both
   AMERICAS) are unaffected.

Verified against live run 1773983083084-bu6b1f:
  BLOCKED: Israel→Taiwan (MENA/EAST_ASIA, spec 0.68)
  BLOCKED: Israel→US political (MENA/AMERICAS, spec 0.68)
  BLOCKED: Cuba→Iran (AMERICAS/MENA, spec 0.73)
  BLOCKED: Brazil→Israel (AMERICAS/MENA, spec 0.85 < 0.90)
  ALLOWED: China→US cyber_disruption (exempt channel)
  ALLOWED: Brazil→Mexico (same AMERICAS)

Also raises ENRICHMENT_COMBINED_MAX from 3 to 5 (total budget 6→8),
targeting enrichedRate improvement from ~38% to ~60%.

* fix(plans): fix markdown lint errors in forecast semantic quality plan

* fix(plans): fix remaining markdown lint error in plan file
2026-03-20 13:26:58 +04:00
Elie Habib
46cd3728d6 fix(forecast): tighten reportable effect quality (#1902)
* fix(forecast): tighten reportable effect quality

* fix(forecast): preserve structural political carryover

* chore(forecast): document effect grouping heuristics
2026-03-20 00:44:21 +04:00
Elie Habib
8768d10b7f fix(forecast): tighten interaction semantics (#1896)
* fix(forecast): tighten interaction semantics

* fix(forecast): narrow maritime family inference

* fix(forecast): keep full reportable interaction graph
2026-03-19 23:34:46 +04:00
Elie Habib
e434769e37 feat(forecast): add simulation action ledger (#1891)
* feat(forecast): add simulation action ledger

* fix(forecast): preserve directional interaction effects
2026-03-19 21:01:47 +04:00
Elie Habib
486f5f799f fix(forecast): tighten family effect credibility (#1880)
* fix(forecast): tighten family effect credibility

* fix(forecast): respect domain effect thresholds
2026-03-19 18:24:40 +04:00
Elie Habib
08cc2723cc fix(forecast): wire per-situation simulation into per-forecast worldState (#1879)
buildForecastTraceArtifacts was building worldState after tracedPredictions,
so simulation data was never available to buildForecastTraceRecord. Each
forecast's caseFile.worldState had situationId/familyId/simulationSummary
all undefined, making the 3-round MiroFish simulation invisible at the
forecast level.

Fix:
- Compute worldState before tracing (so simulationState is ready)
- Build forecastId → situationSimulation lookup from worldState.simulationState
- Pass lookup into buildForecastTraceRecord; inject situationId, familyId,
  familyLabel, simulationSummary, simulationPosture, simulationPostureScore
  into caseFile.worldState for each matched forecast
- Add regression assertion to forecast-trace-export tests

All 194 forecast tests pass.
2026-03-19 17:19:49 +04:00
Elie Habib
2deccac691 fix(forecast): allocate publish output by family (#1868)
* fix(forecast): allocate publish output by family

* fix(forecast): backfill deferred family selections
2026-03-19 11:42:12 +04:00