eliott/worldmonitor - worldmonitor - lab48

eliott/worldmonitor

mirror of https://github.com/koala73/worldmonitor.git synced 2026-04-25 17:14:57 +02:00

Author	SHA1	Message	Date
Elie Habib	8278c8e34e	fix(forecasts): unwrap seed-contract envelope in canonical-key sim patcher (#3348 ) * fix(forecasts): unwrap seed-contract envelope in canonical-key sim patcher Production bug observed 2026-04-23 across both forecast worker services (seed-forecasts-simulation + seed-forecasts-deep): every successful run logs `[SimulationDecorations] Cannot patch canonical key — predictions missing or not an array` and silently fails to write simulation adjustments back to forecast:predictions:v2. Root cause: PR #3097 (seed-contract envelope dual-write) wraps canonical seed writes in `{_seed: {...}, data: {predictions: [...]}}` via runSeed. The Lua patcher (_SIM_PATCH_LUA) and its JS test-path mirror both read `payload.predictions` directly with no envelope unwrap, so they always return 'MISSING' against the new shape — meeting the documented pattern in the project's worldmonitor-seed-envelope-consumer-drift learning (91 producers enveloped, private-helper consumers not migrated). User-visible impact: ForecastPanel renders simulation-adjusted scores only when a fast-path seed has touched a forecast since the bug landed; deep-forecast and simulation re-scores never reach the canonical feed. Fix: - _SIM_PATCH_LUA detects envelope shape (`type(payload._seed) == 'table' and type(payload.data) == 'table'`), reads `inner.predictions`, and re-encodes preserving the wrapper so envelope shape persists across patches. Legacy bare values still pass through unchanged. - JS test path mirrors the same unwrap/rewrap. - New test WD-20b locks the regression: enveloped store fixture, asserts `_seed` wrapper preserved on write + inner predictions patched. Also resolves the per-run `[seed-contract] forecast:predictions missing fields: sourceVersion — required in PR 3` warning by passing `sourceVersion: 'detectors+llm-pipeline'` to runSeed (PR 3 of the seed-contract migration will start enforcing this; cheap to fix now). Verified: typecheck (both tsconfigs) clean; lint 0 errors; test:data 6631/6631 green (forecast suite 309/309 incl new WD-20b); edge-functions 176/176 green; markdown + version-check clean. * fix(forecasts): tighten JS envelope guard to match Lua's strict table check PR #3348 review (P2): JS test path used `!!published._seed` (any truthy value) while the Lua script requires `type(payload._seed) == 'table'` (strict object check). Asymmetry: a fixture with `_seed: true`, `_seed: 1`, or `_seed: 'string'` would be treated as enveloped by JS and bare by Lua — meaning the JS test mirror could silently miss real Lua regressions that bisect on fixture shape, defeating the purpose of having a parity test path. Tighten JS to require both `_seed` and `data` be plain objects (rejecting truthy non-objects + arrays), matching Lua's `type() == 'table'` semantics exactly. New test WD-20c locks the parity: fixture with non-table `_seed` (string) + bare-shape `predictions` → must succeed via bare path, identical to what Lua would do. Verified: 6632/6632 tests pass; new WD-20c green.	2026-04-23 20:38:11 +04:00
Elie Habib	2fddee6b05	feat(simulation): add keyActorRoles to fix actor overlap bonus vocabulary mismatch (#2582 ) * feat(simulation): add keyActorRoles field to fix actor overlap bonus vocabulary mismatch The +0.04 actor overlap bonus never reliably fired in production because stateSummary.actors uses role-category strings ('Commodity traders', 'Policy officials') while simulation keyActors uses named geo-political entities ('Iran', 'Houthi'). 53 production runs audited showed the bonus fired once out of 53. Fix: add keyActorRoles?: string[] to SimulationTopPath. The Round 2 prompt now includes a CANDIDATE ACTOR ROLES section with theater-local role vocab seeded from candidatePacket.stateSummary.actors. The LLM copies matching roles into keyActorRoles. applySimulationMerge scores overlap against keyActorRoles when actorSource=stateSummary, preserving the existing keyActors entity-overlap path for the affectedAssets fallback. - buildSimulationPackageFromDeepSnapshot: add actorRoles[] to each theater from candidate.stateSummary.actors (theater-scoped, no cross-theater noise) - buildSimulationRound2SystemPrompt: inject CANDIDATE ACTOR ROLES section with exact-copy instruction and keyActorRoles in JSON template - tryParseSimulationRoundPayload: extract keyActorRoles from round 2 output - mergedPaths.map(): filter keyActorRoles against theater.actorRoles guardrail - computeSimulationAdjustment: dual-path overlap — roleOverlapCount for stateSummary, keyActorsOverlapCount for affectedAssets (backwards compat) - summarizeImpactPathScore: project roleOverlapCount + keyActorsOverlapCount into path-scorecards.json simDetail New fields: roleOverlapCount, keyActorsOverlapCount in SimulationAdjustmentDetail and ScorecardSimDetail. actorOverlapCount preserved as backwards-compat alias. Tests: 308 pass (was 301 before). New tests T-P1/T-P2/T-P3 (prompt/parser), T-RO1/T-RO2/T-RO3 (role overlap logic), T-PKG1 (pkg builder actorRoles), plus fixture updates for T2/T-F/T-G/T-J/T-K/T-N2/T-SC-4. 🤖 Generated with Claude Sonnet 4.6 via Claude Code (https://claude.ai/claude-code) + Compound Engineering v2.49.0 Co-Authored-By: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com> * fix(simulation): address CE review findings from PR #2582 - Add SimulationPackageTheater interface to seed-forecasts.types.d.ts (actorRoles was untyped under @ts-check) - Add keyActorRoles to uiTheaters Redis projection in writeSimulationOutcome (field was stripped from Redis snapshot; only visible in R2 artifact) - Extract keyActorRoles IIFE to named sanitizeKeyActorRoles() function; hoist allowedRoles Set computation out of per-path loop - Harden bonusOverlap ternary: explicit branch for actorSource='none' prevents silent fallthrough if new actorSource values are added - Eliminate roleOverlap intermediate array in computeSimulationAdjustment - Add U+2028/U+2029 Unicode line-separator stripping to sanitizeForPrompt - Apply sanitizeForPrompt at tryParseSimulationRoundPayload parse boundary; add JSDoc to newly-exported function All 308 tests pass, typecheck + typecheck:api clean. * fix(sim): restore const sanitized in sanitizeKeyActorRoles after early-return guard Prior edit added `if (!allowedRoles.length) return []` but accidentally removed the `const sanitized = ...` line, leaving the filter on line below referencing an undefined variable. Restores the full function body: if (!allowedRoles.length) return []; const sanitized = (Array.isArray(rawRoles) ? rawRoles : []) .map((s) => sanitizeForPrompt(String(s)).slice(0, 80)); const allowedNorm = new Set(allowedRoles.map(normalizeActorName)); return sanitized.filter((s) => allowedNorm.has(normalizeActorName(s))).slice(0, 8); 308/308 tests pass. --------- Co-authored-by: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com>	2026-04-01 08:53:13 +04:00
Elie Habib	cc914e927b	feat(scorecard): add simDetail to path-scorecards.json for simulation audit (#2558 ) Each path-scorecards.json entry now includes a simDetail object containing the full SimulationAdjustmentDetail fields: bucketChannelMatch, actorOverlapCount, candidateActorCount, actorSource, resolvedChannel, channelSource, invalidatorHit, stabilizerHit. Previously, details from computeSimulationAdjustment were only stored in the in-memory adjustments[] array and never reached the R2 artifact. This meant production runs could not be retroactively audited to confirm whether the +0.04 actor-overlap bonus fired, which channel matched, or what actor source was used. Changes: - applySimulationMerge: attach details to path.simulationAdjustmentDetail - summarizeImpactPathScore: project simDetail from simulationAdjustmentDetail - seed-forecasts.types.d.ts: add simulationAdjustmentDetail to ExpandedPath, add ScorecardSimDetail interface - Tests T-SC-1 through T-SC-4: 301/301 passing	2026-03-31 00:52:14 +04:00
Elie Habib	303b059ba8	feat(forecast): plumb simulation decorations from ExpandedPath to Forecast Redis key (#2528 ) * feat(forecast): plumb simulation decorations from ExpandedPath to Forecast Redis key - Add `simulation_adjustment`, `sim_path_confidence`, `demoted_by_simulation` fields to `Forecast` proto (fields 20-22) and regenerate service client - `writeSimulationDecorations`: after simulation rescore, builds candidateStateId → forecastIds map from snapshot.fullRunStateUnits, picks strongest signal per candidate, writes `forecast:sim-decorations:v1` to Redis (3-day TTL) - `applySimulationDecorationsToForecasts`: reads decorations at start of fast-path seed, mutates predictions in-place (stale-by-one-run, non-fatal on failure) - `buildPublishedForecastPayload`: passes sim fields through to published Redis payload - Add `__setRedisStoreForTests` / test-store bypass in `getRedisCredentials` for unit tests - 12 new WD-* tests covering early-exits, strongest-signal selection, candidate conflict resolution, demotion flag, non-fatal error handling, and full round-trip * fix(forecast): always overwrite simulation decorations to prevent 3-day staleness Without this fix, writeSimulationDecorations() returned early on empty adjustments or zero decorationCount, leaving the prior run's forecast:sim-decorations:v1 intact for the full 3-day TTL. Subsequent fast-path seeds would blindly reapply old flags. Fixes: - Split the `!adjustments?.length` guard: only skip on missing simulationEvidence (bogus data); write an empty byForecastId map when adjustments is [] so later runs always overwrite stale entries - Remove the `decorationCount === 0` early return for the same reason - Add SIMULATION_DECORATIONS_MAX_AGE_MS (48h) guard in applySimulationDecorationsToForecasts as defense-in-depth for the edge case where no simulation has run recently Tests: WD-1, WD-3 updated; WD-2b, WD-13, WD-14 added (287 total, all pass) * fix(forecast): call writeSimulationDecorations from inline deep-forecast path processDeepForecastTask called applySimulationMerge but only extracted simulationEvidence, discarding mergeResult without writing decorations. Forecasts processed through the inline deep path (the common case) never got their forecast:sim-decorations:v1 key populated. Fix: fire-and-forget writeSimulationDecorations(mergeResult, snapshot) immediately after applySimulationMerge in processDeepForecastTask, same pattern as applyPostSimulationRescore. The deep path now writes on the SAME run (no stale-by-one lag). Test WD-15: exercises applySimulationMerge → writeSimulationDecorations with snapshot.fullRunStateUnits, verifying both mapped forecast IDs are decorated via the full chain used by processDeepForecastTask (288 total). * fix(forecast): patch forecast:predictions:v2 immediately after sim decorations are written writeSimulationDecorations only wrote the side key (forecast:sim-decorations:v1). forecast:predictions:v2 is published by runSeed() before deep/rescore paths run, so same-run simulation evidence never reached the canonical key — panel consumers saw prior-run data until the next fast seed re-applied decorations. Fix: add patchPublishedForecastsWithSimDecorations() called immediately after the side key write in writeSimulationDecorations(). Reads forecast:predictions:v2, updates sim fields for matched forecasts, resets to 0 for unmatched (clearing any stale values from prior runs), writes back non-fatal. Tests WD-16 and WD-17 assert on forecast:predictions:v2 directly — not just the side key — proving the canonical key is updated on the same run (290 total). * fix(forecast): await writeSimulationDecorations and applyPostSimulationRescore to prevent process.exit abandonment Two fire-and-forget chains caused patchPublishedForecastsWithSimDecorations to silently fail when runSimulationWorker(once=true) returned and process.exit fired: 1. processNextSimulationTask fired-and-forgot applyPostSimulationRescore then returned immediately, so the rescore and its canonical-key patch never ran. 2. applyPostSimulationRescore fired-and-forgot writeSimulationDecorations and returned on the no_path_changes branch before the Redis writes completed. Both call sites are now awaited. The writeSimulationDecorations call in processDeepForecastTask is also changed to await for consistency -- the deep path was already safe (heavy artifact writing follows) but the comment said "fire-and-forget" which contradicted the intent. * fix(forecast): write seed-meta:forecast:sim-decorations:v1 for health monitoring (AGENTS.md requirement) * fix(forecast): guard patchPublishedForecastsWithSimDecorations against cross-run stamping Blind read-modify-write on forecast:predictions:v2 risked an older deep-forecast or simulation worker overwriting a newer fast-path seed's payload (or zeroing out newer sim fields) when workers finish out of order. Fix: patchPublishedForecastsWithSimDecorations now accepts runGeneratedAt and skips the patch when published.generatedAt > runGeneratedAt — i.e. the canonical key already belongs to a newer run. writeSimulationDecorations passes snapshot.generatedAt as the run identity token (same value baked into the canonical key by buildPublishedSeedPayload for the matching run). Add WD-18 (mismatch skips) and WD-19 (same-run patches) to cover overlapping-run scenarios that the prior tests did not exercise. * fix(forecast): atomic write-if-newer for sim-decorations side key; store run origin timestamp Two P1 issues closed: 1. generatedAt: Date.now() on the side key made a late older worker's write look fresh to applySimulationDecorationsToForecasts(), which measures staleness from write time. Changed to snapshot?.generatedAt so freshness reflects the originating run's age, not the wall-clock write timestamp. 2. No run-order guard on forecast:sim-decorations:v1 write. A late older worker could overwrite a newer run's decorations, poisoning the source that the next fast seed reads. Added redisAtomicWriteSimDecorations (Lua EVAL, symmetric with the canonical key fix): atomically reads existing generatedAt, skips write if existing is from a newer run. writeSimulationDecorations now returns early on side key skip, also bypassing the canonical patch (which has its own Lua guard regardless). New tests: WD-21 (end-to-end skip of both side key + canonical), WD-22 (stored generatedAt equals snapshot.generatedAt), WD-23/WD-24 (atomic helper unit cases). 297 tests passing.	2026-03-30 08:14:17 +04:00
Elie Habib	5bfd7bc152	feat(simulation): confidence-weighted adjustments + simulationSignal trace lane (#2515 ) * feat(simulation): confidence-weighted adjustments + simulationSignal trace lane Scale +0.08 and +0.04 simulation bonuses by simPathConfidence (missing/zero falls back to 1.0 so old artifacts are not penalized). Negative adjustments (-0.12/-0.15) remain flat — they are structural, not sim-confidence-dependent. Attach a compact simulationSignal object (backed, adjustmentDelta, channelSource, demoted, simPathConfidence) to each ExpandedPath when adjustment != 0. Written to R2 trace artifacts. ForecastPanel UI chip deferred to follow-up PR (requires proto field + buf generate + prediction-to-path plumbing). Add simPathConfidence to SimulationAdjustmentDetail for observability. Add T-N1..T-N8 (confidence weighting) and T-O1..T-O4 (simulationSignal) tests. 🤖 Generated with Claude Sonnet 4.6 via Claude Code (https://claude.ai/claude-code) + Compound Engineering v2.49.0 Co-Authored-By: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com> * fix(simulation): correct zero-confidence handling + wire simulationSignal to scorecard summaries P1: explicit confidence=0 from LLM now correctly yields simConf=0 (no positive bonus) instead of falling back to 1.0. Absent/non-finite confidence still uses 1.0 fallback (conservative — old LLM artifacts without the field are not penalized). The previous rawConf > 0 guard conflated "absent" and "explicitly unsupported" paths. P2: summarizeImpactPathScore now forwards simulationSignal into scorecard summaries, so path-scorecards.json and impact-expansion-debug.json include the new lane alongside simulationAdjustment and mergedAcceptanceScore. Only forecast-eval.json had it before. Also exports summarizeImpactPathScore for direct unit testing. Tests: T-N5 corrected (explicit 0 → no bonus), T-N5b (zero conf + invalidator still fires flat -0.12), T-O5 (summarizeImpactPathScore includes simulationSignal), T-O6 (omits field when absent). 270 passing. 🤖 Generated with Claude Sonnet 4.6 via Claude Code (https://claude.ai/claude-code) + Compound Engineering v2.49.0 Co-Authored-By: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com> * fix(simulation): clear stale sim fields before re-merge + fix simPathConfidence comment P1 (blocking): applySimulationMerge now deletes simulationAdjustment, mergedAcceptanceScore, simulationSignal, demotedBySimulation, and promotedBySimulation from every expanded path before running computeSimulationAdjustment. This fires before any early-continue (no theater match, no candidate packet, zero confidence), so reloaded forecast-eval.json paths from applyPostSimulationRescore never retain stale simulation metadata from a prior cycle. P3: corrected simPathConfidence JSDoc in SimulationAdjustmentDetail and SimulationSignal — absent/non-finite → 1.0 fallback, explicit 0 preserved as 0 (was: "missing/null/zero fall back to 1.0"). Tests: T-O7 (zero-confidence match clears stale fields), T-O8 (no-theater match clears stale fields). 272 passing. 🤖 Generated with Claude Sonnet 4.6 via Claude Code (https://claude.ai/claude-code) + Compound Engineering v2.49.0 Co-Authored-By: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com> * fix(types): correct SimulationSignal.backed JSDoc — only true for positive adjustments --------- Co-authored-by: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com>	2026-03-29 22:04:06 +04:00
Elie Habib	80918e357a	fix(simulation): actor overlap bonus finally fires -- source from stateSummary.actors not affectedAssets (#2477 ) * fix(simulation): source actor overlap from candidatePacket.stateSummary.actors (#2459) The +0.04 actor overlap bonus in computeSimulationAdjustment has never fired in production because extractPathActors read hop.affectedAssets (financial instruments: "TTF gas futures", "European utility stocks") while sim keyActors are geo/political actors ("Houthi", "Iran", "Saudi Arabia") -- disjoint namespaces, actorOverlapCount=0 on every run. Fix: source actors from candidatePacket.stateSummary.actors (grounded stateUnit.actors, same domain as simulation keyActors). Strict precedence with no union: if stateSummary.actors raw list is non-empty, use only that; else fall back to affectedAssets for backward compat. Precedence is on raw list presence, not post-normalization length, so entries like ['---'] use stateSummary path without falling through. Also fixes normalizeActorName to handle underscore-separated LLM output ("Saudi_Arabia") and entity ID prefix stripping ("state-abc:iran"). Pattern /^[a-z][a-z0-9_-]$/ strips lowercase ID prefixes but rejects uppercase ("US", "EU") and natural language ("New York") to prevent false strips. Both sides of the comparison go through the same function. Guards Array.isArray on both stateSummary.actors and keyActors to prevent crashes on non-array truthy values from malformed LLM snapshots. Retires extractPathActors (single call site, now inlined). Adds candidateActorCount and actorSource to SimulationAdjustmentDetail for observability. Exports normalizeActorName for test coverage. Tests: N-1/2/3/4 (normalizeActorName), T-F/G/H/I/K/L/L-pre/M (computeSimulationAdjustment), T-J (applySimulationMerge end-to-end). fix(types): narrow channelSource to 'direct' \| 'market' \| 'none' union	2026-03-29 12:30:58 +04:00
Elie Habib	99d564061e	feat(simulation): post-simulation re-score to fix stale-sim pipeline sequencing (#2435 ) * feat(simulation): post-simulation re-score fixes stale-sim pipeline sequencing The deep forecast worker runs before the current run's simulation completes, so applySimulationMerge always received a prior-run simulation (isCurrentRun=false). Prior-run stateIds don't match current-run paths → adjustments=[], bucketChannelMatch never fires, near-threshold paths stay rejected. Fix (Option A): fire-and-forget re-score after writeSimulationOutcome completes. - buildForecastEvalPayload(): stores full selectedPaths/rejectedPaths (with direct/ second/third hypothesis data) in forecast-eval.json so re-score can rebuild the impact expansion bundle for promoted paths - writeForecastTraceArtifacts(): writes forecast-eval.json alongside path-scorecards - applyPostSimulationRescore(): loads forecast-eval + snapshot from R2, reconstructs evaluation, calls applySimulationMerge with isCurrentRun=true, re-writes trace artifacts if any path was promoted or demoted - processNextSimulationTask(): fire-and-forget call to applyPostSimulationRescore after writeSimulationOutcome + completeSimulationTask Guard: only re-scores when status='completed_no_material_change' OR a rejected expanded path is in the 0.42–0.50 acceptanceScore band (within reach of +0.08 bucketChannelMatch bonus). Skips silently otherwise. Tests: R-1 through R-5 covering skip conditions, no-change case, promotion of a near-threshold rejected path, and marketContext fallback for invalid LLM channel keys. * fix(simulation): correct post-rescore guard and generatedAt derivation (P1+P2) P1: Replace hasNearThreshold (checked rejected paths for promotion — no-op for `completed` status since applySimulationMerge has no completed+promotion case) with hasDemotionRisk (checks selected paths with acceptanceScore < 0.62, where a -0.12 invalidator could push below the 0.50 acceptance threshold). P2: Use freshOutcome.generatedAt as primary timestamp instead of parseForecastRunGeneratedAt(runId). RunId carries the run-start timestamp; R2 artifact paths use data.generatedAt at write time. UTC midnight crossings cause a key mismatch, silently returning no_eval_data. Fallback to runId parse only when freshOutcome.generatedAt is absent. * fix(simulation): add CASE 3 for partial demotion + skip duplicate pointer on rescore applySimulationMerge CASE 3: handles completed run where some (not all) expanded paths are demoted or promoted. Previously only CASE 1 (completed_no_material_change → completed) and CASE 2 (completed → completed_no_material_change) existed. With the hasDemotionRisk guard in applyPostSimulationRescore, CASE 3 is now reachable: selected expanded paths are non-empty after adjustment, status stays 'completed', but evaluation.selectedPaths was never updated. Fixes the silent no-op: demoted path stayed in selectedPaths in the published evaluation. Bundle and worldState are rebuilt from the surviving expanded paths. writeForecastTraceArtifacts: add context.skipPointer to suppress writeForecastTracePointer on rescore re-writes. Rescore overwrites the same R2 artifact keys — the LPUSH was consuming a history slot and evicting an older unique run earlier than intended. readForecastWorldStateHistory dedupes by worldStateKey so no functional double-read, but the extra slot consumed is still waste. Pass skipPointer: true from rescore call site. Add R-6: partial demotion in completed run reproduces the two-path case (0.55 and 0.75 selected, invalidator fires on 0.55) and asserts selectedPaths is updated. * fix(simulation): add hasPromotionOpportunity guard for completed-run rescore hasDemotionRisk only checked selected paths, leaving completed runs with near-threshold rejected paths (0.42-0.50) incorrectly skipped as no_actionable_paths. applySimulationMerge CASE 3 now correctly handles completed+promotion, but the gate never let it through. Add hasPromotionOpportunity: check rejected expanded paths with acceptanceScore in [0.42, 0.50) — a +0.08 bucketChannelMatch bonus can push them over threshold. The outer guard now passes if either hasDemotionRisk or hasPromotionOpportunity is true. Also drop the evalData.status === 'completed' guard from both checks: the outer condition (status !== 'completed_no_material_change') already handles routing; the inner checks don't need to repeat it. Add R-7: completed eval with safe selected (0.75) + near-threshold rejected (0.44), simulation matches rejected path → +0.08 → 0.52 → promoted via CASE 3, status stays completed, both paths in selectedPaths. * fix(simulation): correct actionable-path guard thresholds to match max adjustments The guard hardcoded 0.42/0.62 based on the +0.08 (bucketChannelMatch-only) and -0.12 (invalidator) adjustments. computeSimulationAdjustment can reach +0.12 with actor overlap (>= 2 actors: +0.08 + 0.04) and -0.15 with a stabilizer hit. Correct bounds derived from actual max adjustments: max positive: +0.12 → promotion window lower bound: 0.50 - 0.12 = 0.38 (was 0.42) max negative: -0.15 → demotion risk upper bound: 0.50 + 0.15 = 0.65 (was 0.62) Production effects of old bounds: - Rejected path at 0.39 with two matching actors (+0.12 → 0.51) was skipped - Selected path at 0.64 with a matching stabilizer (-0.15 → 0.49) was skipped Add R-8: path at 0.39 + two-actor overlap (+0.12 total) → 0.51 → promoted. Add R-9: path at 0.64 + stabilizer (-0.15) → 0.49 → demoted via CASE 3.	2026-03-28 21:21:13 +04:00
Elie Habib	fa3bbe27ef	fix(simulation): split MENA geo group so Hormuz and Red Sea theaters coexist (#2428 ) THEATER_GEO_GROUPS previously mapped both 'Middle East' and 'Red Sea' to 'MENA', causing buildSimulationPackageFromDeepSnapshot to silently drop one when both were candidates. With an active Iran/Hormuz conflict, Red Sea's higher rankingScore could evict the Hormuz theater entirely. Split into 'MENA_Gulf' (Middle East, Persian Gulf) and 'MENA_RedSea' (Red Sea) so both theaters are independently eligible for simulation selection. Tests: updated geo-dedup test to assert 3 selected theaters (Hormuz + Red Sea + Malacca), added within-group dedup test (Red Sea vs Suez Canal still compete).	2026-03-28 16:06:42 +04:00
Elie Habib	d55bf302d1	fix(simulation): bucketChannelMatch never fires due to two independent bugs (#2422 ) Bug 1 (shape): pathBucket fallback read root candidatePacket.topBucketId which is always undefined for new-schema packets — the grounded bucket lives at candidatePacket.marketContext.topBucketId. Fixed by adding the nested path before the legacy root fallback. Bug 2 (semantic): path.direct.channel carries LLM marketImpact enum values (supply_disruption, price_spike, risk_off, etc.) that are disjoint from CHANNEL_KEYWORDS keys — only 3 of 11 are exact matches. The grounded fallback marketContext.topChannel was never reached because the LLM value is always truthy. Fixed with tiered Object.hasOwn() validation: use direct channel if it is a known key, else fall back to marketContext.topChannel (nested) or topChannel (legacy flat), else return '' to prevent greedy literal substring matching. Also adds resolvedChannel and channelSource to adjustment details for runtime observability in debug artifacts. 5 new tests cover: valid direct key, invalid→nested market fallback, invalid→flat legacy fallback, invalid+empty→no match, and the production case where both bucket and channel are absent from direct and must resolve from marketContext.	2026-03-28 15:11:52 +04:00
Elie Habib	2430fa8500	fix(simulation): expand CHANNEL_KEYWORDS bridge terms for geopolitical scenario language (#2419 ) risk_off_rotation: add sovereign risk, shockwave, contagion, spiral, crisis energy_supply_shock: add oil infrastructure, supply disruption, oil price spike, crude price shipping_cost_shock: add rerouting, vessel, shipping lane, maritime LLM simulation output uses geopolitical scenario text that previously missed financial keyword matching. Bridge terms connect scenario labels to channels. Tests: 3 new matchesChannel bridge keyword tests (227 pass)	2026-03-28 15:04:34 +04:00
Elie Habib	4eb1d292eb	fix(simulation): add TypeScript types for simulation pipeline + remove dead legacy fallbacks (#2414 ) - Add scripts/seed-forecasts.types.d.ts with ambient interfaces for all simulation data structures (CandidatePacket, TheaterResult, SimulationOutcome, ExpandedPath, SimulationEvidence, SimulationAdjustmentDetail) - Add // @ts-check + @param JSDoc to contradictsPremise, negatesDisruption, computeSimulationAdjustment, applySimulationMerge - Add scripts/jsconfig.json to enable tsc --checkJs on seed-forecasts.mjs - Remove dead legacy fallbacks candidatePacket?.topBucketId and candidatePacket?.topChannel (these fields were never at top level in production; the correct path is candidatePacket.marketContext.topBucketId). TypeScript now enforces this at write time. - Update all test fixtures to use marketContext: { topBucketId, topChannel } shape, matching production CandidatePacket structure	2026-03-28 13:05:18 +04:00
Elie Habib	e6a85f0a36	fix(simulation): normalize commodityKey underscores and expand CHANNEL_KEYWORDS (#2410 ) Two matching bugs causing adjustment=0 even after #2402+#2404: 1. commodityKey underscore: code checked text.includes('crude_oil') but LLM-generated text always uses natural language ('crude oil'). Fixed by .replace(/_/g, ' ') in both contradictsPremise and negatesDisruption. 2. CHANNEL_KEYWORDS too narrow: risk_off_rotation only matched 4 exact phrases ('risk off', 'risk aversion', 'flight to safety', 'sell off'). Simulation paths use broader language ('capital flight', 'risk premium', 'sell-off', 'retreat', etc.). Expanded to cover LLM output vocabulary. Also expanded security_escalation to include 'military', 'geopolit'. Update test: negatesDisruption commodity test now uses natural language ('crude oil') matching what LLM actually generates, not the internal key.	2026-03-28 12:01:35 +04:00
Elie Habib	8689d68c19	fix(simulation): read topBucketId/topChannel from marketContext in computeSimulationAdjustment (#2404 )	2026-03-28 09:57:20 +04:00
Elie Habib	6af900e3b3	fix(simulation): skip NEGATION_TERMS guard for simulation-curated invalidators (#2402 )	2026-03-28 09:46:28 +04:00
Elie Habib	a99eb5a023	fix(simulation): store candidateStateId in theaterResults to enable Phase 3 merge (#2374 ) * fix(simulation): store candidateStateId in theaterResults to enable Phase 3 merge applySimulationMerge looks up paths by candidateStateId but theaterResults only stored theaterId ("theater-1"). The map lookup always returned undefined, silently no-oping all simulationAdjustment writes. Fix: write candidateStateId alongside theaterId in theaterResults, and key the simByTheater map by candidateStateId with theaterId as fallback. * fix(e2e): add missing Earthquake fields to map-harness test fixture * Revert "fix(e2e): add missing Earthquake fields to map-harness test fixture" This reverts commit `69d8930b88`. * test(simulation): add T13 to cover positional theaterId vs candidateStateId mismatch	2026-03-27 21:01:29 +04:00
Elie Habib	8f74288b01	feat(simulation): Phase 3 outcome re-ingestion with simulationAdjustment scoring (#2336 ) * feat(simulation): Phase 3 outcome re-ingestion with simulationAdjustment scoring Parse simulation-outcome.json back into the deep forecast evaluation pipeline as a simulationEvidence lane. Simulation informs the forecast without replacing structural validation or observed world signals. Changes: - computeSimulationAdjustment: +0.08 bucket/channel match, +0.04 actor overlap, -0.12 invalidator contradiction, -0.15 stabilizer negation (spec lines 447-478) - applySimulationMerge: post-evaluation pass that demotes/promotes expanded paths using mergedAcceptanceScore = clamp01(acceptanceScore + simulationAdjustment) - fetchSimulationOutcomeForMerge: reads R2 full outcome; accepts current run or previous run under 6h old; gracefully returns null when unavailable - processDeepForecastTask: wires in simulation merge between evaluateDeepForecastPaths and writeForecastTraceArtifacts; wrapped in try/catch for non-blocking degradation - buildImpactExpansionDebugPayload: adds simulationEvidence field to debug artifacts - summarizeImpactPathScore: adds simulationAdjustment + mergedAcceptanceScore fields - 24 new tests (203 total, 0 failures) covering all adjustment rules, demotion, promotion, matching helpers, and debug payload field * feat(simulation): replace maritime type gate with significance-based eligibility (#2342) * feat(simulation): replace maritime type gate with significance-based eligibility isMaritimeChokeEnergyCandidate was a hardcoded filter requiring a route in CHOKEPOINT_MARKET_REGIONS plus an energy/freight bucket. Any theater driven by rate hikes, sovereign stress, political instability, volcanic events, or infrastructure failure was silently dropped before simulation. Replace with isSimulationEligible: rankingScore >= 0.40 + hasBucket + hasCandidateStateId. The ranking formula already encodes signal lift, transmission strength, and specificity — theater type is irrelevant. Also generalize buildSimulationRequirementText with stateKind-branched prose templates (market_repricing, political_instability, security_ escalation, infrastructure_fragility, cyber_pressure, fallback) and extend buildSimulationPackageConstraints with two new constraint classes: macro_financial_posture (soft) and structural_event_premise (hard:true) to prevent MiroFish hallucinating maritime artifacts for non-maritime theaters. Maritime simulations are fully backward-compatible: maritime_disruption stateKind still produces identical requirement text and constraints. Tests: 213 pass (8 new eligibility tests, 5 requirement-text tests, 4 constraint tests replacing 7 old maritime-specific tests). 🤖 Generated with Claude Sonnet 4.6 via Claude Code (https://claude.ai/claude-code) + Compound Engineering v2.40.0 Co-Authored-By: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com> * test(simulation): fill coverage gaps in theater-agnostic eligibility tests Three untested paths identified post-review: T-R6: cyber_pressure branch in buildSimulationRequirementText was unexercised. Tests "systems availability" and "financial network continuity" appear. T-R7: governance_pressure was aliased to the political_instability template but had no dedicated test. Confirms same prose as T-R3. T-C5: political_instability + sovereign_risk bucket stacked-constraint case. sovereign_risk IS in MACRO_FIN_BUCKETS, so this theater generates both macro_financial_posture (soft) AND structural_event_premise (hard). T-C2 only asserted the latter; T-C5 explicitly verifies both are present and have the correct hard/soft values. Also confirmed: the "critical bug" theory (rankingScore absent on theater objects) is not a bug — buildSimulationPackageFromDeepSnapshot copies rankingScore onto each theater object at L12593. 2388 tests pass. * fix(simulation): theater-agnostic evaluation targets and symmetric demotion Two P1 issues from PR review: 1. buildSimulationPackageEvaluationTargets still had maritime-specific questions hard-coded for all theater types: "disruption at ${route}", "freight rate delta on affected trade lanes", "energy price direction ($/bbl or %)". For market_repricing/political/cyber theaters this gave MiroFish contradictory instructions — constraints said no chokepoints, eval targets asked it to model route disruptions. Fix: extract buildEvalTargetQuestions() branching on stateKind (7 branches: maritime_disruption, market_repricing, political_instability, governance_pressure, security_escalation, infrastructure_fragility, cyber_pressure, fallback). Maritime branch is identical to original. Also generalize the T+24h timing marker description and the round-1 prompt market_cascade instruction. 2. contradictsPremise() and negatesDisruption() only matched on routeFacilityKey and commodityKey. For the newly eligible theater types (blank route/commodity), invalidators and stabilizers could never fire the -0.12 / -0.15 demotion. New theaters were promotable but not demotable — asymmetric scoring. Fix: when route+commodity are both blank, fall back to subject keywords derived from stateKind and topBucketId (split on _, keep >= 4 chars). Maritime theater behavior is unchanged (early exit on first branch). Tests: 2393 pass (+5 new: contradictsPremise non-maritime match, negatesDisruption non-maritime match and non-match, evalTargets market_ repricing has no maritime framing, negatesDisruption description fix). --------- Co-authored-by: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com>	2026-03-27 10:02:09 +04:00
Elie Habib	13685ad354	fix(simulation): key evaluationTargets and constraints by theaterId (#2332 ) Both buildSimulationPackageEvaluationTargets and buildSimulationPackageConstraints returned flat arrays. The prompt builders consumed them as pkg.evaluationTargets?.[theater.theaterId] — object lookup on an array always returns undefined, falling back to the generic string on every run. Fix: both builders now return Record<theaterId, ...>. Prompt builders updated to properly render constraint class/statement and evalTarget pathType/question. Result: evaluation questions (escalation/containment/market_cascade) and WorldMonitor structural constraints now reach the LLM on every simulation run.	2026-03-27 01:51:58 +04:00
Elie Habib	b0af1ad84f	feat(simulation): geographic theater diversity + market_cascade economic paths (#2264 ) * feat(simulation): geographic theater diversity + market_cascade economic paths Three improvements to the MiroFish simulation pipeline: 1. Geographic deduplication: adds THEATER_GEO_GROUPS constant mapping CHOKEPOINT_MARKET_REGIONS values to macro-groups (MENA, AsiaPacific, EastEurope, etc.). buildSimulationPackageFromDeepSnapshot now skips candidates whose macro-group is already represented, preventing Red Sea + Middle East (both MENA) from appearing as separate theaters. 2. Label cleanup: strips trailing (stateKind) parenthetical from theater labels before writing to selectedTheaters, so "Black Sea maritime disruption state (supply_chain)" becomes "Black Sea maritime disruption state" in the UI. 3. market_cascade path: renames spillover → market_cascade across 4 sites (evaluationTargets, Round 1 prompt + JSON template, Round 2 prompt + JSON template, tryParseSimulationRoundPayload expectedIds). The market_cascade path instructs the LLM to model 2nd/3rd order economic consequences: energy price direction ($/bbl), freight rate delta, downstream sector impacts, and FX stress on import-dependent economies. Tests: 176 pass (3 net new — geo-dedup, label cleanup, market_cascade prompt; plus updated entity-collision and path-validation tests). 🤖 Generated with Claude Sonnet 4.6 via Claude Code (https://claude.ai/claude-code) + Compound Engineering v2.49.0 Co-Authored-By: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com> * docs: fix markdownlint MD032 in simulation diversity plan --------- Co-authored-by: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com>	2026-03-26 08:31:19 +04:00
Elie Habib	01f6057389	feat(simulation): MiroFish Phase 2 — theater-limited simulation runner (#2220 ) * feat(simulation): MiroFish Phase 2 — theater-limited simulation runner Adds the simulation execution layer that consumes simulation-package.json and produces simulation-outcome.json for maritime chokepoint + energy/logistics theaters, closing the WorldMonitor → MiroFish handoff loop. Changes: - scripts/seed-forecasts.mjs: 2-round LLM simulation runner (prompt builders, JSON extractor, runTheaterSimulation, writeSimulationOutcome, task queue with NX dedup lock, runSimulationWorker poll loop) - scripts/process-simulation-tasks.mjs: standalone worker entry point - proto: GetSimulationOutcome RPC + make generate - server/worldmonitor/forecast/v1/get-simulation-outcome.ts: RPC handler - server/gateway.ts: slow tier for get-simulation-outcome - api/health.js: simulationOutcomeLatest in STANDALONE + ON_DEMAND keys - tests: 14 new tests for simulation runner functions * fix(simulation): address P1/P2 code review findings from PR #2220 Security (P1 #018): - sanitizeForPrompt() applied to all entity/seed fields interpolated into Round 1 prompt (entityId, class, stance, seedId, type, timing) - sanitizeForPrompt() applied to actorId and entityIds in Round 2 prompt - sanitizeForPrompt() + length caps applied to all LLM array fields written to R2 (dominantReactions, stabilizers, invalidators, keyActors, timingMarkers) Validation (P1 #019): - Added validateRunId() regex guard - Applied in enqueueSimulationTask() and processNextSimulationTask() loop Type safety (P1 #020): - Added isOutcomePointer() and isPackagePointer() type guards in TS handlers - Replaced unsafe as-casts with runtime-validated guards in both handlers Correctness (P2 #022): - Log warning when pkgPointer.runId does not match task runId Architecture (P2 #024): - isMaritimeChokeEnergyCandidate() accepts both flat and nested topBucketId - Call site simplified to pass theater directly Performance (P2 #025): - SIMULATION_ROUND1_MAX_TOKENS raised 1800 to 2200 - Added max 3 initialReactions instruction to Round 1 prompt Maintainability (P2 #026): - Simulation pointer keys exported from server/_shared/cache-keys.ts - Both TS handlers import from shared location Documentation (P2 #027): - Strengthened runId no-op description in proto and OpenAPI spec * fix(todos): add blank lines around lists in markdown todo files * style(api): reformat openapi yaml to match linter output * test(simulation): add flat-shape filter test + getSimulationOutcome handler coverage Two tests identified as missing during PR #2220 review: 1. isMaritimeChokeEnergyCandidate flat-shape tests — covers the \|\| candidate.topBucketId normalization added in the P1/P2 review pass. The existing tests only used the nested marketContext.topBucketId shape; this adds the flat root-field shape that arrives from the simulation-package.json JSON (selectedTheaters entries have topBucketId at root). 2. getSimulationOutcome handler structural tests — verifies the isOutcomePointer guard, found:false NOT_FOUND return, found:true success path, note population on runId mismatch, and redis_unavailable error string. Follows the readSrc static-analysis pattern used elsewhere in server-handlers.test.mjs (handler imports Redis so full integration test would require a test Redis instance).	2026-03-25 13:55:59 +04:00
Elie Habib	f87c8c71c4	feat(forecast): Phase 2 simulation package read path (#2219 ) * feat(forecast): Phase 2 simulation package read path (getSimulationPackage RPC + Redis existence key) - writeSimulationPackage now writes forecast:simulation-package:latest to Redis after successful R2 write, containing { runId, pkgKey, schemaVersion, theaterCount, generatedAt } with TTL matching TRACE_REDIS_TTL_SECONDS (60 days) - New getSimulationPackage RPC handler reads Redis key, returns pointer metadata without requiring an R2 fetch (zero R2 cost for existence check) - Wired into ForecastServiceHandler and server/gateway.ts cache tier (medium) - Proto: GetSimulationPackage RPC + get_simulation_package.proto message definitions - api/health.js: simulationPackageLatest added to STANDALONE_KEYS + ON_DEMAND_KEYS - Tests: SIMULATION_PACKAGE_LATEST_KEY constant + writeSimulationPackage null-guard test Closes todo #017 (Phase 2 prerequisites for MiroFish integration) * chore(generated): regenerate proto types for GetSimulationPackage RPC * fix(simulation-rpc): distinguish Redis failure from not-found; signal runId mismatch - Add `error` field to GetSimulationPackageResponse: populated with "redis_unavailable" on Redis errors so callers can distinguish a healthy not-found (found=false, error="") from a Redis failure (found=false, error="redis_unavailable"). Adds console.warn on error. - Add `note` field: populated when req.runId is supplied but does not match the latest package's runId, signalling that per-run filtering is not yet active (Phase 3). - Add proto comment on run_id: "Currently ignored; reserved for Phase 3" - Add milliseconds annotation to generated_at description. - Simplify handler: extract NOT_FOUND constant, remove SimulationPackagePointer interface, remove \|\| '' / \|\| 0 guards on guaranteed-present fields. - Regenerate all buf-generated files. Fixes todos #018 (runId silently ignored) and #019 (error indistinguishable from not-found). Also resolves todos #022 (simplifications) and #023 (OpenAPI required fields / generatedAt unit annotation). * fix(simulation-rpc): change cache tier from medium to slow (aligns with deep-run update frequency) * fix(simulation-rpc): fix key prefixing, make Redis errors reachable, no-cache not-found Three P1 regressions caught in external review: 1. Key prefix bug: getCachedJson() applies preview:<sha>: prefix in non-production environments, but writeSimulationPackage writes the raw key via a direct Redis command. In preview/dev the RPC always returned found:false even when the package existed. Fix: new getRawJson() in redis.ts always uses the unprefixed key AND throws on failure instead of swallowing errors. 2. redis_unavailable unreachable: getCachedJson swallows fetch failures and missing- credentials by returning null, so the catch block for redis_unavailable was dead code. getRawJson() throws on HTTP errors and missing credentials, making the error: "redis_unavailable" contract actually reachable. 3. Negative-cache stampede: slow tier caches every 200 GET. A request before any deep run wrote a package returned { found:false } which the CDN cached for up to 1h, breaking post-run discovery. Fix: markNoCacheResponse() on both not-found and error paths so they are served fresh on every request.	2026-03-24 22:45:22 +04:00
Elie Habib	b7e6333877	feat(deep-forecast): Phase 1 simulation package export contract (#2204 ) * feat(deep-forecast): Phase 1 simulation package export contract Add buildSimulationPackageFromDeepSnapshot and writeSimulationPackage to produce simulation-package.json alongside deep-snapshot.json on every eligible fast run, completing Phase 1 of the WorldMonitor → MiroFish bridge defined in docs/internal/wm-mirofish-gap.md. Phase 1 scope: maritime chokepoint + energy/logistics theaters only. A candidate qualifies if its routeFacilityKey is a known chokepoint in CHOKEPOINT_MARKET_REGIONS and the top bucket is energy or freight, or the commodityKey is an energy commodity. Package shape (schemaVersion: v1): - selectedTheaters: top 1–3 qualifying candidates with theater ID, route, commodity, bucket, channel, and rankingScore - simulationRequirement: deterministic template per theater (no LLM, fully cacheable), built from label, stateKind, route, commodity, channel, and criticalSignalTypes - structuralWorld: filtered stateUnits, worldSignals, transmission edges, market buckets, situationClusters, situationFamilies touching the theater - entities: extracted from actorRegistry (forecastId overlap), stateUnit actors, and evidence table actor entries; classified into 7 entity classes (state_actor, military_or_security_actor, regulator_or_central_bank, exporter_or_importer, logistics_operator, market_participant, media_or_public_bloc); falls back to anchor set if extraction finds nothing - eventSeeds: headline evidence → live_news, disruption-keyword signal evidence → observed_disruption; T+0h timing; relative timing format - constraints: route_chokepoint_status (hard if criticalSignalLift ≥ 0.25), commodity_exposure (always hard), market_admissibility (soft, channel routing), known_invalidators (soft, when contradictionScore ≥ 0.10) - evaluationTargets: deterministic escalation/containment/spillover path questions + T+24h/T+48h/T+72h timing markers per theater Also adds 6 missing chokepoints to CHOKEPOINT_MARKET_REGIONS: Baltic Sea, Danish Straits, Strait of Gibraltar, Panama Canal, Lombok Strait, Cape of Good Hope. writeSimulationPackage fires-and-forgets after writeDeepForecastSnapshot so it does not add latency to the critical seed path. 17 new unit tests covering: theater filter, package shape, simulationRequirement content, eventSeeds, constraints (hard/soft), evaluationTargets structure, entity extraction, key format, and 3-theater cap. 🤖 Generated with Claude Sonnet 4.6 via Claude Code (https://claude.ai/claude-code) + Compound Engineering v2.49.0 Co-Authored-By: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com> * fix(simulation-package): address P1+P2 code review issues from PR #2204 P1 fixes: - inferEntityClassFromName: use word-boundary regex to prevent "force" substring false positives (e.g. "workforce", "Salesforce") - buildSimulationPackageEntities: key Map entries by candidateStateId instead of dominantRegion to prevent collision across theaters sharing the same region - writeSimulationPackage call site: pass priorWorldState so actorRegistry is available to buildSimulationPackageFromDeepSnapshot P2 fixes: - buildSimulationRequirementText: apply sanitizeForPrompt to theater.label, stateKind, topChannel, and critTypes before string interpolation (stored prompt injection risk) - buildSimulationPackageEventSeeds: apply sanitizeForPrompt to entry.text before .slice(0, 200) - isMaritimeChokeEnergyCandidate: replace new Set() allocation per call with Array.includes for 2-element arrays - buildSimulationPackageEntities: convert allForecastIds to Set before actor registry loop (O(n²) → O(n)) - buildSimulationPackageEvaluationTargets: add missing candidate guard with console.warn when candidate is undefined for theater - selectedTheaters map: add label fallback to dominantRegion / 'unknown theater' to prevent "undefined" in simulationRequirement Tests: 6 new unit tests covering the word-boundary fix, entity key collision, injection stripping, and undefined label guard --------- Co-authored-by: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com>	2026-03-24 20:40:52 +04:00
Elie Habib	226cebf9bc	feat(deep-forecast): Phase 2+3 scoring recalibration + autoresearch prompt self-improvement (#2178 ) * fix(deep-forecast): lower acceptance threshold 0.60→0.50 to match real score distribution computeDeepPathAcceptanceScore formula: pathScore×0.55 + quality×0.20 + coherence×0.15 With pathScore≈0.65, quality≈0.30, coherence≈0.55: 0.358 + 0.060 + 0.083 = 0.50 The 0.60 threshold was calibrated before understanding that reportableQualityScore is constrained by world-state simulation geometry (not hypothesis quality), and coherence loses 0.15 for generic candidates without routeFacilityKey. The threshold was structurally unreachable with typical expanded paths. Verified end-to-end: deep worker now returns [DeepForecast] completed. Also updates T6 gateDetails assertion and renames the rejection-floor test to correctly describe the new behavior (strong inputs should be accepted). 111/111 tests pass. * feat(deep-forecast): autoresearch prompt self-improvement loop + T9/T10 tests - Add scoreImpactExpansionQuality() locked scorer: commodity rate (35%), variable diversity (35%), chain coverage (20%), mapped rate (10%) - Add runImpactExpansionPromptRefinement(): rate-limited LLM critic loop (30min cooldown) that reads learned section from Redis, scores current run, generates critique if composite < 0.62, tests on same candidates, commits to forecast:prompt:impact-expansion:learned if score improves - buildImpactExpansionSystemPrompt() now accepts learnedSection param, appends it after core rules with separator so model sees prior examples - buildImpactExpansionCandidateHash() includes learnedFingerprint to bust cache when learned section changes - processDeepForecastTask reads learnedSection from Redis before LLM call, runs refinement after both completed and no_material_change paths - Export scoreImpactExpansionQuality + runImpactExpansionPromptRefinement - T9: high commodity rate + chain coverage → composite > 0.70 - T10: no commodity + no chain coverage → composite < 0.40 - 113/113 tests pass * fix(deep-forecast): raise autoresearch threshold 0.62→0.80 + fix JSON parse - Threshold 0.62 was too low: commodity=1.00 + chain=1.00 compensated for diversity=0.50 (all same chains), keeping composite at 0.775 → no critique - Raise to 0.80 so diversity<0.70 triggers critique even with good commodity/chain - Fix JSON parser to extract first {…} block (handles Gemini code-fence wrapping) - Add per-hypothesis log in refinement breakdown for observability - Add refinementQualityThreshold to gateDetails for self-documenting artifacts - Verified: critique fires on diversity=0.50 run, committed Hormuz/Baltic/Suez region-specific chain examples (score 0.592→0.650) * feat(deep-forecast): per-candidate parallel LLM calls replace batch expansion Previously: all candidates → one batch LLM call → LLM averages context → identical route_disruption → inflation_pass_through chains for all candidates. Now: each candidate → its own focused LLM call (parallel Promise.all) → LLM reasons about specific stateKind/region/routeFacility for that candidate. Results (3 candidates, 3 parallel calls): - composite: 0.592 → 0.831 (+0.24) - commodity: 0.17 → 1.00 (all mapped have specific commodity) - diversity: 0.50 → 0.83 (energy_export_stress, importer_balance_stress appearing alongside route_disruption — genuinely different chains) - Baseline updated: 0.831 (above 0.80 threshold → no critique needed) Also threads learnedSection through extractSingleImpactExpansionCandidate so the learned examples from autoresearch apply to each focused call. Per-candidate cache keys (already existed) now serve as primary cache. * fix(tests): update recovery test for per-candidate LLM call flow - Change stage mock from impact_expansion → impact_expansion_single (batch primary path removed, per-candidate is now primary) - Assert parseMode === per_candidate instead of parseStage /^recovered_/ (recovered_ prefix was only set by old batch_repair path) - 2257/2257 tests pass * fix(deep-forecast): add Red Sea, Persian Gulf, South China Sea to chokepoint map Candidate packets had routeFacilityKey=none for Red Sea / Persian Gulf / Baltic Sea signals because prediction titles say "Red Sea maritime disruption" not "Bab el-Mandeb" or "Strait of Hormuz". CHOKEPOINT_MARKET_REGIONS only had sub-facility names (Bab el-Mandeb, Suez Canal) as keys, not the sea regions themselves. Fix: add Red Sea, Persian Gulf, Arabian Sea, Black Sea, South China Sea, Mediterranean Sea as direct keys so region-level candidate titles resolve. Result: LLM user prompt now shows routeFacilityKey=Red Sea / Persian Gulf / Baltic Sea per candidate — giving each focused call the geographic context needed to generate route-specific chains. - Autoresearch baseline updated 0.932→0.965 on this run - T8 extended with Red Sea, Persian Gulf, South China Sea assertions - 2257/2257 tests pass * feat(deep-forecast): free-form hypothesis schema + remove registry constraint - Bump IMPACT_EXPANSION_REGISTRY_VERSION to v4 - Add hypothesisKey, description, geography, affectedAssets, marketImpact, causalLink fields to normalizeImpactHypothesisDraft (keep legacy fields for backward compat) - Rewrite buildImpactExpansionSystemPrompt: remove IMPACT_VARIABLE_REGISTRY constraint table, use free-form ImpactHypothesis schema with geographic/commodity specificity rules - Rewrite evaluateImpactHypothesisRejection: use effective key (hypothesisKey \|\| variableKey) for dedup; legacy registry check only for old cached responses without hypothesisKey - Update validateImpactHypotheses scoring: add geographyScore, commodityScore, causalLinkScore, assetScore terms; channelCoherence/bucketCoherence only apply to legacy responses - Update parent-must-be-mapped invariant to use hypothesisKey \|\| variableKey as effective key - Update mapImpactHypothesesToWorldSignals: use effective key for dedup and sourceKey; prefer description/geography over legacy fields - Update buildImpactPathsForCandidate: match on hypothesisKey \|\| variableKey for parent lookup - Update buildImpactPathId: use hypothesisKey \|\| variableKey for hash inputs - Rewrite scoreImpactExpansionQuality: add geographyRate and assetRate metrics; update composite weights - Update buildImpactPromptCritiqueSystemPrompt/UserPrompt: use hypothesisKey-based chain format in examples - Add new fields to buildImpactExpansionBundleFromPaths push calls - Update T7 test assertion: MUST be the exact hypothesisKey instead of variableKey string * fix(deep-forecast): update breakdown log to show free-form hypothesis fields * feat(deep-forecast): add commodityDiversity metric to autoresearch scorer - commodityDiversity = unique commodities / nCandidates (weight 0.35) Penalizes runs where all candidates default to same commodity. 3 candidates all crude_oil → diversity=0.33 → composite ~0.76 → critique fires. - Rebalanced composite weights: comDiversity 0.35, geo 0.20, keyDiversity 0.15, chain 0.10, commodityRate 0.10, asset 0.05, mappedRate 0.05 - Breakdown log now shows comDiversity + geo + keyDiversity - Critique prompt updated: commodity_monoculture failure mode, diagnosis targets commodity homogeneity - T9: added commodityDiversity=1.0 assertion (2 unique commodities across 2 candidates) * refactor(deep-forecast): replace commodityDiversity with directCommodityDiversity + directGeoDiversity + candidateSpreadScore Problem: measuring diversity on all mapped hypotheses misses the case where one candidate generates 10 implications while others generate 0, or where all candidates converge on the same commodity due to dominating signals. Fix: score at the DIRECT hypothesis level (root causes only) and add a candidate-spread metric: - directCommodityDiversity: unique commodities among direct hypotheses / nCandidates. Measures breadth at the root-cause level. 3 candidates all crude_oil → 0.33 → composite ~0.77 → critique fires. - directGeoDiversity: unique primary geographies among direct hypotheses / nCandidates. First segment of compound geography strings (e.g. 'Red Sea, Suez Canal' → 'red sea') to avoid double-counting. - candidateSpreadScore: normalized inverse-HHI. 1.0 = perfectly even distribution across candidates. One candidate with 10 implications and others with 0 → scores near 0 → critique fires. Weight rationale: comDiversity 0.35, geoDiversity 0.20, spread 0.15, chain 0.15, comRate 0.08, assetRate 0.04, mappedRate 0.03. Verified: Run 2 Baltic/Hormuz/Brazil → freight/crude_oil/USD spread=0.98 ✓ * feat(deep-forecast): add convergence object to R2 debug artifact Surface autoresearch loop outcome per run: converged (bool), finalComposite, critiqueIterations (0 or 1), refinementCommitted, and perCandidateMappedCount (candidateStateId → count). After 5+ runs the artifact alone answers whether the pipeline is improving. Architectural changes: - runImpactExpansionPromptRefinement now returns { iterationCount, committed } at all exit paths instead of undefined - Call hoisted before writeForecastTraceArtifacts so the result flows into the debug payload via dataForWrite.refinementResult - buildImpactExpansionDebugPayload assembles convergence from validation + refinementResult; exported for direct testing - Fix: stale diversityScore reference replaced with directCommodityDiversity Tests: T-conv-1 (converged=true), T-conv-2 (converged=false + iterations=1), T-conv-3 (perCandidateMappedCount grouping) — 116/116 pass * fix(deep-forecast): address P1+P2 review issues from convergence observability PR P1-A: sanitize LLM-returned proposed_addition before Redis write (prompt injection guard via sanitizeProposedLlmAddition — strips directive-phrase lines) P1-B: restore fire-and-forget for runImpactExpansionPromptRefinement; compute critiqueIterations from quality score (predicted) instead of awaiting result, eliminating 15-30s critical-path latency on poor-quality runs P1-C: processDeepForecastTask now returns convergence object to callers; add convergence_quality_met warn check to evaluateForecastRunArtifacts P1-D: cap concurrent LLM calls in extractImpactExpansionBundle to 3 (manual batching — no p-limit) to respect provider rate limits P2-1: hash full learnedSection in buildImpactExpansionCandidateHash (was sliced to 80 chars, causing cache collisions on long learned sections) P2-2: add exitReason field to all runImpactExpansionPromptRefinement return paths P2-3: sanitizeForPrompt strips directive injection phrases; new sanitizeProposedLlmAddition applies line-level filtering before Redis write P2-4: add comment explaining intentional bidirectional affectedAssets/assetsOrSectors coalescing in normalizeImpactHypothesisDraft P2-5: extract makeConvTestData helper in T-conv tests; remove refinementCommitted assertions (field removed from convergence shape) P2-6: convergence_quality_met check added to evaluateForecastRunArtifacts (warn) 🤖 Generated with Claude Sonnet 4.6 via Claude Code (https://claude.ai/claude-code) + Compound Engineering v2.49.0 Co-Authored-By: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com> * fix(docs): add blank lines around lists in plan (MD032) * fix(deep-forecast): address P1+P2 reviewer issues in convergence observability P1-1: mapImpactHypothesesToWorldSignals used free-form marketImpact values (price_spike, shortage, credit_stress, risk_off) verbatim as signal channel types, producing unknown types that buildMarketTransmissionGraph cannot consume. Add IMPACT_SIGNAL_CHANNELS set + resolveImpactChannel() to map free-form strings to the nearest valid channel before signal materialization. P1-2: sanitizeForPrompt had directive-phrase stripping added that was too broad for a function called on headlines, evidence tables, case files, and geopolitical summaries. Reverted to original safe sanitizer (newline/control char removal only). Directive stripping remains in sanitizeProposedLlmAddition where it is scoped to Redis-bound LLM-generated additions only. P2: Renamed convergence.critiqueIterations to predictedCritiqueIterations to make clear this is a prediction from the quality score, not a measured count from actual refinement behavior (refinement is fire-and-forget after artifact write). Updated T-conv-1/2 test assertions to match. * feat(deep-forecast): inject live news headlines into evidence table Wire inputs.newsInsights / inputs.newsDigest through the candidate selection pipeline so buildImpactExpansionEvidenceTable receives up to 3 commodity-relevant live headlines as 'live_news' evidence entries. Changes: - IMPACT_COMMODITY_LEXICON: extend fertilizer pattern (fertiliser, nitrogen, phosphate, npk); add food_grains and shipping_freight entries - filterNewsHeadlinesByState: new pure helper that scores headlines by alert status, LNG/energy/route/sanctions signal match, lexicon commodity match, and source corroboration count (min score 2 to include) - buildImpactExpansionEvidenceTable: add newsItems param, inject live_news entries, raise cap 8→11 - buildImpactExpansionCandidate: add newsInsights/newsDigest params, compute newsItems via filterNewsHeadlinesByState - selectImpactExpansionCandidates: add newsInsights/newsDigest to options - Call site: pass inputs.newsInsights/newsDigest at seed time - Export filterNewsHeadlinesByState, buildImpactExpansionEvidenceTable - 9 new tests (T-news-1 through T-lex-3): all pass, 125 total pass 🤖 Generated with Claude Sonnet 4.6 (200K context) via Claude Code (https://claude.ai/claude-code) + Compound Engineering v2.49.0 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(deep-forecast): remove hardcoded LNG boost from filterNewsHeadlinesByState The LNG+2 score was commodity-specific and inconsistent with the intent: headline scoring should be generic, not biased toward any named commodity. The function already handles the state's detected commodity dynamically via lexEntry.pattern (IMPACT_COMMODITY_LEXICON). LNG headlines still score via CRITICAL_NEWS_ENERGY_RE (+1) and CRITICAL_NEWS_ROUTE_RE (+1) when relevant to the state's region. All 125 tests pass. * fix(deep-forecast): address all P1+P2 code review findings from PR #2178 P1 fixes (block-merge): - Lower third_order mapped floor 0.74→0.70 (max achievable via 0.72 multiplier was 0.72) - Guard runImpactExpansionPromptRefinement against empty validation (no_mapped exit) - Replace block-list sanitizeProposedLlmAddition with pattern-based allowlist (HTML/JS/directive takeover) - Fix TOCTOU on PROMPT_LAST_ATTEMPT_KEY: claim slot before quality check, not after LLM call P2 fixes: - Fix learned section overflow: use slice(-MAX) to preserve tail, not discard all prior content - Add safe_haven_bid and global_crude_spread_stress branches to resolveImpactChannel - quality_met path now sets rate-limit key (prevents 3 Redis GETs per good run) - Hoist extractNewsClusterItems outside stateUnit map in selectImpactExpansionCandidates - Export PROMPT_LEARNED_KEY, PROMPT_BASELINE_KEY, PROMPT_LAST_ATTEMPT_KEY + read/clear helpers All 125 tests pass. * fix(todos): add blank lines around lists/headings in todo files (markdownlint) * fix(todos): fix markdownlint blanks-around-headings/lists in all todo files --------- Co-authored-by: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com>	2026-03-24 18:52:02 +04:00
Elie Habib	c9d4fc49a5	fix(forecast): fix impact expansion LLM prompt + pre-filter validation diagnostic (#2169 ) * fix(forecast): fix impact expansion LLM prompt + add pre-filter validation diagnostic Problem: LLM generates 8 hypotheses but 6 get rejected post-hoc because the prompt gave 3 flat unlinked enum lists without per-variable constraints, causing the LLM to hallucinate plausible but invalid channel/bucket combinations. Only 2 direct-only hypotheses survived, producing no expanded paths -> completed_no_material_change on every deep run. Also fixes two related bugs: - Duplicate canonical state unit labels (same region + stateKind) blocked the deep worker. Added label dedup filter in buildCanonicalStateUnits. - When all hypotheses failed state-ID filtering in materializeImpactExpansion, impactExpansionSummary showed 0 hypotheses with no rejection reasons, losing all diagnostic data. Changes: - Bump IMPACT_EXPANSION_REGISTRY_VERSION v1 -> v2 to invalidate cached LLM responses so the model re-generates with new constraint guidance - Add buildRegistryConstraintTable() that serializes IMPACT_VARIABLE_REGISTRY and MARKET_BUCKET_ALLOWED_CHANNELS into a compact constraint block - Rewrite buildImpactExpansionSystemPrompt() to replace 3 flat enum lists with the structured constraint table, bucket-channel dual gate rule, and explicit MiroFish causal chain guidance (direct -> second_order -> third_order example) - Add validation field to all 3 return paths in evaluateDeepForecastPaths so pre-state-filter rejection data is always preserved - Add hypothesisValidation to buildImpactExpansionDebugPayload capturing totalHypotheses, validatedCount, mappedCount, rejectionReasonCounts, and per-hypothesis rejection identity (candidateIndex, candidateStateId, variableKey, channel, targetBucket, order, rejectionReason) - Filter duplicate state unit labels post-finalization in buildCanonicalStateUnits - Export buildRegistryConstraintTable, IMPACT_VARIABLE_REGISTRY, MARKET_BUCKET_ALLOWED_CHANNELS for testability - Add 4 new tests: mapped=0 early return has validation, no-expanded-accepted path has validation, hypothesisValidation flows through buildForecastTraceArtifacts, buildRegistryConstraintTable matches registry * refactor(forecast): memoize constraint table, clarify dedup filter, tighten test A2 - Extract IMPACT_EXPANSION_REGISTRY_CONSTRAINT_TABLE const so the constraint table is built once instead of on every prompt invocation - Replace IIFE dedup filter in buildCanonicalStateUnits with an explicit seenLabels Set for readability - Add format-assumption comment to buildRegistryConstraintTable - Rename test A2 and add mapped > 0 assertion to pin it past the mapped=0 early-return path * fix(forecast): disambiguate duplicate state unit labels instead of dropping units The previous dedup filter silently dropped state units whose labels collided after finalizeStateUnit(), losing their id, forecastIds, signals, and deep-candidate eligibility. Two distinct clusters can score below the merge threshold (< 5.5) while still resolving to the same formatStateUnitLabel() output when they share the same leadRegion and stateKind. The filter kept the first and discarded the rest, suppressing valid deep paths. Fix: replace the .filter() with a .map() that disambiguates collision labels using dominantDomain suffix, falling back to the last 4 chars of the unit id if domain-based disambiguation would itself collide. The seenLabels Set tracks all assigned labels to prevent any secondary collision. The snapshot validator (and deep worker) no longer sees duplicate labels, and no units are dropped. Also export buildCanonicalStateUnits for direct test coverage. * feat(deep-forecast): phase 2 scoring recalibration + prompt excellence Fixes cascade of three scoring gates that caused every deep run to return completed_no_material_change despite valid LLM hypothesis generation. Changes: - Lower second_order validation floors (mapped: 0.66→0.58, internal: 0.58→0.50) and raise multiplier (0.85→0.88) so typical LLM quality (strength~0.75, conf~0.75, 2 refs, specificityScore=0.2) now reaches mapped status - Binary evidenceSupport: refs >= 2 → 1, else → 0 (enforces 2-ref requirement at scoring layer, not just prompt text; 1-ref hypotheses stay trace_only) - Parent-must-be-mapped invariant: post-validation pass downgrades mapped second_order/third_order whose dependsOnKey has no mapped parent - Lower pathScore threshold from 0.66 to 0.50 to allow barely-mapped pairs through to expanded path generation - Add 6 missing maritime chokepoints to CHOKEPOINT_MARKET_REGIONS - Bump IMPACT_EXPANSION_REGISTRY_VERSION v2 → v3 (invalidates stale LLM cache) - Prompt v3: explicit dependsOnKey pairing, 2-evidence citation rule, confidence calibration guidance, direct+second_order pair structure - Add scoringBreakdown (all hypotheses with scoring factors) and gateDetails (active thresholds) to debug artifact for observability feedback loop - Export buildImpactExpansionSystemPrompt, extractImpactRouteFacilityKey, extractImpactCommodityKey for testability - 8 new tests (T1-T8) covering all phase 2 changes; 111/111 pass	2026-03-24 08:31:41 +04:00
Elie Habib	2e0bc86d81	feat(forecast): add replayable deep forecast lifecycle (#2161 ) * feat(forecast): add replayable deep forecast lifecycle * fix(forecast): serialize replay snapshot market index	2026-03-23 23:59:21 +04:00
Elie Habib	cdc77bcbaa	feat(forecast): add queued deep forecast search (#2155 ) * feat(forecast): add queued deep forecast search * test(forecast): cover deep path scoring	2026-03-23 21:36:46 +04:00
Elie Habib	d29fd4e03f	fix(forecast): harden reliability recovery pipeline (#2149 )	2026-03-23 19:28:26 +04:00
Elie Habib	636ace7b2c	feat(forecast): add impact expansion simulation layer (#2138 ) * feat(forecast): add impact expansion simulation layer * fix(forecast): correct impact bucket coherence gate	2026-03-23 15:19:06 +04:00
Elie Habib	00f9ce7c19	fix(forecast): preserve llm narratives on publish refresh (#2134 )	2026-03-23 13:50:57 +04:00
Elie Habib	a202b8ebcc	feat(consumer-prices): global All view as default, market selector, per-market cache keys (#2128 ) * fix(seeders): apply gold standard TTL-extend+retry pattern to Aviation, NOTAM, Cyber, PositiveEvents * feat(consumer-prices): default to All — global comparison table as landing view - DEFAULT_MARKET = 'all' so panel opens with the global view - 🌍 All pill added at front of market bar - All view fetches all 9 markets in parallel via fetchAllMarketsOverview() and renders a comparison table: Market / Index / WoW / Spread / Updated - Clicking any market row drills into that market's full tab view - SINGLE_MARKETS exported for use in All-view iteration - CSS: .cp-global-table and row styles	2026-03-23 10:58:37 +04:00
Elie Habib	166fc58e92	fix(forecast): tighten state coherence and promotion (#2120 ) * fix(forecast): tighten state coherence and promotion * fix(forecast): harden coherence follow-ups	2026-03-23 10:19:17 +04:00
Elie Habib	1058b648a1	feat(forecast): derive market domains from state units (#2116 ) * feat(forecast): derive market domains from state units * fix(forecast): cover state-derived backfill path	2026-03-23 09:26:13 +04:00
Elie Habib	ea991dc7ce	fix(forecast): unclog market promotion and state selection (#2082 )	2026-03-23 01:19:28 +04:00
Elie Habib	5e8a106999	feat(forecast): extract critical news signals (#2064 ) * feat(forecast): extract critical news signals * fix(forecast): harden critical signal extraction * feat(forecast): add structured urgent signal extraction * docs(env): document critical forecast llm overrides	2026-03-22 22:39:00 +04:00
Elie Habib	a24ea45983	feat(forecast): compress situations into state units (#2037 )	2026-03-22 10:11:41 +04:00
Elie Habib	7eef3fd9ca	feat(forecast): enrich energy transmission signals (#2021 )	2026-03-21 23:05:20 +04:00
Elie Habib	3b762492fe	feat(forecast): deepen market transmission simulation (#1996 )	2026-03-21 17:16:31 +04:00
Elie Habib	41591e33a9	feat(forecast): add macro market signals (#1980 )	2026-03-21 12:26:52 +04:00
Elie Habib	5b987ea434	feat(forecast): drive simulation from market state (#1976 )	2026-03-21 11:09:04 +04:00
Elie Habib	3670716daa	feat(forecast): add market transmission state (#1971 )	2026-03-21 09:48:38 +04:00
Elie Habib	8d86607d21	feat(forecast): drive selection from causal memory (#1958 )	2026-03-21 01:20:42 +04:00
Elie Habib	f56f11a596	feat(forecast): add simulation memory replay state (#1945 )	2026-03-20 20:37:11 +04:00
Elie Habib	8e8db1b40f	feat(forecast): calibrate interaction effect promotion (#1936 )	2026-03-20 18:49:58 +04:00
Elie Habib	070248b792	fix(forecast): guarantee military forecast inclusion in publish selection pool (#1917 ) Military detector forecasts (ADS-B flight tracking + theater posture API) structurally score near zero on readiness metrics that require LLM-enriched caseFile content (supporting evidence, news headlines, calibration, triggers). This causes them to rank below the target count threshold every run despite a valid elevated posture signal. Add a domain guarantee post-pass after the 3 selection loops: if no military forecast was selected and we have room below MAX_TARGET_PUBLISHED_FORECASTS, inject the highest-scoring eligible military forecast. This does not displace any already-selected forecast and respects all existing family/situation caps. Diagnosis: Baltic theater at postureLevel='elevated' with 6 active flights generates a military forecast (prob=0.41, confidence=0.30, score=0.136) but gets buried behind 15+ well-grounded situation cluster forecasts at score 0.4+. Tests: 3 new assertions in 'military domain guarantee in publish selection'.	2026-03-20 14:04:08 +04:00
Elie Habib	01366fcc00	fix(forecast): block generic-actor cross-theater interactions + raise enrichment budget (#1916 ) * fix(forecast): block generic-actor cross-theater interactions + raise enrichment budget Root cause: actor registry uses name:category as key (e.g. "Incumbent leadership:state"), causing unrelated situations (Israel conflict, Taiwan political) to share the same actor ID and fire sharedActor=true in pushInteraction. This propagated into the reportable ledger and surfaced as junk effects like Israel→Taiwan at 80% confidence. Two-pronged fix: 1. Specificity gate in pushInteraction: sharedActor now requires avgSpecificity >= 0.75. Generic blueprint actors ("Incumbent leadership" ~0.68, "Civil protection authorities" ~0.73) no longer qualify as structural cross-situation links. Named domain-specific actors ("Threat actors:adversarial" ~0.95) continue to qualify. 2. MACRO_REGION_MAP + isCrossTheaterPair + gate in buildCrossSituationEffects: for cross-theater pairs (different macro-regions) with non-exempt channels, requires sharedActor=true AND avgActorSpecificity >= 0.90. Exempt channels: cyber_disruption, market_repricing (legitimately global). Same-macro-region pairs (Brazil/Mexico both AMERICAS) are unaffected. Verified against live run 1773983083084-bu6b1f: BLOCKED: Israel→Taiwan (MENA/EAST_ASIA, spec 0.68) BLOCKED: Israel→US political (MENA/AMERICAS, spec 0.68) BLOCKED: Cuba→Iran (AMERICAS/MENA, spec 0.73) BLOCKED: Brazil→Israel (AMERICAS/MENA, spec 0.85 < 0.90) ALLOWED: China→US cyber_disruption (exempt channel) ALLOWED: Brazil→Mexico (same AMERICAS) Also raises ENRICHMENT_COMBINED_MAX from 3 to 5 (total budget 6→8), targeting enrichedRate improvement from ~38% to ~60%. * fix(plans): fix markdown lint errors in forecast semantic quality plan * fix(plans): fix remaining markdown lint error in plan file	2026-03-20 13:26:58 +04:00
Elie Habib	46cd3728d6	fix(forecast): tighten reportable effect quality (#1902 ) * fix(forecast): tighten reportable effect quality * fix(forecast): preserve structural political carryover * chore(forecast): document effect grouping heuristics	2026-03-20 00:44:21 +04:00
Elie Habib	8768d10b7f	fix(forecast): tighten interaction semantics (#1896 ) * fix(forecast): tighten interaction semantics * fix(forecast): narrow maritime family inference * fix(forecast): keep full reportable interaction graph	2026-03-19 23:34:46 +04:00
Elie Habib	e434769e37	feat(forecast): add simulation action ledger (#1891 ) * feat(forecast): add simulation action ledger * fix(forecast): preserve directional interaction effects	2026-03-19 21:01:47 +04:00
Elie Habib	486f5f799f	fix(forecast): tighten family effect credibility (#1880 ) * fix(forecast): tighten family effect credibility * fix(forecast): respect domain effect thresholds	2026-03-19 18:24:40 +04:00
Elie Habib	08cc2723cc	fix(forecast): wire per-situation simulation into per-forecast worldState (#1879 ) buildForecastTraceArtifacts was building worldState after tracedPredictions, so simulation data was never available to buildForecastTraceRecord. Each forecast's caseFile.worldState had situationId/familyId/simulationSummary all undefined, making the 3-round MiroFish simulation invisible at the forecast level. Fix: - Compute worldState before tracing (so simulationState is ready) - Build forecastId → situationSimulation lookup from worldState.simulationState - Pass lookup into buildForecastTraceRecord; inject situationId, familyId, familyLabel, simulationSummary, simulationPosture, simulationPostureScore into caseFile.worldState for each matched forecast - Add regression assertion to forecast-trace-export tests All 194 forecast tests pass.	2026-03-19 17:19:49 +04:00
Elie Habib	2deccac691	fix(forecast): allocate publish output by family (#1868 ) * fix(forecast): allocate publish output by family * fix(forecast): backfill deferred family selections	2026-03-19 11:42:12 +04:00

1 2