eliott/worldmonitor - worldmonitor - lab48

eliott/worldmonitor

mirror of https://github.com/koala73/worldmonitor.git synced 2026-04-25 17:14:57 +02:00

Author	SHA1	Message	Date
Elie Habib	01f6057389	feat(simulation): MiroFish Phase 2 — theater-limited simulation runner (#2220 ) * feat(simulation): MiroFish Phase 2 — theater-limited simulation runner Adds the simulation execution layer that consumes simulation-package.json and produces simulation-outcome.json for maritime chokepoint + energy/logistics theaters, closing the WorldMonitor → MiroFish handoff loop. Changes: - scripts/seed-forecasts.mjs: 2-round LLM simulation runner (prompt builders, JSON extractor, runTheaterSimulation, writeSimulationOutcome, task queue with NX dedup lock, runSimulationWorker poll loop) - scripts/process-simulation-tasks.mjs: standalone worker entry point - proto: GetSimulationOutcome RPC + make generate - server/worldmonitor/forecast/v1/get-simulation-outcome.ts: RPC handler - server/gateway.ts: slow tier for get-simulation-outcome - api/health.js: simulationOutcomeLatest in STANDALONE + ON_DEMAND keys - tests: 14 new tests for simulation runner functions * fix(simulation): address P1/P2 code review findings from PR #2220 Security (P1 #018): - sanitizeForPrompt() applied to all entity/seed fields interpolated into Round 1 prompt (entityId, class, stance, seedId, type, timing) - sanitizeForPrompt() applied to actorId and entityIds in Round 2 prompt - sanitizeForPrompt() + length caps applied to all LLM array fields written to R2 (dominantReactions, stabilizers, invalidators, keyActors, timingMarkers) Validation (P1 #019): - Added validateRunId() regex guard - Applied in enqueueSimulationTask() and processNextSimulationTask() loop Type safety (P1 #020): - Added isOutcomePointer() and isPackagePointer() type guards in TS handlers - Replaced unsafe as-casts with runtime-validated guards in both handlers Correctness (P2 #022): - Log warning when pkgPointer.runId does not match task runId Architecture (P2 #024): - isMaritimeChokeEnergyCandidate() accepts both flat and nested topBucketId - Call site simplified to pass theater directly Performance (P2 #025): - SIMULATION_ROUND1_MAX_TOKENS raised 1800 to 2200 - Added max 3 initialReactions instruction to Round 1 prompt Maintainability (P2 #026): - Simulation pointer keys exported from server/_shared/cache-keys.ts - Both TS handlers import from shared location Documentation (P2 #027): - Strengthened runId no-op description in proto and OpenAPI spec * fix(todos): add blank lines around lists in markdown todo files * style(api): reformat openapi yaml to match linter output * test(simulation): add flat-shape filter test + getSimulationOutcome handler coverage Two tests identified as missing during PR #2220 review: 1. isMaritimeChokeEnergyCandidate flat-shape tests — covers the \|\| candidate.topBucketId normalization added in the P1/P2 review pass. The existing tests only used the nested marketContext.topBucketId shape; this adds the flat root-field shape that arrives from the simulation-package.json JSON (selectedTheaters entries have topBucketId at root). 2. getSimulationOutcome handler structural tests — verifies the isOutcomePointer guard, found:false NOT_FOUND return, found:true success path, note population on runId mismatch, and redis_unavailable error string. Follows the readSrc static-analysis pattern used elsewhere in server-handlers.test.mjs (handler imports Redis so full integration test would require a test Redis instance).	2026-03-25 13:55:59 +04:00
Elie Habib	f87c8c71c4	feat(forecast): Phase 2 simulation package read path (#2219 ) * feat(forecast): Phase 2 simulation package read path (getSimulationPackage RPC + Redis existence key) - writeSimulationPackage now writes forecast:simulation-package:latest to Redis after successful R2 write, containing { runId, pkgKey, schemaVersion, theaterCount, generatedAt } with TTL matching TRACE_REDIS_TTL_SECONDS (60 days) - New getSimulationPackage RPC handler reads Redis key, returns pointer metadata without requiring an R2 fetch (zero R2 cost for existence check) - Wired into ForecastServiceHandler and server/gateway.ts cache tier (medium) - Proto: GetSimulationPackage RPC + get_simulation_package.proto message definitions - api/health.js: simulationPackageLatest added to STANDALONE_KEYS + ON_DEMAND_KEYS - Tests: SIMULATION_PACKAGE_LATEST_KEY constant + writeSimulationPackage null-guard test Closes todo #017 (Phase 2 prerequisites for MiroFish integration) * chore(generated): regenerate proto types for GetSimulationPackage RPC * fix(simulation-rpc): distinguish Redis failure from not-found; signal runId mismatch - Add `error` field to GetSimulationPackageResponse: populated with "redis_unavailable" on Redis errors so callers can distinguish a healthy not-found (found=false, error="") from a Redis failure (found=false, error="redis_unavailable"). Adds console.warn on error. - Add `note` field: populated when req.runId is supplied but does not match the latest package's runId, signalling that per-run filtering is not yet active (Phase 3). - Add proto comment on run_id: "Currently ignored; reserved for Phase 3" - Add milliseconds annotation to generated_at description. - Simplify handler: extract NOT_FOUND constant, remove SimulationPackagePointer interface, remove \|\| '' / \|\| 0 guards on guaranteed-present fields. - Regenerate all buf-generated files. Fixes todos #018 (runId silently ignored) and #019 (error indistinguishable from not-found). Also resolves todos #022 (simplifications) and #023 (OpenAPI required fields / generatedAt unit annotation). * fix(simulation-rpc): change cache tier from medium to slow (aligns with deep-run update frequency) * fix(simulation-rpc): fix key prefixing, make Redis errors reachable, no-cache not-found Three P1 regressions caught in external review: 1. Key prefix bug: getCachedJson() applies preview:<sha>: prefix in non-production environments, but writeSimulationPackage writes the raw key via a direct Redis command. In preview/dev the RPC always returned found:false even when the package existed. Fix: new getRawJson() in redis.ts always uses the unprefixed key AND throws on failure instead of swallowing errors. 2. redis_unavailable unreachable: getCachedJson swallows fetch failures and missing- credentials by returning null, so the catch block for redis_unavailable was dead code. getRawJson() throws on HTTP errors and missing credentials, making the error: "redis_unavailable" contract actually reachable. 3. Negative-cache stampede: slow tier caches every 200 GET. A request before any deep run wrote a package returned { found:false } which the CDN cached for up to 1h, breaking post-run discovery. Fix: markNoCacheResponse() on both not-found and error paths so they are served fresh on every request.	2026-03-24 22:45:22 +04:00
Elie Habib	b7e6333877	feat(deep-forecast): Phase 1 simulation package export contract (#2204 ) * feat(deep-forecast): Phase 1 simulation package export contract Add buildSimulationPackageFromDeepSnapshot and writeSimulationPackage to produce simulation-package.json alongside deep-snapshot.json on every eligible fast run, completing Phase 1 of the WorldMonitor → MiroFish bridge defined in docs/internal/wm-mirofish-gap.md. Phase 1 scope: maritime chokepoint + energy/logistics theaters only. A candidate qualifies if its routeFacilityKey is a known chokepoint in CHOKEPOINT_MARKET_REGIONS and the top bucket is energy or freight, or the commodityKey is an energy commodity. Package shape (schemaVersion: v1): - selectedTheaters: top 1–3 qualifying candidates with theater ID, route, commodity, bucket, channel, and rankingScore - simulationRequirement: deterministic template per theater (no LLM, fully cacheable), built from label, stateKind, route, commodity, channel, and criticalSignalTypes - structuralWorld: filtered stateUnits, worldSignals, transmission edges, market buckets, situationClusters, situationFamilies touching the theater - entities: extracted from actorRegistry (forecastId overlap), stateUnit actors, and evidence table actor entries; classified into 7 entity classes (state_actor, military_or_security_actor, regulator_or_central_bank, exporter_or_importer, logistics_operator, market_participant, media_or_public_bloc); falls back to anchor set if extraction finds nothing - eventSeeds: headline evidence → live_news, disruption-keyword signal evidence → observed_disruption; T+0h timing; relative timing format - constraints: route_chokepoint_status (hard if criticalSignalLift ≥ 0.25), commodity_exposure (always hard), market_admissibility (soft, channel routing), known_invalidators (soft, when contradictionScore ≥ 0.10) - evaluationTargets: deterministic escalation/containment/spillover path questions + T+24h/T+48h/T+72h timing markers per theater Also adds 6 missing chokepoints to CHOKEPOINT_MARKET_REGIONS: Baltic Sea, Danish Straits, Strait of Gibraltar, Panama Canal, Lombok Strait, Cape of Good Hope. writeSimulationPackage fires-and-forgets after writeDeepForecastSnapshot so it does not add latency to the critical seed path. 17 new unit tests covering: theater filter, package shape, simulationRequirement content, eventSeeds, constraints (hard/soft), evaluationTargets structure, entity extraction, key format, and 3-theater cap. 🤖 Generated with Claude Sonnet 4.6 via Claude Code (https://claude.ai/claude-code) + Compound Engineering v2.49.0 Co-Authored-By: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com> * fix(simulation-package): address P1+P2 code review issues from PR #2204 P1 fixes: - inferEntityClassFromName: use word-boundary regex to prevent "force" substring false positives (e.g. "workforce", "Salesforce") - buildSimulationPackageEntities: key Map entries by candidateStateId instead of dominantRegion to prevent collision across theaters sharing the same region - writeSimulationPackage call site: pass priorWorldState so actorRegistry is available to buildSimulationPackageFromDeepSnapshot P2 fixes: - buildSimulationRequirementText: apply sanitizeForPrompt to theater.label, stateKind, topChannel, and critTypes before string interpolation (stored prompt injection risk) - buildSimulationPackageEventSeeds: apply sanitizeForPrompt to entry.text before .slice(0, 200) - isMaritimeChokeEnergyCandidate: replace new Set() allocation per call with Array.includes for 2-element arrays - buildSimulationPackageEntities: convert allForecastIds to Set before actor registry loop (O(n²) → O(n)) - buildSimulationPackageEvaluationTargets: add missing candidate guard with console.warn when candidate is undefined for theater - selectedTheaters map: add label fallback to dominantRegion / 'unknown theater' to prevent "undefined" in simulationRequirement Tests: 6 new unit tests covering the word-boundary fix, entity key collision, injection stripping, and undefined label guard --------- Co-authored-by: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com>	2026-03-24 20:40:52 +04:00
Elie Habib	226cebf9bc	feat(deep-forecast): Phase 2+3 scoring recalibration + autoresearch prompt self-improvement (#2178 ) * fix(deep-forecast): lower acceptance threshold 0.60→0.50 to match real score distribution computeDeepPathAcceptanceScore formula: pathScore×0.55 + quality×0.20 + coherence×0.15 With pathScore≈0.65, quality≈0.30, coherence≈0.55: 0.358 + 0.060 + 0.083 = 0.50 The 0.60 threshold was calibrated before understanding that reportableQualityScore is constrained by world-state simulation geometry (not hypothesis quality), and coherence loses 0.15 for generic candidates without routeFacilityKey. The threshold was structurally unreachable with typical expanded paths. Verified end-to-end: deep worker now returns [DeepForecast] completed. Also updates T6 gateDetails assertion and renames the rejection-floor test to correctly describe the new behavior (strong inputs should be accepted). 111/111 tests pass. * feat(deep-forecast): autoresearch prompt self-improvement loop + T9/T10 tests - Add scoreImpactExpansionQuality() locked scorer: commodity rate (35%), variable diversity (35%), chain coverage (20%), mapped rate (10%) - Add runImpactExpansionPromptRefinement(): rate-limited LLM critic loop (30min cooldown) that reads learned section from Redis, scores current run, generates critique if composite < 0.62, tests on same candidates, commits to forecast:prompt:impact-expansion:learned if score improves - buildImpactExpansionSystemPrompt() now accepts learnedSection param, appends it after core rules with separator so model sees prior examples - buildImpactExpansionCandidateHash() includes learnedFingerprint to bust cache when learned section changes - processDeepForecastTask reads learnedSection from Redis before LLM call, runs refinement after both completed and no_material_change paths - Export scoreImpactExpansionQuality + runImpactExpansionPromptRefinement - T9: high commodity rate + chain coverage → composite > 0.70 - T10: no commodity + no chain coverage → composite < 0.40 - 113/113 tests pass * fix(deep-forecast): raise autoresearch threshold 0.62→0.80 + fix JSON parse - Threshold 0.62 was too low: commodity=1.00 + chain=1.00 compensated for diversity=0.50 (all same chains), keeping composite at 0.775 → no critique - Raise to 0.80 so diversity<0.70 triggers critique even with good commodity/chain - Fix JSON parser to extract first {…} block (handles Gemini code-fence wrapping) - Add per-hypothesis log in refinement breakdown for observability - Add refinementQualityThreshold to gateDetails for self-documenting artifacts - Verified: critique fires on diversity=0.50 run, committed Hormuz/Baltic/Suez region-specific chain examples (score 0.592→0.650) * feat(deep-forecast): per-candidate parallel LLM calls replace batch expansion Previously: all candidates → one batch LLM call → LLM averages context → identical route_disruption → inflation_pass_through chains for all candidates. Now: each candidate → its own focused LLM call (parallel Promise.all) → LLM reasons about specific stateKind/region/routeFacility for that candidate. Results (3 candidates, 3 parallel calls): - composite: 0.592 → 0.831 (+0.24) - commodity: 0.17 → 1.00 (all mapped have specific commodity) - diversity: 0.50 → 0.83 (energy_export_stress, importer_balance_stress appearing alongside route_disruption — genuinely different chains) - Baseline updated: 0.831 (above 0.80 threshold → no critique needed) Also threads learnedSection through extractSingleImpactExpansionCandidate so the learned examples from autoresearch apply to each focused call. Per-candidate cache keys (already existed) now serve as primary cache. * fix(tests): update recovery test for per-candidate LLM call flow - Change stage mock from impact_expansion → impact_expansion_single (batch primary path removed, per-candidate is now primary) - Assert parseMode === per_candidate instead of parseStage /^recovered_/ (recovered_ prefix was only set by old batch_repair path) - 2257/2257 tests pass * fix(deep-forecast): add Red Sea, Persian Gulf, South China Sea to chokepoint map Candidate packets had routeFacilityKey=none for Red Sea / Persian Gulf / Baltic Sea signals because prediction titles say "Red Sea maritime disruption" not "Bab el-Mandeb" or "Strait of Hormuz". CHOKEPOINT_MARKET_REGIONS only had sub-facility names (Bab el-Mandeb, Suez Canal) as keys, not the sea regions themselves. Fix: add Red Sea, Persian Gulf, Arabian Sea, Black Sea, South China Sea, Mediterranean Sea as direct keys so region-level candidate titles resolve. Result: LLM user prompt now shows routeFacilityKey=Red Sea / Persian Gulf / Baltic Sea per candidate — giving each focused call the geographic context needed to generate route-specific chains. - Autoresearch baseline updated 0.932→0.965 on this run - T8 extended with Red Sea, Persian Gulf, South China Sea assertions - 2257/2257 tests pass * feat(deep-forecast): free-form hypothesis schema + remove registry constraint - Bump IMPACT_EXPANSION_REGISTRY_VERSION to v4 - Add hypothesisKey, description, geography, affectedAssets, marketImpact, causalLink fields to normalizeImpactHypothesisDraft (keep legacy fields for backward compat) - Rewrite buildImpactExpansionSystemPrompt: remove IMPACT_VARIABLE_REGISTRY constraint table, use free-form ImpactHypothesis schema with geographic/commodity specificity rules - Rewrite evaluateImpactHypothesisRejection: use effective key (hypothesisKey \|\| variableKey) for dedup; legacy registry check only for old cached responses without hypothesisKey - Update validateImpactHypotheses scoring: add geographyScore, commodityScore, causalLinkScore, assetScore terms; channelCoherence/bucketCoherence only apply to legacy responses - Update parent-must-be-mapped invariant to use hypothesisKey \|\| variableKey as effective key - Update mapImpactHypothesesToWorldSignals: use effective key for dedup and sourceKey; prefer description/geography over legacy fields - Update buildImpactPathsForCandidate: match on hypothesisKey \|\| variableKey for parent lookup - Update buildImpactPathId: use hypothesisKey \|\| variableKey for hash inputs - Rewrite scoreImpactExpansionQuality: add geographyRate and assetRate metrics; update composite weights - Update buildImpactPromptCritiqueSystemPrompt/UserPrompt: use hypothesisKey-based chain format in examples - Add new fields to buildImpactExpansionBundleFromPaths push calls - Update T7 test assertion: MUST be the exact hypothesisKey instead of variableKey string * fix(deep-forecast): update breakdown log to show free-form hypothesis fields * feat(deep-forecast): add commodityDiversity metric to autoresearch scorer - commodityDiversity = unique commodities / nCandidates (weight 0.35) Penalizes runs where all candidates default to same commodity. 3 candidates all crude_oil → diversity=0.33 → composite ~0.76 → critique fires. - Rebalanced composite weights: comDiversity 0.35, geo 0.20, keyDiversity 0.15, chain 0.10, commodityRate 0.10, asset 0.05, mappedRate 0.05 - Breakdown log now shows comDiversity + geo + keyDiversity - Critique prompt updated: commodity_monoculture failure mode, diagnosis targets commodity homogeneity - T9: added commodityDiversity=1.0 assertion (2 unique commodities across 2 candidates) * refactor(deep-forecast): replace commodityDiversity with directCommodityDiversity + directGeoDiversity + candidateSpreadScore Problem: measuring diversity on all mapped hypotheses misses the case where one candidate generates 10 implications while others generate 0, or where all candidates converge on the same commodity due to dominating signals. Fix: score at the DIRECT hypothesis level (root causes only) and add a candidate-spread metric: - directCommodityDiversity: unique commodities among direct hypotheses / nCandidates. Measures breadth at the root-cause level. 3 candidates all crude_oil → 0.33 → composite ~0.77 → critique fires. - directGeoDiversity: unique primary geographies among direct hypotheses / nCandidates. First segment of compound geography strings (e.g. 'Red Sea, Suez Canal' → 'red sea') to avoid double-counting. - candidateSpreadScore: normalized inverse-HHI. 1.0 = perfectly even distribution across candidates. One candidate with 10 implications and others with 0 → scores near 0 → critique fires. Weight rationale: comDiversity 0.35, geoDiversity 0.20, spread 0.15, chain 0.15, comRate 0.08, assetRate 0.04, mappedRate 0.03. Verified: Run 2 Baltic/Hormuz/Brazil → freight/crude_oil/USD spread=0.98 ✓ * feat(deep-forecast): add convergence object to R2 debug artifact Surface autoresearch loop outcome per run: converged (bool), finalComposite, critiqueIterations (0 or 1), refinementCommitted, and perCandidateMappedCount (candidateStateId → count). After 5+ runs the artifact alone answers whether the pipeline is improving. Architectural changes: - runImpactExpansionPromptRefinement now returns { iterationCount, committed } at all exit paths instead of undefined - Call hoisted before writeForecastTraceArtifacts so the result flows into the debug payload via dataForWrite.refinementResult - buildImpactExpansionDebugPayload assembles convergence from validation + refinementResult; exported for direct testing - Fix: stale diversityScore reference replaced with directCommodityDiversity Tests: T-conv-1 (converged=true), T-conv-2 (converged=false + iterations=1), T-conv-3 (perCandidateMappedCount grouping) — 116/116 pass * fix(deep-forecast): address P1+P2 review issues from convergence observability PR P1-A: sanitize LLM-returned proposed_addition before Redis write (prompt injection guard via sanitizeProposedLlmAddition — strips directive-phrase lines) P1-B: restore fire-and-forget for runImpactExpansionPromptRefinement; compute critiqueIterations from quality score (predicted) instead of awaiting result, eliminating 15-30s critical-path latency on poor-quality runs P1-C: processDeepForecastTask now returns convergence object to callers; add convergence_quality_met warn check to evaluateForecastRunArtifacts P1-D: cap concurrent LLM calls in extractImpactExpansionBundle to 3 (manual batching — no p-limit) to respect provider rate limits P2-1: hash full learnedSection in buildImpactExpansionCandidateHash (was sliced to 80 chars, causing cache collisions on long learned sections) P2-2: add exitReason field to all runImpactExpansionPromptRefinement return paths P2-3: sanitizeForPrompt strips directive injection phrases; new sanitizeProposedLlmAddition applies line-level filtering before Redis write P2-4: add comment explaining intentional bidirectional affectedAssets/assetsOrSectors coalescing in normalizeImpactHypothesisDraft P2-5: extract makeConvTestData helper in T-conv tests; remove refinementCommitted assertions (field removed from convergence shape) P2-6: convergence_quality_met check added to evaluateForecastRunArtifacts (warn) 🤖 Generated with Claude Sonnet 4.6 via Claude Code (https://claude.ai/claude-code) + Compound Engineering v2.49.0 Co-Authored-By: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com> * fix(docs): add blank lines around lists in plan (MD032) * fix(deep-forecast): address P1+P2 reviewer issues in convergence observability P1-1: mapImpactHypothesesToWorldSignals used free-form marketImpact values (price_spike, shortage, credit_stress, risk_off) verbatim as signal channel types, producing unknown types that buildMarketTransmissionGraph cannot consume. Add IMPACT_SIGNAL_CHANNELS set + resolveImpactChannel() to map free-form strings to the nearest valid channel before signal materialization. P1-2: sanitizeForPrompt had directive-phrase stripping added that was too broad for a function called on headlines, evidence tables, case files, and geopolitical summaries. Reverted to original safe sanitizer (newline/control char removal only). Directive stripping remains in sanitizeProposedLlmAddition where it is scoped to Redis-bound LLM-generated additions only. P2: Renamed convergence.critiqueIterations to predictedCritiqueIterations to make clear this is a prediction from the quality score, not a measured count from actual refinement behavior (refinement is fire-and-forget after artifact write). Updated T-conv-1/2 test assertions to match. * feat(deep-forecast): inject live news headlines into evidence table Wire inputs.newsInsights / inputs.newsDigest through the candidate selection pipeline so buildImpactExpansionEvidenceTable receives up to 3 commodity-relevant live headlines as 'live_news' evidence entries. Changes: - IMPACT_COMMODITY_LEXICON: extend fertilizer pattern (fertiliser, nitrogen, phosphate, npk); add food_grains and shipping_freight entries - filterNewsHeadlinesByState: new pure helper that scores headlines by alert status, LNG/energy/route/sanctions signal match, lexicon commodity match, and source corroboration count (min score 2 to include) - buildImpactExpansionEvidenceTable: add newsItems param, inject live_news entries, raise cap 8→11 - buildImpactExpansionCandidate: add newsInsights/newsDigest params, compute newsItems via filterNewsHeadlinesByState - selectImpactExpansionCandidates: add newsInsights/newsDigest to options - Call site: pass inputs.newsInsights/newsDigest at seed time - Export filterNewsHeadlinesByState, buildImpactExpansionEvidenceTable - 9 new tests (T-news-1 through T-lex-3): all pass, 125 total pass 🤖 Generated with Claude Sonnet 4.6 (200K context) via Claude Code (https://claude.ai/claude-code) + Compound Engineering v2.49.0 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(deep-forecast): remove hardcoded LNG boost from filterNewsHeadlinesByState The LNG+2 score was commodity-specific and inconsistent with the intent: headline scoring should be generic, not biased toward any named commodity. The function already handles the state's detected commodity dynamically via lexEntry.pattern (IMPACT_COMMODITY_LEXICON). LNG headlines still score via CRITICAL_NEWS_ENERGY_RE (+1) and CRITICAL_NEWS_ROUTE_RE (+1) when relevant to the state's region. All 125 tests pass. * fix(deep-forecast): address all P1+P2 code review findings from PR #2178 P1 fixes (block-merge): - Lower third_order mapped floor 0.74→0.70 (max achievable via 0.72 multiplier was 0.72) - Guard runImpactExpansionPromptRefinement against empty validation (no_mapped exit) - Replace block-list sanitizeProposedLlmAddition with pattern-based allowlist (HTML/JS/directive takeover) - Fix TOCTOU on PROMPT_LAST_ATTEMPT_KEY: claim slot before quality check, not after LLM call P2 fixes: - Fix learned section overflow: use slice(-MAX) to preserve tail, not discard all prior content - Add safe_haven_bid and global_crude_spread_stress branches to resolveImpactChannel - quality_met path now sets rate-limit key (prevents 3 Redis GETs per good run) - Hoist extractNewsClusterItems outside stateUnit map in selectImpactExpansionCandidates - Export PROMPT_LEARNED_KEY, PROMPT_BASELINE_KEY, PROMPT_LAST_ATTEMPT_KEY + read/clear helpers All 125 tests pass. * fix(todos): add blank lines around lists/headings in todo files (markdownlint) * fix(todos): fix markdownlint blanks-around-headings/lists in all todo files --------- Co-authored-by: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com>	2026-03-24 18:52:02 +04:00
Elie Habib	c9d4fc49a5	fix(forecast): fix impact expansion LLM prompt + pre-filter validation diagnostic (#2169 ) * fix(forecast): fix impact expansion LLM prompt + add pre-filter validation diagnostic Problem: LLM generates 8 hypotheses but 6 get rejected post-hoc because the prompt gave 3 flat unlinked enum lists without per-variable constraints, causing the LLM to hallucinate plausible but invalid channel/bucket combinations. Only 2 direct-only hypotheses survived, producing no expanded paths -> completed_no_material_change on every deep run. Also fixes two related bugs: - Duplicate canonical state unit labels (same region + stateKind) blocked the deep worker. Added label dedup filter in buildCanonicalStateUnits. - When all hypotheses failed state-ID filtering in materializeImpactExpansion, impactExpansionSummary showed 0 hypotheses with no rejection reasons, losing all diagnostic data. Changes: - Bump IMPACT_EXPANSION_REGISTRY_VERSION v1 -> v2 to invalidate cached LLM responses so the model re-generates with new constraint guidance - Add buildRegistryConstraintTable() that serializes IMPACT_VARIABLE_REGISTRY and MARKET_BUCKET_ALLOWED_CHANNELS into a compact constraint block - Rewrite buildImpactExpansionSystemPrompt() to replace 3 flat enum lists with the structured constraint table, bucket-channel dual gate rule, and explicit MiroFish causal chain guidance (direct -> second_order -> third_order example) - Add validation field to all 3 return paths in evaluateDeepForecastPaths so pre-state-filter rejection data is always preserved - Add hypothesisValidation to buildImpactExpansionDebugPayload capturing totalHypotheses, validatedCount, mappedCount, rejectionReasonCounts, and per-hypothesis rejection identity (candidateIndex, candidateStateId, variableKey, channel, targetBucket, order, rejectionReason) - Filter duplicate state unit labels post-finalization in buildCanonicalStateUnits - Export buildRegistryConstraintTable, IMPACT_VARIABLE_REGISTRY, MARKET_BUCKET_ALLOWED_CHANNELS for testability - Add 4 new tests: mapped=0 early return has validation, no-expanded-accepted path has validation, hypothesisValidation flows through buildForecastTraceArtifacts, buildRegistryConstraintTable matches registry * refactor(forecast): memoize constraint table, clarify dedup filter, tighten test A2 - Extract IMPACT_EXPANSION_REGISTRY_CONSTRAINT_TABLE const so the constraint table is built once instead of on every prompt invocation - Replace IIFE dedup filter in buildCanonicalStateUnits with an explicit seenLabels Set for readability - Add format-assumption comment to buildRegistryConstraintTable - Rename test A2 and add mapped > 0 assertion to pin it past the mapped=0 early-return path * fix(forecast): disambiguate duplicate state unit labels instead of dropping units The previous dedup filter silently dropped state units whose labels collided after finalizeStateUnit(), losing their id, forecastIds, signals, and deep-candidate eligibility. Two distinct clusters can score below the merge threshold (< 5.5) while still resolving to the same formatStateUnitLabel() output when they share the same leadRegion and stateKind. The filter kept the first and discarded the rest, suppressing valid deep paths. Fix: replace the .filter() with a .map() that disambiguates collision labels using dominantDomain suffix, falling back to the last 4 chars of the unit id if domain-based disambiguation would itself collide. The seenLabels Set tracks all assigned labels to prevent any secondary collision. The snapshot validator (and deep worker) no longer sees duplicate labels, and no units are dropped. Also export buildCanonicalStateUnits for direct test coverage. * feat(deep-forecast): phase 2 scoring recalibration + prompt excellence Fixes cascade of three scoring gates that caused every deep run to return completed_no_material_change despite valid LLM hypothesis generation. Changes: - Lower second_order validation floors (mapped: 0.66→0.58, internal: 0.58→0.50) and raise multiplier (0.85→0.88) so typical LLM quality (strength~0.75, conf~0.75, 2 refs, specificityScore=0.2) now reaches mapped status - Binary evidenceSupport: refs >= 2 → 1, else → 0 (enforces 2-ref requirement at scoring layer, not just prompt text; 1-ref hypotheses stay trace_only) - Parent-must-be-mapped invariant: post-validation pass downgrades mapped second_order/third_order whose dependsOnKey has no mapped parent - Lower pathScore threshold from 0.66 to 0.50 to allow barely-mapped pairs through to expanded path generation - Add 6 missing maritime chokepoints to CHOKEPOINT_MARKET_REGIONS - Bump IMPACT_EXPANSION_REGISTRY_VERSION v2 → v3 (invalidates stale LLM cache) - Prompt v3: explicit dependsOnKey pairing, 2-evidence citation rule, confidence calibration guidance, direct+second_order pair structure - Add scoringBreakdown (all hypotheses with scoring factors) and gateDetails (active thresholds) to debug artifact for observability feedback loop - Export buildImpactExpansionSystemPrompt, extractImpactRouteFacilityKey, extractImpactCommodityKey for testability - 8 new tests (T1-T8) covering all phase 2 changes; 111/111 pass	2026-03-24 08:31:41 +04:00
Elie Habib	2e0bc86d81	feat(forecast): add replayable deep forecast lifecycle (#2161 ) * feat(forecast): add replayable deep forecast lifecycle * fix(forecast): serialize replay snapshot market index	2026-03-23 23:59:21 +04:00
Elie Habib	cdc77bcbaa	feat(forecast): add queued deep forecast search (#2155 ) * feat(forecast): add queued deep forecast search * test(forecast): cover deep path scoring	2026-03-23 21:36:46 +04:00
Elie Habib	d29fd4e03f	fix(forecast): harden reliability recovery pipeline (#2149 )	2026-03-23 19:28:26 +04:00
Elie Habib	636ace7b2c	feat(forecast): add impact expansion simulation layer (#2138 ) * feat(forecast): add impact expansion simulation layer * fix(forecast): correct impact bucket coherence gate	2026-03-23 15:19:06 +04:00
Elie Habib	00f9ce7c19	fix(forecast): preserve llm narratives on publish refresh (#2134 )	2026-03-23 13:50:57 +04:00
Elie Habib	a202b8ebcc	feat(consumer-prices): global All view as default, market selector, per-market cache keys (#2128 ) * fix(seeders): apply gold standard TTL-extend+retry pattern to Aviation, NOTAM, Cyber, PositiveEvents * feat(consumer-prices): default to All — global comparison table as landing view - DEFAULT_MARKET = 'all' so panel opens with the global view - 🌍 All pill added at front of market bar - All view fetches all 9 markets in parallel via fetchAllMarketsOverview() and renders a comparison table: Market / Index / WoW / Spread / Updated - Clicking any market row drills into that market's full tab view - SINGLE_MARKETS exported for use in All-view iteration - CSS: .cp-global-table and row styles	2026-03-23 10:58:37 +04:00
Elie Habib	166fc58e92	fix(forecast): tighten state coherence and promotion (#2120 ) * fix(forecast): tighten state coherence and promotion * fix(forecast): harden coherence follow-ups	2026-03-23 10:19:17 +04:00
Elie Habib	1058b648a1	feat(forecast): derive market domains from state units (#2116 ) * feat(forecast): derive market domains from state units * fix(forecast): cover state-derived backfill path	2026-03-23 09:26:13 +04:00
Elie Habib	ea991dc7ce	fix(forecast): unclog market promotion and state selection (#2082 )	2026-03-23 01:19:28 +04:00
Elie Habib	5e8a106999	feat(forecast): extract critical news signals (#2064 ) * feat(forecast): extract critical news signals * fix(forecast): harden critical signal extraction * feat(forecast): add structured urgent signal extraction * docs(env): document critical forecast llm overrides	2026-03-22 22:39:00 +04:00
Elie Habib	a24ea45983	feat(forecast): compress situations into state units (#2037 )	2026-03-22 10:11:41 +04:00
Elie Habib	7eef3fd9ca	feat(forecast): enrich energy transmission signals (#2021 )	2026-03-21 23:05:20 +04:00
Elie Habib	3b762492fe	feat(forecast): deepen market transmission simulation (#1996 )	2026-03-21 17:16:31 +04:00
Elie Habib	41591e33a9	feat(forecast): add macro market signals (#1980 )	2026-03-21 12:26:52 +04:00
Elie Habib	5b987ea434	feat(forecast): drive simulation from market state (#1976 )	2026-03-21 11:09:04 +04:00
Elie Habib	3670716daa	feat(forecast): add market transmission state (#1971 )	2026-03-21 09:48:38 +04:00
Elie Habib	8d86607d21	feat(forecast): drive selection from causal memory (#1958 )	2026-03-21 01:20:42 +04:00
Elie Habib	f56f11a596	feat(forecast): add simulation memory replay state (#1945 )	2026-03-20 20:37:11 +04:00
Elie Habib	8e8db1b40f	feat(forecast): calibrate interaction effect promotion (#1936 )	2026-03-20 18:49:58 +04:00
Elie Habib	070248b792	fix(forecast): guarantee military forecast inclusion in publish selection pool (#1917 ) Military detector forecasts (ADS-B flight tracking + theater posture API) structurally score near zero on readiness metrics that require LLM-enriched caseFile content (supporting evidence, news headlines, calibration, triggers). This causes them to rank below the target count threshold every run despite a valid elevated posture signal. Add a domain guarantee post-pass after the 3 selection loops: if no military forecast was selected and we have room below MAX_TARGET_PUBLISHED_FORECASTS, inject the highest-scoring eligible military forecast. This does not displace any already-selected forecast and respects all existing family/situation caps. Diagnosis: Baltic theater at postureLevel='elevated' with 6 active flights generates a military forecast (prob=0.41, confidence=0.30, score=0.136) but gets buried behind 15+ well-grounded situation cluster forecasts at score 0.4+. Tests: 3 new assertions in 'military domain guarantee in publish selection'.	2026-03-20 14:04:08 +04:00
Elie Habib	01366fcc00	fix(forecast): block generic-actor cross-theater interactions + raise enrichment budget (#1916 ) * fix(forecast): block generic-actor cross-theater interactions + raise enrichment budget Root cause: actor registry uses name:category as key (e.g. "Incumbent leadership:state"), causing unrelated situations (Israel conflict, Taiwan political) to share the same actor ID and fire sharedActor=true in pushInteraction. This propagated into the reportable ledger and surfaced as junk effects like Israel→Taiwan at 80% confidence. Two-pronged fix: 1. Specificity gate in pushInteraction: sharedActor now requires avgSpecificity >= 0.75. Generic blueprint actors ("Incumbent leadership" ~0.68, "Civil protection authorities" ~0.73) no longer qualify as structural cross-situation links. Named domain-specific actors ("Threat actors:adversarial" ~0.95) continue to qualify. 2. MACRO_REGION_MAP + isCrossTheaterPair + gate in buildCrossSituationEffects: for cross-theater pairs (different macro-regions) with non-exempt channels, requires sharedActor=true AND avgActorSpecificity >= 0.90. Exempt channels: cyber_disruption, market_repricing (legitimately global). Same-macro-region pairs (Brazil/Mexico both AMERICAS) are unaffected. Verified against live run 1773983083084-bu6b1f: BLOCKED: Israel→Taiwan (MENA/EAST_ASIA, spec 0.68) BLOCKED: Israel→US political (MENA/AMERICAS, spec 0.68) BLOCKED: Cuba→Iran (AMERICAS/MENA, spec 0.73) BLOCKED: Brazil→Israel (AMERICAS/MENA, spec 0.85 < 0.90) ALLOWED: China→US cyber_disruption (exempt channel) ALLOWED: Brazil→Mexico (same AMERICAS) Also raises ENRICHMENT_COMBINED_MAX from 3 to 5 (total budget 6→8), targeting enrichedRate improvement from ~38% to ~60%. * fix(plans): fix markdown lint errors in forecast semantic quality plan * fix(plans): fix remaining markdown lint error in plan file	2026-03-20 13:26:58 +04:00
Elie Habib	46cd3728d6	fix(forecast): tighten reportable effect quality (#1902 ) * fix(forecast): tighten reportable effect quality * fix(forecast): preserve structural political carryover * chore(forecast): document effect grouping heuristics	2026-03-20 00:44:21 +04:00
Elie Habib	8768d10b7f	fix(forecast): tighten interaction semantics (#1896 ) * fix(forecast): tighten interaction semantics * fix(forecast): narrow maritime family inference * fix(forecast): keep full reportable interaction graph	2026-03-19 23:34:46 +04:00
Elie Habib	e434769e37	feat(forecast): add simulation action ledger (#1891 ) * feat(forecast): add simulation action ledger * fix(forecast): preserve directional interaction effects	2026-03-19 21:01:47 +04:00
Elie Habib	486f5f799f	fix(forecast): tighten family effect credibility (#1880 ) * fix(forecast): tighten family effect credibility * fix(forecast): respect domain effect thresholds	2026-03-19 18:24:40 +04:00
Elie Habib	08cc2723cc	fix(forecast): wire per-situation simulation into per-forecast worldState (#1879 ) buildForecastTraceArtifacts was building worldState after tracedPredictions, so simulation data was never available to buildForecastTraceRecord. Each forecast's caseFile.worldState had situationId/familyId/simulationSummary all undefined, making the 3-round MiroFish simulation invisible at the forecast level. Fix: - Compute worldState before tracing (so simulationState is ready) - Build forecastId → situationSimulation lookup from worldState.simulationState - Pass lookup into buildForecastTraceRecord; inject situationId, familyId, familyLabel, simulationSummary, simulationPosture, simulationPostureScore into caseFile.worldState for each matched forecast - Add regression assertion to forecast-trace-export tests All 194 forecast tests pass.	2026-03-19 17:19:49 +04:00
Elie Habib	2deccac691	fix(forecast): allocate publish output by family (#1868 ) * fix(forecast): allocate publish output by family * fix(forecast): backfill deferred family selections	2026-03-19 11:42:12 +04:00
Elie Habib	ee0f124b3f	feat(forecast): add family spillover engine (#1866 ) * feat(forecast): add family spillover engine * fix(forecast): require direct spillover links * fix(forecast): stabilize family spillover wiring	2026-03-19 10:52:06 +04:00
Elie Habib	0b338afed8	fix(forecast): calibrate simulation posture scoring (#1860 ) * fix(forecast): calibrate simulation posture scoring * fix(forecast): version and rebalance simulation scoring	2026-03-19 09:48:41 +04:00
Elie Habib	15e2a6fccb	feat(forecast): drive simulation rounds from actor actions (#1858 )	2026-03-19 09:08:04 +04:00
Elie Habib	214b17d757	fix(forecast): align published and candidate state surfaces (#1852 ) * fix(forecast): align published and candidate state surfaces * fix(forecast): preserve projected published situations	2026-03-19 08:24:26 +04:00
Elie Habib	568408b0ca	fix(forecast): tighten simulation effect links (#1851 )	2026-03-19 03:55:19 +04:00
Elie Habib	10958a397b	feat(forecast): synthesize reports from simulation state (#1850 )	2026-03-19 03:40:22 +04:00
Elie Habib	a40c0a11fb	feat(forecast): add simulation state transitions (#1847 )	2026-03-19 03:25:52 +04:00
Elie Habib	a3492e8b4e	fix(forecast): refine world-state situation clustering (#1843 )	2026-03-19 02:55:22 +04:00
Elie Habib	47e942011b	fix(forecast): improve situation-aware publish quality (#1840 )	2026-03-19 02:11:27 +04:00
Elie Habib	2b228da916	fix(forecast): dedupe situation-overlap forecasts (#1807 ) * fix(forecast): dedupe situation-overlap forecasts * fix(forecast): reuse situation clusters in publish flow * fix(forecast): reuse publish trace context	2026-03-18 17:57:03 +04:00
Elie Habib	e58a608262	fix(forecast): make trace writing single-writer (#1801 ) * fix(forecast): make trace writing single-writer * fix(forecast): preserve chained refresh requests	2026-03-18 11:00:43 +04:00
Elie Habib	527002f873	fix(forecast): improve trace enrichment diagnostics (#1797 ) * fix(forecast): avoid duplicate prior world-state read * feat(forecast): record llm enrichment failure reasons * fix(forecast): preserve latest pointer continuity fallback	2026-03-18 09:55:28 +04:00
Elie Habib	6b1ea49397	feat(forecast): add report continuity history (#1788 ) * feat(forecast): cluster situations in world state * feat(forecast): add report continuity history * fix(forecast): stabilize report continuity matching	2026-03-18 00:27:03 +04:00
Elie Habib	c1f8aa516b	feat(forecast): add situation clustering to world state (#1785 ) * feat(forecast): cluster situations in world state * fix(forecast): stabilize situation continuity ids	2026-03-17 22:47:58 +04:00
Elie Habib	3ba56997af	feat(forecast): add world-state report synthesis (#1780 )	2026-03-17 19:19:28 +04:00
Elie Habib	0296758398	feat(forecast): add world-state continuity to main (#1779 ) * feat(forecast): add actor continuity to world state * fix(forecast): report full actor continuity counts * feat(forecast): add branch continuity to world state	2026-03-17 18:49:58 +04:00
Elie Habib	8cdca53bd8	feat(forecast): persist run-level world state (#1773 ) * feat(forecast): persist run-level world state * fix(forecast): align world state artifacts	2026-03-17 18:21:56 +04:00
Elie Habib	e2f0811330	fix(forecast): tighten quality and enrichment balance (#1761 )	2026-03-17 14:03:13 +04:00

1 2