* feat(simulation): MiroFish Phase 2 — theater-limited simulation runner Adds the simulation execution layer that consumes simulation-package.json and produces simulation-outcome.json for maritime chokepoint + energy/logistics theaters, closing the WorldMonitor → MiroFish handoff loop. Changes: - scripts/seed-forecasts.mjs: 2-round LLM simulation runner (prompt builders, JSON extractor, runTheaterSimulation, writeSimulationOutcome, task queue with NX dedup lock, runSimulationWorker poll loop) - scripts/process-simulation-tasks.mjs: standalone worker entry point - proto: GetSimulationOutcome RPC + make generate - server/worldmonitor/forecast/v1/get-simulation-outcome.ts: RPC handler - server/gateway.ts: slow tier for get-simulation-outcome - api/health.js: simulationOutcomeLatest in STANDALONE + ON_DEMAND keys - tests: 14 new tests for simulation runner functions * fix(simulation): address P1/P2 code review findings from PR #2220 Security (P1 #018): - sanitizeForPrompt() applied to all entity/seed fields interpolated into Round 1 prompt (entityId, class, stance, seedId, type, timing) - sanitizeForPrompt() applied to actorId and entityIds in Round 2 prompt - sanitizeForPrompt() + length caps applied to all LLM array fields written to R2 (dominantReactions, stabilizers, invalidators, keyActors, timingMarkers) Validation (P1 #019): - Added validateRunId() regex guard - Applied in enqueueSimulationTask() and processNextSimulationTask() loop Type safety (P1 #020): - Added isOutcomePointer() and isPackagePointer() type guards in TS handlers - Replaced unsafe as-casts with runtime-validated guards in both handlers Correctness (P2 #022): - Log warning when pkgPointer.runId does not match task runId Architecture (P2 #024): - isMaritimeChokeEnergyCandidate() accepts both flat and nested topBucketId - Call site simplified to pass theater directly Performance (P2 #025): - SIMULATION_ROUND1_MAX_TOKENS raised 1800 to 2200 - Added max 3 initialReactions instruction to Round 1 prompt Maintainability (P2 #026): - Simulation pointer keys exported from server/_shared/cache-keys.ts - Both TS handlers import from shared location Documentation (P2 #027): - Strengthened runId no-op description in proto and OpenAPI spec * fix(todos): add blank lines around lists in markdown todo files * style(api): reformat openapi yaml to match linter output * test(simulation): add flat-shape filter test + getSimulationOutcome handler coverage Two tests identified as missing during PR #2220 review: 1. isMaritimeChokeEnergyCandidate flat-shape tests — covers the || candidate.topBucketId normalization added in the P1/P2 review pass. The existing tests only used the nested marketContext.topBucketId shape; this adds the flat root-field shape that arrives from the simulation-package.json JSON (selectedTheaters entries have topBucketId at root). 2. getSimulationOutcome handler structural tests — verifies the isOutcomePointer guard, found:false NOT_FOUND return, found:true success path, note population on runId mismatch, and redis_unavailable error string. Follows the readSrc static-analysis pattern used elsewhere in server-handlers.test.mjs (handler imports Redis so full integration test would require a test Redis instance).
2.8 KiB
status, priority, issue_id, tags
| status | priority | issue_id | tags | ||||
|---|---|---|---|---|---|---|---|
| complete | p2 | 025 |
|
Round 1 token budget (1800) may be too tight for fully-populated theaters
Problem Statement
SIMULATION_ROUND1_MAX_TOKENS = 1800 is the output token cap for Round 1 LLM calls. With a fully-populated theater (10 entities, 8 seeds, constraints, eval targets, simulation requirement, plus the ~350-token JSON response template), the system prompt alone consumes ~1,030 tokens. This leaves ~770 tokens for the response. A minimal valid Round 1 response (3 paths with labels, summaries, and 3 initialReactions each) costs ~700-900 tokens. At the high end of entity/seed density, the model will truncate its JSON mid-object, causing round1_parse_failed and marking the theater as failed — silently, with no token-exhaustion signal in the diagnostic.
Findings
F-1 (HIGH): Token budget vs. prompt size analysis:
- Static template text: ~350 tokens
- 10 entities at ~20 tokens each: ~200 tokens
- 8 event seeds at ~25 tokens each: ~200 tokens
- simulationRequirement + constraints + evalTargets: ~255 tokens
- Total input: ~1,005 tokens
- Output budget remaining: 795 tokens
- Minimal valid Round 1 response (3 paths, 3 reactions each): ~700-900 tokens
- Margin: -105 to +95 tokens — essentially zero at max density
SIMULATION_ROUND2_MAX_TOKENS = 2500 is adequate for Round 2 (shorter input, richer output).
Proposed Solutions
Option A: Raise SIMULATION_ROUND1_MAX_TOKENS to 2200 + cap initialReactions in prompt (Recommended)
const SIMULATION_ROUND1_MAX_TOKENS = 2200; // was 1800
// In buildSimulationRound1SystemPrompt INSTRUCTIONS section, add:
// - Maximum 3 initialReactions per path
This provides a 1,195-token output margin (2200 - 1005) which comfortably fits 3 paths × 3 reactions. The initialReactions cap aligns with existing behavior (only 3 are used in Round 2 path summaries).
Effort: Trivial | Risk: Very Low — increases LLM output budget, no structural change
Option B: Dynamic token calculation based on entity/seed count
Calculate prompt token estimate and adjust maxTokens accordingly. More precise but adds complexity with no meaningful benefit given the fixed slice limits.
Acceptance Criteria
SIMULATION_ROUND1_MAX_TOKENSraised from 1800 to 2200- INSTRUCTIONS block in
buildSimulationRound1SystemPromptincludes "- Maximum 3 initialReactions per path" - Existing tests pass (prompt builder tests check content, not token count)
Technical Details
- File:
scripts/seed-forecasts.mjs—SIMULATION_ROUND1_MAX_TOKENS(~line 38),buildSimulationRound1SystemPromptINSTRUCTIONS section (~line 15445)
Work Log
- 2026-03-24: Found by compound-engineering:review:performance-oracle in PR #2220 review