Files
worldmonitor/todos/025-complete-p2-simulation-round1-token-budget-too-tight.md
Elie Habib 01f6057389 feat(simulation): MiroFish Phase 2 — theater-limited simulation runner (#2220)
* feat(simulation): MiroFish Phase 2 — theater-limited simulation runner

Adds the simulation execution layer that consumes simulation-package.json
and produces simulation-outcome.json for maritime chokepoint + energy/logistics
theaters, closing the WorldMonitor → MiroFish handoff loop.

Changes:
- scripts/seed-forecasts.mjs: 2-round LLM simulation runner (prompt builders,
  JSON extractor, runTheaterSimulation, writeSimulationOutcome, task queue
  with NX dedup lock, runSimulationWorker poll loop)
- scripts/process-simulation-tasks.mjs: standalone worker entry point
- proto: GetSimulationOutcome RPC + make generate
- server/worldmonitor/forecast/v1/get-simulation-outcome.ts: RPC handler
- server/gateway.ts: slow tier for get-simulation-outcome
- api/health.js: simulationOutcomeLatest in STANDALONE + ON_DEMAND keys
- tests: 14 new tests for simulation runner functions

* fix(simulation): address P1/P2 code review findings from PR #2220

Security (P1 #018):
- sanitizeForPrompt() applied to all entity/seed fields interpolated into
  Round 1 prompt (entityId, class, stance, seedId, type, timing)
- sanitizeForPrompt() applied to actorId and entityIds in Round 2 prompt
- sanitizeForPrompt() + length caps applied to all LLM array fields written
  to R2 (dominantReactions, stabilizers, invalidators, keyActors, timingMarkers)

Validation (P1 #019):
- Added validateRunId() regex guard
- Applied in enqueueSimulationTask() and processNextSimulationTask() loop

Type safety (P1 #020):
- Added isOutcomePointer() and isPackagePointer() type guards in TS handlers
- Replaced unsafe as-casts with runtime-validated guards in both handlers

Correctness (P2 #022):
- Log warning when pkgPointer.runId does not match task runId

Architecture (P2 #024):
- isMaritimeChokeEnergyCandidate() accepts both flat and nested topBucketId
- Call site simplified to pass theater directly

Performance (P2 #025):
- SIMULATION_ROUND1_MAX_TOKENS raised 1800 to 2200
- Added max 3 initialReactions instruction to Round 1 prompt

Maintainability (P2 #026):
- Simulation pointer keys exported from server/_shared/cache-keys.ts
- Both TS handlers import from shared location

Documentation (P2 #027):
- Strengthened runId no-op description in proto and OpenAPI spec

* fix(todos): add blank lines around lists in markdown todo files

* style(api): reformat openapi yaml to match linter output

* test(simulation): add flat-shape filter test + getSimulationOutcome handler coverage

Two tests identified as missing during PR #2220 review:

1. isMaritimeChokeEnergyCandidate flat-shape tests — covers the || candidate.topBucketId
   normalization added in the P1/P2 review pass. The existing tests only used the nested
   marketContext.topBucketId shape; this adds the flat root-field shape that arrives from
   the simulation-package.json JSON (selectedTheaters entries have topBucketId at root).

2. getSimulationOutcome handler structural tests — verifies the isOutcomePointer guard,
   found:false NOT_FOUND return, found:true success path, note population on runId mismatch,
   and redis_unavailable error string. Follows the readSrc static-analysis pattern used
   elsewhere in server-handlers.test.mjs (handler imports Redis so full integration test
   would require a test Redis instance).
2026-03-25 13:55:59 +04:00

2.8 KiB
Raw Permalink Blame History

status, priority, issue_id, tags
status priority issue_id tags
complete p2 025
code-review
performance
simulation-runner
llm

Round 1 token budget (1800) may be too tight for fully-populated theaters

Problem Statement

SIMULATION_ROUND1_MAX_TOKENS = 1800 is the output token cap for Round 1 LLM calls. With a fully-populated theater (10 entities, 8 seeds, constraints, eval targets, simulation requirement, plus the ~350-token JSON response template), the system prompt alone consumes ~1,030 tokens. This leaves ~770 tokens for the response. A minimal valid Round 1 response (3 paths with labels, summaries, and 3 initialReactions each) costs ~700-900 tokens. At the high end of entity/seed density, the model will truncate its JSON mid-object, causing round1_parse_failed and marking the theater as failed — silently, with no token-exhaustion signal in the diagnostic.

Findings

F-1 (HIGH): Token budget vs. prompt size analysis:

  • Static template text: ~350 tokens
  • 10 entities at ~20 tokens each: ~200 tokens
  • 8 event seeds at ~25 tokens each: ~200 tokens
  • simulationRequirement + constraints + evalTargets: ~255 tokens
  • Total input: ~1,005 tokens
  • Output budget remaining: 795 tokens
  • Minimal valid Round 1 response (3 paths, 3 reactions each): ~700-900 tokens
  • Margin: -105 to +95 tokens — essentially zero at max density

SIMULATION_ROUND2_MAX_TOKENS = 2500 is adequate for Round 2 (shorter input, richer output).

Proposed Solutions

const SIMULATION_ROUND1_MAX_TOKENS = 2200; // was 1800

// In buildSimulationRound1SystemPrompt INSTRUCTIONS section, add:
// - Maximum 3 initialReactions per path

This provides a 1,195-token output margin (2200 - 1005) which comfortably fits 3 paths × 3 reactions. The initialReactions cap aligns with existing behavior (only 3 are used in Round 2 path summaries).

Effort: Trivial | Risk: Very Low — increases LLM output budget, no structural change

Option B: Dynamic token calculation based on entity/seed count

Calculate prompt token estimate and adjust maxTokens accordingly. More precise but adds complexity with no meaningful benefit given the fixed slice limits.

Acceptance Criteria

  • SIMULATION_ROUND1_MAX_TOKENS raised from 1800 to 2200
  • INSTRUCTIONS block in buildSimulationRound1SystemPrompt includes "- Maximum 3 initialReactions per path"
  • Existing tests pass (prompt builder tests check content, not token count)

Technical Details

  • File: scripts/seed-forecasts.mjsSIMULATION_ROUND1_MAX_TOKENS (~line 38), buildSimulationRound1SystemPrompt INSTRUCTIONS section (~line 15445)

Work Log

  • 2026-03-24: Found by compound-engineering:review:performance-oracle in PR #2220 review