Files
worldmonitor/todos/023-complete-p2-simulation-llm-array-fields-unsanitized-r2.md
Elie Habib 01f6057389 feat(simulation): MiroFish Phase 2 — theater-limited simulation runner (#2220)
* feat(simulation): MiroFish Phase 2 — theater-limited simulation runner

Adds the simulation execution layer that consumes simulation-package.json
and produces simulation-outcome.json for maritime chokepoint + energy/logistics
theaters, closing the WorldMonitor → MiroFish handoff loop.

Changes:
- scripts/seed-forecasts.mjs: 2-round LLM simulation runner (prompt builders,
  JSON extractor, runTheaterSimulation, writeSimulationOutcome, task queue
  with NX dedup lock, runSimulationWorker poll loop)
- scripts/process-simulation-tasks.mjs: standalone worker entry point
- proto: GetSimulationOutcome RPC + make generate
- server/worldmonitor/forecast/v1/get-simulation-outcome.ts: RPC handler
- server/gateway.ts: slow tier for get-simulation-outcome
- api/health.js: simulationOutcomeLatest in STANDALONE + ON_DEMAND keys
- tests: 14 new tests for simulation runner functions

* fix(simulation): address P1/P2 code review findings from PR #2220

Security (P1 #018):
- sanitizeForPrompt() applied to all entity/seed fields interpolated into
  Round 1 prompt (entityId, class, stance, seedId, type, timing)
- sanitizeForPrompt() applied to actorId and entityIds in Round 2 prompt
- sanitizeForPrompt() + length caps applied to all LLM array fields written
  to R2 (dominantReactions, stabilizers, invalidators, keyActors, timingMarkers)

Validation (P1 #019):
- Added validateRunId() regex guard
- Applied in enqueueSimulationTask() and processNextSimulationTask() loop

Type safety (P1 #020):
- Added isOutcomePointer() and isPackagePointer() type guards in TS handlers
- Replaced unsafe as-casts with runtime-validated guards in both handlers

Correctness (P2 #022):
- Log warning when pkgPointer.runId does not match task runId

Architecture (P2 #024):
- isMaritimeChokeEnergyCandidate() accepts both flat and nested topBucketId
- Call site simplified to pass theater directly

Performance (P2 #025):
- SIMULATION_ROUND1_MAX_TOKENS raised 1800 to 2200
- Added max 3 initialReactions instruction to Round 1 prompt

Maintainability (P2 #026):
- Simulation pointer keys exported from server/_shared/cache-keys.ts
- Both TS handlers import from shared location

Documentation (P2 #027):
- Strengthened runId no-op description in proto and OpenAPI spec

* fix(todos): add blank lines around lists in markdown todo files

* style(api): reformat openapi yaml to match linter output

* test(simulation): add flat-shape filter test + getSimulationOutcome handler coverage

Two tests identified as missing during PR #2220 review:

1. isMaritimeChokeEnergyCandidate flat-shape tests — covers the || candidate.topBucketId
   normalization added in the P1/P2 review pass. The existing tests only used the nested
   marketContext.topBucketId shape; this adds the flat root-field shape that arrives from
   the simulation-package.json JSON (selectedTheaters entries have topBucketId at root).

2. getSimulationOutcome handler structural tests — verifies the isOutcomePointer guard,
   found:false NOT_FOUND return, found:true success path, note population on runId mismatch,
   and redis_unavailable error string. Follows the readSrc static-analysis pattern used
   elsewhere in server-handlers.test.mjs (handler imports Redis so full integration test
   would require a test Redis instance).
2026-03-25 13:55:59 +04:00

3.1 KiB

status, priority, issue_id, tags
status priority issue_id tags
complete p2 023
code-review
security
simulation-runner
data-integrity

LLM output arrays written to R2 without per-element sanitization or length limits

Problem Statement

In processNextSimulationTask, several LLM output arrays are written to R2 using only map(String).slice(0, N) — which ensures items are strings but applies no length cap per element and no sanitization. A single oversized or injection-containing LLM output item (e.g., a stabilizers entry of 50,000 characters) is written directly to R2 and later served to clients without truncation. Additionally, the timingMarkers sourced from result.round2?.paths?.[0] use a different (non-sanitized) path compared to the per-path timingMarkers processing that correctly applies sanitizeForPrompt.

Findings

F-1 (MEDIUM):

// scripts/seed-forecasts.mjs ~line 15766
dominantReactions: (result.round1?.dominantReactions || []).map(String).slice(0, 6),
stabilizers: (result.round2?.stabilizers || []).map(String).slice(0, 6),
invalidators: (result.round2?.invalidators || []).map(String).slice(0, 6),
keyActors: Array.isArray(p.keyActors) ? p.keyActors.map(String).slice(0, 6) : [],
// No per-element length limit or sanitization — each string can be arbitrarily long

F-2 (MEDIUM):

// ~line 15769 — timingMarkers from round2 paths[0] (different code path than per-path markers)
timingMarkers: (result.round2?.paths?.[0]?.timingMarkers || []).slice(0, 4),
// Individual marker objects NOT sanitized — but per-path timingMarkers at ~15757 DO sanitize

Proposed Solutions

dominantReactions: (result.round1?.dominantReactions || [])
  .map((s) => sanitizeForPrompt(String(s)).slice(0, 120)).slice(0, 6),
stabilizers: (result.round2?.stabilizers || [])
  .map((s) => sanitizeForPrompt(String(s)).slice(0, 120)).slice(0, 6),
invalidators: (result.round2?.invalidators || [])
  .map((s) => sanitizeForPrompt(String(s)).slice(0, 120)).slice(0, 6),
keyActors: Array.isArray(p.keyActors)
  ? p.keyActors.map((s) => sanitizeForPrompt(String(s)).slice(0, 80)).slice(0, 6)
  : [],
// For timingMarkers at ~15769 — apply same sanitization as per-path version:
timingMarkers: (result.round2?.paths?.[0]?.timingMarkers || []).slice(0, 4)
  .map((m) => ({ event: sanitizeForPrompt(m.event || '').slice(0, 80), timing: String(m.timing || 'T+0h').slice(0, 10) })),

Effort: Small | Risk: Low

Acceptance Criteria

  • dominantReactions, stabilizers, invalidators elements capped at 120 chars each with sanitizeForPrompt
  • keyActors elements capped at 80 chars each with sanitizeForPrompt
  • timingMarkers at the theater-result level uses the same sanitization as per-path version
  • Test: LLM output with a 10,000-char stabilizers[0] is truncated to ≤120 chars in R2 artifact

Technical Details

  • File: scripts/seed-forecasts.mjsprocessNextSimulationTask (~lines 15752, 15766-15769)

Work Log

  • 2026-03-24: Found by compound-engineering:review:security-sentinel in PR #2220 review