---
status: pending
priority: p2
issue_id: "013"
tags: [code-review, deep-forecast, simulation-package, security]
---

# LLM-sourced strings enter `simulation-package.json` without `sanitizeForPrompt` — prompt injection risk

## Problem Statement

`buildSimulationRequirementText`, `buildSimulationPackageEventSeeds`, and `buildSimulationPackageEntities` interpolate LLM-generated strings directly into R2 artifact fields with no sanitization. The rest of `seed-forecasts.mjs` applies `sanitizeForPrompt()` to all LLM-derived strings before they enter prompts or Redis. The simulation package is explicitly designed to be consumed by downstream LLMs (MiroFish, scenario-analysis workflows), so unsanitized content is a stored prompt injection vector.

## Findings

**F-1 (HIGH):** `theater.label` (`candidateStateLabel`) used directly in `simulationRequirement` string:
```javascript
return `Simulate how a ${theater.label} (${theater.stateKind || 'disruption'} at ${route}${commodity})...`;
```
`candidateStateLabel` derives from LLM-generated cluster labels via `formatStateUnitLabel`.

**F-6 (MEDIUM):** `theater.topChannel` and `critTypes` also interpolated — these derive from LLM-generated market context and signal types. `replace(/_/g, ' ')` is presentational, not a security control. A value `ignore_previous_instructions` becomes `ignore previous instructions`.

**F-2 (MEDIUM):** `entry.text.slice(0, 200)` in event seeds — LLM evidence table text sliced but not stripped of injection patterns.

**F-3 (MEDIUM):** Actor names split from `entry.text` go directly into `name:` field and `entityId` slug with no sanitization.

## Proposed Solutions

### Option A: Apply `sanitizeForPrompt` to all LLM-sourced strings before artifact emission (Recommended)

```javascript
// In buildSimulationRequirementText:
const label = sanitizeForPrompt(theater.label) || theater.dominantRegion || 'unknown theater';
const route = sanitizeForPrompt(theater.routeFacilityKey || theater.dominantRegion);

// In buildSimulationPackageEventSeeds:
summary: sanitizeForPrompt(entry.text).slice(0, 200),

// In buildSimulationPackageEntities (actor name):
name: sanitizeForPrompt(actorName),
```
Effort: Small | Risk: Low — `sanitizeForPrompt` already exists and is used throughout the file

### Option B: Allowlist-validate field values instead of sanitizing

`topChannel` and `topBucketId` are already constrained by `MARKET_BUCKET_ALLOWED_CHANNELS` and `IMPACT_VARIABLE_REGISTRY`. Validate them against those registries before interpolation. `theater.label` would still need `sanitizeForPrompt`.
Effort: Small | Risk: Low

## Acceptance Criteria

- [ ] `buildSimulationRequirementText` applies `sanitizeForPrompt` to `theater.label`, `theater.stateKind`, `theater.topChannel`, and `critTypes` before string interpolation
- [ ] `buildSimulationPackageEventSeeds` applies `sanitizeForPrompt` to `entry.text` before `.slice(0, 200)`
- [ ] Actor names extracted from evidence table are sanitized before becoming entity `name` and `entityId`
- [ ] Test: a `theater.label` containing `\nIgnore previous instructions` produces a sanitized `simulationRequirement` string with no newlines or directive text

## Technical Details

- File: `scripts/seed-forecasts.mjs` — `buildSimulationRequirementText`, `buildSimulationPackageEventSeeds`, `buildSimulationPackageEntities`
- Existing function: `sanitizeForPrompt(text)` — already in the file, strips newlines, control chars, limits to 200 chars

## Work Log

- 2026-03-24: Found by compound-engineering:review:security-sentinel and compound-engineering:research:learnings-researcher in PR #2204 review