Files
worldmonitor/scripts/_llm-json.mjs
Elie Habib 81d773d2bb feat(intelligence): DeepEar QW-1/QW-3/QW-4 — Polymarket context, dual-model routing, robust JSON (#2421)
* feat(intelligence): DeepEar QW-1/QW-3/QW-4 — Polymarket context injection, dual-model routing, robust JSON parsing

QW-1: Inject crowd-calibrated Polymarket/Kalshi odds into deduction prompts
- Fetch prediction:markets-bootstrap:v1 from Redis and keyword-score markets
  against user query; top-7 matches appended as structured context block
- Hash prediction context into cache key so odds movements trigger fresh LLM calls
- Sanitize market titles with sanitizeHeadline() before prompt injection
- deductSituation now uses callLlmReasoning (explicit reasoning-tier routing)

QW-3: Dual-model LLM routing via env vars
- callLlmTool (LLM_TOOL_PROVIDER/MODEL, default groq) for extraction tasks
- callLlmReasoning (LLM_REASONING_PROVIDER/MODEL, default openrouter) for synthesis
- Private callLlmProfile factory eliminates duplicated wrapper bodies

QW-4: Shared _llm-json.mjs utility for robust LLM JSON parsing
- cleanJsonText: strips C-style comments and trailing commas before JSON.parse
- extractFirstJsonObject/Array: private extractFirstDelimited walker (was 2x23 LOC)
- Removes duplicate function definitions from seed-forecasts.mjs
- All JSON.parse call sites in tryParseImpactExpansionCandidate and
  tryParseStructuredCandidate now use cleanJsonText for comment/comma tolerance

* fix(intelligence): address PR #2421 review findings

- P1: getCachedJson(PREDICTION_BOOTSTRAP_KEY, true) — seed keys must be
  read raw (unprefixed); without this the prediction-odds block never
  activates outside production
- P2: validate LLM_TOOL_PROVIDER / LLM_REASONING_PROVIDER env value
  against PROVIDER_SET before use; log a warning and fall back to
  defaultProvider on typo instead of silently rerouting
- Filter: lower word-length threshold from >3 to >1 so short but
  critical terms like "war", "oil", "EU", "AI", "Fed", "USD" match
  prediction markets
- Suggestion: bucket yesPrice to 5% increments before building the
  context string to reduce 1%-movement cache churn (20 bands vs 100)

* fix(intelligence): guard empty prediction context header when all titles sanitize to empty
2026-03-28 15:03:54 +04:00

45 lines
1.5 KiB
JavaScript

/**
* Shared LLM JSON extraction utilities.
*
* Handles the common failure modes of LLM JSON output:
* - C-style line and block comments (// ..., /* ... *\/)
* - Trailing commas before } and ]
* - Partial outputs (brace/bracket extraction as fallback)
*
* Note: cleanJsonText strips `//` best-effort and will incorrectly strip
* URLs inside JSON string values (e.g. "https://..."). Acceptable for LLM output.
*/
/**
* Strip C-style comments and trailing commas from a JSON-like string.
*/
export function cleanJsonText(text) {
return text
.replace(/\/\*[\s\S]*?\*\//g, '')
.replace(/\/\/[^\n]*/g, '')
.replace(/,(\s*[}\]])/g, '$1')
.trim();
}
function extractFirstDelimited(text, open, close) {
const cleaned = cleanJsonText(text);
const start = cleaned.indexOf(open);
if (start === -1) return '';
let depth = 0;
let inString = false;
let escaped = false;
for (let i = start; i < cleaned.length; i++) {
const char = cleaned[i];
if (escaped) { escaped = false; continue; }
if (char === '\\') { escaped = true; continue; }
if (char === '"') { inString = !inString; continue; }
if (inString) continue;
if (char === open) depth++;
if (char === close && --depth === 0) return cleaned.slice(start, i + 1);
}
return cleaned.slice(start);
}
export const extractFirstJsonObject = (text) => extractFirstDelimited(text, '{', '}');
export const extractFirstJsonArray = (text) => extractFirstDelimited(text, '[', ']');