worldmonitor

mirror of https://github.com/koala73/worldmonitor.git synced 2026-04-25 17:14:57 +02:00

Author	SHA1	Message	Date
Elie Habib	305dc5ef36	feat(digest-dedup): Phase A — embedding-based dedup scaffolding (no-op) (#3200 ) * feat(digest-dedup): Phase A — embedding-based dedup scaffolding (no-op) Replaces the inline Jaccard story-dedup in seed-digest-notifications with an orchestrator that can run Jaccard, shadow, or full embedding modes. Ships with DIGEST_DEDUP_MODE=jaccard as the default so production behaviour is unchanged until Phase C shadow + Phase D flip. New modules (scripts/lib/): - brief-dedup-consts.mjs tunables + cache prefix + __constants bag - brief-dedup-jaccard.mjs verbatim 0.55-threshold extract (fallback) - entity-gazetteer.mjs cities/regions gazetteer + common-caps - brief-embedding.mjs OpenRouter /embeddings client with Upstash cache, all-or-nothing timeout, cosineSimilarity - brief-dedup-embed.mjs complete-link clustering + entity veto (pure) - brief-dedup.mjs orchestrator, env read at call entry, shadow archive, structured log line Operator tools (scripts/tools/): - calibrate-dedup-threshold.mjs offline calibration runner + histogram - golden-pair-validator.mjs live-embedder drift detector (nightly CI) - shadow-sample.mjs Sample A/B CSV emitter over SCAN archive Tests: - brief-dedup-jaccard.test.mjs migrated from regex-harness to direct import plus orchestrator parity tests (22) - brief-dedup-embedding.test.mjs 9 plan scenarios incl. 10-permutation property test, complete-link non-chain (21) - brief-dedup-golden.test.mjs 20-pair mocked canary (21) Workflows: - .github/workflows/dedup-golden-pairs.yml nightly live-embedder canary (07:17 UTC), opens issue on drift Deviation from plan: the shouldVeto("Iran closes Hormuz", "Tehran shuts Hormuz") case can't return true under a single coherent classification (country-in-A vs capital-in-B sit on different sides of the actor/location boundary). Gazetteer follows the plan's "countries are actors" intent; the test is updated to assert false with a comment pointing at the irreducible capital-country coreference limitation. Verification: - npm run test:data 5825/5825 pass - tests/edge-functions 171/171 pass - typecheck + typecheck:api clean - biome check on new files clean - lint:md 0 errors Phase B (calibration), Phase C (shadow), and Phase D (flip) are subsequent PRs. * refactor(digest-dedup): address review findings 193-199 Fresh-eyes review found 3 P1s, 3 P2s, and a P3 bundle across kieran-typescript, security-sentinel, performance-oracle, architecture- strategist, and code-simplicity reviewers. Fixes below; all 64 dedup tests + 5825 data tests + 171 edge-function tests still green. P1 #193 - dedup regex + redis pipeline duplication - Extract defaultRedisPipeline into scripts/lib/_upstash-pipeline.mjs; both orchestrator and embedding client import from there. - normalizeForEmbedding now delegates to stripSourceSuffix from the Jaccard module so the outlet allow-list is single-sourced. P1 #194 - embedding timeout floor + negative-budget path - callEmbeddingsApi throws EmbeddingTimeoutError when timeoutMs<=0 instead of opening a doomed 250ms fetch. - Removed Math.max(250, ...) floor that let wall-clock cap overshoot. P1 #195 - dead env getters - Deleted getMode / isRemoteEmbedEnabled / isEntityVetoEnabled / getCosineThreshold / getWallClockMs from brief-dedup-consts.mjs (zero callers; orchestrator reimplements inline). P2 #196 - orchestrator cleanup bundle - Removed re-exports at bottom of brief-dedup.mjs. - Extracted materializeCluster into brief-dedup-jaccard.mjs; both the fallback and orchestrator use the shared helper. - Deleted clusterWithEntityVeto wrapper; orchestrator inlines the vetoFn wiring at the single call site. - Shadow mode now runs Jaccard exactly once per tick (was twice). - Fallback warn line carries reason=ErrorName so operators can filter timeout vs provider vs shape errors. - Invalid DIGEST_DEDUP_MODE values emit a warn once per run (vs silently falling to jaccard). P2 #197 - workflow + shadow-sample hardening - dedup-golden-pairs.yml body composition no longer relies on a heredoc that would command-substitute validator stdout. Switched to printf with sanitised LOG_TAIL (printable ASCII only) and --body-file so crafted fixture text cannot escape into the runner. - shadow-sample.mjs Upstash helper enforces a hardcoded command allowlist (SCAN \| GET \| EXISTS). P2 #198 - test + observability polish - Scenarios 2 and 3 deep-equal returned clusters against the Jaccard expected shape, not just length. Also assert the reason= field. P3 #199 - nits - Removed __constants test-bag; jaccard tests use named imports. - Renamed deps.apiKey to deps._apiKey in embedding client. - Added @pre JSDoc on diffClustersByHash about unique-hash contract. - Deferred: mocked golden-pair test removal, gazetteer JSON migration, scripts/tools AGENTS.md doc note. Todos 193-199 moved from pending to complete. Verification: - npm run test:data 5825/5825 pass - tests/edge-functions 171/171 pass - typecheck + typecheck:api clean - biome check on changed files clean * fix(digest-dedup): address Greptile P2 findings on PR #3200 1. brief-embedding.mjs: wrap fetch lookup as `(...args) => globalThis.fetch(...args)` instead of aliasing bare `fetch`. Aliasing captures the binding at module-load time, so later instrumentation / Edge-runtime shims don't see the wrapper — same class of bug as the banned `fetch.bind(globalThis)` pattern flagged in AGENTS.md. 2. dedup-golden-pairs.yml: `gh issue create --label "..." \|\| true` silently swallowed the failure when any of dedup/canary/p1 labels didn't pre-exist, breaking the drift alert channel while leaving the job red in the Actions UI. Switched to repeated `--label` flags + `--create-label` so any missing label is auto-created on first drift, and dropped the `\|\| true` so a legitimate failure (network / auth) surfaces instead of hiding. Both fixes are P2-style per Greptile (confidence 5/5, no P0/P1); applied pre-merge so the nightly canary is usable from day one. * fix(digest-dedup): two P1s found on PR #3200 P1 — canary classifier must match production Nightly golden-pair validator was checking a hardcoded threshold (default 0.60) and always applied the entity veto, while the actual dedup path at runtime reads DIGEST_DEDUP_COSINE_THRESHOLD and DIGEST_DEDUP_ENTITY_VETO_ENABLED from env at every call. A Phase C/D env flip could make the canary green while prod was wrong or red while prod was healthy, defeating the whole point of a drift detector. Fix: - golden-pair-validator.mjs now calls readOrchestratorConfig(process.env) — the same helper the orchestrator uses — so any classifier knob added later is picked up automatically. The threshold and veto- enabled flags are sourced from env by default; a --threshold CLI flag still overrides for manual calibration sweeps. - dedup-golden-pairs.yml sources DIGEST_DEDUP_COSINE_THRESHOLD and DIGEST_DEDUP_ENTITY_VETO_ENABLED from GitHub repo variables (vars.), which operators must keep in lockstep with Railway. The workflow_dispatch threshold input now defaults to empty; the scheduled canary always uses the production-parity config. - Validator log line prints the effective config + source so nightly output makes the classifier visible. P1 — shadow archive writes were fail-open `defaultRedisPipeline()` returns null on timeout / auth / HTTP failure. `writeShadowArchive()` only had a try/catch, so the null result was silently treated as success. A Phase C rollout could log clean "mode=shadow … disagreements=X" lines every tick while the Upstash archive received zero writes — and Sample B labelling would then find no batches, silently killing calibration. Fix: - writeShadowArchive now inspects the pipeline return. null result, non-array response, per-command {error}, or a cell without {result: "OK"} all return {ok: false, reason}. - Orchestrator emits a warn line with the failure reason, and the structured log line carries archive_write=ok\|failed so operators can grep for failed ticks. - Regression test in brief-dedup-embedding.test.mjs simulates the null-pipeline contract and asserts both the warn and the structured field land. Verification: - test:data 5825/5825 pass - dedup suites 65/65 pass (new: archive-fail regression) - typecheck + api clean - biome check clean on changed files fix(digest-dedup): two more P1s found on PR #3200 P1 — canary must also honour DIGEST_DEDUP_MODE + REMOTE_EMBED_ENABLED The prior round fixed the threshold/veto knobs but left the canary running embeddings regardless of whether production could actually reach the embed path. If Railway has DIGEST_DEDUP_MODE=jaccard or DIGEST_DEDUP_REMOTE_EMBED_ENABLED=0, production never calls the classifier, so a drift signal is meaningless — or worse, a live OpenRouter issue flags the canary while prod is obliviously fine. Fix: - golden-pair-validator.mjs reads mode + remoteEmbedEnabled from the same readOrchestratorConfig() helper the orchestrator uses. When either says "embed path inactive in prod", the validator logs an explicit skip line and exits 0. The nightly workflow then shows green, which is the correct signal ("nothing to drift against"). - A --force CLI flag remains for manual dispatch during staged rollouts. - dedup-golden-pairs.yml sources DIGEST_DEDUP_MODE and DIGEST_DEDUP_REMOTE_EMBED_ENABLED from GitHub repo variables alongside the threshold and veto-enabled knobs, so all four classifier gates stay in lockstep with Railway. - Validator log line now prints mode + remoteEmbedEnabled so the canary output surfaces which classifier it validated. P1 — shadow-sample Sample A was biased by SCAN order enumerate-and-dedup added every seen pair to a dedup key BEFORE filtering by agreement. If the same pair appeared in an agreeing batch first and a disagreeing batch later, the disagreeing occurrence was silently dropped. SCAN order is unspecified, so Sample A could omit real disagreement pairs. Fix: - Extracted the enumeration into a pure `enumeratePairs(archives, mode)` export so the logic is testable. Mode filter runs BEFORE the dedup check: agreeing pairs are skipped entirely under --mode disagreements, so any later disagreeing occurrence can still claim the dedup slot. - Added tests/brief-dedup-shadow-sample.test.mjs with 5 regression cases: agreement-then-disagreement, reversed order (symmetry), always-agreed omission, population enumeration, cross-batch dedup. - isMain guard added so importing the module for tests does not kick off the CLI scan path. Verification: - test:data 5825/5825 pass - dedup suites 70/70 pass (5 new shadow-sample regressions) - typecheck + api clean - biome check clean on changed files Operator follow-up before Phase C: Set all FOUR dedup repo variables in GitHub alongside Railway: DIGEST_DEDUP_MODE, DIGEST_DEDUP_REMOTE_EMBED_ENABLED, DIGEST_DEDUP_COSINE_THRESHOLD, DIGEST_DEDUP_ENTITY_VETO_ENABLED * refactor(digest-dedup): Railway is the single source of truth for dedup config Fair user pushback: asking operators to set four DIGEST_DEDUP_* values in BOTH Railway (where the cron runs) AND GitHub repo variables (where the canary runs) is architectural debt. Two copies of the same truth will always drift. Solution: the digest cron publishes its resolved config to Upstash on every tick under brief:dedup:config:v1 (2h TTL). The nightly golden-pair canary reads that key instead of env vars. Railway stays the sole source of truth; no parallel repo variables to maintain. A missing/expired key signals "cron hasn't run" and the canary skips with exit 0 — better than validating against hardcoded defaults that might diverge from prod. Changes: - brief-dedup-consts.mjs: new ACTIVE_CONFIG_KEY + TTL constants. - brief-dedup.mjs: new publishActiveConfig() fires at the start of every deduplicateStories() call (before the mode short-circuit, so jaccard ticks also publish a "mode=jaccard" signal the canary can read). Fire-and-forget; archive-write error semantics still apply if the operator wants stricter tracking. - golden-pair-validator.mjs: removed readOrchestratorConfig(env) path. Now calls fetchActiveConfigFromUpstash() and either validates against that config, skips when the embed path is inactive, or skips when the key is missing (with --force override for manual dispatch). - dedup-golden-pairs.yml: dropped the four DIGEST_DEDUP_* env lines and the corresponding repo-variable dependency. Only the three Upstash + OpenRouter secrets remain. - tests: two new regressions assert config is published on every tick (shadow AND jaccard modes) with the right shape + TTL. Operator onboarding now takes one action: set the four DIGEST_DEDUP_* variables on the Railway seed-digest-notifications service. Nothing to set in GitHub beyond the existing OPENROUTER_API_KEY / UPSTASH_* secrets. Verification: - test:data 5825/5825 pass - dedup suites 72/72 pass (2 new config-publish regressions) - typecheck + api clean - biome check clean on changed files * refactor(digest-dedup): ship embed directly, drop phases/canary/shadow User feedback: "i dont need multiple phases and shit, we go directly to embed". Fair. Ripping out the overengineering I accumulated: DELETED - .github/workflows/dedup-golden-pairs.yml (nightly canary) - scripts/tools/golden-pair-validator.mjs - scripts/tools/shadow-sample.mjs - scripts/tools/calibrate-dedup-threshold.mjs - tests/fixtures/brief-dedup-golden-pairs.json - tests/brief-dedup-golden.test.mjs - tests/brief-dedup-shadow-sample.test.mjs SIMPLIFIED - brief-dedup.mjs: removed shadow mode, publishActiveConfig, writeShadowArchive, diffClustersByHash, jaccardRepsToClusterHashes, and the DIGEST_DEDUP_REMOTE_EMBED_ENABLED knob. MODE is now binary: `embed` (default) or `jaccard` (instant kill switch). - brief-dedup-consts.mjs: dropped SHADOW_ARCHIVE_, ACTIVE_CONFIG_. - Default flipped: DIGEST_DEDUP_MODE unset = embed (prod path). Railway deploy with OPENROUTER_API_KEY set = embeddings live on next cron tick. Set MODE=jaccard on Railway to revert instantly. Orchestrator still falls back to Jaccard on any embed-path failure (timeout, provider outage, missing API key, bad response). Fallback warn carries reason=<ErrorName>. The cron never fails because embeddings flaked. All 64 dedup tests + 5825 data tests still green. Net diff: -1,407 lines. Operator single action: set OPENROUTER_API_KEY on Railway's seed-digest-notifications service (already present) and ship. No GH Actions, no shadow archives, no labelling sprints. If the 0.60 threshold turns out wrong, tune DIGEST_DEDUP_COSINE_THRESHOLD on Railway — takes effect on next tick, no redeploy. * fix(digest-dedup): multi-word location phrases in the entity veto Extractor was whitespace-tokenising and only single-token matching against LOCATION_GAZETTEER, silently making every multi-word entry unreachable: extractEntities("Houthis strike ship in Red Sea") → { locations: [], actors: ['houthis','red','sea'] } ✗ shouldVeto("Houthis strike ship in Red Sea", "US escorts convoy in Red Sea") → false ✗ With MODE=embed as the default, that turned off the main anti-overmerge safety rail for bodies of water, regions, and compound city names — exactly the P07-Hormuz / Houthis-Red-Sea headlines the veto was designed to cover. Fix: greedy longest-phrase scan with a sliding window. At each token position try the longest multi-word phrase first (down to 2), require first AND last tokens to be capitalised (so lowercase prose like "the middle east" doesn't falsely match while headline "Middle East" does), lowercase connectors in between are fine ("Strait of Hormuz" → phrase "strait of hormuz" ✓). Falls back to single-token lookup when no multi-word phrase fits. Now: extractEntities("Houthis strike ship in Red Sea") → { locations: ['red sea'], actors: ['houthis'] } ✓ shouldVeto(Red-Sea-Houthis, Red-Sea-US) → true ✓ Complexity still O(N · MAX_PHRASE_LEN) — MAX_PHRASE_LEN is 4 (longest gazetteer entry: "ho chi minh city"), so this is effectively O(N). Added 5 regression tests covering Red Sea, South China Sea, Strait of Hormuz (lowercase-connector case), Abu Dhabi, and New York, plus the Houthis-vs-US veto reproducer from the P1. All 5825 data tests + 45 dedup tests green; lint + typecheck clean.	2026-04-19 13:49:48 +04:00
Sebastien Melki	72e3e3ee3b	fix(auth): remove isProUser() gate so all visitors see Sign In button (#3115 ) The setupAuthWidget() call was gated behind isProUser(), which created a deadlock: new users without legacy API keys could never see the Sign In button and thus could never authenticate. Removes the guard in both the main app and the pro landing page pricing section. Closes #034 Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 19:08:29 +04:00
Elie Habib	7dfdc819a9	Phase 0: Regional Intelligence snapshot writer foundation (#2940 )	2026-04-11 17:55:39 +04:00
Elie Habib	60e727679c	feat(supply-chain): Sprint E — scenario visual completion + service parity (#2910 ) * feat(supply-chain): Sprint E — scenario visual completion + service parity - E1: fetchSectorDependency exported from supply-chain service index - E2: PRO gate + all-renderer dispatch in MapContainer.activateScenario - E3: scenario summary banner in SupplyChainPanel (dismiss wired) - E4: "Simulate Closure" trigger button in expanded chokepoint cards - E5: affectedIso2s heat layer in DeckGLMap (GeoJsonLayer, red tint) - E6: SVG renderer setScenarioState (best-effort iso2 fill) - E7: Globe renderer scenario polygons via flushPolygons - E8: integration tests for scenario run/status endpoints * fix(supply-chain): address PR #2910 review findings (P1 + P2 + P3) - Wire setOnScenarioActivate + setOnDismissScenario in panel-layout.ts (todo #155) - Rename shadow variable t→tmpl in SCENARIO_TEMPLATES.find (todo #152) - Add statusResp.ok guard in scenario polling loop (todo #153) - Replace status.result! non-null assertion with shape guard (todo #154) - Add AbortController to prevent concurrent polling races (todo #162) - Add polygonStrokeColor scenario branch (transparent) in GlobeMap (todo #156) - Re-export SCENARIO_TEMPLATES via src/config/scenario-templates.ts (todo #157) - Cache affectedIso2Set in DeckGLMap.setScenarioState (todo #158) - Add scenario paths to PREMIUM_RPC_PATHS for auth injection (todo #160) - Show template name in scenario banner instead of raw ID (todo #163) * fix(supply-chain): address PR #2910 review findings - Add auth headers to scenario fetch calls in SupplyChainPanel - Reset button state on scenario dismiss - Poll status immediately on first iteration (no 2s delay) - Pre-compute scenario polygons in GlobeMap.setScenarioState - Use scenarioId for DeckGL updateTriggers precision * fix(supply-chain): wire panel instance to MapContainer, stop button click propagation - Call setSupplyChainPanel() in panel-layout.ts so scenario banner renders - Add stopPropagation() to Simulate Closure button to prevent card collapse	2026-04-10 21:31:26 +04:00
Elie Habib	6e401ad02f	feat(supply-chain): Global Shipping Intelligence — Sprint 0 + Sprint 1 (#2870 ) * feat(supply-chain): Sprint 0 — chokepoint registry, HS2 sectors, war_risk_tier - src/config/chokepoint-registry.ts: single source of truth for all 13 canonical chokepoints with displayName, relayName, portwatchName, corridorRiskName, baselineId, shockModelSupported, routeIds, lat/lon - src/config/hs2-sectors.ts: static dictionary for all 99 HS2 chapters with category, shockModelSupported (true only for HS27), cargoType - server/worldmonitor/supply-chain/v1/_chokepoint-ids.ts: migrated to derive CANONICAL_CHOKEPOINTS from chokepoint-registry; no data duplication - src/config/geo.ts + src/types/index.ts: added chokepointId field to StrategicWaterway interface and all 13 STRATEGIC_WATERWAYS entries - src/components/MapPopup.ts: switched chokepoint matching from fragile name.toLowerCase() to direct chokepointId === id comparison - server/worldmonitor/intelligence/v1/_shock-compute.ts: migrated from old IDs (hormuz/malacca/babelm) to canonical IDs (hormuz_strait/malacca_strait/ bab_el_mandeb); same for CHOKEPOINT_LNG_EXPOSURE - proto/worldmonitor/supply_chain/v1/supply_chain_data.proto: added WarRiskTier enum + war_risk_tier field (field 16) on ChokepointInfo - get-chokepoint-status.ts: populates warRiskTier from ChokepointConfig.threatLevel via new threatLevelToWarRiskTier() helper (FREE field, no PRO gate) * feat(supply-chain): Sprint 1 — country chokepoint exposure index + sector ring S1.1: scripts/shared/country-port-clusters.json ~130 country → {nearestRouteIds, coastSide} mappings derived from trade route waypoints; covers all 6 seeded Comtrade reporters plus major trading nations. S1.2: scripts/seed-hs2-chokepoint-exposure.mjs Daily cron seeder. Pure computation — reads country-port-clusters.json, scores each country against CHOKEPOINT_REGISTRY route overlap, writes supply-chain:exposure:{iso2}:{hs2}:v1 keys + seed-meta (24h TTL). S1.3: RPC get-country-chokepoint-index (PRO-gated, request-varying) - proto: GetCountryChokepointIndexRequest/Response + ChokepointExposureEntry - handler: isCallerPremium gate; cachedFetchJson 24h; on-demand for any iso2 - cache-keys.ts: CHOKEPOINT_EXPOSURE_KEY(iso2, hs2) constant - health.js: chokepointExposure SEED_META entry (48h threshold) - gateway.ts: slow-browser cache tier - service client: fetchCountryChokepointIndex() exported S1.4: Chokepoint popup HS2 sector ring chart (PRO-gated) Static trade-sector breakdown (IEA/UNCTAD estimates) per 9 major chokepoints. SVG donut ring + legend shown for PRO users; blurred lockout + gate-hit analytics for free users. Wired into renderWaterwayPopup(). 🤖 Generated with Claude Sonnet 4.6 via Claude Code (https://claude.com/claude-code) + Compound Engineering v2.49.0 Co-Authored-By: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com> * fix(tests): update energy-shock-v2 tests to use canonical chokepoint IDs CHOKEPOINT_EXPOSURE and CHOKEPOINT_LNG_EXPOSURE keys were migrated from short IDs (hormuz, malacca, babelm) to canonical registry IDs (hormuz_strait, malacca_strait, bab_el_mandeb) in Sprint 0. Test fixtures were not updated at the time; fix them now. * fix(tests): update energy-shock-seed chokepoint ID to canonical form VALID_CHOKEPOINTS changed to canonical IDs in Sprint 0; the seed test that checks valid IDs was not updated alongside it. * fix(cache-keys): reword JSDoc comment to avoid confusing bootstrap test regex The comment "NOT in BOOTSTRAP_CACHE_KEYS" caused the bootstrap.test.mjs regex to match the comment rather than the actual export declaration, resulting in 0 entries found. Rephrase to "excluded from bootstrap". * fix(supply-chain): address P1 review findings for chokepoint exposure index - Add get-country-chokepoint-index to PREMIUM_RPC_PATHS (CDN bypass) - Validate iso2/hs2 params before Redis key construction (cache injection) - Fix seeder TTL to 172800s (2× interval) and extend TTL on skipped lock - Fix CHOKEPOINT_EXPOSURE_SEED_META_KEY to match seeder write key - Render placeholder sectors behind blur gate (DOM data leakage) - Document get-country-chokepoint-index in widget agent system prompts * fix(lint): resolve Biome CI failures - Add biome.json overrides to silence noVar in HTML inline scripts, disable linting for public/ vendor/build artifacts and pro-test/ - Remove duplicate NG and MW keys from country-port-clusters.json - Use import attributes (with) instead of deprecated assert syntax * fix(build): drop JSON import attribute — esbuild rejects `with` syntax --------- Co-authored-by: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com>	2026-04-09 17:06:03 +04:00
Elie Habib	2fddee6b05	feat(simulation): add keyActorRoles to fix actor overlap bonus vocabulary mismatch (#2582 ) * feat(simulation): add keyActorRoles field to fix actor overlap bonus vocabulary mismatch The +0.04 actor overlap bonus never reliably fired in production because stateSummary.actors uses role-category strings ('Commodity traders', 'Policy officials') while simulation keyActors uses named geo-political entities ('Iran', 'Houthi'). 53 production runs audited showed the bonus fired once out of 53. Fix: add keyActorRoles?: string[] to SimulationTopPath. The Round 2 prompt now includes a CANDIDATE ACTOR ROLES section with theater-local role vocab seeded from candidatePacket.stateSummary.actors. The LLM copies matching roles into keyActorRoles. applySimulationMerge scores overlap against keyActorRoles when actorSource=stateSummary, preserving the existing keyActors entity-overlap path for the affectedAssets fallback. - buildSimulationPackageFromDeepSnapshot: add actorRoles[] to each theater from candidate.stateSummary.actors (theater-scoped, no cross-theater noise) - buildSimulationRound2SystemPrompt: inject CANDIDATE ACTOR ROLES section with exact-copy instruction and keyActorRoles in JSON template - tryParseSimulationRoundPayload: extract keyActorRoles from round 2 output - mergedPaths.map(): filter keyActorRoles against theater.actorRoles guardrail - computeSimulationAdjustment: dual-path overlap — roleOverlapCount for stateSummary, keyActorsOverlapCount for affectedAssets (backwards compat) - summarizeImpactPathScore: project roleOverlapCount + keyActorsOverlapCount into path-scorecards.json simDetail New fields: roleOverlapCount, keyActorsOverlapCount in SimulationAdjustmentDetail and ScorecardSimDetail. actorOverlapCount preserved as backwards-compat alias. Tests: 308 pass (was 301 before). New tests T-P1/T-P2/T-P3 (prompt/parser), T-RO1/T-RO2/T-RO3 (role overlap logic), T-PKG1 (pkg builder actorRoles), plus fixture updates for T2/T-F/T-G/T-J/T-K/T-N2/T-SC-4. 🤖 Generated with Claude Sonnet 4.6 via Claude Code (https://claude.ai/claude-code) + Compound Engineering v2.49.0 Co-Authored-By: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com> * fix(simulation): address CE review findings from PR #2582 - Add SimulationPackageTheater interface to seed-forecasts.types.d.ts (actorRoles was untyped under @ts-check) - Add keyActorRoles to uiTheaters Redis projection in writeSimulationOutcome (field was stripped from Redis snapshot; only visible in R2 artifact) - Extract keyActorRoles IIFE to named sanitizeKeyActorRoles() function; hoist allowedRoles Set computation out of per-path loop - Harden bonusOverlap ternary: explicit branch for actorSource='none' prevents silent fallthrough if new actorSource values are added - Eliminate roleOverlap intermediate array in computeSimulationAdjustment - Add U+2028/U+2029 Unicode line-separator stripping to sanitizeForPrompt - Apply sanitizeForPrompt at tryParseSimulationRoundPayload parse boundary; add JSDoc to newly-exported function All 308 tests pass, typecheck + typecheck:api clean. * fix(sim): restore const sanitized in sanitizeKeyActorRoles after early-return guard Prior edit added `if (!allowedRoles.length) return []` but accidentally removed the `const sanitized = ...` line, leaving the filter on line below referencing an undefined variable. Restores the full function body: if (!allowedRoles.length) return []; const sanitized = (Array.isArray(rawRoles) ? rawRoles : []) .map((s) => sanitizeForPrompt(String(s)).slice(0, 80)); const allowedNorm = new Set(allowedRoles.map(normalizeActorName)); return sanitized.filter((s) => allowedNorm.has(normalizeActorName(s))).slice(0, 8); 308/308 tests pass. --------- Co-authored-by: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com>	2026-04-01 08:53:13 +04:00
Elie Habib	14a31c4283	feat(mcp): OAuth 2.0 Authorization Server for claude.ai connector (#2418 ) * feat(mcp): add OAuth 2.0 Authorization Server for claude.ai connector Implements spec-compliant MCP authentication so claude.ai's remote connector (which requires OAuth Client ID + Secret, no custom headers) can authenticate. - public/.well-known/oauth-authorization-server: RFC 8414 discovery document - api/oauth/token.js: client_credentials grant, issues UUID Bearer token in Redis TTL 3600s - api/_oauth-token.js: resolveApiKeyFromBearer() looks up token in Redis - api/mcp.ts: 3-tier auth (Bearer OAuth first, then ?key=, then X-WorldMonitor-Key); switch to getPublicCorsHeaders; surface error messages in catch - vercel.json: rewrite /oauth/token, exclude oauth from SPA, CORS headers - tests: update SPA no-cache pattern Supersedes PR #2417. Usage: URL=worldmonitor.app/mcp, Client ID=worldmonitor, Client Secret=<API key> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: fix markdown lint in OAuth plan (blank lines around lists) * fix(oauth): address all P1+P2 code review findings for MCP OAuth endpoint - Add per-IP rate limiting (10 req/min) to /oauth/token via Upstash slidingWindow - Return HTTP 401 + WWW-Authenticate header when Bearer token is invalid/expired - Add Cache-Control: no-store + Pragma: no-cache to token response (RFC 6749 §5.1) - Simplify _oauth-token.js to delegate to readJsonFromUpstash (removes duplicated Redis boilerplate) - Remove dead code from token.js: parseBasicAuth, JSON body path, clientId/issuedAt fields - Add Content-Type: application/json header for /.well-known/oauth-authorization-server - Remove response_types_supported (only applies to authorization endpoint, not client_credentials) Closes: todos 075, 076, 077, 078, 079 🤖 Generated with claude-sonnet-4-6 via Claude Code (https://claude.ai/claude-code) + Compound Engineering v2.40.0 Co-Authored-By: claude-sonnet-4-6 (200K context) <noreply@anthropic.com> * chore(review): fresh review findings — todos 081-086, mark 075/077/078/079 complete * fix(mcp): remove ?key= URL param auth + mask internal errors - Remove ?key= query param auth path — API keys in URLs appear in Vercel/CF access logs, browser history, Referer headers. OAuth client_credentials (same PR) already covers clients that cannot set custom headers. Only two auth paths remain: Bearer OAuth and X-WorldMonitor-Key header. - Revert err.message disclosure: catch block was accidentally exposing internal service URLs/IPs via err.message. Restore original hardcoded string, add console.error for server-side visibility. Resolves: todos 081, 082 * fix(oauth): resolve all P2/P3 review findings (todos 076, 080, 083-086) - 076: no-credentials path in mcp.ts now returns HTTP 401 + WWW-Authenticate instead of rpcError (200) - 080: store key fingerprint (sha256 first 16 hex chars) in Redis, not plaintext key - 083: replace Array.includes() with timingSafeIncludes() (constant-time HMAC comparison) in token.js and mcp.ts - 084: resolveApiKeyFromBearer uses direct fetch that throws on Redis errors (500 not 401 on infra failure) - 085: token.js imports getClientIp, getPublicCorsHeaders, jsonResponse from shared helpers; removes local duplicates - 086: mcp.ts auth chain restructured to check Bearer header first, passes token string to resolveApiKeyFromBearer (eliminates double header read + unconditional await) * test(mcp): update auth test to expect HTTP 401 for missing credentials Align with todo 076 fix: no-credentials path now returns 401 + WWW-Authenticate instead of JSON-RPC 200 response. Also asserts WWW-Authenticate header presence. * chore: mark todos 076, 080, 083-086 complete * fix(mcp): harden OAuth error paths and fix rate limit cross-user collision - Wrap resolveApiKeyFromBearer() in try/catch in mcp.ts; Redis/network errors now return 503 + Retry-After: 5 instead of crashing the handler - Wrap storeToken() fetch in try/catch in oauth/token.js; network errors return false so the existing if (!stored) path returns 500 cleanly - Re-key token endpoint rate limit by sha256(clientSecret).slice(0,8) instead of IP; prevents cross-user 429s when callers share Anthropic's shared outbound IPs (Claude remote MCP connector) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-28 14:53:32 +04:00
Elie Habib	f783bf2d2d	fix(intelligence): analytical frameworks follow-up — P1 security + P2 correctness fixes (#2386 ) * fix(intelligence): include framework/systemAppend hash in cache keys (todos 041, 045, 051) * fix(intelligence): gate framework/systemAppend on server-side PRO check (todo 042) * fix(skills): exact hostname allowlist + redirect:manual to prevent SSRF (todos 043, 054) * fix(intelligence): sanitize systemAppend against prompt injection before LLM (todo 044) * fix(intelligence): use framework field in DeductionPanel, fix InsightsPanel double increment (todos 046, 047) * fix(intelligence): settings export, hot-path cache, country-brief debounce (todos 048, 049, 050) * fix(intelligence): i18n, FrameworkSelector note, stripThinkingTags dedup, UUID IDs (todos 052, 055, 056, 057) - i18n Analysis Frameworks settings section (en + fr locales, replace all hardcoded English strings with t() calls) - FrameworkSelector: replace panelId==='insights' hardcode with note? option; both InsightsPanel and DailyMarketBriefPanel pass note - stripThinkingTags: remove inline duplicate in summarize-article.ts, import from _shared/llm; add Strip unterminated comment so tests can locate the section - Replace Date.now() IDs for imported frameworks with crypto.randomUUID() - Drop 'not supported in phase 1' phrasing to 'not supported' - test: fix summarize-reasoning Fix 2 suite to read from llm.ts - test: add premium-check-stub and wire into redis-caching country intel brief importPatchedTsModule so test can resolve the new import * fix(security): address P1 review findings from PR #2386 - premium-check: require `required: true` from validateApiKey so trusted browser origins (worldmonitor.app, Vercel previews, localhost) are not treated as PRO callers; fixes free-user bypass of framework/systemAppend gate - llm: replace weak sanitizeSystemAppend with sanitizeForPrompt from llm-sanitize.js; all callLlm callers now get model-delimiter and control-char stripping, not just phrase blocklist - get-country-intel-brief: apply sanitizeForPrompt to contextSnapshot before injecting into user prompt; fixes unsanitized query-param injection Closes todos 060, 061, 062 (P1 — blocked merge of #2386). * chore(todos): mark P1 todos 060-062 complete * fix(agentskills): address Greptile P2 review comments - hoist ALLOWED_AGENTSKILLS_HOSTS Set to module scope (was reallocated per-request) - add res.type === 'opaqueredirect' check alongside the 3xx guard; Edge Runtime returns status=0 for opaque redirects so the status range check alone is dead code	2026-03-28 01:10:02 +04:00
Elie Habib	110ab402c4	feat(intelligence): analytical framework selector for AI panels (#2380 ) * feat(frameworks): add settings section and import modal - Add Analysis Frameworks group to preferences-content.ts between Intelligence and Media sections - Per-panel active framework display (read-only, 4 panels) - Skill library list with built-in badge, Rename and Delete actions for imported frameworks - Import modal with two tabs: From agentskills.io (fetch + preview) and Paste JSON - All error cases handled inline: network, domain validation, missing instructions, invalid JSON, duplicate name, instructions too long, rate limit - Add api/skills/fetch-agentskills.ts edge function (proxy to agentskills.io) - Add analysis-framework-store.ts (loadFrameworkLibrary, saveImportedFramework, deleteImportedFramework, renameImportedFramework, getActiveFrameworkForPanel) - Add fw-* CSS classes to main.css matching dark panel aesthetic * feat(panels): wire analytical framework store into InsightsPanel, CountryDeepDive, DailyMarketBrief, DeductionPanel - InsightsPanel: append active framework to geoContext in updateFromClient(); subscribe in constructor, unsubscribe in destroy() - CountryIntelManager: pass framework as query param to fetchCountryIntelBrief(); subscribe to re-open brief on framework change; unsubscribe in destroy() - DataLoaderManager: add dailyBriefGeneration counter for stale-result guard; pass frameworkAppend to buildDailyMarketBrief(); subscribe to framework changes to force refresh; unsubscribe in destroy() - daily-market-brief service: add frameworkAppend? field to BuildDailyMarketBriefOptions; append to extendedContext before summarize call - DeductionPanel: append active framework to geoContext in handleSubmit() before RPC call * feat(frameworks): add FrameworkSelector UI component - Create FrameworkSelector component with premium/locked states - Premium: select dropdown with all framework options, change triggers setActiveFrameworkForPanel - Locked: disabled select + PRO badge, click calls showGatedCta(FREE_TIER) - InsightsPanel: adds asterisk note (client-generated analysis hint) - Wire into InsightsPanel, DailyMarketBriefPanel, DeductionPanel (via this.header) - Wire into CountryDeepDivePanel header right-side (no Panel base, panel=null) - Add framework-selector CSS to main.css * fix(frameworks): make new proto fields optional in generated types * fix(frameworks): extract firstMsg to satisfy strict null checks in tsconfig.api.json * fix(docs): add blank lines around lists/headings to pass markdownlint * fix(frameworks): add required proto string fields to call sites after make generate * chore(review): add code review todos 041-057 for PR #2380 7 review agents (TypeScript, Security, Architecture, Performance, Simplicity, Agent-Native, Learnings) identified 17 findings across 5 P1, 8 P2, and 4 P3 categories.	2026-03-27 23:36:44 +04:00
Elie Habib	e7ba05553d	fix(health): disease outbreaks CDC/Outbreak feeds, VPD tracker seed, BOOTSTRAP_KEYS gold standard (#2378 ) * feat(panels): Disease Outbreaks, Shipping Stress, Social Velocity, nuclear test site monitoring - Add HealthService proto with ListDiseaseOutbreaks RPC (WHO + ProMED RSS) - Add GetShippingStress RPC to SupplyChainService (Yahoo Finance carrier ETFs) - Add GetSocialVelocity RPC to IntelligenceService (Reddit r/worldnews + r/geopolitics) - Enrich earthquake seed with Haversine nuclear test-site proximity scoring - Add 5 nuclear test sites to NUCLEAR_FACILITIES (Punggye-ri, Lop Nur, Novaya Zemlya, Nevada NTS, Semipalatinsk) - Add shipping stress + social velocity seed loops to ais-relay.cjs - Add seed-disease-outbreaks.mjs Railway cron script - Wire all new RPCs: edge functions, handlers, gateway cache tiers, health.js STANDALONE_KEYS/SEED_META * fix(relay): apply gold standard retry/TTL-extend pattern to shipping-stress and social-velocity seeders * fix(review): address all PR #2375 review findings - health.js: shippingStress maxStaleMin 30→45 (3x interval), socialVelocity 20→30 (3x interval) - health.js: remove shippingStress/diseaseOutbreaks/socialVelocity from ON_DEMAND_KEYS (relay/cron seeds, not on-demand) - cache-keys.ts: add shippingStress, diseaseOutbreaks, socialVelocity to BOOTSTRAP_CACHE_KEYS - ais-relay.cjs: stressScore formula 50→40 (neutral market = moderate, not elevated) - ais-relay.cjs: fetchedAt Date.now() (consistent with other seeders) - ais-relay.cjs: deduplicate cross-subreddit article URLs in social velocity loop - seed-disease-outbreaks.mjs: WHO URL → specific DON RSS endpoint (not dead general news feed) - seed-disease-outbreaks.mjs: validate() requires outbreaks.length >= 1 (reject empty array) - seed-disease-outbreaks.mjs: stable id using hash(link) not array index - seed-disease-outbreaks.mjs: RSS regexes use [\s\S]? for CDATA multiline content - seed-earthquakes.mjs: Lop Nur coordinates corrected (41.39,89.03 not 41.75,88.35) - seed-earthquakes.mjs: sourceVersion bumped to usgs-4.5-day-nuclear-v1 - earthquake.proto: fields 8-11 marked optional (distinguish not-enriched from enriched=false/0) - buf generate: regenerate seismology service stubs revert(cache-keys): don't add new keys to bootstrap without frontend consumers * fix(panels): address all P1/P2/P3 review findings for PR #2375 - proto: add INT64_ENCODING_NUMBER annotation + sebuf import to get_shipping_stress.proto (run make generate) - bootstrap: register shippingStress (fast), socialVelocity (fast), diseaseOutbreaks (slow) in api/bootstrap.js + cache-keys.ts - relay: update WIDGET_SYSTEM_PROMPT with new bootstrap keys and live RPCs for health/supply-chain/intelligence - seeder: remove broken ProMED feed URL (promedmail.org/feed/ returns HTML 404); add 500K size guard to fetchRssItems; replace private COUNTRY_CODE_MAP with shared geo-extract.mjs; remove permanently-empty location field; bump sourceVersion to who-don-rss-v2 - handlers: remove dead .catch from all 3 new RPC handlers; fix stressLevel fallback to low; fix fetchedAt fallback to 0 - services: add fetchShippingStress, disease-outbreaks.ts, social-velocity.ts with getHydratedData consumers * fix(health): move seeded keys to BOOTSTRAP_KEYS, add VPD tracker seed and feeds - Reclassify diseaseOutbreaks, shippingStress, socialVelocity from STANDALONE_KEYS to BOOTSTRAP_KEYS so health endpoint reports CRIT (not WARN) when their seeds miss a cycle - Add vpdTrackerRealtime and vpdTrackerHistorical to BOOTSTRAP_KEYS with SEED_META entries (maxStaleMin: 2880 = 2x daily interval) - Fix seed-disease-outbreaks: add CDC and Outbreak News Today feeds alongside WHO, populate location field from title parsing, fix TTL to 259200s (3x daily interval per gold standard) - Add seed-vpd-tracker.mjs: scrapes Think Global Health VPD Tracker bundle (1,827 realtime alerts + 25,960 historical WHO records), writes both Redis keys in one runSeed call via extraKeys - Add review todos 049-059 from PR #2375 code review	2026-03-27 22:47:24 +04:00
Elie Habib	41d964265e	fix(map): sanctions layer invisible on WebGL map (#2267 ) * fix(health): extend EMPTY_DATA_OK_KEYS check to bootstrap loop Greptile correctly identified the fix was a no-op: EMPTY_DATA_OK_KEYS was only consulted in the STANDALONE_KEYS loop, not BOOTSTRAP_KEYS. All three calendar keys are bootstrap keys, so critCount was still incremented. Mirror the same seedStale branching already present in the standalone loop into the bootstrap loop so EMPTY_DATA_OK members get OK/STALE_SEED instead of EMPTY/EMPTY_DATA/critCount++. * fix(map): implement sanctions choropleth layer in DeckGLMap The sanctions layer toggle did nothing on the WebGL map — DeckGLMap had no rendering logic, only a help text label. Only Map.ts (2D SVG) had the updateCountryFills() implementation. Add createSanctionsChoroplethLayer() as a GeoJsonLayer using SANCTIONED_COUNTRIES_ALPHA2 (new export) since countriesGeoJsonData keys by ISO3166-1-Alpha-2, not the numeric IDs used by Map.ts/TopoJSON. Wire it into the layer pipeline after the CII choropleth. Alpha values match Map.ts: severe=89, high=64, moderate=51 (0-255). * Revert "fix(health): extend EMPTY_DATA_OK_KEYS check to bootstrap loop" This reverts commit `cc1405f495`.	2026-03-26 08:52:48 +04:00
Elie Habib	2939b1f4a1	feat(finance-panels): add 7 macro/market panels + Daily Brief context (issues #2245-#2253) (#2258 ) * feat(fear-greed): add regime state label, action stance badge, divergence warnings Closes #2245 * feat(finance-panels): add 7 new finance panels + Daily Brief macro context Implements issues #2245 (F&G Regime), #2246 (Sector Heatmap bars), #2247 (MacroTiles), #2248 (FSI), #2249 (Yield Curve), #2250 (Earnings Calendar), #2251 (Economic Calendar), #2252 (COT Positioning), #2253 (Daily Brief prompt extension). New panels: - MacroTilesPanel: CPI YoY, Unemployment, GDP, Fed Rate tiles via FRED - FSIPanel: Financial Stress Indicator gauge (HYG/TLT/VIX/HY-spread) - YieldCurvePanel: SVG yield curve chart with inverted/normal badge - EarningsCalendarPanel: Finnhub earnings calendar with BMO/AMC/BEAT/MISS - EconomicCalendarPanel: FOMC/CPI/NFP events with impact badges - CotPositioningPanel: CFTC disaggregated COT positioning bars - MarketPanel: adds sorted bar chart view above sector heatmap grid New RPCs: - ListEarningsCalendar (market/v1) - GetCotPositioning (market/v1) - GetEconomicCalendar (economic/v1) Seed scripts: - seed-earnings-calendar.mjs (Finnhub, 14-day window, TTL 12h) - seed-economic-calendar.mjs (Finnhub, 30-day window, TTL 12h) - seed-cot.mjs (CFTC disaggregated text file, TTL 7d) - seed-economy.mjs: adds yield curve tenors DGS1MO/3MO/6MO/1/2/5/30 - seed-fear-greed.mjs: adds FSI computation + sector performance Daily Brief: extends buildDailyMarketBrief with optional regime, yield curve, and sector context fed to the LLM summarization prompt. All panels default enabled in FINANCE_PANELS, disabled in FULL_PANELS. 🤖 Generated with Claude Sonnet 4.6 via Claude Code (https://claude.ai/claude-code) + Compound Engineering v2.40.0 Co-Authored-By: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com> * fix(finance-panels): address code review P1/P2 findings P1 - Security/Correctness: - EconomicCalendarPanel: add escapeHtml on all 7 Finnhub-sourced fields - EconomicCalendarPanel: fix panel contract (public fetchData():boolean, remove constructor self-init, add retry callbacks to all showError calls) - YieldCurvePanel: fix NaN in xPos() when count <= 1 (divide-by-zero) - seed-earnings-calendar: move Finnhub API key from URL to X-Finnhub-Token header - seed-economic-calendar: move Finnhub API key from URL to X-Finnhub-Token header - seed-earnings-calendar: add isMain guard around runSeed() call - health.js + bootstrap.js: register earningsCalendar, econCalendar, cotPositioning keys - health.js dataSize(): add earnings + instruments to property name list P2 - Quality: - FSIPanel: change !resp.fsiValue → resp.fsiValue <= 0 (rejects valid zero) - data-loader: fix Promise.allSettled type inference via indexed destructure - seed-fear-greed: allowlist cnnLabel against known values before writing to Redis - seed-economic-calendar: remove unused sleep import - seed-earnings-calendar + econ-calendar: increase TTL 43200 → 129600 (36h = 3x interval) - YieldCurvePanel: use SERIES_IDS const in RPC call (single source of truth) * fix(bootstrap): remove on-demand panel keys from bootstrap.js earningsCalendar, econCalendar, cotPositioning panels fetch via RPC on demand — they have no getHydratedData consumer in src/ and must not be in api/bootstrap.js. They remain in api/health.js BOOTSTRAP_KEYS for staleness monitoring. * fix(compound-engineering): fix markdown lint error in local settings * fix(finance-panels): resolve all P3 code-review findings - 030: MacroTilesPanel: add `deltaFormat?` field to MacroTile interface, define per-tile delta formatters (CPI pp, GDP localeString+B), replace fragile tile.id switch in tileHtml with fmt = deltaFormat ?? format - 031: FSIPanel: check getHydratedData('fearGreedIndex') at top of fetchData(); extract fsi/vix/hySpread from headerMetrics and render synchronously; fall back to live RPC only when bootstrap absent - 032: All 6 finance panels: extract lazy module-level client singletons (EconomicServiceClient or MarketServiceClient) so the client is constructed at most once per panel module lifetime, not on every fetchData - 033: get-fred-series-batch: add BAMLC0A0CM and SOFR to ALLOWED_SERIES (both seeded by seed-economy.mjs but previously unreachable via RPC) * fix(finance-panels): health.js SEED_META, FSI calibration, seed-cot catch handler - health.js: add SEED_META entries for earningsCalendar (1440min), econCalendar (1440min), cotPositioning (14400min) — without these, stopped seeds only alarm CRIT:EMPTY after TTL expiry instead of earlier WARN:STALE_SEED - seed-cot.mjs: replace bare await with .catch() handler consistent with other seeds - seed-fear-greed.mjs: recalibrate FSI thresholds to match formula output range (Low>=1.5, Moderate>=0.8, Elevated>=0.3; old values >=0.08/0.05/0.03 were calibrated for [0,0.15] but formula yields ~1-2 in normal conditions) - FSIPanel.ts: fix gauge fillPct range to [0, 2.5] matching recalibrated thresholds - todos: fix MD022/MD032 markdown lint errors in P3 review files --------- Co-authored-by: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com>	2026-03-26 08:03:09 +04:00
Elie Habib	01f6057389	feat(simulation): MiroFish Phase 2 — theater-limited simulation runner (#2220 ) * feat(simulation): MiroFish Phase 2 — theater-limited simulation runner Adds the simulation execution layer that consumes simulation-package.json and produces simulation-outcome.json for maritime chokepoint + energy/logistics theaters, closing the WorldMonitor → MiroFish handoff loop. Changes: - scripts/seed-forecasts.mjs: 2-round LLM simulation runner (prompt builders, JSON extractor, runTheaterSimulation, writeSimulationOutcome, task queue with NX dedup lock, runSimulationWorker poll loop) - scripts/process-simulation-tasks.mjs: standalone worker entry point - proto: GetSimulationOutcome RPC + make generate - server/worldmonitor/forecast/v1/get-simulation-outcome.ts: RPC handler - server/gateway.ts: slow tier for get-simulation-outcome - api/health.js: simulationOutcomeLatest in STANDALONE + ON_DEMAND keys - tests: 14 new tests for simulation runner functions * fix(simulation): address P1/P2 code review findings from PR #2220 Security (P1 #018): - sanitizeForPrompt() applied to all entity/seed fields interpolated into Round 1 prompt (entityId, class, stance, seedId, type, timing) - sanitizeForPrompt() applied to actorId and entityIds in Round 2 prompt - sanitizeForPrompt() + length caps applied to all LLM array fields written to R2 (dominantReactions, stabilizers, invalidators, keyActors, timingMarkers) Validation (P1 #019): - Added validateRunId() regex guard - Applied in enqueueSimulationTask() and processNextSimulationTask() loop Type safety (P1 #020): - Added isOutcomePointer() and isPackagePointer() type guards in TS handlers - Replaced unsafe as-casts with runtime-validated guards in both handlers Correctness (P2 #022): - Log warning when pkgPointer.runId does not match task runId Architecture (P2 #024): - isMaritimeChokeEnergyCandidate() accepts both flat and nested topBucketId - Call site simplified to pass theater directly Performance (P2 #025): - SIMULATION_ROUND1_MAX_TOKENS raised 1800 to 2200 - Added max 3 initialReactions instruction to Round 1 prompt Maintainability (P2 #026): - Simulation pointer keys exported from server/_shared/cache-keys.ts - Both TS handlers import from shared location Documentation (P2 #027): - Strengthened runId no-op description in proto and OpenAPI spec * fix(todos): add blank lines around lists in markdown todo files * style(api): reformat openapi yaml to match linter output * test(simulation): add flat-shape filter test + getSimulationOutcome handler coverage Two tests identified as missing during PR #2220 review: 1. isMaritimeChokeEnergyCandidate flat-shape tests — covers the \|\| candidate.topBucketId normalization added in the P1/P2 review pass. The existing tests only used the nested marketContext.topBucketId shape; this adds the flat root-field shape that arrives from the simulation-package.json JSON (selectedTheaters entries have topBucketId at root). 2. getSimulationOutcome handler structural tests — verifies the isOutcomePointer guard, found:false NOT_FOUND return, found:true success path, note population on runId mismatch, and redis_unavailable error string. Follows the readSrc static-analysis pattern used elsewhere in server-handlers.test.mjs (handler imports Redis so full integration test would require a test Redis instance).	2026-03-25 13:55:59 +04:00
Elie Habib	092efd4fe9	fix(panels): always fire background RPC refresh after bootstrap render (#2208 ) * fix(panels): always fire background RPC refresh after bootstrap render Bootstrap hydration (getHydratedData) is one-shot — once rendered from it, panels never refresh and can show stale or partial data indefinitely. Affected panels: MacroSignals, ETFFlows, Stablecoins, FuelPrices, GulfEconomies, GroceryBasket, BigMac. Pattern: render from bootstrap immediately for fast first paint, then fire a background RPC call that silently updates the panel with live data. Errors during background refresh are suppressed when bootstrap data is already visible (no error flash over valid data). * fix(panels): guard background RPC refresh against empty response overwriting bootstrap Empty RPC responses (200 + empty array) no longer overwrite valid bootstrap data with error/unavailable state across all 7 affected panels: - ETFFlowsPanel, StablecoinPanel: wrap this.data assignment in `if (fresh.xxx?.length \|\| !this.data)` guard - FuelPricesPanel, GulfEconomiesPanel, GroceryBasketPanel, BigMacPanel: add `!data.xxx?.length` check in background .then() before calling render - MacroSignalsPanel: return false early when error suppressed to skip redundant renderPanel() call * fix(hormuz): fix noUncheckedIndexedAccess TypeScript errors * fix(todos): add blank lines around headings (markdownlint MD022) * fix(hormuz): add missing hormuz-tracker service + fix implicit any in HormuzPanel * revert: remove HormuzPanel.ts from this branch (belongs in PR #2210)	2026-03-24 20:23:40 +04:00
Elie Habib	226cebf9bc	feat(deep-forecast): Phase 2+3 scoring recalibration + autoresearch prompt self-improvement (#2178 ) * fix(deep-forecast): lower acceptance threshold 0.60→0.50 to match real score distribution computeDeepPathAcceptanceScore formula: pathScore×0.55 + quality×0.20 + coherence×0.15 With pathScore≈0.65, quality≈0.30, coherence≈0.55: 0.358 + 0.060 + 0.083 = 0.50 The 0.60 threshold was calibrated before understanding that reportableQualityScore is constrained by world-state simulation geometry (not hypothesis quality), and coherence loses 0.15 for generic candidates without routeFacilityKey. The threshold was structurally unreachable with typical expanded paths. Verified end-to-end: deep worker now returns [DeepForecast] completed. Also updates T6 gateDetails assertion and renames the rejection-floor test to correctly describe the new behavior (strong inputs should be accepted). 111/111 tests pass. * feat(deep-forecast): autoresearch prompt self-improvement loop + T9/T10 tests - Add scoreImpactExpansionQuality() locked scorer: commodity rate (35%), variable diversity (35%), chain coverage (20%), mapped rate (10%) - Add runImpactExpansionPromptRefinement(): rate-limited LLM critic loop (30min cooldown) that reads learned section from Redis, scores current run, generates critique if composite < 0.62, tests on same candidates, commits to forecast:prompt:impact-expansion:learned if score improves - buildImpactExpansionSystemPrompt() now accepts learnedSection param, appends it after core rules with separator so model sees prior examples - buildImpactExpansionCandidateHash() includes learnedFingerprint to bust cache when learned section changes - processDeepForecastTask reads learnedSection from Redis before LLM call, runs refinement after both completed and no_material_change paths - Export scoreImpactExpansionQuality + runImpactExpansionPromptRefinement - T9: high commodity rate + chain coverage → composite > 0.70 - T10: no commodity + no chain coverage → composite < 0.40 - 113/113 tests pass * fix(deep-forecast): raise autoresearch threshold 0.62→0.80 + fix JSON parse - Threshold 0.62 was too low: commodity=1.00 + chain=1.00 compensated for diversity=0.50 (all same chains), keeping composite at 0.775 → no critique - Raise to 0.80 so diversity<0.70 triggers critique even with good commodity/chain - Fix JSON parser to extract first {…} block (handles Gemini code-fence wrapping) - Add per-hypothesis log in refinement breakdown for observability - Add refinementQualityThreshold to gateDetails for self-documenting artifacts - Verified: critique fires on diversity=0.50 run, committed Hormuz/Baltic/Suez region-specific chain examples (score 0.592→0.650) * feat(deep-forecast): per-candidate parallel LLM calls replace batch expansion Previously: all candidates → one batch LLM call → LLM averages context → identical route_disruption → inflation_pass_through chains for all candidates. Now: each candidate → its own focused LLM call (parallel Promise.all) → LLM reasons about specific stateKind/region/routeFacility for that candidate. Results (3 candidates, 3 parallel calls): - composite: 0.592 → 0.831 (+0.24) - commodity: 0.17 → 1.00 (all mapped have specific commodity) - diversity: 0.50 → 0.83 (energy_export_stress, importer_balance_stress appearing alongside route_disruption — genuinely different chains) - Baseline updated: 0.831 (above 0.80 threshold → no critique needed) Also threads learnedSection through extractSingleImpactExpansionCandidate so the learned examples from autoresearch apply to each focused call. Per-candidate cache keys (already existed) now serve as primary cache. * fix(tests): update recovery test for per-candidate LLM call flow - Change stage mock from impact_expansion → impact_expansion_single (batch primary path removed, per-candidate is now primary) - Assert parseMode === per_candidate instead of parseStage /^recovered_/ (recovered_ prefix was only set by old batch_repair path) - 2257/2257 tests pass * fix(deep-forecast): add Red Sea, Persian Gulf, South China Sea to chokepoint map Candidate packets had routeFacilityKey=none for Red Sea / Persian Gulf / Baltic Sea signals because prediction titles say "Red Sea maritime disruption" not "Bab el-Mandeb" or "Strait of Hormuz". CHOKEPOINT_MARKET_REGIONS only had sub-facility names (Bab el-Mandeb, Suez Canal) as keys, not the sea regions themselves. Fix: add Red Sea, Persian Gulf, Arabian Sea, Black Sea, South China Sea, Mediterranean Sea as direct keys so region-level candidate titles resolve. Result: LLM user prompt now shows routeFacilityKey=Red Sea / Persian Gulf / Baltic Sea per candidate — giving each focused call the geographic context needed to generate route-specific chains. - Autoresearch baseline updated 0.932→0.965 on this run - T8 extended with Red Sea, Persian Gulf, South China Sea assertions - 2257/2257 tests pass * feat(deep-forecast): free-form hypothesis schema + remove registry constraint - Bump IMPACT_EXPANSION_REGISTRY_VERSION to v4 - Add hypothesisKey, description, geography, affectedAssets, marketImpact, causalLink fields to normalizeImpactHypothesisDraft (keep legacy fields for backward compat) - Rewrite buildImpactExpansionSystemPrompt: remove IMPACT_VARIABLE_REGISTRY constraint table, use free-form ImpactHypothesis schema with geographic/commodity specificity rules - Rewrite evaluateImpactHypothesisRejection: use effective key (hypothesisKey \|\| variableKey) for dedup; legacy registry check only for old cached responses without hypothesisKey - Update validateImpactHypotheses scoring: add geographyScore, commodityScore, causalLinkScore, assetScore terms; channelCoherence/bucketCoherence only apply to legacy responses - Update parent-must-be-mapped invariant to use hypothesisKey \|\| variableKey as effective key - Update mapImpactHypothesesToWorldSignals: use effective key for dedup and sourceKey; prefer description/geography over legacy fields - Update buildImpactPathsForCandidate: match on hypothesisKey \|\| variableKey for parent lookup - Update buildImpactPathId: use hypothesisKey \|\| variableKey for hash inputs - Rewrite scoreImpactExpansionQuality: add geographyRate and assetRate metrics; update composite weights - Update buildImpactPromptCritiqueSystemPrompt/UserPrompt: use hypothesisKey-based chain format in examples - Add new fields to buildImpactExpansionBundleFromPaths push calls - Update T7 test assertion: MUST be the exact hypothesisKey instead of variableKey string * fix(deep-forecast): update breakdown log to show free-form hypothesis fields * feat(deep-forecast): add commodityDiversity metric to autoresearch scorer - commodityDiversity = unique commodities / nCandidates (weight 0.35) Penalizes runs where all candidates default to same commodity. 3 candidates all crude_oil → diversity=0.33 → composite ~0.76 → critique fires. - Rebalanced composite weights: comDiversity 0.35, geo 0.20, keyDiversity 0.15, chain 0.10, commodityRate 0.10, asset 0.05, mappedRate 0.05 - Breakdown log now shows comDiversity + geo + keyDiversity - Critique prompt updated: commodity_monoculture failure mode, diagnosis targets commodity homogeneity - T9: added commodityDiversity=1.0 assertion (2 unique commodities across 2 candidates) * refactor(deep-forecast): replace commodityDiversity with directCommodityDiversity + directGeoDiversity + candidateSpreadScore Problem: measuring diversity on all mapped hypotheses misses the case where one candidate generates 10 implications while others generate 0, or where all candidates converge on the same commodity due to dominating signals. Fix: score at the DIRECT hypothesis level (root causes only) and add a candidate-spread metric: - directCommodityDiversity: unique commodities among direct hypotheses / nCandidates. Measures breadth at the root-cause level. 3 candidates all crude_oil → 0.33 → composite ~0.77 → critique fires. - directGeoDiversity: unique primary geographies among direct hypotheses / nCandidates. First segment of compound geography strings (e.g. 'Red Sea, Suez Canal' → 'red sea') to avoid double-counting. - candidateSpreadScore: normalized inverse-HHI. 1.0 = perfectly even distribution across candidates. One candidate with 10 implications and others with 0 → scores near 0 → critique fires. Weight rationale: comDiversity 0.35, geoDiversity 0.20, spread 0.15, chain 0.15, comRate 0.08, assetRate 0.04, mappedRate 0.03. Verified: Run 2 Baltic/Hormuz/Brazil → freight/crude_oil/USD spread=0.98 ✓ * feat(deep-forecast): add convergence object to R2 debug artifact Surface autoresearch loop outcome per run: converged (bool), finalComposite, critiqueIterations (0 or 1), refinementCommitted, and perCandidateMappedCount (candidateStateId → count). After 5+ runs the artifact alone answers whether the pipeline is improving. Architectural changes: - runImpactExpansionPromptRefinement now returns { iterationCount, committed } at all exit paths instead of undefined - Call hoisted before writeForecastTraceArtifacts so the result flows into the debug payload via dataForWrite.refinementResult - buildImpactExpansionDebugPayload assembles convergence from validation + refinementResult; exported for direct testing - Fix: stale diversityScore reference replaced with directCommodityDiversity Tests: T-conv-1 (converged=true), T-conv-2 (converged=false + iterations=1), T-conv-3 (perCandidateMappedCount grouping) — 116/116 pass * fix(deep-forecast): address P1+P2 review issues from convergence observability PR P1-A: sanitize LLM-returned proposed_addition before Redis write (prompt injection guard via sanitizeProposedLlmAddition — strips directive-phrase lines) P1-B: restore fire-and-forget for runImpactExpansionPromptRefinement; compute critiqueIterations from quality score (predicted) instead of awaiting result, eliminating 15-30s critical-path latency on poor-quality runs P1-C: processDeepForecastTask now returns convergence object to callers; add convergence_quality_met warn check to evaluateForecastRunArtifacts P1-D: cap concurrent LLM calls in extractImpactExpansionBundle to 3 (manual batching — no p-limit) to respect provider rate limits P2-1: hash full learnedSection in buildImpactExpansionCandidateHash (was sliced to 80 chars, causing cache collisions on long learned sections) P2-2: add exitReason field to all runImpactExpansionPromptRefinement return paths P2-3: sanitizeForPrompt strips directive injection phrases; new sanitizeProposedLlmAddition applies line-level filtering before Redis write P2-4: add comment explaining intentional bidirectional affectedAssets/assetsOrSectors coalescing in normalizeImpactHypothesisDraft P2-5: extract makeConvTestData helper in T-conv tests; remove refinementCommitted assertions (field removed from convergence shape) P2-6: convergence_quality_met check added to evaluateForecastRunArtifacts (warn) 🤖 Generated with Claude Sonnet 4.6 via Claude Code (https://claude.ai/claude-code) + Compound Engineering v2.49.0 Co-Authored-By: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com> * fix(docs): add blank lines around lists in plan (MD032) * fix(deep-forecast): address P1+P2 reviewer issues in convergence observability P1-1: mapImpactHypothesesToWorldSignals used free-form marketImpact values (price_spike, shortage, credit_stress, risk_off) verbatim as signal channel types, producing unknown types that buildMarketTransmissionGraph cannot consume. Add IMPACT_SIGNAL_CHANNELS set + resolveImpactChannel() to map free-form strings to the nearest valid channel before signal materialization. P1-2: sanitizeForPrompt had directive-phrase stripping added that was too broad for a function called on headlines, evidence tables, case files, and geopolitical summaries. Reverted to original safe sanitizer (newline/control char removal only). Directive stripping remains in sanitizeProposedLlmAddition where it is scoped to Redis-bound LLM-generated additions only. P2: Renamed convergence.critiqueIterations to predictedCritiqueIterations to make clear this is a prediction from the quality score, not a measured count from actual refinement behavior (refinement is fire-and-forget after artifact write). Updated T-conv-1/2 test assertions to match. * feat(deep-forecast): inject live news headlines into evidence table Wire inputs.newsInsights / inputs.newsDigest through the candidate selection pipeline so buildImpactExpansionEvidenceTable receives up to 3 commodity-relevant live headlines as 'live_news' evidence entries. Changes: - IMPACT_COMMODITY_LEXICON: extend fertilizer pattern (fertiliser, nitrogen, phosphate, npk); add food_grains and shipping_freight entries - filterNewsHeadlinesByState: new pure helper that scores headlines by alert status, LNG/energy/route/sanctions signal match, lexicon commodity match, and source corroboration count (min score 2 to include) - buildImpactExpansionEvidenceTable: add newsItems param, inject live_news entries, raise cap 8→11 - buildImpactExpansionCandidate: add newsInsights/newsDigest params, compute newsItems via filterNewsHeadlinesByState - selectImpactExpansionCandidates: add newsInsights/newsDigest to options - Call site: pass inputs.newsInsights/newsDigest at seed time - Export filterNewsHeadlinesByState, buildImpactExpansionEvidenceTable - 9 new tests (T-news-1 through T-lex-3): all pass, 125 total pass 🤖 Generated with Claude Sonnet 4.6 (200K context) via Claude Code (https://claude.ai/claude-code) + Compound Engineering v2.49.0 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(deep-forecast): remove hardcoded LNG boost from filterNewsHeadlinesByState The LNG+2 score was commodity-specific and inconsistent with the intent: headline scoring should be generic, not biased toward any named commodity. The function already handles the state's detected commodity dynamically via lexEntry.pattern (IMPACT_COMMODITY_LEXICON). LNG headlines still score via CRITICAL_NEWS_ENERGY_RE (+1) and CRITICAL_NEWS_ROUTE_RE (+1) when relevant to the state's region. All 125 tests pass. * fix(deep-forecast): address all P1+P2 code review findings from PR #2178 P1 fixes (block-merge): - Lower third_order mapped floor 0.74→0.70 (max achievable via 0.72 multiplier was 0.72) - Guard runImpactExpansionPromptRefinement against empty validation (no_mapped exit) - Replace block-list sanitizeProposedLlmAddition with pattern-based allowlist (HTML/JS/directive takeover) - Fix TOCTOU on PROMPT_LAST_ATTEMPT_KEY: claim slot before quality check, not after LLM call P2 fixes: - Fix learned section overflow: use slice(-MAX) to preserve tail, not discard all prior content - Add safe_haven_bid and global_crude_spread_stress branches to resolveImpactChannel - quality_met path now sets rate-limit key (prevents 3 Redis GETs per good run) - Hoist extractNewsClusterItems outside stateUnit map in selectImpactExpansionCandidates - Export PROMPT_LEARNED_KEY, PROMPT_BASELINE_KEY, PROMPT_LAST_ATTEMPT_KEY + read/clear helpers All 125 tests pass. * fix(todos): add blank lines around lists/headings in todo files (markdownlint) * fix(todos): fix markdownlint blanks-around-headings/lists in all todo files --------- Co-authored-by: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com>	2026-03-24 18:52:02 +04:00

15 Commits