mirror of
https://github.com/koala73/worldmonitor.git
synced 2026-04-25 17:14:57 +02:00
main
15 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
305dc5ef36 |
feat(digest-dedup): Phase A — embedding-based dedup scaffolding (no-op) (#3200)
* feat(digest-dedup): Phase A — embedding-based dedup scaffolding (no-op)
Replaces the inline Jaccard story-dedup in seed-digest-notifications
with an orchestrator that can run Jaccard, shadow, or full embedding
modes. Ships with DIGEST_DEDUP_MODE=jaccard as the default so
production behaviour is unchanged until Phase C shadow + Phase D flip.
New modules (scripts/lib/):
- brief-dedup-consts.mjs tunables + cache prefix + __constants bag
- brief-dedup-jaccard.mjs verbatim 0.55-threshold extract (fallback)
- entity-gazetteer.mjs cities/regions gazetteer + common-caps
- brief-embedding.mjs OpenRouter /embeddings client with Upstash
cache, all-or-nothing timeout, cosineSimilarity
- brief-dedup-embed.mjs complete-link clustering + entity veto (pure)
- brief-dedup.mjs orchestrator, env read at call entry,
shadow archive, structured log line
Operator tools (scripts/tools/):
- calibrate-dedup-threshold.mjs offline calibration runner + histogram
- golden-pair-validator.mjs live-embedder drift detector (nightly CI)
- shadow-sample.mjs Sample A/B CSV emitter over SCAN archive
Tests:
- brief-dedup-jaccard.test.mjs migrated from regex-harness to direct
import plus orchestrator parity tests (22)
- brief-dedup-embedding.test.mjs 9 plan scenarios incl. 10-permutation
property test, complete-link non-chain (21)
- brief-dedup-golden.test.mjs 20-pair mocked canary (21)
Workflows:
- .github/workflows/dedup-golden-pairs.yml nightly live-embedder canary
(07:17 UTC), opens issue on drift
Deviation from plan: the shouldVeto("Iran closes Hormuz", "Tehran
shuts Hormuz") case can't return true under a single coherent
classification (country-in-A vs capital-in-B sit on different sides
of the actor/location boundary). Gazetteer follows the plan's
"countries are actors" intent; the test is updated to assert false
with a comment pointing at the irreducible capital-country
coreference limitation.
Verification:
- npm run test:data 5825/5825 pass
- tests/edge-functions 171/171 pass
- typecheck + typecheck:api clean
- biome check on new files clean
- lint:md 0 errors
Phase B (calibration), Phase C (shadow), and Phase D (flip) are
subsequent PRs.
* refactor(digest-dedup): address review findings 193-199
Fresh-eyes review found 3 P1s, 3 P2s, and a P3 bundle across
kieran-typescript, security-sentinel, performance-oracle, architecture-
strategist, and code-simplicity reviewers. Fixes below; all 64 dedup
tests + 5825 data tests + 171 edge-function tests still green.
P1 #193 - dedup regex + redis pipeline duplication
- Extract defaultRedisPipeline into scripts/lib/_upstash-pipeline.mjs;
both orchestrator and embedding client import from there.
- normalizeForEmbedding now delegates to stripSourceSuffix from the
Jaccard module so the outlet allow-list is single-sourced.
P1 #194 - embedding timeout floor + negative-budget path
- callEmbeddingsApi throws EmbeddingTimeoutError when timeoutMs<=0
instead of opening a doomed 250ms fetch.
- Removed Math.max(250, ...) floor that let wall-clock cap overshoot.
P1 #195 - dead env getters
- Deleted getMode / isRemoteEmbedEnabled / isEntityVetoEnabled /
getCosineThreshold / getWallClockMs from brief-dedup-consts.mjs
(zero callers; orchestrator reimplements inline).
P2 #196 - orchestrator cleanup bundle
- Removed re-exports at bottom of brief-dedup.mjs.
- Extracted materializeCluster into brief-dedup-jaccard.mjs; both
the fallback and orchestrator use the shared helper.
- Deleted clusterWithEntityVeto wrapper; orchestrator inlines the
vetoFn wiring at the single call site.
- Shadow mode now runs Jaccard exactly once per tick (was twice).
- Fallback warn line carries reason=ErrorName so operators can
filter timeout vs provider vs shape errors.
- Invalid DIGEST_DEDUP_MODE values emit a warn once per run (vs
silently falling to jaccard).
P2 #197 - workflow + shadow-sample hardening
- dedup-golden-pairs.yml body composition no longer relies on a
heredoc that would command-substitute validator stdout. Switched
to printf with sanitised LOG_TAIL (printable ASCII only) and
--body-file so crafted fixture text cannot escape into the runner.
- shadow-sample.mjs Upstash helper enforces a hardcoded command
allowlist (SCAN | GET | EXISTS).
P2 #198 - test + observability polish
- Scenarios 2 and 3 deep-equal returned clusters against the Jaccard
expected shape, not just length. Also assert the reason= field.
P3 #199 - nits
- Removed __constants test-bag; jaccard tests use named imports.
- Renamed deps.apiKey to deps._apiKey in embedding client.
- Added @pre JSDoc on diffClustersByHash about unique-hash contract.
- Deferred: mocked golden-pair test removal, gazetteer JSON migration,
scripts/tools AGENTS.md doc note.
Todos 193-199 moved from pending to complete.
Verification:
- npm run test:data 5825/5825 pass
- tests/edge-functions 171/171 pass
- typecheck + typecheck:api clean
- biome check on changed files clean
* fix(digest-dedup): address Greptile P2 findings on PR #3200
1. brief-embedding.mjs: wrap fetch lookup as
`(...args) => globalThis.fetch(...args)` instead of aliasing bare
`fetch`. Aliasing captures the binding at module-load time, so
later instrumentation / Edge-runtime shims don't see the wrapper —
same class of bug as the banned `fetch.bind(globalThis)` pattern
flagged in AGENTS.md.
2. dedup-golden-pairs.yml: `gh issue create --label "..." || true`
silently swallowed the failure when any of dedup/canary/p1 labels
didn't pre-exist, breaking the drift alert channel while leaving
the job red in the Actions UI. Switched to repeated `--label`
flags + `--create-label` so any missing label is auto-created on
first drift, and dropped the `|| true` so a legitimate failure
(network / auth) surfaces instead of hiding.
Both fixes are P2-style per Greptile (confidence 5/5, no P0/P1);
applied pre-merge so the nightly canary is usable from day one.
* fix(digest-dedup): two P1s found on PR #3200
P1 — canary classifier must match production
Nightly golden-pair validator was checking a hardcoded threshold
(default 0.60) and always applied the entity veto, while the actual
dedup path at runtime reads DIGEST_DEDUP_COSINE_THRESHOLD and
DIGEST_DEDUP_ENTITY_VETO_ENABLED from env at every call. A Phase
C/D env flip could make the canary green while prod was wrong or
red while prod was healthy, defeating the whole point of a drift
detector.
Fix:
- golden-pair-validator.mjs now calls readOrchestratorConfig(process.env)
— the same helper the orchestrator uses — so any classifier knob
added later is picked up automatically. The threshold and veto-
enabled flags are sourced from env by default; a --threshold CLI
flag still overrides for manual calibration sweeps.
- dedup-golden-pairs.yml sources DIGEST_DEDUP_COSINE_THRESHOLD and
DIGEST_DEDUP_ENTITY_VETO_ENABLED from GitHub repo variables (vars.*),
which operators must keep in lockstep with Railway. The
workflow_dispatch threshold input now defaults to empty; the
scheduled canary always uses the production-parity config.
- Validator log line prints the effective config + source so nightly
output makes the classifier visible.
P1 — shadow archive writes were fail-open
`defaultRedisPipeline()` returns null on timeout / auth / HTTP
failure. `writeShadowArchive()` only had a try/catch, so the null
result was silently treated as success. A Phase C rollout could
log clean "mode=shadow … disagreements=X" lines every tick while
the Upstash archive received zero writes — and Sample B labelling
would then find no batches, silently killing calibration.
Fix:
- writeShadowArchive now inspects the pipeline return. null result,
non-array response, per-command {error}, or a cell without
{result: "OK"} all return {ok: false, reason}.
- Orchestrator emits a warn line with the failure reason, and the
structured log line carries archive_write=ok|failed so operators
can grep for failed ticks.
- Regression test in brief-dedup-embedding.test.mjs simulates the
null-pipeline contract and asserts both the warn and the structured
field land.
Verification:
- test:data 5825/5825 pass
- dedup suites 65/65 pass (new: archive-fail regression)
- typecheck + api clean
- biome check clean on changed files
* fix(digest-dedup): two more P1s found on PR #3200
P1 — canary must also honour DIGEST_DEDUP_MODE + REMOTE_EMBED_ENABLED
The prior round fixed the threshold/veto knobs but left the canary
running embeddings regardless of whether production could actually
reach the embed path. If Railway has DIGEST_DEDUP_MODE=jaccard or
DIGEST_DEDUP_REMOTE_EMBED_ENABLED=0, production never calls the
classifier, so a drift signal is meaningless — or worse, a live
OpenRouter issue flags the canary while prod is obliviously fine.
Fix:
- golden-pair-validator.mjs reads mode + remoteEmbedEnabled from the
same readOrchestratorConfig() helper the orchestrator uses. When
either says "embed path inactive in prod", the validator logs an
explicit skip line and exits 0. The nightly workflow then shows
green, which is the correct signal ("nothing to drift against").
- A --force CLI flag remains for manual dispatch during staged
rollouts.
- dedup-golden-pairs.yml sources DIGEST_DEDUP_MODE and
DIGEST_DEDUP_REMOTE_EMBED_ENABLED from GitHub repo variables
alongside the threshold and veto-enabled knobs, so all four
classifier gates stay in lockstep with Railway.
- Validator log line now prints mode + remoteEmbedEnabled so the
canary output surfaces which classifier it validated.
P1 — shadow-sample Sample A was biased by SCAN order
enumerate-and-dedup added every seen pair to a dedup key BEFORE
filtering by agreement. If the same pair appeared in an agreeing
batch first and a disagreeing batch later, the disagreeing
occurrence was silently dropped. SCAN order is unspecified, so
Sample A could omit real disagreement pairs.
Fix:
- Extracted the enumeration into a pure `enumeratePairs(archives, mode)`
export so the logic is testable. Mode filter runs BEFORE the dedup
check: agreeing pairs are skipped entirely under
--mode disagreements, so any later disagreeing occurrence can
still claim the dedup slot.
- Added tests/brief-dedup-shadow-sample.test.mjs with 5 regression
cases: agreement-then-disagreement, reversed order (symmetry),
always-agreed omission, population enumeration, cross-batch dedup.
- isMain guard added so importing the module for tests does not
kick off the CLI scan path.
Verification:
- test:data 5825/5825 pass
- dedup suites 70/70 pass (5 new shadow-sample regressions)
- typecheck + api clean
- biome check clean on changed files
Operator follow-up before Phase C:
Set all FOUR dedup repo variables in GitHub alongside Railway:
DIGEST_DEDUP_MODE, DIGEST_DEDUP_REMOTE_EMBED_ENABLED,
DIGEST_DEDUP_COSINE_THRESHOLD, DIGEST_DEDUP_ENTITY_VETO_ENABLED
* refactor(digest-dedup): Railway is the single source of truth for dedup config
Fair user pushback: asking operators to set four DIGEST_DEDUP_*
values in BOTH Railway (where the cron runs) AND GitHub repo
variables (where the canary runs) is architectural debt. Two
copies of the same truth will always drift.
Solution: the digest cron publishes its resolved config to Upstash
on every tick under brief:dedup:config:v1 (2h TTL). The nightly
golden-pair canary reads that key instead of env vars. Railway
stays the sole source of truth; no parallel repo variables to
maintain. A missing/expired key signals "cron hasn't run" and
the canary skips with exit 0 — better than validating against
hardcoded defaults that might diverge from prod.
Changes:
- brief-dedup-consts.mjs: new ACTIVE_CONFIG_KEY + TTL constants.
- brief-dedup.mjs: new publishActiveConfig() fires at the start of
every deduplicateStories() call (before the mode short-circuit,
so jaccard ticks also publish a "mode=jaccard" signal the canary
can read). Fire-and-forget; archive-write error semantics still
apply if the operator wants stricter tracking.
- golden-pair-validator.mjs: removed readOrchestratorConfig(env)
path. Now calls fetchActiveConfigFromUpstash() and either
validates against that config, skips when the embed path is
inactive, or skips when the key is missing (with --force
override for manual dispatch).
- dedup-golden-pairs.yml: dropped the four DIGEST_DEDUP_* env lines
and the corresponding repo-variable dependency. Only the three
Upstash + OpenRouter secrets remain.
- tests: two new regressions assert config is published on every
tick (shadow AND jaccard modes) with the right shape + TTL.
Operator onboarding now takes one action: set the four
DIGEST_DEDUP_* variables on the Railway seed-digest-notifications
service. Nothing to set in GitHub beyond the existing
OPENROUTER_API_KEY / UPSTASH_* secrets.
Verification:
- test:data 5825/5825 pass
- dedup suites 72/72 pass (2 new config-publish regressions)
- typecheck + api clean
- biome check clean on changed files
* refactor(digest-dedup): ship embed directly, drop phases/canary/shadow
User feedback: "i dont need multiple phases and shit, we go directly
to embed". Fair. Ripping out the overengineering I accumulated:
DELETED
- .github/workflows/dedup-golden-pairs.yml (nightly canary)
- scripts/tools/golden-pair-validator.mjs
- scripts/tools/shadow-sample.mjs
- scripts/tools/calibrate-dedup-threshold.mjs
- tests/fixtures/brief-dedup-golden-pairs.json
- tests/brief-dedup-golden.test.mjs
- tests/brief-dedup-shadow-sample.test.mjs
SIMPLIFIED
- brief-dedup.mjs: removed shadow mode, publishActiveConfig,
writeShadowArchive, diffClustersByHash, jaccardRepsToClusterHashes,
and the DIGEST_DEDUP_REMOTE_EMBED_ENABLED knob. MODE is now
binary: `embed` (default) or `jaccard` (instant kill switch).
- brief-dedup-consts.mjs: dropped SHADOW_ARCHIVE_*, ACTIVE_CONFIG_*.
- Default flipped: DIGEST_DEDUP_MODE unset = embed (prod path).
Railway deploy with OPENROUTER_API_KEY set = embeddings live on
next cron tick. Set MODE=jaccard on Railway to revert instantly.
Orchestrator still falls back to Jaccard on any embed-path failure
(timeout, provider outage, missing API key, bad response). Fallback
warn carries reason=<ErrorName>. The cron never fails because
embeddings flaked. All 64 dedup tests + 5825 data tests still green.
Net diff: -1,407 lines.
Operator single action: set OPENROUTER_API_KEY on Railway's
seed-digest-notifications service (already present) and ship. No
GH Actions, no shadow archives, no labelling sprints. If the 0.60
threshold turns out wrong, tune DIGEST_DEDUP_COSINE_THRESHOLD on
Railway — takes effect on next tick, no redeploy.
* fix(digest-dedup): multi-word location phrases in the entity veto
Extractor was whitespace-tokenising and only single-token matching
against LOCATION_GAZETTEER, silently making every multi-word entry
unreachable:
extractEntities("Houthis strike ship in Red Sea")
→ { locations: [], actors: ['houthis','red','sea'] } ✗
shouldVeto("Houthis strike ship in Red Sea",
"US escorts convoy in Red Sea") → false ✗
With MODE=embed as the default, that turned off the main
anti-overmerge safety rail for bodies of water, regions, and
compound city names — exactly the P07-Hormuz / Houthis-Red-Sea
headlines the veto was designed to cover.
Fix: greedy longest-phrase scan with a sliding window. At each
token position try the longest multi-word phrase first (down to
2), require first AND last tokens to be capitalised (so lowercase
prose like "the middle east" doesn't falsely match while headline
"Middle East" does), lowercase connectors in between are fine
("Strait of Hormuz" → phrase "strait of hormuz" ✓). Falls back to
single-token lookup when no multi-word phrase fits.
Now:
extractEntities("Houthis strike ship in Red Sea")
→ { locations: ['red sea'], actors: ['houthis'] } ✓
shouldVeto(Red-Sea-Houthis, Red-Sea-US) → true ✓
Complexity still O(N · MAX_PHRASE_LEN) — MAX_PHRASE_LEN is 4
(longest gazetteer entry: "ho chi minh city"), so this is
effectively O(N).
Added 5 regression tests covering Red Sea, South China Sea,
Strait of Hormuz (lowercase-connector case), Abu Dhabi, and
New York, plus the Houthis-vs-US veto reproducer from the P1.
All 5825 data tests + 45 dedup tests green; lint + typecheck clean.
|
||
|
|
72e3e3ee3b |
fix(auth): remove isProUser() gate so all visitors see Sign In button (#3115)
The setupAuthWidget() call was gated behind isProUser(), which created a deadlock: new users without legacy API keys could never see the Sign In button and thus could never authenticate. Removes the guard in both the main app and the pro landing page pricing section. Closes #034 Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
7dfdc819a9 | Phase 0: Regional Intelligence snapshot writer foundation (#2940) | ||
|
|
60e727679c |
feat(supply-chain): Sprint E — scenario visual completion + service parity (#2910)
* feat(supply-chain): Sprint E — scenario visual completion + service parity - E1: fetchSectorDependency exported from supply-chain service index - E2: PRO gate + all-renderer dispatch in MapContainer.activateScenario - E3: scenario summary banner in SupplyChainPanel (dismiss wired) - E4: "Simulate Closure" trigger button in expanded chokepoint cards - E5: affectedIso2s heat layer in DeckGLMap (GeoJsonLayer, red tint) - E6: SVG renderer setScenarioState (best-effort iso2 fill) - E7: Globe renderer scenario polygons via flushPolygons - E8: integration tests for scenario run/status endpoints * fix(supply-chain): address PR #2910 review findings (P1 + P2 + P3) - Wire setOnScenarioActivate + setOnDismissScenario in panel-layout.ts (todo #155) - Rename shadow variable t→tmpl in SCENARIO_TEMPLATES.find (todo #152) - Add statusResp.ok guard in scenario polling loop (todo #153) - Replace status.result! non-null assertion with shape guard (todo #154) - Add AbortController to prevent concurrent polling races (todo #162) - Add polygonStrokeColor scenario branch (transparent) in GlobeMap (todo #156) - Re-export SCENARIO_TEMPLATES via src/config/scenario-templates.ts (todo #157) - Cache affectedIso2Set in DeckGLMap.setScenarioState (todo #158) - Add scenario paths to PREMIUM_RPC_PATHS for auth injection (todo #160) - Show template name in scenario banner instead of raw ID (todo #163) * fix(supply-chain): address PR #2910 review findings - Add auth headers to scenario fetch calls in SupplyChainPanel - Reset button state on scenario dismiss - Poll status immediately on first iteration (no 2s delay) - Pre-compute scenario polygons in GlobeMap.setScenarioState - Use scenarioId for DeckGL updateTriggers precision * fix(supply-chain): wire panel instance to MapContainer, stop button click propagation - Call setSupplyChainPanel() in panel-layout.ts so scenario banner renders - Add stopPropagation() to Simulate Closure button to prevent card collapse |
||
|
|
6e401ad02f |
feat(supply-chain): Global Shipping Intelligence — Sprint 0 + Sprint 1 (#2870)
* feat(supply-chain): Sprint 0 — chokepoint registry, HS2 sectors, war_risk_tier
- src/config/chokepoint-registry.ts: single source of truth for all 13
canonical chokepoints with displayName, relayName, portwatchName,
corridorRiskName, baselineId, shockModelSupported, routeIds, lat/lon
- src/config/hs2-sectors.ts: static dictionary for all 99 HS2 chapters
with category, shockModelSupported (true only for HS27), cargoType
- server/worldmonitor/supply-chain/v1/_chokepoint-ids.ts: migrated to
derive CANONICAL_CHOKEPOINTS from chokepoint-registry; no data duplication
- src/config/geo.ts + src/types/index.ts: added chokepointId field to
StrategicWaterway interface and all 13 STRATEGIC_WATERWAYS entries
- src/components/MapPopup.ts: switched chokepoint matching from fragile
name.toLowerCase() to direct chokepointId === id comparison
- server/worldmonitor/intelligence/v1/_shock-compute.ts: migrated from old
IDs (hormuz/malacca/babelm) to canonical IDs (hormuz_strait/malacca_strait/
bab_el_mandeb); same for CHOKEPOINT_LNG_EXPOSURE
- proto/worldmonitor/supply_chain/v1/supply_chain_data.proto: added
WarRiskTier enum + war_risk_tier field (field 16) on ChokepointInfo
- get-chokepoint-status.ts: populates warRiskTier from ChokepointConfig.threatLevel
via new threatLevelToWarRiskTier() helper (FREE field, no PRO gate)
* feat(supply-chain): Sprint 1 — country chokepoint exposure index + sector ring
S1.1: scripts/shared/country-port-clusters.json
~130 country → {nearestRouteIds, coastSide} mappings derived from trade route
waypoints; covers all 6 seeded Comtrade reporters plus major trading nations.
S1.2: scripts/seed-hs2-chokepoint-exposure.mjs
Daily cron seeder. Pure computation — reads country-port-clusters.json,
scores each country against CHOKEPOINT_REGISTRY route overlap, writes
supply-chain:exposure:{iso2}:{hs2}:v1 keys + seed-meta (24h TTL).
S1.3: RPC get-country-chokepoint-index (PRO-gated, request-varying)
- proto: GetCountryChokepointIndexRequest/Response + ChokepointExposureEntry
- handler: isCallerPremium gate; cachedFetchJson 24h; on-demand for any iso2
- cache-keys.ts: CHOKEPOINT_EXPOSURE_KEY(iso2, hs2) constant
- health.js: chokepointExposure SEED_META entry (48h threshold)
- gateway.ts: slow-browser cache tier
- service client: fetchCountryChokepointIndex() exported
S1.4: Chokepoint popup HS2 sector ring chart (PRO-gated)
Static trade-sector breakdown (IEA/UNCTAD estimates) per 9 major chokepoints.
SVG donut ring + legend shown for PRO users; blurred lockout + gate-hit
analytics for free users. Wired into renderWaterwayPopup().
🤖 Generated with Claude Sonnet 4.6 via Claude Code (https://claude.com/claude-code) + Compound Engineering v2.49.0
Co-Authored-By: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com>
* fix(tests): update energy-shock-v2 tests to use canonical chokepoint IDs
CHOKEPOINT_EXPOSURE and CHOKEPOINT_LNG_EXPOSURE keys were migrated from
short IDs (hormuz, malacca, babelm) to canonical registry IDs
(hormuz_strait, malacca_strait, bab_el_mandeb) in Sprint 0.
Test fixtures were not updated at the time; fix them now.
* fix(tests): update energy-shock-seed chokepoint ID to canonical form
VALID_CHOKEPOINTS changed to canonical IDs in Sprint 0; the seed test
that checks valid IDs was not updated alongside it.
* fix(cache-keys): reword JSDoc comment to avoid confusing bootstrap test regex
The comment "NOT in BOOTSTRAP_CACHE_KEYS" caused the bootstrap.test.mjs
regex to match the comment rather than the actual export declaration,
resulting in 0 entries found. Rephrase to "excluded from bootstrap".
* fix(supply-chain): address P1 review findings for chokepoint exposure index
- Add get-country-chokepoint-index to PREMIUM_RPC_PATHS (CDN bypass)
- Validate iso2/hs2 params before Redis key construction (cache injection)
- Fix seeder TTL to 172800s (2× interval) and extend TTL on skipped lock
- Fix CHOKEPOINT_EXPOSURE_SEED_META_KEY to match seeder write key
- Render placeholder sectors behind blur gate (DOM data leakage)
- Document get-country-chokepoint-index in widget agent system prompts
* fix(lint): resolve Biome CI failures
- Add biome.json overrides to silence noVar in HTML inline scripts,
disable linting for public/ vendor/build artifacts and pro-test/
- Remove duplicate NG and MW keys from country-port-clusters.json
- Use import attributes (with) instead of deprecated assert syntax
* fix(build): drop JSON import attribute — esbuild rejects `with` syntax
---------
Co-authored-by: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com>
|
||
|
|
2fddee6b05 |
feat(simulation): add keyActorRoles to fix actor overlap bonus vocabulary mismatch (#2582)
* feat(simulation): add keyActorRoles field to fix actor overlap bonus vocabulary mismatch
The +0.04 actor overlap bonus never reliably fired in production because
stateSummary.actors uses role-category strings ('Commodity traders',
'Policy officials') while simulation keyActors uses named geo-political
entities ('Iran', 'Houthi'). 53 production runs audited showed the bonus
fired once out of 53.
Fix: add keyActorRoles?: string[] to SimulationTopPath. The Round 2 prompt
now includes a CANDIDATE ACTOR ROLES section with theater-local role vocab
seeded from candidatePacket.stateSummary.actors. The LLM copies matching
roles into keyActorRoles. applySimulationMerge scores overlap against
keyActorRoles when actorSource=stateSummary, preserving the existing
keyActors entity-overlap path for the affectedAssets fallback.
- buildSimulationPackageFromDeepSnapshot: add actorRoles[] to each theater
from candidate.stateSummary.actors (theater-scoped, no cross-theater noise)
- buildSimulationRound2SystemPrompt: inject CANDIDATE ACTOR ROLES section
with exact-copy instruction and keyActorRoles in JSON template
- tryParseSimulationRoundPayload: extract keyActorRoles from round 2 output
- mergedPaths.map(): filter keyActorRoles against theater.actorRoles guardrail
- computeSimulationAdjustment: dual-path overlap — roleOverlapCount for
stateSummary, keyActorsOverlapCount for affectedAssets (backwards compat)
- summarizeImpactPathScore: project roleOverlapCount + keyActorsOverlapCount
into path-scorecards.json simDetail
New fields: roleOverlapCount, keyActorsOverlapCount in SimulationAdjustmentDetail
and ScorecardSimDetail. actorOverlapCount preserved as backwards-compat alias.
Tests: 308 pass (was 301 before). New tests T-P1/T-P2/T-P3 (prompt/parser),
T-RO1/T-RO2/T-RO3 (role overlap logic), T-PKG1 (pkg builder actorRoles),
plus fixture updates for T2/T-F/T-G/T-J/T-K/T-N2/T-SC-4.
🤖 Generated with Claude Sonnet 4.6 via Claude Code (https://claude.ai/claude-code) + Compound Engineering v2.49.0
Co-Authored-By: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com>
* fix(simulation): address CE review findings from PR #2582
- Add SimulationPackageTheater interface to seed-forecasts.types.d.ts
(actorRoles was untyped under @ts-check)
- Add keyActorRoles to uiTheaters Redis projection in writeSimulationOutcome
(field was stripped from Redis snapshot; only visible in R2 artifact)
- Extract keyActorRoles IIFE to named sanitizeKeyActorRoles() function;
hoist allowedRoles Set computation out of per-path loop
- Harden bonusOverlap ternary: explicit branch for actorSource='none'
prevents silent fallthrough if new actorSource values are added
- Eliminate roleOverlap intermediate array in computeSimulationAdjustment
- Add U+2028/U+2029 Unicode line-separator stripping to sanitizeForPrompt
- Apply sanitizeForPrompt at tryParseSimulationRoundPayload parse boundary;
add JSDoc to newly-exported function
All 308 tests pass, typecheck + typecheck:api clean.
* fix(sim): restore const sanitized in sanitizeKeyActorRoles after early-return guard
Prior edit added `if (!allowedRoles.length) return []` but accidentally removed
the `const sanitized = ...` line, leaving the filter on line below referencing an
undefined variable. Restores the full function body:
if (!allowedRoles.length) return [];
const sanitized = (Array.isArray(rawRoles) ? rawRoles : [])
.map((s) => sanitizeForPrompt(String(s)).slice(0, 80));
const allowedNorm = new Set(allowedRoles.map(normalizeActorName));
return sanitized.filter((s) => allowedNorm.has(normalizeActorName(s))).slice(0, 8);
308/308 tests pass.
---------
Co-authored-by: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com>
|
||
|
|
14a31c4283 |
feat(mcp): OAuth 2.0 Authorization Server for claude.ai connector (#2418)
* feat(mcp): add OAuth 2.0 Authorization Server for claude.ai connector Implements spec-compliant MCP authentication so claude.ai's remote connector (which requires OAuth Client ID + Secret, no custom headers) can authenticate. - public/.well-known/oauth-authorization-server: RFC 8414 discovery document - api/oauth/token.js: client_credentials grant, issues UUID Bearer token in Redis TTL 3600s - api/_oauth-token.js: resolveApiKeyFromBearer() looks up token in Redis - api/mcp.ts: 3-tier auth (Bearer OAuth first, then ?key=, then X-WorldMonitor-Key); switch to getPublicCorsHeaders; surface error messages in catch - vercel.json: rewrite /oauth/token, exclude oauth from SPA, CORS headers - tests: update SPA no-cache pattern Supersedes PR #2417. Usage: URL=worldmonitor.app/mcp, Client ID=worldmonitor, Client Secret=<API key> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: fix markdown lint in OAuth plan (blank lines around lists) * fix(oauth): address all P1+P2 code review findings for MCP OAuth endpoint - Add per-IP rate limiting (10 req/min) to /oauth/token via Upstash slidingWindow - Return HTTP 401 + WWW-Authenticate header when Bearer token is invalid/expired - Add Cache-Control: no-store + Pragma: no-cache to token response (RFC 6749 §5.1) - Simplify _oauth-token.js to delegate to readJsonFromUpstash (removes duplicated Redis boilerplate) - Remove dead code from token.js: parseBasicAuth, JSON body path, clientId/issuedAt fields - Add Content-Type: application/json header for /.well-known/oauth-authorization-server - Remove response_types_supported (only applies to authorization endpoint, not client_credentials) Closes: todos 075, 076, 077, 078, 079 🤖 Generated with claude-sonnet-4-6 via Claude Code (https://claude.ai/claude-code) + Compound Engineering v2.40.0 Co-Authored-By: claude-sonnet-4-6 (200K context) <noreply@anthropic.com> * chore(review): fresh review findings — todos 081-086, mark 075/077/078/079 complete * fix(mcp): remove ?key= URL param auth + mask internal errors - Remove ?key= query param auth path — API keys in URLs appear in Vercel/CF access logs, browser history, Referer headers. OAuth client_credentials (same PR) already covers clients that cannot set custom headers. Only two auth paths remain: Bearer OAuth and X-WorldMonitor-Key header. - Revert err.message disclosure: catch block was accidentally exposing internal service URLs/IPs via err.message. Restore original hardcoded string, add console.error for server-side visibility. Resolves: todos 081, 082 * fix(oauth): resolve all P2/P3 review findings (todos 076, 080, 083-086) - 076: no-credentials path in mcp.ts now returns HTTP 401 + WWW-Authenticate instead of rpcError (200) - 080: store key fingerprint (sha256 first 16 hex chars) in Redis, not plaintext key - 083: replace Array.includes() with timingSafeIncludes() (constant-time HMAC comparison) in token.js and mcp.ts - 084: resolveApiKeyFromBearer uses direct fetch that throws on Redis errors (500 not 401 on infra failure) - 085: token.js imports getClientIp, getPublicCorsHeaders, jsonResponse from shared helpers; removes local duplicates - 086: mcp.ts auth chain restructured to check Bearer header first, passes token string to resolveApiKeyFromBearer (eliminates double header read + unconditional await) * test(mcp): update auth test to expect HTTP 401 for missing credentials Align with todo 076 fix: no-credentials path now returns 401 + WWW-Authenticate instead of JSON-RPC 200 response. Also asserts WWW-Authenticate header presence. * chore: mark todos 076, 080, 083-086 complete * fix(mcp): harden OAuth error paths and fix rate limit cross-user collision - Wrap resolveApiKeyFromBearer() in try/catch in mcp.ts; Redis/network errors now return 503 + Retry-After: 5 instead of crashing the handler - Wrap storeToken() fetch in try/catch in oauth/token.js; network errors return false so the existing if (!stored) path returns 500 cleanly - Re-key token endpoint rate limit by sha256(clientSecret).slice(0,8) instead of IP; prevents cross-user 429s when callers share Anthropic's shared outbound IPs (Claude remote MCP connector) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|
|
f783bf2d2d |
fix(intelligence): analytical frameworks follow-up — P1 security + P2 correctness fixes (#2386)
* fix(intelligence): include framework/systemAppend hash in cache keys (todos 041, 045, 051) * fix(intelligence): gate framework/systemAppend on server-side PRO check (todo 042) * fix(skills): exact hostname allowlist + redirect:manual to prevent SSRF (todos 043, 054) * fix(intelligence): sanitize systemAppend against prompt injection before LLM (todo 044) * fix(intelligence): use framework field in DeductionPanel, fix InsightsPanel double increment (todos 046, 047) * fix(intelligence): settings export, hot-path cache, country-brief debounce (todos 048, 049, 050) * fix(intelligence): i18n, FrameworkSelector note, stripThinkingTags dedup, UUID IDs (todos 052, 055, 056, 057) - i18n Analysis Frameworks settings section (en + fr locales, replace all hardcoded English strings with t() calls) - FrameworkSelector: replace panelId==='insights' hardcode with note? option; both InsightsPanel and DailyMarketBriefPanel pass note - stripThinkingTags: remove inline duplicate in summarize-article.ts, import from _shared/llm; add Strip unterminated comment so tests can locate the section - Replace Date.now() IDs for imported frameworks with crypto.randomUUID() - Drop 'not supported in phase 1' phrasing to 'not supported' - test: fix summarize-reasoning Fix 2 suite to read from llm.ts - test: add premium-check-stub and wire into redis-caching country intel brief importPatchedTsModule so test can resolve the new import * fix(security): address P1 review findings from PR #2386 - premium-check: require `required: true` from validateApiKey so trusted browser origins (worldmonitor.app, Vercel previews, localhost) are not treated as PRO callers; fixes free-user bypass of framework/systemAppend gate - llm: replace weak sanitizeSystemAppend with sanitizeForPrompt from llm-sanitize.js; all callLlm callers now get model-delimiter and control-char stripping, not just phrase blocklist - get-country-intel-brief: apply sanitizeForPrompt to contextSnapshot before injecting into user prompt; fixes unsanitized query-param injection Closes todos 060, 061, 062 (P1 — blocked merge of #2386). * chore(todos): mark P1 todos 060-062 complete * fix(agentskills): address Greptile P2 review comments - hoist ALLOWED_AGENTSKILLS_HOSTS Set to module scope (was reallocated per-request) - add res.type === 'opaqueredirect' check alongside the 3xx guard; Edge Runtime returns status=0 for opaque redirects so the status range check alone is dead code |
||
|
|
110ab402c4 |
feat(intelligence): analytical framework selector for AI panels (#2380)
* feat(frameworks): add settings section and import modal - Add Analysis Frameworks group to preferences-content.ts between Intelligence and Media sections - Per-panel active framework display (read-only, 4 panels) - Skill library list with built-in badge, Rename and Delete actions for imported frameworks - Import modal with two tabs: From agentskills.io (fetch + preview) and Paste JSON - All error cases handled inline: network, domain validation, missing instructions, invalid JSON, duplicate name, instructions too long, rate limit - Add api/skills/fetch-agentskills.ts edge function (proxy to agentskills.io) - Add analysis-framework-store.ts (loadFrameworkLibrary, saveImportedFramework, deleteImportedFramework, renameImportedFramework, getActiveFrameworkForPanel) - Add fw-* CSS classes to main.css matching dark panel aesthetic * feat(panels): wire analytical framework store into InsightsPanel, CountryDeepDive, DailyMarketBrief, DeductionPanel - InsightsPanel: append active framework to geoContext in updateFromClient(); subscribe in constructor, unsubscribe in destroy() - CountryIntelManager: pass framework as query param to fetchCountryIntelBrief(); subscribe to re-open brief on framework change; unsubscribe in destroy() - DataLoaderManager: add dailyBriefGeneration counter for stale-result guard; pass frameworkAppend to buildDailyMarketBrief(); subscribe to framework changes to force refresh; unsubscribe in destroy() - daily-market-brief service: add frameworkAppend? field to BuildDailyMarketBriefOptions; append to extendedContext before summarize call - DeductionPanel: append active framework to geoContext in handleSubmit() before RPC call * feat(frameworks): add FrameworkSelector UI component - Create FrameworkSelector component with premium/locked states - Premium: select dropdown with all framework options, change triggers setActiveFrameworkForPanel - Locked: disabled select + PRO badge, click calls showGatedCta(FREE_TIER) - InsightsPanel: adds asterisk note (client-generated analysis hint) - Wire into InsightsPanel, DailyMarketBriefPanel, DeductionPanel (via this.header) - Wire into CountryDeepDivePanel header right-side (no Panel base, panel=null) - Add framework-selector CSS to main.css * fix(frameworks): make new proto fields optional in generated types * fix(frameworks): extract firstMsg to satisfy strict null checks in tsconfig.api.json * fix(docs): add blank lines around lists/headings to pass markdownlint * fix(frameworks): add required proto string fields to call sites after make generate * chore(review): add code review todos 041-057 for PR #2380 7 review agents (TypeScript, Security, Architecture, Performance, Simplicity, Agent-Native, Learnings) identified 17 findings across 5 P1, 8 P2, and 4 P3 categories. |
||
|
|
e7ba05553d |
fix(health): disease outbreaks CDC/Outbreak feeds, VPD tracker seed, BOOTSTRAP_KEYS gold standard (#2378)
* feat(panels): Disease Outbreaks, Shipping Stress, Social Velocity, nuclear test site monitoring - Add HealthService proto with ListDiseaseOutbreaks RPC (WHO + ProMED RSS) - Add GetShippingStress RPC to SupplyChainService (Yahoo Finance carrier ETFs) - Add GetSocialVelocity RPC to IntelligenceService (Reddit r/worldnews + r/geopolitics) - Enrich earthquake seed with Haversine nuclear test-site proximity scoring - Add 5 nuclear test sites to NUCLEAR_FACILITIES (Punggye-ri, Lop Nur, Novaya Zemlya, Nevada NTS, Semipalatinsk) - Add shipping stress + social velocity seed loops to ais-relay.cjs - Add seed-disease-outbreaks.mjs Railway cron script - Wire all new RPCs: edge functions, handlers, gateway cache tiers, health.js STANDALONE_KEYS/SEED_META * fix(relay): apply gold standard retry/TTL-extend pattern to shipping-stress and social-velocity seeders * fix(review): address all PR #2375 review findings - health.js: shippingStress maxStaleMin 30→45 (3x interval), socialVelocity 20→30 (3x interval) - health.js: remove shippingStress/diseaseOutbreaks/socialVelocity from ON_DEMAND_KEYS (relay/cron seeds, not on-demand) - cache-keys.ts: add shippingStress, diseaseOutbreaks, socialVelocity to BOOTSTRAP_CACHE_KEYS - ais-relay.cjs: stressScore formula 50→40 (neutral market = moderate, not elevated) - ais-relay.cjs: fetchedAt Date.now() (consistent with other seeders) - ais-relay.cjs: deduplicate cross-subreddit article URLs in social velocity loop - seed-disease-outbreaks.mjs: WHO URL → specific DON RSS endpoint (not dead general news feed) - seed-disease-outbreaks.mjs: validate() requires outbreaks.length >= 1 (reject empty array) - seed-disease-outbreaks.mjs: stable id using hash(link) not array index - seed-disease-outbreaks.mjs: RSS regexes use [\s\S]*? for CDATA multiline content - seed-earthquakes.mjs: Lop Nur coordinates corrected (41.39,89.03 not 41.75,88.35) - seed-earthquakes.mjs: sourceVersion bumped to usgs-4.5-day-nuclear-v1 - earthquake.proto: fields 8-11 marked optional (distinguish not-enriched from enriched=false/0) - buf generate: regenerate seismology service stubs * revert(cache-keys): don't add new keys to bootstrap without frontend consumers * fix(panels): address all P1/P2/P3 review findings for PR #2375 - proto: add INT64_ENCODING_NUMBER annotation + sebuf import to get_shipping_stress.proto (run make generate) - bootstrap: register shippingStress (fast), socialVelocity (fast), diseaseOutbreaks (slow) in api/bootstrap.js + cache-keys.ts - relay: update WIDGET_SYSTEM_PROMPT with new bootstrap keys and live RPCs for health/supply-chain/intelligence - seeder: remove broken ProMED feed URL (promedmail.org/feed/ returns HTML 404); add 500K size guard to fetchRssItems; replace private COUNTRY_CODE_MAP with shared geo-extract.mjs; remove permanently-empty location field; bump sourceVersion to who-don-rss-v2 - handlers: remove dead .catch from all 3 new RPC handlers; fix stressLevel fallback to low; fix fetchedAt fallback to 0 - services: add fetchShippingStress, disease-outbreaks.ts, social-velocity.ts with getHydratedData consumers * fix(health): move seeded keys to BOOTSTRAP_KEYS, add VPD tracker seed and feeds - Reclassify diseaseOutbreaks, shippingStress, socialVelocity from STANDALONE_KEYS to BOOTSTRAP_KEYS so health endpoint reports CRIT (not WARN) when their seeds miss a cycle - Add vpdTrackerRealtime and vpdTrackerHistorical to BOOTSTRAP_KEYS with SEED_META entries (maxStaleMin: 2880 = 2x daily interval) - Fix seed-disease-outbreaks: add CDC and Outbreak News Today feeds alongside WHO, populate location field from title parsing, fix TTL to 259200s (3x daily interval per gold standard) - Add seed-vpd-tracker.mjs: scrapes Think Global Health VPD Tracker bundle (1,827 realtime alerts + 25,960 historical WHO records), writes both Redis keys in one runSeed call via extraKeys - Add review todos 049-059 from PR #2375 code review |
||
|
|
41d964265e |
fix(map): sanctions layer invisible on WebGL map (#2267)
* fix(health): extend EMPTY_DATA_OK_KEYS check to bootstrap loop
Greptile correctly identified the fix was a no-op: EMPTY_DATA_OK_KEYS was
only consulted in the STANDALONE_KEYS loop, not BOOTSTRAP_KEYS. All three
calendar keys are bootstrap keys, so critCount was still incremented.
Mirror the same seedStale branching already present in the standalone loop
into the bootstrap loop so EMPTY_DATA_OK members get OK/STALE_SEED instead
of EMPTY/EMPTY_DATA/critCount++.
* fix(map): implement sanctions choropleth layer in DeckGLMap
The sanctions layer toggle did nothing on the WebGL map — DeckGLMap had
no rendering logic, only a help text label. Only Map.ts (2D SVG) had the
updateCountryFills() implementation.
Add createSanctionsChoroplethLayer() as a GeoJsonLayer using
SANCTIONED_COUNTRIES_ALPHA2 (new export) since countriesGeoJsonData keys
by ISO3166-1-Alpha-2, not the numeric IDs used by Map.ts/TopoJSON.
Wire it into the layer pipeline after the CII choropleth.
Alpha values match Map.ts: severe=89, high=64, moderate=51 (0-255).
* Revert "fix(health): extend EMPTY_DATA_OK_KEYS check to bootstrap loop"
This reverts commit
|
||
|
|
2939b1f4a1 |
feat(finance-panels): add 7 macro/market panels + Daily Brief context (issues #2245-#2253) (#2258)
* feat(fear-greed): add regime state label, action stance badge, divergence warnings Closes #2245 * feat(finance-panels): add 7 new finance panels + Daily Brief macro context Implements issues #2245 (F&G Regime), #2246 (Sector Heatmap bars), #2247 (MacroTiles), #2248 (FSI), #2249 (Yield Curve), #2250 (Earnings Calendar), #2251 (Economic Calendar), #2252 (COT Positioning), #2253 (Daily Brief prompt extension). New panels: - MacroTilesPanel: CPI YoY, Unemployment, GDP, Fed Rate tiles via FRED - FSIPanel: Financial Stress Indicator gauge (HYG/TLT/VIX/HY-spread) - YieldCurvePanel: SVG yield curve chart with inverted/normal badge - EarningsCalendarPanel: Finnhub earnings calendar with BMO/AMC/BEAT/MISS - EconomicCalendarPanel: FOMC/CPI/NFP events with impact badges - CotPositioningPanel: CFTC disaggregated COT positioning bars - MarketPanel: adds sorted bar chart view above sector heatmap grid New RPCs: - ListEarningsCalendar (market/v1) - GetCotPositioning (market/v1) - GetEconomicCalendar (economic/v1) Seed scripts: - seed-earnings-calendar.mjs (Finnhub, 14-day window, TTL 12h) - seed-economic-calendar.mjs (Finnhub, 30-day window, TTL 12h) - seed-cot.mjs (CFTC disaggregated text file, TTL 7d) - seed-economy.mjs: adds yield curve tenors DGS1MO/3MO/6MO/1/2/5/30 - seed-fear-greed.mjs: adds FSI computation + sector performance Daily Brief: extends buildDailyMarketBrief with optional regime, yield curve, and sector context fed to the LLM summarization prompt. All panels default enabled in FINANCE_PANELS, disabled in FULL_PANELS. 🤖 Generated with Claude Sonnet 4.6 via Claude Code (https://claude.ai/claude-code) + Compound Engineering v2.40.0 Co-Authored-By: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com> * fix(finance-panels): address code review P1/P2 findings P1 - Security/Correctness: - EconomicCalendarPanel: add escapeHtml on all 7 Finnhub-sourced fields - EconomicCalendarPanel: fix panel contract (public fetchData():boolean, remove constructor self-init, add retry callbacks to all showError calls) - YieldCurvePanel: fix NaN in xPos() when count <= 1 (divide-by-zero) - seed-earnings-calendar: move Finnhub API key from URL to X-Finnhub-Token header - seed-economic-calendar: move Finnhub API key from URL to X-Finnhub-Token header - seed-earnings-calendar: add isMain guard around runSeed() call - health.js + bootstrap.js: register earningsCalendar, econCalendar, cotPositioning keys - health.js dataSize(): add earnings + instruments to property name list P2 - Quality: - FSIPanel: change !resp.fsiValue → resp.fsiValue <= 0 (rejects valid zero) - data-loader: fix Promise.allSettled type inference via indexed destructure - seed-fear-greed: allowlist cnnLabel against known values before writing to Redis - seed-economic-calendar: remove unused sleep import - seed-earnings-calendar + econ-calendar: increase TTL 43200 → 129600 (36h = 3x interval) - YieldCurvePanel: use SERIES_IDS const in RPC call (single source of truth) * fix(bootstrap): remove on-demand panel keys from bootstrap.js earningsCalendar, econCalendar, cotPositioning panels fetch via RPC on demand — they have no getHydratedData consumer in src/ and must not be in api/bootstrap.js. They remain in api/health.js BOOTSTRAP_KEYS for staleness monitoring. * fix(compound-engineering): fix markdown lint error in local settings * fix(finance-panels): resolve all P3 code-review findings - 030: MacroTilesPanel: add `deltaFormat?` field to MacroTile interface, define per-tile delta formatters (CPI pp, GDP localeString+B), replace fragile tile.id switch in tileHtml with fmt = deltaFormat ?? format - 031: FSIPanel: check getHydratedData('fearGreedIndex') at top of fetchData(); extract fsi/vix/hySpread from headerMetrics and render synchronously; fall back to live RPC only when bootstrap absent - 032: All 6 finance panels: extract lazy module-level client singletons (EconomicServiceClient or MarketServiceClient) so the client is constructed at most once per panel module lifetime, not on every fetchData - 033: get-fred-series-batch: add BAMLC0A0CM and SOFR to ALLOWED_SERIES (both seeded by seed-economy.mjs but previously unreachable via RPC) * fix(finance-panels): health.js SEED_META, FSI calibration, seed-cot catch handler - health.js: add SEED_META entries for earningsCalendar (1440min), econCalendar (1440min), cotPositioning (14400min) — without these, stopped seeds only alarm CRIT:EMPTY after TTL expiry instead of earlier WARN:STALE_SEED - seed-cot.mjs: replace bare await with .catch() handler consistent with other seeds - seed-fear-greed.mjs: recalibrate FSI thresholds to match formula output range (Low>=1.5, Moderate>=0.8, Elevated>=0.3; old values >=0.08/0.05/0.03 were calibrated for [0,0.15] but formula yields ~1-2 in normal conditions) - FSIPanel.ts: fix gauge fillPct range to [0, 2.5] matching recalibrated thresholds - todos: fix MD022/MD032 markdown lint errors in P3 review files --------- Co-authored-by: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com> |
||
|
|
01f6057389 |
feat(simulation): MiroFish Phase 2 — theater-limited simulation runner (#2220)
* feat(simulation): MiroFish Phase 2 — theater-limited simulation runner Adds the simulation execution layer that consumes simulation-package.json and produces simulation-outcome.json for maritime chokepoint + energy/logistics theaters, closing the WorldMonitor → MiroFish handoff loop. Changes: - scripts/seed-forecasts.mjs: 2-round LLM simulation runner (prompt builders, JSON extractor, runTheaterSimulation, writeSimulationOutcome, task queue with NX dedup lock, runSimulationWorker poll loop) - scripts/process-simulation-tasks.mjs: standalone worker entry point - proto: GetSimulationOutcome RPC + make generate - server/worldmonitor/forecast/v1/get-simulation-outcome.ts: RPC handler - server/gateway.ts: slow tier for get-simulation-outcome - api/health.js: simulationOutcomeLatest in STANDALONE + ON_DEMAND keys - tests: 14 new tests for simulation runner functions * fix(simulation): address P1/P2 code review findings from PR #2220 Security (P1 #018): - sanitizeForPrompt() applied to all entity/seed fields interpolated into Round 1 prompt (entityId, class, stance, seedId, type, timing) - sanitizeForPrompt() applied to actorId and entityIds in Round 2 prompt - sanitizeForPrompt() + length caps applied to all LLM array fields written to R2 (dominantReactions, stabilizers, invalidators, keyActors, timingMarkers) Validation (P1 #019): - Added validateRunId() regex guard - Applied in enqueueSimulationTask() and processNextSimulationTask() loop Type safety (P1 #020): - Added isOutcomePointer() and isPackagePointer() type guards in TS handlers - Replaced unsafe as-casts with runtime-validated guards in both handlers Correctness (P2 #022): - Log warning when pkgPointer.runId does not match task runId Architecture (P2 #024): - isMaritimeChokeEnergyCandidate() accepts both flat and nested topBucketId - Call site simplified to pass theater directly Performance (P2 #025): - SIMULATION_ROUND1_MAX_TOKENS raised 1800 to 2200 - Added max 3 initialReactions instruction to Round 1 prompt Maintainability (P2 #026): - Simulation pointer keys exported from server/_shared/cache-keys.ts - Both TS handlers import from shared location Documentation (P2 #027): - Strengthened runId no-op description in proto and OpenAPI spec * fix(todos): add blank lines around lists in markdown todo files * style(api): reformat openapi yaml to match linter output * test(simulation): add flat-shape filter test + getSimulationOutcome handler coverage Two tests identified as missing during PR #2220 review: 1. isMaritimeChokeEnergyCandidate flat-shape tests — covers the || candidate.topBucketId normalization added in the P1/P2 review pass. The existing tests only used the nested marketContext.topBucketId shape; this adds the flat root-field shape that arrives from the simulation-package.json JSON (selectedTheaters entries have topBucketId at root). 2. getSimulationOutcome handler structural tests — verifies the isOutcomePointer guard, found:false NOT_FOUND return, found:true success path, note population on runId mismatch, and redis_unavailable error string. Follows the readSrc static-analysis pattern used elsewhere in server-handlers.test.mjs (handler imports Redis so full integration test would require a test Redis instance). |
||
|
|
092efd4fe9 |
fix(panels): always fire background RPC refresh after bootstrap render (#2208)
* fix(panels): always fire background RPC refresh after bootstrap render Bootstrap hydration (getHydratedData) is one-shot — once rendered from it, panels never refresh and can show stale or partial data indefinitely. Affected panels: MacroSignals, ETFFlows, Stablecoins, FuelPrices, GulfEconomies, GroceryBasket, BigMac. Pattern: render from bootstrap immediately for fast first paint, then fire a background RPC call that silently updates the panel with live data. Errors during background refresh are suppressed when bootstrap data is already visible (no error flash over valid data). * fix(panels): guard background RPC refresh against empty response overwriting bootstrap Empty RPC responses (200 + empty array) no longer overwrite valid bootstrap data with error/unavailable state across all 7 affected panels: - ETFFlowsPanel, StablecoinPanel: wrap this.data assignment in `if (fresh.xxx?.length || !this.data)` guard - FuelPricesPanel, GulfEconomiesPanel, GroceryBasketPanel, BigMacPanel: add `!data.xxx?.length` check in background .then() before calling render - MacroSignalsPanel: return false early when error suppressed to skip redundant renderPanel() call * fix(hormuz): fix noUncheckedIndexedAccess TypeScript errors * fix(todos): add blank lines around headings (markdownlint MD022) * fix(hormuz): add missing hormuz-tracker service + fix implicit any in HormuzPanel * revert: remove HormuzPanel.ts from this branch (belongs in PR #2210) |
||
|
|
226cebf9bc |
feat(deep-forecast): Phase 2+3 scoring recalibration + autoresearch prompt self-improvement (#2178)
* fix(deep-forecast): lower acceptance threshold 0.60→0.50 to match real score distribution
computeDeepPathAcceptanceScore formula: pathScore×0.55 + quality×0.20 + coherence×0.15
With pathScore≈0.65, quality≈0.30, coherence≈0.55:
0.358 + 0.060 + 0.083 = 0.50
The 0.60 threshold was calibrated before understanding that reportableQualityScore
is constrained by world-state simulation geometry (not hypothesis quality), and
coherence loses 0.15 for generic candidates without routeFacilityKey. The threshold
was structurally unreachable with typical expanded paths.
Verified end-to-end: deep worker now returns [DeepForecast] completed.
Also updates T6 gateDetails assertion and renames the rejection-floor test to
correctly describe the new behavior (strong inputs should be accepted).
111/111 tests pass.
* feat(deep-forecast): autoresearch prompt self-improvement loop + T9/T10 tests
- Add scoreImpactExpansionQuality() locked scorer: commodity rate (35%),
variable diversity (35%), chain coverage (20%), mapped rate (10%)
- Add runImpactExpansionPromptRefinement(): rate-limited LLM critic loop
(30min cooldown) that reads learned section from Redis, scores current
run, generates critique if composite < 0.62, tests on same candidates,
commits to forecast:prompt:impact-expansion:learned if score improves
- buildImpactExpansionSystemPrompt() now accepts learnedSection param,
appends it after core rules with separator so model sees prior examples
- buildImpactExpansionCandidateHash() includes learnedFingerprint to
bust cache when learned section changes
- processDeepForecastTask reads learnedSection from Redis before LLM
call, runs refinement after both completed and no_material_change paths
- Export scoreImpactExpansionQuality + runImpactExpansionPromptRefinement
- T9: high commodity rate + chain coverage → composite > 0.70
- T10: no commodity + no chain coverage → composite < 0.40
- 113/113 tests pass
* fix(deep-forecast): raise autoresearch threshold 0.62→0.80 + fix JSON parse
- Threshold 0.62 was too low: commodity=1.00 + chain=1.00 compensated for
diversity=0.50 (all same chains), keeping composite at 0.775 → no critique
- Raise to 0.80 so diversity<0.70 triggers critique even with good commodity/chain
- Fix JSON parser to extract first {…} block (handles Gemini code-fence wrapping)
- Add per-hypothesis log in refinement breakdown for observability
- Add refinementQualityThreshold to gateDetails for self-documenting artifacts
- Verified: critique fires on diversity=0.50 run, committed Hormuz/Baltic/Suez
region-specific chain examples (score 0.592→0.650)
* feat(deep-forecast): per-candidate parallel LLM calls replace batch expansion
Previously: all candidates → one batch LLM call → LLM averages context →
identical route_disruption → inflation_pass_through chains for all candidates.
Now: each candidate → its own focused LLM call (parallel Promise.all) →
LLM reasons about specific stateKind/region/routeFacility for that candidate.
Results (3 candidates, 3 parallel calls):
- composite: 0.592 → 0.831 (+0.24)
- commodity: 0.17 → 1.00 (all mapped have specific commodity)
- diversity: 0.50 → 0.83 (energy_export_stress, importer_balance_stress
appearing alongside route_disruption — genuinely different chains)
- Baseline updated: 0.831 (above 0.80 threshold → no critique needed)
Also threads learnedSection through extractSingleImpactExpansionCandidate
so the learned examples from autoresearch apply to each focused call.
Per-candidate cache keys (already existed) now serve as primary cache.
* fix(tests): update recovery test for per-candidate LLM call flow
- Change stage mock from impact_expansion → impact_expansion_single
(batch primary path removed, per-candidate is now primary)
- Assert parseMode === per_candidate instead of parseStage /^recovered_/
(recovered_ prefix was only set by old batch_repair path)
- 2257/2257 tests pass
* fix(deep-forecast): add Red Sea, Persian Gulf, South China Sea to chokepoint map
Candidate packets had routeFacilityKey=none for Red Sea / Persian Gulf /
Baltic Sea signals because prediction titles say "Red Sea maritime disruption"
not "Bab el-Mandeb" or "Strait of Hormuz". CHOKEPOINT_MARKET_REGIONS only
had sub-facility names (Bab el-Mandeb, Suez Canal) as keys, not the sea
regions themselves.
Fix: add Red Sea, Persian Gulf, Arabian Sea, Black Sea, South China Sea,
Mediterranean Sea as direct keys so region-level candidate titles resolve.
Result: LLM user prompt now shows routeFacilityKey=Red Sea / Persian Gulf /
Baltic Sea per candidate — giving each focused call the geographic context
needed to generate route-specific chains.
- Autoresearch baseline updated 0.932→0.965 on this run
- T8 extended with Red Sea, Persian Gulf, South China Sea assertions
- 2257/2257 tests pass
* feat(deep-forecast): free-form hypothesis schema + remove registry constraint
- Bump IMPACT_EXPANSION_REGISTRY_VERSION to v4
- Add hypothesisKey, description, geography, affectedAssets, marketImpact, causalLink fields to normalizeImpactHypothesisDraft (keep legacy fields for backward compat)
- Rewrite buildImpactExpansionSystemPrompt: remove IMPACT_VARIABLE_REGISTRY constraint table, use free-form ImpactHypothesis schema with geographic/commodity specificity rules
- Rewrite evaluateImpactHypothesisRejection: use effective key (hypothesisKey || variableKey) for dedup; legacy registry check only for old cached responses without hypothesisKey
- Update validateImpactHypotheses scoring: add geographyScore, commodityScore, causalLinkScore, assetScore terms; channelCoherence/bucketCoherence only apply to legacy responses
- Update parent-must-be-mapped invariant to use hypothesisKey || variableKey as effective key
- Update mapImpactHypothesesToWorldSignals: use effective key for dedup and sourceKey; prefer description/geography over legacy fields
- Update buildImpactPathsForCandidate: match on hypothesisKey || variableKey for parent lookup
- Update buildImpactPathId: use hypothesisKey || variableKey for hash inputs
- Rewrite scoreImpactExpansionQuality: add geographyRate and assetRate metrics; update composite weights
- Update buildImpactPromptCritiqueSystemPrompt/UserPrompt: use hypothesisKey-based chain format in examples
- Add new fields to buildImpactExpansionBundleFromPaths push calls
- Update T7 test assertion: MUST be the exact hypothesisKey instead of variableKey string
* fix(deep-forecast): update breakdown log to show free-form hypothesis fields
* feat(deep-forecast): add commodityDiversity metric to autoresearch scorer
- commodityDiversity = unique commodities / nCandidates (weight 0.35)
Penalizes runs where all candidates default to same commodity.
3 candidates all crude_oil → diversity=0.33 → composite ~0.76 → critique fires.
- Rebalanced composite weights: comDiversity 0.35, geo 0.20, keyDiversity 0.15, chain 0.10, commodityRate 0.10, asset 0.05, mappedRate 0.05
- Breakdown log now shows comDiversity + geo + keyDiversity
- Critique prompt updated: commodity_monoculture failure mode, diagnosis targets commodity homogeneity
- T9: added commodityDiversity=1.0 assertion (2 unique commodities across 2 candidates)
* refactor(deep-forecast): replace commodityDiversity with directCommodityDiversity + directGeoDiversity + candidateSpreadScore
Problem: measuring diversity on all mapped hypotheses misses the case where
one candidate generates 10 implications while others generate 0, or where
all candidates converge on the same commodity due to dominating signals.
Fix: score at the DIRECT hypothesis level (root causes only) and add
a candidate-spread metric:
- directCommodityDiversity: unique commodities among direct hypotheses /
nCandidates. Measures breadth at the root-cause level. 3 candidates all
crude_oil → 0.33 → composite ~0.77 → critique fires.
- directGeoDiversity: unique primary geographies among direct hypotheses /
nCandidates. First segment of compound geography strings (e.g.
'Red Sea, Suez Canal' → 'red sea') to avoid double-counting.
- candidateSpreadScore: normalized inverse-HHI. 1.0 = perfectly even
distribution across candidates. One candidate with 10 implications and
others with 0 → scores near 0 → critique fires.
Weight rationale: comDiversity 0.35, geoDiversity 0.20, spread 0.15,
chain 0.15, comRate 0.08, assetRate 0.04, mappedRate 0.03.
Verified: Run 2 Baltic/Hormuz/Brazil → freight/crude_oil/USD spread=0.98 ✓
* feat(deep-forecast): add convergence object to R2 debug artifact
Surface autoresearch loop outcome per run: converged (bool), finalComposite,
critiqueIterations (0 or 1), refinementCommitted, and perCandidateMappedCount
(candidateStateId → count). After 5+ runs the artifact alone answers whether
the pipeline is improving.
Architectural changes:
- runImpactExpansionPromptRefinement now returns { iterationCount, committed }
at all exit paths instead of undefined
- Call hoisted before writeForecastTraceArtifacts so the result flows into the
debug payload via dataForWrite.refinementResult
- buildImpactExpansionDebugPayload assembles convergence from validation +
refinementResult; exported for direct testing
- Fix: stale diversityScore reference replaced with directCommodityDiversity
Tests: T-conv-1 (converged=true), T-conv-2 (converged=false + iterations=1),
T-conv-3 (perCandidateMappedCount grouping) — 116/116 pass
* fix(deep-forecast): address P1+P2 review issues from convergence observability PR
P1-A: sanitize LLM-returned proposed_addition before Redis write (prompt injection
guard via sanitizeProposedLlmAddition — strips directive-phrase lines)
P1-B: restore fire-and-forget for runImpactExpansionPromptRefinement; compute
critiqueIterations from quality score (predicted) instead of awaiting result,
eliminating 15-30s critical-path latency on poor-quality runs
P1-C: processDeepForecastTask now returns convergence object to callers; add
convergence_quality_met warn check to evaluateForecastRunArtifacts
P1-D: cap concurrent LLM calls in extractImpactExpansionBundle to 3 (manual
batching — no p-limit) to respect provider rate limits
P2-1: hash full learnedSection in buildImpactExpansionCandidateHash (was sliced
to 80 chars, causing cache collisions on long learned sections)
P2-2: add exitReason field to all runImpactExpansionPromptRefinement return paths
P2-3: sanitizeForPrompt strips directive injection phrases; new
sanitizeProposedLlmAddition applies line-level filtering before Redis write
P2-4: add comment explaining intentional bidirectional affectedAssets/assetsOrSectors
coalescing in normalizeImpactHypothesisDraft
P2-5: extract makeConvTestData helper in T-conv tests; remove refinementCommitted
assertions (field removed from convergence shape)
P2-6: convergence_quality_met check added to evaluateForecastRunArtifacts (warn)
🤖 Generated with Claude Sonnet 4.6 via Claude Code (https://claude.ai/claude-code) + Compound Engineering v2.49.0
Co-Authored-By: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com>
* fix(docs): add blank lines around lists in plan (MD032)
* fix(deep-forecast): address P1+P2 reviewer issues in convergence observability
P1-1: mapImpactHypothesesToWorldSignals used free-form marketImpact values
(price_spike, shortage, credit_stress, risk_off) verbatim as signal channel
types, producing unknown types that buildMarketTransmissionGraph cannot
consume. Add IMPACT_SIGNAL_CHANNELS set + resolveImpactChannel() to map
free-form strings to the nearest valid channel before signal materialization.
P1-2: sanitizeForPrompt had directive-phrase stripping added that was too
broad for a function called on headlines, evidence tables, case files, and
geopolitical summaries. Reverted to original safe sanitizer (newline/control
char removal only). Directive stripping remains in sanitizeProposedLlmAddition
where it is scoped to Redis-bound LLM-generated additions only.
P2: Renamed convergence.critiqueIterations to predictedCritiqueIterations to
make clear this is a prediction from the quality score, not a measured count
from actual refinement behavior (refinement is fire-and-forget after artifact
write). Updated T-conv-1/2 test assertions to match.
* feat(deep-forecast): inject live news headlines into evidence table
Wire inputs.newsInsights / inputs.newsDigest through the candidate
selection pipeline so buildImpactExpansionEvidenceTable receives up to
3 commodity-relevant live headlines as 'live_news' evidence entries.
Changes:
- IMPACT_COMMODITY_LEXICON: extend fertilizer pattern (fertiliser,
nitrogen, phosphate, npk); add food_grains and shipping_freight entries
- filterNewsHeadlinesByState: new pure helper that scores headlines by
alert status, LNG/energy/route/sanctions signal match, lexicon commodity
match, and source corroboration count (min score 2 to include)
- buildImpactExpansionEvidenceTable: add newsItems param, inject
live_news entries, raise cap 8→11
- buildImpactExpansionCandidate: add newsInsights/newsDigest params,
compute newsItems via filterNewsHeadlinesByState
- selectImpactExpansionCandidates: add newsInsights/newsDigest to options
- Call site: pass inputs.newsInsights/newsDigest at seed time
- Export filterNewsHeadlinesByState, buildImpactExpansionEvidenceTable
- 9 new tests (T-news-1 through T-lex-3): all pass, 125 total pass
🤖 Generated with Claude Sonnet 4.6 (200K context) via Claude Code (https://claude.ai/claude-code) + Compound Engineering v2.49.0
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor(deep-forecast): remove hardcoded LNG boost from filterNewsHeadlinesByState
The LNG+2 score was commodity-specific and inconsistent with the
intent: headline scoring should be generic, not biased toward any
named commodity. The function already handles the state's detected
commodity dynamically via lexEntry.pattern (IMPACT_COMMODITY_LEXICON).
LNG headlines still score via CRITICAL_NEWS_ENERGY_RE (+1) and
CRITICAL_NEWS_ROUTE_RE (+1) when relevant to the state's region.
All 125 tests pass.
* fix(deep-forecast): address all P1+P2 code review findings from PR #2178
P1 fixes (block-merge):
- Lower third_order mapped floor 0.74→0.70 (max achievable via 0.72 multiplier was 0.72)
- Guard runImpactExpansionPromptRefinement against empty validation (no_mapped exit)
- Replace block-list sanitizeProposedLlmAddition with pattern-based allowlist (HTML/JS/directive takeover)
- Fix TOCTOU on PROMPT_LAST_ATTEMPT_KEY: claim slot before quality check, not after LLM call
P2 fixes:
- Fix learned section overflow: use slice(-MAX) to preserve tail, not discard all prior content
- Add safe_haven_bid and global_crude_spread_stress branches to resolveImpactChannel
- quality_met path now sets rate-limit key (prevents 3 Redis GETs per good run)
- Hoist extractNewsClusterItems outside stateUnit map in selectImpactExpansionCandidates
- Export PROMPT_LEARNED_KEY, PROMPT_BASELINE_KEY, PROMPT_LAST_ATTEMPT_KEY + read/clear helpers
All 125 tests pass.
* fix(todos): add blank lines around lists/headings in todo files (markdownlint)
* fix(todos): fix markdownlint blanks-around-headings/lists in all todo files
---------
Co-authored-by: Claude Sonnet 4.6 (200K context) <noreply@anthropic.com>
|