eliott/worldmonitor - worldmonitor - lab48

eliott/worldmonitor

mirror of https://github.com/koala73/worldmonitor.git synced 2026-04-25 17:14:57 +02:00

Author	SHA1	Message	Date
Elie Habib	34dfc9a451	fix(news): ground LLM surfaces on real RSS description end-to-end (#3370 ) * feat(news/parser): extract RSS/Atom description for LLM grounding (U1) Add description field to ParsedItem, extract from the first non-empty of description/content:encoded (RSS) or summary/content (Atom), picking the longest after HTML-strip + entity-decode + whitespace-normalize. Clip to 400 chars. Reject empty, <40 chars after strip, or normalize-equal to the headline — downstream consumers fall back to the cleaned headline on '', preserving current behavior for feeds without a description. CDATA end is anchored to the closing tag so internal ]]> sequences do not truncate the match. Preserves cached rss:feed:v1 row compatibility during the 1h TTL bleed since the field is additive. Part of fix: pipe RSS description end-to-end so LLM surfaces stop hallucinating named actors (docs/plans/2026-04-24-001-...). Covers R1, R7. * feat(news/story-track): persist description on story:track:v1 HSET (U2) Append description to the story:track:v1 HSET only when non-empty. Additive — no key version bump. Old rows and rows from feeds without a description return undefined on HGETALL, letting downstream readers fall back to the cleaned headline (R6). Extract buildStoryTrackHsetFields as a pure helper so the inclusion gate is unit-testable without Redis. Update the contract comment in cache-keys.ts so the next reader of the schema sees description as an optional field. Covers R2, R6. * feat(proto): NewsItem.snippet + SummarizeArticleRequest.bodies (U3) Add two additive proto fields so the article description can ride to every LLM-adjacent consumer without a breaking change: - NewsItem.snippet (field 12): RSS/Atom description, HTML-stripped, ≤400 chars, empty when unavailable. Wired on toProtoItem. - SummarizeArticleRequest.bodies (field 8): optional article bodies paired 1:1 with headlines for prompt grounding. Empty array is today's headline-only behavior. Regenerated TS client/server stubs and OpenAPI YAML/JSON via sebuf v0.11.1 (PATH=~/go/bin required — Homebrew's protoc-gen-openapiv3 is an older pre-bundle-mode build that collides on duplicate emission). Pre-emptive bodies:[] placeholders at the two existing SummarizeArticle call sites in src/services/summarization.ts; U6 replaces them with real article bodies once SummarizeArticle handler reads the field. Covers R3, R5. * feat(brief/digest): forward RSS description end-to-end through brief envelope (U4) Digest accumulator reader (seed-digest-notifications.mjs::buildDigest) now plumbs the optional `description` field off each story:track:v1 HGETALL into the digest story object. The brief adapter (brief-compose.mjs:: digestStoryToUpstreamTopStory) prefers the real RSS description over the cleaned headline; when the upstream row has no description (old rows in the 48h bleed, feeds that don't carry one), we fall back to the cleaned headline so today behavior is preserved (R6). This is the upstream half of the description cache path. U5 lands the LLM- side grounding + cache-prefix bump so Gemini actually sees the article body instead of hallucinating a named actor from the headline. Covers R4 (upstream half), R6. * feat(brief/llm): RSS grounding + sanitisation + 4 cache prefix bumps (U5) The actual fix for the headline-only named-actor hallucination class: Gemini 2.5 Flash now receives the real article body as grounding context, so it paraphrases what the article says instead of filling role-label headlines from parametric priors ("Iran's new supreme leader" → "Ali Khamenei" was the 2026-04-24 reproduction; with grounding, it becomes the actual article-named actor). Changes: - buildStoryDescriptionPrompt interpolates a `Context: <body>` line between the metadata block and the "One editorial sentence" instruction when description is non-empty AND not normalise-equal to the headline. Clips to 400 chars as a second belt-and-braces after the U1 parser cap. No Context line → identical prompt to pre-fix (R6 preserved). - sanitizeStoryForPrompt extended to cover `description`. Closes the asymmetry where whyMatters was sanitised and description wasn't — untrusted RSS bodies now flow through the same injection-marker neutraliser before prompt interpolation. generateStoryDescription wraps the story in sanitizeStoryForPrompt before calling the builder, matching generateWhyMatters. - Four cache prefixes bumped atomically to evict pre-grounding rows: scripts/lib/brief-llm.mjs: brief:llm:description:v1 → v2 (Railway, description path) brief:llm:whymatters:v2 → v3 (Railway, whyMatters fallback) api/internal/brief-why-matters.ts: brief:llm:whymatters:v6 → v7 (edge, primary) brief:llm:whymatters:shadow:v4 → shadow:v5 (edge, shadow) hashBriefStory already includes description in the 6-field material (v5 contract) so identity naturally drifts; the prefix bump is the belt-and-braces that guarantees a clean cold-start on first tick. - Tests: 8 new + 2 prefix-match updates on tests/brief-llm.test.mjs. Covers Context-line injection, empty/dup-of-headline rejection, 400-char clip, sanitisation of adversarial descriptions, v2 write, and legacy-v1 row dark (forced cold-start). Covers R4 + new sanitisation requirement. * feat(news/summarize): accept bodies + bump summary cache v5→v6 (U6) SummarizeArticle now grounds on per-headline article bodies when callers supply them, so the dashboard "News summary" path stops hallucinating across unrelated headlines when the upstream RSS carried context. Three coordinated changes: 1. SummarizeArticleRequest handler reads req.bodies, sanitises each entry through sanitizeForPrompt (same trust treatment as geoContext — bodies are untrusted RSS text), clips to 400 chars, and pads to the headlines length so pair-wise identity is stable. 2. buildArticlePrompts accepts optional bodies and interleaves a ` Context: <body>` line under each numbered headline that has a non-empty body. Skipped in translate mode (headline[0]-only) and when all bodies are empty — yielding a byte-identical prompt to pre-U6 for every current caller (R6 preserved). 3. summary-cache-key bumps CACHE_VERSION v5→v6 so the pre-grounding rows (produced from headline-only prompts) cold-start cleanly. Extends canonicalizeSummaryInputs + buildSummaryCacheKey with a pair-wise bodies segment `:bd<hash>`; the prefix is `:bd` rather than `:b` to avoid colliding with `:brief:` when pattern-matching keys. Translate mode is headline[0]-only and intentionally does not shift on bodies. Dedup reorder preserved: the handler re-pairs bodies to the deduplicated top-5 via findIndex, so layout matches without breaking cache identity. New tests: 7 on buildArticlePrompts (bodies interleave, partial fill, translate-mode skip, clip, short-array tolerance), 8 on buildSummaryCacheKey (pair-wise sort, cache-bust on body drift, translate skip). Existing summary-cache-key assertions updated v5→v6. Covers R3, R4. * feat(consumers): surface RSS snippet across dashboard, email, relay, MCP + audit (U7) Thread the RSS description from the ingestion path (U1-U5) into every user-facing LLM-adjacent surface. Audit the notification producers so RSS-origin and domain-origin events stay on distinct contracts. Dashboard (proto snippet → client → panel): - src/types/index.ts NewsItem.snippet?:string (client-side field). - src/app/data-loader.ts proto→client mapper propagates p.snippet. - src/components/NewsPanel.ts renders snippet as a truncated (~200 chars, word-boundary ellipsis) `.item-snippet` line under each headline. - NewsPanel.currentBodies tracks per-headline bodies paired 1:1 with currentHeadlines; passed as options.bodies to generateSummary so the server-side SummarizeArticle LLM grounds on the article body. Summary plumbing: - src/services/summarization.ts threads bodies through SummarizeOptions → generateSummary → runApiChain → tryApiProvider; cache key now includes bodies (via U6's buildSummaryCacheKey signature). MCP world-brief: - api/mcp.ts pairs headlines with their RSS snippets and POSTs `bodies` to /api/news/v1/summarize-article so the MCP tool surface is no longer starved. Email digest: - scripts/seed-digest-notifications.mjs plain-text formatDigest appends a ~200-char truncated snippet line under each story; HTML formatDigestHtml renders a dim-grey description div between title and meta. Both gated on non-empty description (R6 — empty → today's behavior). Real-time alerts: - src/services/breaking-news-alerts.ts BreakingAlert gains optional description; checkBatchForBreakingAlerts reads item.snippet; dispatchAlert includes `description` in the /api/notify payload when present. Notification relay: - scripts/notification-relay.cjs formatMessage gated on NOTIFY_RELAY_INCLUDE_SNIPPET=1 (default off). When on, RSS-origin payloads render a `> <snippet>` context line under the title. When off or payload.description absent, output is byte-identical to pre-U7. Audit (RSS vs domain): - tests/notification-relay-payload-audit.test.mjs enforces file-level @notification-source tags on every producer, rejects `description:` in domain-origin payload blocks, and verifies the relay codepath gates snippet rendering under the flag. - Tag added to ais-relay.cjs (domain), seed-aviation.mjs (domain), alert-emitter.mjs (domain), breaking-news-alerts.ts (rss). Deferred (plan explicitly flags): InsightsPanel + cluster-producer plumbing (bodies default to [] — will unlock gradually once news:insights:v1 producer also carries primarySnippet). Covers R5, R6. * docs+test: grounding-path note + bump pinned CACHE_VERSION v5→v6 (U8) Final verification for the RSS-description-end-to-end fix: - docs/architecture.mdx — one-paragraph "News Grounding Pipeline" subsection tracing parser → story:track:v1.description → NewsItem.snippet → brief / SummarizeArticle / dashboard / email / relay / MCP, with the empty-description R6 fallback rule called out explicitly. - tests/summarize-reasoning.test.mjs — Fix-4 static-analysis pin updated to match the v6 bump from U6. Without this the summary cache bump silently regressed CI's pinned-version assertion. Final sweep (2026-04-24): - grep -rn 'brief:llm:description:v1' → only in the U5 legacy-row test simulation (by design: proves the v2 bump forces cold-start). - grep -rn 'brief:llm:whymatters:v2/v6/shadow:v4' → no live references. - grep -rn 'summary:v5' → no references. - CACHE_VERSION = 'v6' in src/utils/summary-cache-key.ts. - Full tsx --test sweep across all tests/.test.{mjs,mts}: 6747/6747 pass. - npm run typecheck + typecheck:api: both clean. Covers R4, R6, R7. fix(rss-description): address /ce:review findings before merge 14 fixes from structured code review across 13 reviewer personas. Correctness-critical (P1 — fixes that prevent R6/U7 contract violations): - NewsPanel signature covers currentBodies so view-mode toggles that leave headlines identical but bodies different now invalidate in-flight summaries. Without this, switching renderItems → renderClusters mid-summary let a grounded response arrive under a stale (now-orphaned) cache key. - summarize-article.ts re-pairs bodies with headlines BEFORE dedup via a single zip-sanitize-filter-dedup pass. Previously bodies[] was indexed by position in light-sanitized headlines while findIndex looked up the full-sanitized array — any headline that sanitizeHeadlines emptied mispaired every subsequent body, grounding the LLM on the wrong story. - Client skips the pre-chain cache lookup when bodies are present, since client builds keys from RAW bodies while server sanitizes first. The keys diverge on injection content, which would silently miss the server's authoritative cache every call. Test + audit hardening: - Legacy v1 eviction test now uses the real hashBriefStory(story()) suffix instead of a literal "somehash", so a bug where the reader still queried the v1 prefix at the real key would actually be caught. - tests/summary-cache-key.test.mts adds 400-char clip identity coverage so the canonicalizer's clip and any downstream clip can't silently drift. - tests/news-rss-description-extract.test.mts renames the well-formed CDATA test and adds a new test documenting the malformed-]]> fallback behavior (plain regex captures, article content survives). Safe_auto cleanups: - Deleted dead SNIPPET_PUSH_MAX constant in notification-relay.cjs. - BETA-mode groq warm call now passes bodies, warming the right cache slot. - seed-digest shares a local normalize-equality helper for description != headline comparison, matching the parser's contract. - Pair-wise sort in summary-cache-key tie-breaks on body so duplicate headlines produce stable order across runs. - buildSummaryCacheKey gained JSDoc documenting the client/server contract and the bodies parameter semantics. - MCP get_world_brief tool description now mentions RSS article-body grounding so calling agents see the current contract. - _shared.ts `opts.bodies![i]!` double-bang replaced with `?? ''`. - extractRawTagBody regexes cached in module-level Map, mirroring the existing TAG_REGEX_CACHE pattern. Deferred to follow-up (tracked for PR description / separate issue): - Promote shared MAX_BODY constant across the 5 clip sites - Promote shared truncateForDisplay helper across 4 render sites - Collapse NewsPanel.{currentHeadlines, currentBodies} → Array<{title, snippet}> - Promote sanitizeStoryForPrompt to shared/brief-llm-core.js - Split list-feed-digest.ts parser helpers into sibling -utils.ts - Strengthen audit test: forward-sweep + behavioral gate test Tests: 6749/6749 pass. Typecheck clean on both configs. * fix(summarization): thread bodies through browser T5 path (Codex #2) Addresses the second of two Codex-raised findings on PR #3370: The PR threaded bodies through the server-side API provider chain (Ollama → Groq → OpenRouter → /api/news/v1/summarize-article) but the local browser T5 path at tryBrowserT5 was still summarising from headlines alone. In BETA_MODE that ungrounded path runs BEFORE the grounded server providers; in normal mode it remains the last fallback. Whenever T5-small won, the dashboard summary surface regressed to the headline-only path — the exact hallucination class this PR exists to eliminate. Fix: tryBrowserT5 accepts an optional `bodies` parameter and interleaves each body with its paired headline via a `headline — body` separator in the combined text (clipped to 200 chars per body to stay within T5-small's ~512-token context window). All three call sites (BETA warm, BETA cold, normal-mode fallback) now pass the bodies threaded down from generateSummary options.bodies. When bodies is empty/omitted, the combined text is byte-identical to pre-fix (R6 preserved). On Codex finding #1 (story:track:v1 additive-only HSET keeps a body from an earlier mention of the same normalized title), declining to change. The current rule — "if this mention has a body, overwrite; otherwise leave the prior body alone" — is defensible: a body from mention A is not falsified by mention B being body-less (a wire reprint doesn't invalidate the original source's body). A feed that publishes a corrected headline creates a new normalized-title hash, so no stale body carries forward. The failure window is narrow (live story evolving while keeping the same title through hours of body-less wire reprints) and the 7-day STORY_TTL is the backstop. Opening a follow-up issue to revisit semantics if real-world evidence surfaces a stale-grounding case. * fix(story-track): description always-written to overwrite stale bodies (Codex #1) Revisiting Codex finding #1 on PR #3370 after re-review. The previous response declined the fix with reasoning; on reflection the argument was over-defending the current behavior. Problem: buildStoryTrackHsetFields previously wrote `description` only when non-empty. Because story:track:v1 rows are collapsed by normalized-title hash, an earlier mention's body would persist for up to STORY_TTL (7 days) on subsequent body-less mentions of the same story. Consumers reading `track.description` via HGETALL could not distinguish "this mention's body" from "some mention's body from the last week," silently grounding brief / whyMatters / SummarizeArticle LLMs on text the current mention never supplied. That violates the grounding contract advertised to every downstream surface in this PR. Fix: HSET `description` unconditionally on every mention — empty string when the current item has no body, real body when it does. An empty value overwrites any prior mention's body so the row is always authoritative for the current cycle. Consumers continue to treat empty description as "fall back to cleaned headline" (R6 preserved). The 7-day STORY_TTL and normalized-title hash semantics are unchanged. Trade-off accepted: a valid body from Feed A (NYT) is wiped when Feed B (AP body-less wire reprint) arrives for the same normalized title, even though Feed A's body is factually correct. Rationale: the alternative — keeping Feed A's body indefinitely — means the user sees Feed A's body attributed (by proximity) to an AP mention at a later timestamp, which is at minimum misleading and at worst carries retracted/corrected details. Honest absence beats unlabeled presence. Tests: new stale-body overwrite sequence test (T0 body → T1 empty → T2 new body), existing "writes description when non-empty" preserved, existing "omits when empty" inverted to "writes empty, overwriting." cache-keys.ts contract comment updated to mark description as always-written rather than optional.	2026-04-24 16:25:14 +04:00
Elie Habib	f783bf2d2d	fix(intelligence): analytical frameworks follow-up — P1 security + P2 correctness fixes (#2386 ) * fix(intelligence): include framework/systemAppend hash in cache keys (todos 041, 045, 051) * fix(intelligence): gate framework/systemAppend on server-side PRO check (todo 042) * fix(skills): exact hostname allowlist + redirect:manual to prevent SSRF (todos 043, 054) * fix(intelligence): sanitize systemAppend against prompt injection before LLM (todo 044) * fix(intelligence): use framework field in DeductionPanel, fix InsightsPanel double increment (todos 046, 047) * fix(intelligence): settings export, hot-path cache, country-brief debounce (todos 048, 049, 050) * fix(intelligence): i18n, FrameworkSelector note, stripThinkingTags dedup, UUID IDs (todos 052, 055, 056, 057) - i18n Analysis Frameworks settings section (en + fr locales, replace all hardcoded English strings with t() calls) - FrameworkSelector: replace panelId==='insights' hardcode with note? option; both InsightsPanel and DailyMarketBriefPanel pass note - stripThinkingTags: remove inline duplicate in summarize-article.ts, import from _shared/llm; add Strip unterminated comment so tests can locate the section - Replace Date.now() IDs for imported frameworks with crypto.randomUUID() - Drop 'not supported in phase 1' phrasing to 'not supported' - test: fix summarize-reasoning Fix 2 suite to read from llm.ts - test: add premium-check-stub and wire into redis-caching country intel brief importPatchedTsModule so test can resolve the new import * fix(security): address P1 review findings from PR #2386 - premium-check: require `required: true` from validateApiKey so trusted browser origins (worldmonitor.app, Vercel previews, localhost) are not treated as PRO callers; fixes free-user bypass of framework/systemAppend gate - llm: replace weak sanitizeSystemAppend with sanitizeForPrompt from llm-sanitize.js; all callLlm callers now get model-delimiter and control-char stripping, not just phrase blocklist - get-country-intel-brief: apply sanitizeForPrompt to contextSnapshot before injecting into user prompt; fixes unsanitized query-param injection Closes todos 060, 061, 062 (P1 — blocked merge of #2386). * chore(todos): mark P1 todos 060-062 complete * fix(agentskills): address Greptile P2 review comments - hoist ALLOWED_AGENTSKILLS_HOSTS Set to module scope (was reallocated per-request) - add res.type === 'opaqueredirect' check alongside the 3xx guard; Edge Runtime returns status=0 for opaque redirects so the status range check alone is dead code	2026-03-28 01:10:02 +04:00
Elie Habib	363cf5e71c	perf(api): convert POST RPCs to GET for CDN caching (#795 ) * perf(api): convert classify-event to GET and add summarize-article cache endpoint for CDN caching classify-event (7.9M calls/wk) was POST — bypassing all CDN caching. Converting to GET with static cache tier (1hr) enables Cloudflare edge caching. Degraded responses (no API key, empty title, errors) are marked no-cache to prevent caching error states. summarize-article has repeated headlines too large for URL params. Added a new GetSummarizeArticleCache GET endpoint that looks up Redis by a deterministic cache key. Client computes key via shared buildSummaryCacheKey(), tries GET first (CDN-cacheable), falls back to existing POST on miss. Shared module ensures client/server key parity. * fix(types): wire missing DeductSituation and ListGulfQuotes RPCs, fix tsc errors - Added DeductSituation RPC to intelligence/v1/service.proto (messages existed, RPC declaration was missing) - Added ListGulfQuotes proto + RPC to market/v1/service.proto (handler existed, proto was missing) - Fixed scrapedAt type mismatch in conflict/index.ts (int64 → string) - Added @ts-nocheck to generated files with codegen type bugs - Regenerated all sebuf client/server code * fix(types): fix int64→string type mismatch in list-iran-events.ts	2026-03-02 22:01:32 +04:00
Elie Habib	c2a06401c0	fix(ollama): strip thinking tokens, raise max_tokens, fix panel summary cache (#456 ) * fix(ollama): strip thinking tokens, raise max_tokens, fix panel summary cache (#450) - Add OLLAMA_MAX_TOKENS env var (clamped 50-2000, default 300) so thinking models have enough budget for actual summaries instead of truncated reasoning - Strip <\|begin_of_thought\|>/<\|end_of_thought\|> tags (terminated + unterminated) - Add mode-scoped min-length gate: reject <20 char outputs for brief/analysis - Extend TASK_NARRATION regex with first/step/my-task/to-summarize patterns - Fix client-side summary cache: store headline signature in value, validate on read, auto-dismiss stale summaries on headline change, discard in-flight results when headlines change during generation - Add tests for new patterns and negative cases (39/39 pass) * fix: hide summary container on stale in-flight discard, fix comment - Add hideSummary() call when headline signature changes mid-generation, preventing a stuck "Generating summary..." overlay - Fix stale comment: cache version is v5, not v4	2026-02-27 14:41:56 +04:00
Elie Habib	f24b04fbd7	fix: prevent entity conflation in pane summarization (#339 ) Small models (llama3.1:8b via Ollama) were merging entities from unrelated headlines into incoherent summaries (e.g. "Iran's president, Marco Rubio"). Root cause: 8-10 unrelated story headlines were sent to the model with a "summarize the top story" prompt, causing the model to conflate actors across separate stories. Changes: - Rewrite all summarization prompts (brief, analysis, fallback) to explicitly state headlines are SEPARATE stories and the model must pick ONE and never merge facts from different headlines - Reduce headline count from 8-10 to 5 across the pipeline (InsightsPanel, NewsPanel, server dedup, browser T5, cache key) - Remove biasing examples from system prompts ("Iran's regime...") - Remove "CRITICAL FOCAL POINTS" directive that encouraged the model to force-inject geo context entities into unrelated stories - Bump CACHE_VERSION to v5 to invalidate stale entries https://claude.ai/code/session_014t5oXq7c3b7oh5bZCtQFrp Co-authored-by: Claude <noreply@anthropic.com>	2026-02-24 22:09:33 +00:00
Elie Habib	10e50e080d	fix: strip Ollama reasoning tokens and plain-text thinking from summaries (#299 ) Models like DeepSeek-R1 and QwQ output chain-of-thought as plain text even with think:false. This caused summaries like "We need to summarize the top story..." instead of actual news content. - Remove message.reasoning fallback that used thinking tokens as summary - Extend tag stripping to <\|thinking\|>, <reasoning>, <reflection> formats - Add hasReasoningPreamble() to reject task narration and prompt echoes - Gate reasoning detection to brief/analysis modes (translate unaffected) - Bump CACHE_VERSION v3→v4 to invalidate polluted cached summaries - Add 28 unit tests covering all edge cases	2026-02-24 05:52:00 +00:00