worldmonitor

mirror of https://github.com/koala73/worldmonitor.git synced 2026-04-25 17:14:57 +02:00

Author	SHA1	Message	Date
Elie Habib	34dfc9a451	fix(news): ground LLM surfaces on real RSS description end-to-end (#3370 ) * feat(news/parser): extract RSS/Atom description for LLM grounding (U1) Add description field to ParsedItem, extract from the first non-empty of description/content:encoded (RSS) or summary/content (Atom), picking the longest after HTML-strip + entity-decode + whitespace-normalize. Clip to 400 chars. Reject empty, <40 chars after strip, or normalize-equal to the headline — downstream consumers fall back to the cleaned headline on '', preserving current behavior for feeds without a description. CDATA end is anchored to the closing tag so internal ]]> sequences do not truncate the match. Preserves cached rss:feed:v1 row compatibility during the 1h TTL bleed since the field is additive. Part of fix: pipe RSS description end-to-end so LLM surfaces stop hallucinating named actors (docs/plans/2026-04-24-001-...). Covers R1, R7. * feat(news/story-track): persist description on story:track:v1 HSET (U2) Append description to the story:track:v1 HSET only when non-empty. Additive — no key version bump. Old rows and rows from feeds without a description return undefined on HGETALL, letting downstream readers fall back to the cleaned headline (R6). Extract buildStoryTrackHsetFields as a pure helper so the inclusion gate is unit-testable without Redis. Update the contract comment in cache-keys.ts so the next reader of the schema sees description as an optional field. Covers R2, R6. * feat(proto): NewsItem.snippet + SummarizeArticleRequest.bodies (U3) Add two additive proto fields so the article description can ride to every LLM-adjacent consumer without a breaking change: - NewsItem.snippet (field 12): RSS/Atom description, HTML-stripped, ≤400 chars, empty when unavailable. Wired on toProtoItem. - SummarizeArticleRequest.bodies (field 8): optional article bodies paired 1:1 with headlines for prompt grounding. Empty array is today's headline-only behavior. Regenerated TS client/server stubs and OpenAPI YAML/JSON via sebuf v0.11.1 (PATH=~/go/bin required — Homebrew's protoc-gen-openapiv3 is an older pre-bundle-mode build that collides on duplicate emission). Pre-emptive bodies:[] placeholders at the two existing SummarizeArticle call sites in src/services/summarization.ts; U6 replaces them with real article bodies once SummarizeArticle handler reads the field. Covers R3, R5. * feat(brief/digest): forward RSS description end-to-end through brief envelope (U4) Digest accumulator reader (seed-digest-notifications.mjs::buildDigest) now plumbs the optional `description` field off each story:track:v1 HGETALL into the digest story object. The brief adapter (brief-compose.mjs:: digestStoryToUpstreamTopStory) prefers the real RSS description over the cleaned headline; when the upstream row has no description (old rows in the 48h bleed, feeds that don't carry one), we fall back to the cleaned headline so today behavior is preserved (R6). This is the upstream half of the description cache path. U5 lands the LLM- side grounding + cache-prefix bump so Gemini actually sees the article body instead of hallucinating a named actor from the headline. Covers R4 (upstream half), R6. * feat(brief/llm): RSS grounding + sanitisation + 4 cache prefix bumps (U5) The actual fix for the headline-only named-actor hallucination class: Gemini 2.5 Flash now receives the real article body as grounding context, so it paraphrases what the article says instead of filling role-label headlines from parametric priors ("Iran's new supreme leader" → "Ali Khamenei" was the 2026-04-24 reproduction; with grounding, it becomes the actual article-named actor). Changes: - buildStoryDescriptionPrompt interpolates a `Context: <body>` line between the metadata block and the "One editorial sentence" instruction when description is non-empty AND not normalise-equal to the headline. Clips to 400 chars as a second belt-and-braces after the U1 parser cap. No Context line → identical prompt to pre-fix (R6 preserved). - sanitizeStoryForPrompt extended to cover `description`. Closes the asymmetry where whyMatters was sanitised and description wasn't — untrusted RSS bodies now flow through the same injection-marker neutraliser before prompt interpolation. generateStoryDescription wraps the story in sanitizeStoryForPrompt before calling the builder, matching generateWhyMatters. - Four cache prefixes bumped atomically to evict pre-grounding rows: scripts/lib/brief-llm.mjs: brief:llm:description:v1 → v2 (Railway, description path) brief:llm:whymatters:v2 → v3 (Railway, whyMatters fallback) api/internal/brief-why-matters.ts: brief:llm:whymatters:v6 → v7 (edge, primary) brief:llm:whymatters:shadow:v4 → shadow:v5 (edge, shadow) hashBriefStory already includes description in the 6-field material (v5 contract) so identity naturally drifts; the prefix bump is the belt-and-braces that guarantees a clean cold-start on first tick. - Tests: 8 new + 2 prefix-match updates on tests/brief-llm.test.mjs. Covers Context-line injection, empty/dup-of-headline rejection, 400-char clip, sanitisation of adversarial descriptions, v2 write, and legacy-v1 row dark (forced cold-start). Covers R4 + new sanitisation requirement. * feat(news/summarize): accept bodies + bump summary cache v5→v6 (U6) SummarizeArticle now grounds on per-headline article bodies when callers supply them, so the dashboard "News summary" path stops hallucinating across unrelated headlines when the upstream RSS carried context. Three coordinated changes: 1. SummarizeArticleRequest handler reads req.bodies, sanitises each entry through sanitizeForPrompt (same trust treatment as geoContext — bodies are untrusted RSS text), clips to 400 chars, and pads to the headlines length so pair-wise identity is stable. 2. buildArticlePrompts accepts optional bodies and interleaves a ` Context: <body>` line under each numbered headline that has a non-empty body. Skipped in translate mode (headline[0]-only) and when all bodies are empty — yielding a byte-identical prompt to pre-U6 for every current caller (R6 preserved). 3. summary-cache-key bumps CACHE_VERSION v5→v6 so the pre-grounding rows (produced from headline-only prompts) cold-start cleanly. Extends canonicalizeSummaryInputs + buildSummaryCacheKey with a pair-wise bodies segment `:bd<hash>`; the prefix is `:bd` rather than `:b` to avoid colliding with `:brief:` when pattern-matching keys. Translate mode is headline[0]-only and intentionally does not shift on bodies. Dedup reorder preserved: the handler re-pairs bodies to the deduplicated top-5 via findIndex, so layout matches without breaking cache identity. New tests: 7 on buildArticlePrompts (bodies interleave, partial fill, translate-mode skip, clip, short-array tolerance), 8 on buildSummaryCacheKey (pair-wise sort, cache-bust on body drift, translate skip). Existing summary-cache-key assertions updated v5→v6. Covers R3, R4. * feat(consumers): surface RSS snippet across dashboard, email, relay, MCP + audit (U7) Thread the RSS description from the ingestion path (U1-U5) into every user-facing LLM-adjacent surface. Audit the notification producers so RSS-origin and domain-origin events stay on distinct contracts. Dashboard (proto snippet → client → panel): - src/types/index.ts NewsItem.snippet?:string (client-side field). - src/app/data-loader.ts proto→client mapper propagates p.snippet. - src/components/NewsPanel.ts renders snippet as a truncated (~200 chars, word-boundary ellipsis) `.item-snippet` line under each headline. - NewsPanel.currentBodies tracks per-headline bodies paired 1:1 with currentHeadlines; passed as options.bodies to generateSummary so the server-side SummarizeArticle LLM grounds on the article body. Summary plumbing: - src/services/summarization.ts threads bodies through SummarizeOptions → generateSummary → runApiChain → tryApiProvider; cache key now includes bodies (via U6's buildSummaryCacheKey signature). MCP world-brief: - api/mcp.ts pairs headlines with their RSS snippets and POSTs `bodies` to /api/news/v1/summarize-article so the MCP tool surface is no longer starved. Email digest: - scripts/seed-digest-notifications.mjs plain-text formatDigest appends a ~200-char truncated snippet line under each story; HTML formatDigestHtml renders a dim-grey description div between title and meta. Both gated on non-empty description (R6 — empty → today's behavior). Real-time alerts: - src/services/breaking-news-alerts.ts BreakingAlert gains optional description; checkBatchForBreakingAlerts reads item.snippet; dispatchAlert includes `description` in the /api/notify payload when present. Notification relay: - scripts/notification-relay.cjs formatMessage gated on NOTIFY_RELAY_INCLUDE_SNIPPET=1 (default off). When on, RSS-origin payloads render a `> <snippet>` context line under the title. When off or payload.description absent, output is byte-identical to pre-U7. Audit (RSS vs domain): - tests/notification-relay-payload-audit.test.mjs enforces file-level @notification-source tags on every producer, rejects `description:` in domain-origin payload blocks, and verifies the relay codepath gates snippet rendering under the flag. - Tag added to ais-relay.cjs (domain), seed-aviation.mjs (domain), alert-emitter.mjs (domain), breaking-news-alerts.ts (rss). Deferred (plan explicitly flags): InsightsPanel + cluster-producer plumbing (bodies default to [] — will unlock gradually once news:insights:v1 producer also carries primarySnippet). Covers R5, R6. * docs+test: grounding-path note + bump pinned CACHE_VERSION v5→v6 (U8) Final verification for the RSS-description-end-to-end fix: - docs/architecture.mdx — one-paragraph "News Grounding Pipeline" subsection tracing parser → story:track:v1.description → NewsItem.snippet → brief / SummarizeArticle / dashboard / email / relay / MCP, with the empty-description R6 fallback rule called out explicitly. - tests/summarize-reasoning.test.mjs — Fix-4 static-analysis pin updated to match the v6 bump from U6. Without this the summary cache bump silently regressed CI's pinned-version assertion. Final sweep (2026-04-24): - grep -rn 'brief:llm:description:v1' → only in the U5 legacy-row test simulation (by design: proves the v2 bump forces cold-start). - grep -rn 'brief:llm:whymatters:v2/v6/shadow:v4' → no live references. - grep -rn 'summary:v5' → no references. - CACHE_VERSION = 'v6' in src/utils/summary-cache-key.ts. - Full tsx --test sweep across all tests/.test.{mjs,mts}: 6747/6747 pass. - npm run typecheck + typecheck:api: both clean. Covers R4, R6, R7. fix(rss-description): address /ce:review findings before merge 14 fixes from structured code review across 13 reviewer personas. Correctness-critical (P1 — fixes that prevent R6/U7 contract violations): - NewsPanel signature covers currentBodies so view-mode toggles that leave headlines identical but bodies different now invalidate in-flight summaries. Without this, switching renderItems → renderClusters mid-summary let a grounded response arrive under a stale (now-orphaned) cache key. - summarize-article.ts re-pairs bodies with headlines BEFORE dedup via a single zip-sanitize-filter-dedup pass. Previously bodies[] was indexed by position in light-sanitized headlines while findIndex looked up the full-sanitized array — any headline that sanitizeHeadlines emptied mispaired every subsequent body, grounding the LLM on the wrong story. - Client skips the pre-chain cache lookup when bodies are present, since client builds keys from RAW bodies while server sanitizes first. The keys diverge on injection content, which would silently miss the server's authoritative cache every call. Test + audit hardening: - Legacy v1 eviction test now uses the real hashBriefStory(story()) suffix instead of a literal "somehash", so a bug where the reader still queried the v1 prefix at the real key would actually be caught. - tests/summary-cache-key.test.mts adds 400-char clip identity coverage so the canonicalizer's clip and any downstream clip can't silently drift. - tests/news-rss-description-extract.test.mts renames the well-formed CDATA test and adds a new test documenting the malformed-]]> fallback behavior (plain regex captures, article content survives). Safe_auto cleanups: - Deleted dead SNIPPET_PUSH_MAX constant in notification-relay.cjs. - BETA-mode groq warm call now passes bodies, warming the right cache slot. - seed-digest shares a local normalize-equality helper for description != headline comparison, matching the parser's contract. - Pair-wise sort in summary-cache-key tie-breaks on body so duplicate headlines produce stable order across runs. - buildSummaryCacheKey gained JSDoc documenting the client/server contract and the bodies parameter semantics. - MCP get_world_brief tool description now mentions RSS article-body grounding so calling agents see the current contract. - _shared.ts `opts.bodies![i]!` double-bang replaced with `?? ''`. - extractRawTagBody regexes cached in module-level Map, mirroring the existing TAG_REGEX_CACHE pattern. Deferred to follow-up (tracked for PR description / separate issue): - Promote shared MAX_BODY constant across the 5 clip sites - Promote shared truncateForDisplay helper across 4 render sites - Collapse NewsPanel.{currentHeadlines, currentBodies} → Array<{title, snippet}> - Promote sanitizeStoryForPrompt to shared/brief-llm-core.js - Split list-feed-digest.ts parser helpers into sibling -utils.ts - Strengthen audit test: forward-sweep + behavioral gate test Tests: 6749/6749 pass. Typecheck clean on both configs. * fix(summarization): thread bodies through browser T5 path (Codex #2) Addresses the second of two Codex-raised findings on PR #3370: The PR threaded bodies through the server-side API provider chain (Ollama → Groq → OpenRouter → /api/news/v1/summarize-article) but the local browser T5 path at tryBrowserT5 was still summarising from headlines alone. In BETA_MODE that ungrounded path runs BEFORE the grounded server providers; in normal mode it remains the last fallback. Whenever T5-small won, the dashboard summary surface regressed to the headline-only path — the exact hallucination class this PR exists to eliminate. Fix: tryBrowserT5 accepts an optional `bodies` parameter and interleaves each body with its paired headline via a `headline — body` separator in the combined text (clipped to 200 chars per body to stay within T5-small's ~512-token context window). All three call sites (BETA warm, BETA cold, normal-mode fallback) now pass the bodies threaded down from generateSummary options.bodies. When bodies is empty/omitted, the combined text is byte-identical to pre-fix (R6 preserved). On Codex finding #1 (story:track:v1 additive-only HSET keeps a body from an earlier mention of the same normalized title), declining to change. The current rule — "if this mention has a body, overwrite; otherwise leave the prior body alone" — is defensible: a body from mention A is not falsified by mention B being body-less (a wire reprint doesn't invalidate the original source's body). A feed that publishes a corrected headline creates a new normalized-title hash, so no stale body carries forward. The failure window is narrow (live story evolving while keeping the same title through hours of body-less wire reprints) and the 7-day STORY_TTL is the backstop. Opening a follow-up issue to revisit semantics if real-world evidence surfaces a stale-grounding case. * fix(story-track): description always-written to overwrite stale bodies (Codex #1) Revisiting Codex finding #1 on PR #3370 after re-review. The previous response declined the fix with reasoning; on reflection the argument was over-defending the current behavior. Problem: buildStoryTrackHsetFields previously wrote `description` only when non-empty. Because story:track:v1 rows are collapsed by normalized-title hash, an earlier mention's body would persist for up to STORY_TTL (7 days) on subsequent body-less mentions of the same story. Consumers reading `track.description` via HGETALL could not distinguish "this mention's body" from "some mention's body from the last week," silently grounding brief / whyMatters / SummarizeArticle LLMs on text the current mention never supplied. That violates the grounding contract advertised to every downstream surface in this PR. Fix: HSET `description` unconditionally on every mention — empty string when the current item has no body, real body when it does. An empty value overwrites any prior mention's body so the row is always authoritative for the current cycle. Consumers continue to treat empty description as "fall back to cleaned headline" (R6 preserved). The 7-day STORY_TTL and normalized-title hash semantics are unchanged. Trade-off accepted: a valid body from Feed A (NYT) is wiped when Feed B (AP body-less wire reprint) arrives for the same normalized title, even though Feed A's body is factually correct. Rationale: the alternative — keeping Feed A's body indefinitely — means the user sees Feed A's body attributed (by proximity) to an AP mention at a later timestamp, which is at minimum misleading and at worst carries retracted/corrected details. Honest absence beats unlabeled presence. Tests: new stale-body overwrite sequence test (T0 body → T1 empty → T2 new body), existing "writes description when non-empty" preserved, existing "omits when empty" inverted to "writes empty, overwriting." cache-keys.ts contract comment updated to mark description as always-written rather than optional.	2026-04-24 16:25:14 +04:00
Elie Habib	8ea4c8f163	feat(digest-dedup): replayable per-story input log (opt-in, no behaviour change) (#3330 ) * feat(digest-dedup): replayable per-story input log (opt-in, no behaviour change) Ship the measurement layer before picking any recall-lift strategy. Why: the current dedup path embeds titles only, so brief-wire headlines that share a real event but drop the geographic anchor (e.g. "Alleged Coup: defendant arrives in court" vs "Trial opens after Nigeria charges six over 2025 coup plot") can slip past the 0.60 cosine threshold. To tune recall without regressing precision we need a replayable per-tick dataset — one record per story with the exact fields any downstream candidate (title+slug, LLM-canonicalise, text-embedding-3-large, cross- encoder re-rank, etc.) would need to score. This PR ships ONLY the log. Zero behaviour change: - Opt-in via DIGEST_DEDUP_REPLAY_LOG=1 (default OFF). - Writer is best-effort: all errors swallowed + warned, never affects digest delivery. No throw path. - Records include hash, originalIndex, isRep, clusterId, raw + normalised title, link, severity/score/mentions/phase/sources, embeddingCacheKey, hasEmbedding sidecar flag, and the tick's config snapshot (mode, clustering, cosineThreshold, topicThreshold, veto). - clusterId derives from rep.mergedHashes (already set by materializeCluster) so the orchestrator is untouched. - Storage: Upstash list keyed by {variant}:{lang}:{sensitivity}:{date} with 30-day EXPIRE. Date suffix caps per-key growth; retention covers the labelling cadence + cross-candidate comparison window. - Env flag is '1'-only (fail-closed on typos, same pattern as DIGEST_DEDUP_MODE). Activation path (post-merge): flip DIGEST_DEDUP_REPLAY_LOG=1 on the seed-digest-notifications Railway service. Watch one cron tick for the RPUSH + EXPIRE pair (or a single warn line if creds/upstream flake), then leave running for at least one week to accumulate calibration data. Tests: 21 unit tests covering flag parsing, key shape + sanitisation, record field correctness (isRep, clusterId, embeddingCacheKey, hasEmbedding, tickConfig), pipeline null/throw handling, malformed input. Existing 77 dedup tests unchanged and still green. * fix(digest-dedup): capture topicGroupingEnabled in replay tickConfig Review catch (PR #3330): the tickConfig snapshot omitted topicGroupingEnabled even though readOrchestratorConfig returns it and the digest's post-dedup topic ordering gates on it. A tick run with DIGEST_DEDUP_TOPIC_GROUPING=0 serialised identically to a default tick, making those runs non-replayable for the calibration work this log is meant to enable. Add topicGroupingEnabled to the recorded tickConfig. One-line schema fix + regression test asserting topic-grouping-off ticks serialise distinctly from default. 22/22 tests pass. * fix(digest-dedup): await replay-log write to survive explicit process.exit Review catch (PR #3330): the fire-and-forget `void writeReplayLog(...)` call could be dropped on the explicit-exit paths — the brief-compose failure gate at line 1539 and main().catch at line 1545 both call process.exit(1). Unlike natural exit, process.exit does not drain in-flight promises, so the last N ticks' replay records could be silently lost on runs where measurement fidelity matters most. Fix: await the writeReplayLog call. Safe because: - writeReplayLog returns synchronously when the flag is off (replayLogEnabled check is the first thing it does) - It has a top-level try/catch that always returns a result object - The Upstash pipeline call has a 10s timeout ceiling - buildDigest already awaits many Upstash calls (dedup, compose, render) so one more is not a hot-path concern Comment block added above the call explains why the await is deliberate — so a future refactor doesn't revert it to void thinking it's a leftover. No test change: existing writeReplayLog unit tests already cover the disabled / empty / success / error paths. The fix is a single-keyword change in a caller that was already guaranteed-safe by the callee's contract. * refactor(digest-dedup): address Greptile P2 review comments on replay log Three non-blocking polish items from the automated review, bundled because they all touch the same new module and none change behaviour. 1. tsMs captured BEFORE deduplicateStories (seed-digest-notifications.mjs). Previously sampled after dedup returned, so briefTickId reflected dedup-completion time rather than tick-start. For downstream readers the natural reading of "briefTickId" is when the tick began processing; moved the Date.now() call to match that expectation. Drift is maybe 100ms-2s on cold-cache embed calls — small, but moving it is free. 2. buildReplayLogKey emptiness check now strips ':' and '-' in addition to '_'. A pathological ruleId of ':::' previously passed through verbatim, producing keys like `digest:replay-log:v1::::2026-04-23` that confuse redis-cli's namespace tooling (SCAN / KEYS / tab completion). The new guard falls back to "unknown" on any input that's all separators. Added a regression test covering the ':::' / '---' / '___' / mixed cases. 3. tickConfig is now a per-record shallow copy instead of a shared reference. Storage is unaffected (writeReplayLog serialises each record via JSON.stringify independently) but an in-memory consumer that mutated one record's tickConfig for experimentation would have silently affected all other records in the same batch. Added a regression test asserting mutation doesn't leak across records. Tests: 24/24 pass (22 prior + 2 new regression). Typecheck + lint clean.	2026-04-23 11:50:19 +04:00
Elie Habib	1958b34f55	fix(digest-dedup): CLUSTERING typo fallback fails closed to complete-link (#3331 ) DIGEST_DEDUP_CLUSTERING previously fell to 'single' on unrecognised values, which silently defeated the documented kill switch. A typo like `DIGEST_DEDUP_CLUSTERING=complet` during an over-merge incident would stick with the aggressive single-link merger instead of rolling back to the conservative complete-link algorithm. Mirror the DIGEST_DEDUP_MODE typo pattern (PR #3247): - Unrecognised value → fall to 'complete' (SAFE / conservative). - Surface the raw value via new `invalidClusteringRaw` config field. - Emit a warn line on the dedup orchestrator's entry path so operators see the typo alongside the kill-switch-took-effect message. Valid values 'single' (default), 'complete', unset, empty, and any case variation all behave unchanged. Only true typos change behaviour — and the new behaviour is the kill-switch-safe one. Tests: updated the existing case that codified the old behaviour plus added coverage for (a) multiple typo variants falling to complete with invalidClusteringRaw set, (b) case-insensitive valid values not triggering the typo path, and (c) the orchestrator emitting the warn line even on the jaccard-kill-switch codepath (since CLUSTERING intent applies to both modes). 81/81 dedup tests pass.	2026-04-23 11:25:05 +04:00
Elie Habib	29306008e4	fix(email): route Intelligence Brief off the alerts@ mailbox (#3321 ) * fix(email): route Intelligence Brief off the alerts@ mailbox The daily "WorldMonitor Intelligence Brief" email was shipping from `alerts@worldmonitor.app` with a display name that — if the Railway env override dropped the `Name <…>` wrapper — Gmail/Outlook fell back to rendering the local-part ("alerts" / "alert") as the sender name. Recipients saw a scary-looking "alert" in their inbox for what is actually a curated editorial read. Split the sender so editorial mail can't share the `alerts@` mailbox with incident pushes: - New env var `RESEND_FROM_BRIEF` (default `WorldMonitor Brief <brief@worldmonitor.app>`) consumed by seed-digest-notifications.mjs. - Falls back to `RESEND_FROM_EMAIL`, then to the built-in default, so existing deploys keep working and the rollout is a single Railway env flip on the digest service. - notification-relay.cjs (realtime push alerts) intentionally keeps `RESEND_FROM_EMAIL` / `alerts@` — accurate for that path. - .env.example documents the display-name rule so the bare-address trap can't re-introduce the bug. Rollout: set `RESEND_FROM_BRIEF=WorldMonitor Brief <brief@worldmonitor.app>` on the `seed-digest-notifications` Railway service. Domain-level Resend verification already covers the new local-part; no DNS change needed. * fix(email): runtime normalize sender to prevent bare-address regression PR review feedback from codex: > P2 — RESEND_FROM_BRIEF is consumed verbatim, so an operator can > still set brief@worldmonitor.app without a display name and > recreate the same Gmail/Outlook rendering bug for the daily brief. > Today that protection is only documentation in .env.example, not > runtime enforcement. Add a small shared helper `scripts/lib/resend-from.cjs` that coerces a bare email address into a "Name <addr>" wrapper with a loud warning log, and wire it into the digest path. - Bare-address input (e.g. `brief@worldmonitor.app`) is rewritten to `WorldMonitor Brief <brief@worldmonitor.app>` so Gmail/Outlook stop falling back to the local-part as the display name. - Coercion emits a single `console.warn` line per boot so operators see the signal in Railway logs and can fix the underlying env. - Fail-safe (not fail-closed) — a misconfigured env does NOT take the cron down. Also resolves the P3 doc-vs-runtime divergence by reverting .env.example's RESEND_FROM_EMAIL default from "WorldMonitor Alerts <...>" back to "WorldMonitor <...>" to match the existing notification-relay.cjs runtime default. The realtime-alert path will get the same normalizer treatment in a follow-up PR that cohesively touches notification-relay.cjs + Dockerfile.relay. tests: 7 new cases in tests/resend-sender-normalize.test.mjs covering empty/null/whitespace input, wrapped passthrough, trim, bare-address coercion, warning emission, no-warning on wrapped, console.warn default sink. Runs under `npm run test:data`.	2026-04-23 08:51:27 +04:00
Elie Habib	e878baec52	fix(digest): DIGEST_ONLY_USER self-expiring (mandatory until= suffix, 48h cap) (#3271 ) * fix(digest): DIGEST_ONLY_USER self-expiring (mandatory until= suffix, 48h cap) Review finding on PR #3255: DIGEST_ONLY_USER was a sticky production footgun. If an operator forgot to unset after a one-off validation, the cron silently filtered every other user out indefinitely while still completing normally (exit 0) — prolonged partial outage with "green" runs. Fix: mandatory `\|until=<ISO8601>` suffix within 48h of now. Otherwise the flag is IGNORED with a loud warn, fan-out proceeds normally. Active filter emits a structured console.warn at run start listing expiry + remaining minutes. Valid: DIGEST_ONLY_USER=user_xxx\|until=2026-04-22T18:00Z Rejected (→ loud warn, normal fan-out): - Legacy bare `user_xxx` (missing required suffix) - Unparseable ISO - Expiry > 48h (forever-test mistake) - Expiry in past (auto-disable) Parser extracted to `scripts/lib/digest-only-user.mjs` (testable without importing seed-digest-notifications.mjs which has no isMain guard). Tests: 17 cases covering unset / reject / active branches, ISO variants, boundaries, and the 48h cap. 6066 total pass. typecheck × 2 clean. Breaking change on the flag's format, but it shipped 2h before this finding with no prod usage — tightening now is cheaper than after an incident. * chore(digest): address /ce:review P2s on DIGEST_ONLY_USER parser Two style fixes flagged by Greptile on PR #3271: 1. Misleading multi-pipe error message. `user_xxx\|until=<iso>\|extra` returned "missing mandatory suffix", which points the operator toward adding a suffix that is already present (confused operator might try `user_xxx\|until=...\|until=...`). Now distinguishes parts.length===1 ("missing suffix") from >2 ("expected exactly one '\|' separator, got N"). 2. Date.parse is lenient — accepts RFC 2822, locale strings, "April 22". The documented contract is strict ISO 8601; the 48h cap catches accidental-valid dates but the documentation lied. Added a regex guard up-front that enforces the ISO 8601 shape (YYYY-MM-DD optionally followed by time + TZ). Rejects the 6 Date-parseable-but-not-ISO fixtures before Date.parse runs. Both regressions pinned in tests/digest-only-user.test.mjs (18 pass, was 17). typecheck × 2 clean.	2026-04-21 22:36:30 +04:00
Elie Habib	ec35cf4158	feat(brief): analyst prompt v2 — multi-sentence, grounded, story description (#3269 ) * feat(brief): analyst prompt v2 — multi-sentence, grounded, includes story description Shadow-diff of 12 prod stories on 2026-04-21 showed v1 analyst output indistinguishable from legacy Gemini: identical single-sentence abstraction ("destabilize / systemic / sovereign risk repricing") with no named actors, metrics, or dates — in several cases Gemini was MORE specific. Root cause: 18–30 word cap compressed context specifics out. v2 loosens three dials at once so we can settle the A/B: 1. New system prompt WHY_MATTERS_ANALYST_SYSTEM_V2 — 2–3 sentences, 40–70 words, implicit SITUATION→ANALYSIS→(optional) WATCH arc, MUST cite one specific named actor / metric / date / place from the context. Analyst path only; gemini path stays on v1. 2. New parser parseWhyMattersV2 — accepts 100–500 chars, rejects preamble boilerplate + leaked section labels + markdown. 3. Story description plumbed through — endpoint body accepts optional story.description (≤ 1000 chars, body cap bumped 4 KB → 8 KB). Cron forwards it when upstream has one (skipped when it equals the headline — no new signal). Cache + shadow bumped v3 → v4 / v1 → v2 so fresh output lands on the first post-deploy cron tick. maxTokens 180 → 260 for ~3× output length. If shadow-diff 24h after deploy still shows no delta vs gemini, kill is BRIEF_WHY_MATTERS_PRIMARY=gemini on Vercel (instant, no redeploy). Tests: 6059 pass (was 6022 + 37 new). typecheck × 2 clean. * fix(brief): stop truncating v2 multi-sentence output + description in cache hash Two P1s caught in PR #3269 review. P1a — cron reparsed endpoint output with v1 single-sentence parser, silently dropping sentences 2+3 of v2 analyst output. The endpoint had ALREADY validated the string (parseWhyMattersV2 for analyst path; parseWhyMatters for gemini). Re-parsing with v1 took only the first sentence — exact regression #3269 was meant to fix. Fix: trust the endpoint. Replace re-parse with bounds check (30–500 chars) + stub-echo reject. Added regression test asserting multi- sentence output reaches the envelope unchanged. P1b — `story.description` flowed into the analyst prompt but NOT into the cache hash. Two requests with identical core fields but different descriptions collided on one cache slot → second caller got prose grounded in the FIRST caller's description. Fix: add `description` as the 6th field of `hashBriefStory`. Bump endpoint cache v4→v5 and shadow v2→v3 so buggy 5-field entries are dropped. Updated the parity sentinel in brief-llm-core.test.mjs to match 6-field semantics. Added regression tests covering different- descriptions-differ and present-vs-absent-differ. Tests: 6083 pass. typecheck × 2 clean.	2026-04-21 22:25:54 +04:00
Elie Habib	2f19d96357	feat(brief): route whyMatters through internal analyst-context endpoint (#3248 ) * feat(brief): route whyMatters through internal analyst-context endpoint The brief's "why this is important" callout currently calls Gemini on only {headline, source, threatLevel, category, country} with no live state. The LLM can't know whether a ceasefire is on day 2 or day 50, that IMF flagged >90% gas dependency in UAE/Qatar/Bahrain, or what today's forecasts look like. Output is generic prose instead of the situational analysis WMAnalyst produces when given live context. This PR adds an internal Vercel edge endpoint that reuses a trimmed variant of the analyst context (country-brief, risk scores, top-3 forecasts, macro signals, market data — no GDELT, no digest-search) and ships it through a one-sentence LLM call with the existing WHY_MATTERS_SYSTEM prompt. The endpoint owns its own Upstash cache (v3 prefix, 6h TTL), supports a shadow mode that runs both paths in parallel for offline diffing, and is auth'd via RELAY_SHARED_SECRET. Three-layer graceful degradation (endpoint → legacy Gemini-direct → stub) keeps the brief shipping on any failure. Env knobs: - BRIEF_WHY_MATTERS_PRIMARY=analyst\|gemini (default: analyst; typo → gemini) - BRIEF_WHY_MATTERS_SHADOW=0\|1 (default: 1; only '0' disables) - BRIEF_WHY_MATTERS_SHADOW_SAMPLE_PCT=0..100 (default: 100) - BRIEF_WHY_MATTERS_ENDPOINT_URL (Railway, optional override) Cache keys: - brief:llm:whymatters:v3:{hash16} — envelope {whyMatters, producedBy, at}, 6h TTL. Endpoint-owned. - brief:llm:whymatters:shadow:v1:{hash16} — {analyst, gemini, chosen, at}, 7d TTL. Fire-and-forget. - brief:llm:whymatters:v2:{hash16} — legacy. Cron's fallback path still reads/writes during the rollout window; expires in ≤24h. Tests: 6022 pass (existing 5915 + 12 core + 36 endpoint + misc). typecheck + typecheck:api + biome on changed files clean. Plan (Codex-approved after 4 rounds): docs/plans/2026-04-21-001-feat-brief-why-matters-analyst-endpoint-plan.md * fix(brief): address /ce:review round 1 findings on PR #3248 Fixes 5 findings from multi-agent review, 2 of them P1: - #241 P1: `.gitignore !api/internal/*` was too broad — it re-included `.env`, `.env.local`, and any future secret file dropped into that directory. Narrowed to explicit source extensions (`.ts`, `.js`, `.mjs`) so parent `.env` / secrets rules stay in effect inside api/internal/. - #242 P1: `Dockerfile.digest-notifications` did not COPY `shared/brief-llm-core.js` + `.d.ts`. Cron would have crashed at container start with ERR_MODULE_NOT_FOUND. Added alongside brief-envelope + brief-filter COPY lines. - #243 P2: Cron dropped the endpoint's source/producedBy ground-truth signal, violating PR #3247's own round-3 memory (feedback_gate_on_ground_truth_not_configured_state.md). Added structured log at the call site: `[brief-llm] whyMatters source=<src> producedBy=<pb> hash=<h>`. Endpoint response now includes `hash` so log + shadow-record pairs can be cross-referenced. - #244 P2: Defense-in-depth prompt-injection hardening. Story fields flowed verbatim into both LLM prompts, bypassing the repo's sanitizeForPrompt convention. Added sanitizeStoryFields helper and applied in both analyst and gemini paths. - #245 P2: Removed redundant `validate` option from callLlmReasoning. With only openrouter configured in prod, a parse-reject walked the provider chain, then fell through to the other path (same provider), then the cron's own fallback (same model) — 3x billing on one reject. Post-call parseWhyMatters check already handles rejection cleanly. Deferred to P3 follow-ups (todos 246-248): singleflight, v2 sunset, misc polish (country-normalize LOC, JSDoc pruning, shadow waitUntil, auto-sync mirror, context-assembly caching). Tests: 6022 pass. typecheck + typecheck:api clean. * fix(brief-why-matters): ctx.waitUntil for shadow write + sanitize legacy fallback Two P2 findings on PR #3248: 1. Shadow record was fire-and-forget without ctx.waitUntil on an Edge function. Vercel can terminate the isolate after response return, so the background redisPipeline write completes unreliably — i.e. the rollout-validation signal the shadow keys were supposed to provide was flaky in production. Fix: accept an optional EdgeContext 2nd arg. Build the shadow promise up front (so it starts executing immediately) then register it with ctx.waitUntil when present. Falls back to plain unawaited execution when ctx is absent (local harness / tests). 2. scripts/lib/brief-llm.mjs legacy fallback path called buildWhyMattersPrompt(story) on raw fields with no sanitization. The analyst endpoint sanitizes before its own prompt build, but the fallback is exactly what runs when the endpoint misses / errors — so hostile headlines / sources reached the LLM verbatim on that path. Fix: local sanitizeStoryForPrompt wrapper imports sanitizeForPrompt from server/_shared/llm-sanitize.js (existing pattern — see scripts/seed-digest-notifications.mjs:41). Wraps story fields before buildWhyMattersPrompt. Cache key unchanged (hash is over raw story), so cache parity with the analyst endpoint's v3 entries is preserved. Regression guard: new test asserts the fallback prompt strips "ignore previous instructions", "### Assistant:" line prefixes, and `<\|im_start\|>` tokens when injection-crafted fields arrive. Typecheck + typecheck:api clean. 6023 / 6023 data tests pass. * fix(digest-cron): COPY server/_shared/llm-sanitize into digest-notifications image Reviewer P1 on PR #3248: my previous commit (`4eee22083`) added `import sanitizeForPrompt from server/_shared/llm-sanitize.js` to scripts/lib/brief-llm.mjs, but Dockerfile.digest-notifications cherry- picks server/_shared/* files and doesn't copy llm-sanitize. Import is top-level/static — the container would crash at module load with ERR_MODULE_NOT_FOUND the moment seed-digest-notifications.mjs pulls in scripts/lib/brief-llm.mjs. Not just on fallback — every startup. Fix: add `COPY server/_shared/llm-sanitize.js server/_shared/llm-sanitize.d.ts` next to the existing brief-render COPY line. Module is pure string manipulation with zero transitive imports — nothing else needs to land. Cites feedback_validation_docker_ship_full_scripts_dir.md in the comment next to the COPY; the cherry-pick convention keeps biting when new cross-dir imports land in scripts/lib/ or scripts/shared/. Can't regression-test at build time from this branch without a docker-build CI job, but the symptom is deterministic — local runs remain green (they resolve against the live filesystem); only the container crashes. Post-merge, Railway redeploy of seed-digest- notifications should show a clean `Starting Container` log line instead of the MODULE_NOT_FOUND crash my prior commit would have caused.	2026-04-21 14:03:27 +04:00
Elie Habib	4d9ae3b214	feat(digest): topic-grouped brief ordering (size-first) (#3247 )	2026-04-21 08:58:02 +04:00
Elie Habib	d1ebc84c6c	feat(digest-dedup): single-link clustering (F1 0.73 vs 0.53 complete-link) (#3234 ) Problem ------- The post-threshold-tuning brief at /api/brief/user_3BovQ1tYlaz2YIGYAdDPXGFBgKy/2026-04-20-1532 still showed 4 copies of "US seizes Iranian ship", 3 copies of the Hormuz closure, and 2 copies of the oil-price story — despite running the calibrated 0.55 threshold. Root cause: complete-link is too strict for wire-headline clustering. Pairwise cosines in the 4-way ship-seizure cluster: 1 <-> 5: 0.632 5 <-> 8: 0.692 1 <-> 8: 0.500 5 <-> 10: 0.656 1 <-> 10: 0.554 8 <-> 10: 0.510 Complete-link requires EVERY pair to clear threshold. Pair 1<->8 at 0.500 fails so the whole 4-way cluster can't form, and all 4 stories bubble up as separate reps, eating 4 slots of the 12-story brief. Measured on the 12 real titles from that brief: Algorithm \| Clusters \| F1 \| P \| R --------------------------\|----------\|-------\|------\|------ complete-link @ 0.55 (was)\| 7 \| 0.526 \| 0.56 \| 0.50 complete-link @ 0.50 \| 6 \| 0.435 \| 0.38 \| 0.50 single-link @ 0.55 \| 4 \| 0.435 \| 0.28 \| 1.00 over-merge single-link @ 0.60 \| 6 \| 0.727 \| 0.67 \| 0.80 winner Change ------ scripts/lib/brief-dedup-embed.mjs: New singleLinkCluster(items, {cosineThreshold, vetoFn}) using union-find. Chain merges through strong intermediates when a direct pair is weak; respects the entity veto (blocked pairs don't union). O(N^2 alpha(N)); permutation-invariant by construction. scripts/lib/brief-dedup.mjs: New DIGEST_DEDUP_CLUSTERING env var (default 'single', set 'complete' to revert). readOrchestratorConfig returns 'clustering' field. Dispatch at call site picks the right function. Structured log line now includes clustering=<algo>. tests/brief-dedup-embedding.test.mjs: +8 regressions: - singleLinkCluster chains the 4-way through a bridge - veto blocks unions even when cosine passes - permutation-invariance property test (5 shuffles) - empty-input - DIGEST_DEDUP_CLUSTERING default is 'single' - DIGEST_DEDUP_CLUSTERING=complete kill switch works - unrecognised values fall back to 'single' - log line includes clustering=<algo> Bridge-pollution risk note -------------------------- The original plan rejected single-link to avoid the Jaccard-era "bridge pollution" (A~B=0.6, B~C=0.6, A~C=0.3 all chain through a mixed-topic B). With text-embedding-3-small at cosine >= 0.60, a bridge must be semantically real — the probe showed a 37% F1 bump with no new FPs on the production case. Setting DIGEST_DEDUP_CLUSTERING=complete on Railway is the instant rollback if a bad day ever surfaces chaining. Operator activation ------------------- After merge, on Railway seed-digest-notifications service: DIGEST_DEDUP_COSINE_THRESHOLD=0.60 No other changes needed — clustering=single is the default. Verification ------------ - npm run test:data 5825/5825 pass - tests/brief-dedup-embedding 53/53 pass (45 existing + 8 new) - typecheck + typecheck:api clean - biome check on changed files clean Post-Deploy Monitoring & Validation ----------------------------------- - Grep '[digest] dedup mode=embed clustering=single' in Railway logs — confirms the new algo is live - Expect clusters= to drop further on bulk ticks (stories=700+): current ~23 on 84-story ticks -> expected ~15-18 - Manually open next brief post-deploy, visually verify ship-seizure / Hormuz / oil stories no longer duplicate - Rollback: DIGEST_DEDUP_CLUSTERING=complete on Railway (instant, no deploy), next cron tick reverts to old behaviour - Validation window: 24h - Owner: koala73 Related ------- - #3200 embedding-based dedup (introduced complete-link) - #3224 DIGEST_SCORE_MIN floor (the low-importance half of the fix)	2026-04-20 16:21:20 +04:00
Elie Habib	38e6892995	fix(brief): per-run slot URL so same-day digests link to distinct briefs (#3205 ) * fix(brief): per-run slot URL so same-day digests link to distinct briefs Digest emails at 8am and 1pm on the same day pointed to byte-identical magazine URLs because the URL was keyed on YYYY-MM-DD in the user tz. Each compose run overwrote the single daily envelope in place, and the composer rolling 24h story window meant afternoon output often looked identical to morning. Readers clicking an older email got whatever the latest cron happened to write. Slot format is now YYYY-MM-DD-HHMM (local tz, per compose run). The magazine URL, carousel URLs, and Redis key all carry the slot, and each digest dispatch gets its own frozen envelope that lives out the 7d TTL. envelope.data.date stays YYYY-MM-DD for rendering "19 April 2026". The digest cron also writes a brief:latest:{userId} pointer (7d TTL, overwritten each compose) so the dashboard panel and share-url endpoint can locate the most recent brief without knowing the slot. The previous date-probing strategy does not work once keys carry HHMM. No back-compat for the old YYYY-MM-DD format: the verifier rejects it, the composer only ever writes the new shape, and any in-flight notifications signed under the old format will 403 on click. Acceptable at the rollout boundary per product decision. * fix(brief): carve middleware bot allowlist to accept slot-format carousel path BRIEF_CAROUSEL_PATH_RE in middleware.ts was still matching only the pre-slot YYYY-MM-DD segment, so every slot-based carousel URL emitted by the digest cron (YYYY-MM-DD-HHMM) would miss the social allowlist and fall into the generic bot gate. Telegram/Slack/Discord/LinkedIn image fetchers would 403 on sendMediaGroup, breaking previews for the new digest links. CI missed this because tests/middleware-bot-gate.test.mts still exercised the old /YYYY-MM-DD/ path shape. Swap the fixture to the slot format and add a regression asserting the pre-slot shape is now rejected, so legacy links cannot silently leak the allowlist after the rollout. * fix(brief): preserve caller-requested slot + correct no-brief share-url error Two contract bugs in the slot rollout that silently misled callers: 1. GET /api/latest-brief?slot=X where X has no envelope was returning { status: 'composing', issueDate: <today UTC> } — which reads as "today's brief is composing" instead of "the specific slot you asked about doesn't exist". A caller probing a known historical slot would get a completely unrelated "today" signal. Now we echo the requested slot back (issueSlot + issueDate derived from its date portion) when the caller supplied ?slot=, and keep the UTC-today placeholder only for the no-param path. 2. POST /api/brief/share-url with no slot and no latest-pointer was falling into the generic invalid_slot_shape 400 branch. That is not an input-shape problem; it is "no brief exists yet for this user". Return 404 brief_not_found — the same code the existing-envelope check returns — so callers get one coherent contract: either the brief exists and is shareable, or it doesn't and you get 404.	2026-04-19 14:15:59 +04:00
Elie Habib	305dc5ef36	feat(digest-dedup): Phase A — embedding-based dedup scaffolding (no-op) (#3200 ) * feat(digest-dedup): Phase A — embedding-based dedup scaffolding (no-op) Replaces the inline Jaccard story-dedup in seed-digest-notifications with an orchestrator that can run Jaccard, shadow, or full embedding modes. Ships with DIGEST_DEDUP_MODE=jaccard as the default so production behaviour is unchanged until Phase C shadow + Phase D flip. New modules (scripts/lib/): - brief-dedup-consts.mjs tunables + cache prefix + __constants bag - brief-dedup-jaccard.mjs verbatim 0.55-threshold extract (fallback) - entity-gazetteer.mjs cities/regions gazetteer + common-caps - brief-embedding.mjs OpenRouter /embeddings client with Upstash cache, all-or-nothing timeout, cosineSimilarity - brief-dedup-embed.mjs complete-link clustering + entity veto (pure) - brief-dedup.mjs orchestrator, env read at call entry, shadow archive, structured log line Operator tools (scripts/tools/): - calibrate-dedup-threshold.mjs offline calibration runner + histogram - golden-pair-validator.mjs live-embedder drift detector (nightly CI) - shadow-sample.mjs Sample A/B CSV emitter over SCAN archive Tests: - brief-dedup-jaccard.test.mjs migrated from regex-harness to direct import plus orchestrator parity tests (22) - brief-dedup-embedding.test.mjs 9 plan scenarios incl. 10-permutation property test, complete-link non-chain (21) - brief-dedup-golden.test.mjs 20-pair mocked canary (21) Workflows: - .github/workflows/dedup-golden-pairs.yml nightly live-embedder canary (07:17 UTC), opens issue on drift Deviation from plan: the shouldVeto("Iran closes Hormuz", "Tehran shuts Hormuz") case can't return true under a single coherent classification (country-in-A vs capital-in-B sit on different sides of the actor/location boundary). Gazetteer follows the plan's "countries are actors" intent; the test is updated to assert false with a comment pointing at the irreducible capital-country coreference limitation. Verification: - npm run test:data 5825/5825 pass - tests/edge-functions 171/171 pass - typecheck + typecheck:api clean - biome check on new files clean - lint:md 0 errors Phase B (calibration), Phase C (shadow), and Phase D (flip) are subsequent PRs. * refactor(digest-dedup): address review findings 193-199 Fresh-eyes review found 3 P1s, 3 P2s, and a P3 bundle across kieran-typescript, security-sentinel, performance-oracle, architecture- strategist, and code-simplicity reviewers. Fixes below; all 64 dedup tests + 5825 data tests + 171 edge-function tests still green. P1 #193 - dedup regex + redis pipeline duplication - Extract defaultRedisPipeline into scripts/lib/_upstash-pipeline.mjs; both orchestrator and embedding client import from there. - normalizeForEmbedding now delegates to stripSourceSuffix from the Jaccard module so the outlet allow-list is single-sourced. P1 #194 - embedding timeout floor + negative-budget path - callEmbeddingsApi throws EmbeddingTimeoutError when timeoutMs<=0 instead of opening a doomed 250ms fetch. - Removed Math.max(250, ...) floor that let wall-clock cap overshoot. P1 #195 - dead env getters - Deleted getMode / isRemoteEmbedEnabled / isEntityVetoEnabled / getCosineThreshold / getWallClockMs from brief-dedup-consts.mjs (zero callers; orchestrator reimplements inline). P2 #196 - orchestrator cleanup bundle - Removed re-exports at bottom of brief-dedup.mjs. - Extracted materializeCluster into brief-dedup-jaccard.mjs; both the fallback and orchestrator use the shared helper. - Deleted clusterWithEntityVeto wrapper; orchestrator inlines the vetoFn wiring at the single call site. - Shadow mode now runs Jaccard exactly once per tick (was twice). - Fallback warn line carries reason=ErrorName so operators can filter timeout vs provider vs shape errors. - Invalid DIGEST_DEDUP_MODE values emit a warn once per run (vs silently falling to jaccard). P2 #197 - workflow + shadow-sample hardening - dedup-golden-pairs.yml body composition no longer relies on a heredoc that would command-substitute validator stdout. Switched to printf with sanitised LOG_TAIL (printable ASCII only) and --body-file so crafted fixture text cannot escape into the runner. - shadow-sample.mjs Upstash helper enforces a hardcoded command allowlist (SCAN \| GET \| EXISTS). P2 #198 - test + observability polish - Scenarios 2 and 3 deep-equal returned clusters against the Jaccard expected shape, not just length. Also assert the reason= field. P3 #199 - nits - Removed __constants test-bag; jaccard tests use named imports. - Renamed deps.apiKey to deps._apiKey in embedding client. - Added @pre JSDoc on diffClustersByHash about unique-hash contract. - Deferred: mocked golden-pair test removal, gazetteer JSON migration, scripts/tools AGENTS.md doc note. Todos 193-199 moved from pending to complete. Verification: - npm run test:data 5825/5825 pass - tests/edge-functions 171/171 pass - typecheck + typecheck:api clean - biome check on changed files clean * fix(digest-dedup): address Greptile P2 findings on PR #3200 1. brief-embedding.mjs: wrap fetch lookup as `(...args) => globalThis.fetch(...args)` instead of aliasing bare `fetch`. Aliasing captures the binding at module-load time, so later instrumentation / Edge-runtime shims don't see the wrapper — same class of bug as the banned `fetch.bind(globalThis)` pattern flagged in AGENTS.md. 2. dedup-golden-pairs.yml: `gh issue create --label "..." \|\| true` silently swallowed the failure when any of dedup/canary/p1 labels didn't pre-exist, breaking the drift alert channel while leaving the job red in the Actions UI. Switched to repeated `--label` flags + `--create-label` so any missing label is auto-created on first drift, and dropped the `\|\| true` so a legitimate failure (network / auth) surfaces instead of hiding. Both fixes are P2-style per Greptile (confidence 5/5, no P0/P1); applied pre-merge so the nightly canary is usable from day one. * fix(digest-dedup): two P1s found on PR #3200 P1 — canary classifier must match production Nightly golden-pair validator was checking a hardcoded threshold (default 0.60) and always applied the entity veto, while the actual dedup path at runtime reads DIGEST_DEDUP_COSINE_THRESHOLD and DIGEST_DEDUP_ENTITY_VETO_ENABLED from env at every call. A Phase C/D env flip could make the canary green while prod was wrong or red while prod was healthy, defeating the whole point of a drift detector. Fix: - golden-pair-validator.mjs now calls readOrchestratorConfig(process.env) — the same helper the orchestrator uses — so any classifier knob added later is picked up automatically. The threshold and veto- enabled flags are sourced from env by default; a --threshold CLI flag still overrides for manual calibration sweeps. - dedup-golden-pairs.yml sources DIGEST_DEDUP_COSINE_THRESHOLD and DIGEST_DEDUP_ENTITY_VETO_ENABLED from GitHub repo variables (vars.), which operators must keep in lockstep with Railway. The workflow_dispatch threshold input now defaults to empty; the scheduled canary always uses the production-parity config. - Validator log line prints the effective config + source so nightly output makes the classifier visible. P1 — shadow archive writes were fail-open `defaultRedisPipeline()` returns null on timeout / auth / HTTP failure. `writeShadowArchive()` only had a try/catch, so the null result was silently treated as success. A Phase C rollout could log clean "mode=shadow … disagreements=X" lines every tick while the Upstash archive received zero writes — and Sample B labelling would then find no batches, silently killing calibration. Fix: - writeShadowArchive now inspects the pipeline return. null result, non-array response, per-command {error}, or a cell without {result: "OK"} all return {ok: false, reason}. - Orchestrator emits a warn line with the failure reason, and the structured log line carries archive_write=ok\|failed so operators can grep for failed ticks. - Regression test in brief-dedup-embedding.test.mjs simulates the null-pipeline contract and asserts both the warn and the structured field land. Verification: - test:data 5825/5825 pass - dedup suites 65/65 pass (new: archive-fail regression) - typecheck + api clean - biome check clean on changed files fix(digest-dedup): two more P1s found on PR #3200 P1 — canary must also honour DIGEST_DEDUP_MODE + REMOTE_EMBED_ENABLED The prior round fixed the threshold/veto knobs but left the canary running embeddings regardless of whether production could actually reach the embed path. If Railway has DIGEST_DEDUP_MODE=jaccard or DIGEST_DEDUP_REMOTE_EMBED_ENABLED=0, production never calls the classifier, so a drift signal is meaningless — or worse, a live OpenRouter issue flags the canary while prod is obliviously fine. Fix: - golden-pair-validator.mjs reads mode + remoteEmbedEnabled from the same readOrchestratorConfig() helper the orchestrator uses. When either says "embed path inactive in prod", the validator logs an explicit skip line and exits 0. The nightly workflow then shows green, which is the correct signal ("nothing to drift against"). - A --force CLI flag remains for manual dispatch during staged rollouts. - dedup-golden-pairs.yml sources DIGEST_DEDUP_MODE and DIGEST_DEDUP_REMOTE_EMBED_ENABLED from GitHub repo variables alongside the threshold and veto-enabled knobs, so all four classifier gates stay in lockstep with Railway. - Validator log line now prints mode + remoteEmbedEnabled so the canary output surfaces which classifier it validated. P1 — shadow-sample Sample A was biased by SCAN order enumerate-and-dedup added every seen pair to a dedup key BEFORE filtering by agreement. If the same pair appeared in an agreeing batch first and a disagreeing batch later, the disagreeing occurrence was silently dropped. SCAN order is unspecified, so Sample A could omit real disagreement pairs. Fix: - Extracted the enumeration into a pure `enumeratePairs(archives, mode)` export so the logic is testable. Mode filter runs BEFORE the dedup check: agreeing pairs are skipped entirely under --mode disagreements, so any later disagreeing occurrence can still claim the dedup slot. - Added tests/brief-dedup-shadow-sample.test.mjs with 5 regression cases: agreement-then-disagreement, reversed order (symmetry), always-agreed omission, population enumeration, cross-batch dedup. - isMain guard added so importing the module for tests does not kick off the CLI scan path. Verification: - test:data 5825/5825 pass - dedup suites 70/70 pass (5 new shadow-sample regressions) - typecheck + api clean - biome check clean on changed files Operator follow-up before Phase C: Set all FOUR dedup repo variables in GitHub alongside Railway: DIGEST_DEDUP_MODE, DIGEST_DEDUP_REMOTE_EMBED_ENABLED, DIGEST_DEDUP_COSINE_THRESHOLD, DIGEST_DEDUP_ENTITY_VETO_ENABLED * refactor(digest-dedup): Railway is the single source of truth for dedup config Fair user pushback: asking operators to set four DIGEST_DEDUP_* values in BOTH Railway (where the cron runs) AND GitHub repo variables (where the canary runs) is architectural debt. Two copies of the same truth will always drift. Solution: the digest cron publishes its resolved config to Upstash on every tick under brief:dedup:config:v1 (2h TTL). The nightly golden-pair canary reads that key instead of env vars. Railway stays the sole source of truth; no parallel repo variables to maintain. A missing/expired key signals "cron hasn't run" and the canary skips with exit 0 — better than validating against hardcoded defaults that might diverge from prod. Changes: - brief-dedup-consts.mjs: new ACTIVE_CONFIG_KEY + TTL constants. - brief-dedup.mjs: new publishActiveConfig() fires at the start of every deduplicateStories() call (before the mode short-circuit, so jaccard ticks also publish a "mode=jaccard" signal the canary can read). Fire-and-forget; archive-write error semantics still apply if the operator wants stricter tracking. - golden-pair-validator.mjs: removed readOrchestratorConfig(env) path. Now calls fetchActiveConfigFromUpstash() and either validates against that config, skips when the embed path is inactive, or skips when the key is missing (with --force override for manual dispatch). - dedup-golden-pairs.yml: dropped the four DIGEST_DEDUP_* env lines and the corresponding repo-variable dependency. Only the three Upstash + OpenRouter secrets remain. - tests: two new regressions assert config is published on every tick (shadow AND jaccard modes) with the right shape + TTL. Operator onboarding now takes one action: set the four DIGEST_DEDUP_* variables on the Railway seed-digest-notifications service. Nothing to set in GitHub beyond the existing OPENROUTER_API_KEY / UPSTASH_* secrets. Verification: - test:data 5825/5825 pass - dedup suites 72/72 pass (2 new config-publish regressions) - typecheck + api clean - biome check clean on changed files * refactor(digest-dedup): ship embed directly, drop phases/canary/shadow User feedback: "i dont need multiple phases and shit, we go directly to embed". Fair. Ripping out the overengineering I accumulated: DELETED - .github/workflows/dedup-golden-pairs.yml (nightly canary) - scripts/tools/golden-pair-validator.mjs - scripts/tools/shadow-sample.mjs - scripts/tools/calibrate-dedup-threshold.mjs - tests/fixtures/brief-dedup-golden-pairs.json - tests/brief-dedup-golden.test.mjs - tests/brief-dedup-shadow-sample.test.mjs SIMPLIFIED - brief-dedup.mjs: removed shadow mode, publishActiveConfig, writeShadowArchive, diffClustersByHash, jaccardRepsToClusterHashes, and the DIGEST_DEDUP_REMOTE_EMBED_ENABLED knob. MODE is now binary: `embed` (default) or `jaccard` (instant kill switch). - brief-dedup-consts.mjs: dropped SHADOW_ARCHIVE_, ACTIVE_CONFIG_. - Default flipped: DIGEST_DEDUP_MODE unset = embed (prod path). Railway deploy with OPENROUTER_API_KEY set = embeddings live on next cron tick. Set MODE=jaccard on Railway to revert instantly. Orchestrator still falls back to Jaccard on any embed-path failure (timeout, provider outage, missing API key, bad response). Fallback warn carries reason=<ErrorName>. The cron never fails because embeddings flaked. All 64 dedup tests + 5825 data tests still green. Net diff: -1,407 lines. Operator single action: set OPENROUTER_API_KEY on Railway's seed-digest-notifications service (already present) and ship. No GH Actions, no shadow archives, no labelling sprints. If the 0.60 threshold turns out wrong, tune DIGEST_DEDUP_COSINE_THRESHOLD on Railway — takes effect on next tick, no redeploy. * fix(digest-dedup): multi-word location phrases in the entity veto Extractor was whitespace-tokenising and only single-token matching against LOCATION_GAZETTEER, silently making every multi-word entry unreachable: extractEntities("Houthis strike ship in Red Sea") → { locations: [], actors: ['houthis','red','sea'] } ✗ shouldVeto("Houthis strike ship in Red Sea", "US escorts convoy in Red Sea") → false ✗ With MODE=embed as the default, that turned off the main anti-overmerge safety rail for bodies of water, regions, and compound city names — exactly the P07-Hormuz / Houthis-Red-Sea headlines the veto was designed to cover. Fix: greedy longest-phrase scan with a sliding window. At each token position try the longest multi-word phrase first (down to 2), require first AND last tokens to be capitalised (so lowercase prose like "the middle east" doesn't falsely match while headline "Middle East" does), lowercase connectors in between are fine ("Strait of Hormuz" → phrase "strait of hormuz" ✓). Falls back to single-token lookup when no multi-word phrase fits. Now: extractEntities("Houthis strike ship in Red Sea") → { locations: ['red sea'], actors: ['houthis'] } ✓ shouldVeto(Red-Sea-Houthis, Red-Sea-US) → true ✓ Complexity still O(N · MAX_PHRASE_LEN) — MAX_PHRASE_LEN is 4 (longest gazetteer entry: "ho chi minh city"), so this is effectively O(N). Added 5 regression tests covering Red Sea, South China Sea, Strait of Hormuz (lowercase-connector case), Abu Dhabi, and New York, plus the Houthis-vs-US veto reproducer from the P1. All 5825 data tests + 45 dedup tests green; lint + typecheck clean.	2026-04-19 13:49:48 +04:00
Elie Habib	81536cb395	feat(brief): source links, LLM descriptions, strip suffix (envelope v2) (#3181 ) * feat(brief): source links, LLM descriptions, strip publisher suffix (envelope v2) Three coordinated fixes to the magazine content pipeline. 1. Headlines were ending with " - AP News" / " \| Reuters" etc. because the composer passed RSS titles through verbatim. Added stripHeadlineSuffix() in brief-compose.mjs, conservative case- insensitive match only when the trailing token equals primarySource, so a real subtitle that happens to contain a dash still survives. 2. Story descriptions were the headline verbatim. Added generateStoryDescription to brief-llm.mjs, plumbed into enrichBriefEnvelopeWithLLM: one additional LLM call per story, cached 24h on a v1 key covering headline, source, severity, category, country. Cache hits are revalidated via parseStoryDescription so a bad row cannot flow to the envelope. Falls through to the cleaned headline on any failure. 3. Source attribution was plain text, no outgoing link. Bumped BRIEF_ENVELOPE_VERSION to 2, added BriefStory.sourceUrl. The composer now plumbs story:track:v1.link through digestStoryToUpstreamTopStory, UpstreamTopStory.primaryLink, filterTopStories, BriefStory.sourceUrl. The renderer wraps the Source line in an anchor with target=_blank, rel=noopener noreferrer, and UTM params (utm_source=worldmonitor, utm_medium=brief, utm_campaign=<issueDate>, utm_content=story- <rank>). UTM appending is idempotent, publisher-attributed URLs keep their own utm_source. Envelope validation gains a validateSourceUrl step (https/http only, no userinfo credentials, parseable absolute URL). Stories without a valid upstream link are dropped by filterTopStories rather than shipping with an unlinked source. Tests: 30 renderer tests to 38; new assertions cover UTM presence on every anchor, HTML-escaping of ampersands in hrefs, pre-existing UTM preservation, and all four validator rejection modes. New composer tests cover suffix stripping, link plumb-through, and v2 drop-on-no- link behaviour. New LLM tests for generateStoryDescription cover cache hit/miss, revalidation of bad rows, 24h TTL, and null-on- failure. * fix(brief): v1 back-compat window on renderer + consolidate story hash helper Two P1/P2 review findings on #3181. P1 (v1 back-compat). Bumping BRIEF_ENVELOPE_VERSION 1 to 2 made every v1 envelope still resident in Redis under the 7-day TTL fail assertBriefEnvelope. The hosted /api/brief route would 404 "expired" and the /api/latest-brief preview would downgrade to "composing", breaking already-issued links from the preceding week. Fix: renderer now accepts SUPPORTED_ENVELOPE_VERSIONS = Set([1, 2]) on READ. BRIEF_ENVELOPE_VERSION stays at 2 and is the only version the composer ever writes. BriefStory.sourceUrl is required when version === 2 and absent on v1; when rendering a v1 story the source line degrades to plain text (no anchor), matching pre-v2 appearance. When the TTL window passes the set can shrink to [2] in a follow-up. P2 (hash dedup). hashStoryDescription was byte-identical to hashStory, inviting silent drift if one prompt gains a field the other forgets. Consolidated into hashBriefStory. Cache key separation remains via the distinct prefixes (brief:llm:whymatters:v2:/brief:llm:description:v1:). Tests: adds 3 v1 back-compat assertions (plain source line, field validation still runs, defensive sourceUrl check), updates the version-mismatch assertion to match the new supported-set message. 161/161 pass (was 158). Full test:data 5706/5706.	2026-04-18 21:49:17 +04:00
Elie Habib	c2356890da	feat(brief): Phase 3b — LLM whyMatters + editorial digest prose via Gemini (#3172 ) * feat(brief): Phase 3b — LLM whyMatters + editorial digest prose via Gemini Replaces the Phase 3a stubs with editorial output from Gemini 2.5 Flash via the existing OpenRouter-backed callLLM chain. Two LLM pathways, different caching semantics: whyMatters (per story): 1 editorial sentence, 18-30 words, global stakes. Cache brief:llm:whymatters:v1:{sha256(headline\|source\|severity)} with 24h TTL shared ACROSS users (whyMatters is not personalised). Bounded concurrency 5 so a 12-story brief doesn't open 12 parallel sockets to OpenRouter. digest prose (per user): JSON { lead, threads[], signals[] } replacing the stubs. Cache brief:llm:digest:v1:{userId}:{sensitivity} :{poolHash} with 4h TTL per-user. Pool hash is order-insensitive so rank shuffling doesn't invalidate. Provider pinned to OpenRouter (google/gemini-2.5-flash) via skipProviders: ['ollama', 'groq'] per explicit user direction. Null-safe all the way down. If the LLM is unreachable, parse fails, or cache throws, enrichBriefEnvelopeWithLLM returns the baseline envelope with its stubs intact. The brief always ships. Kill switch BRIEF_LLM_ENABLED is distinct from AI_DIGEST_ENABLED so the brief's editorial prose and the email's AI summary can be toggled independently during provider outages. Files: scripts/lib/brief-llm.mjs (new) — pure prompt/parse helpers + IO generateWhyMatters/generateDigestProse + envelope enrichment scripts/seed-digest-notifications.mjs — BRIEF_LLM_ENABLED flag, briefLlmDeps closure, enrichment inserted between compose + SETEX tests/brief-llm.test.mjs (new, 34 cases) End-to-end verification: the enriched envelope passes assertBriefEnvelope() — the renderer's strict validator is the gate between composer and api/brief, so we prove the enriched envelope still validates. 156/156 brief tests pass. Both tsconfigs typecheck clean. * fix(brief): address three P1 review findings on Phase 3b All three findings are about cache-key correctness + envelope safety. P1-A — whyMatters cache key under-specifies the prompt. hashStory keyed on headline\|source\|threatLevel, but the prompt also carries category + country. Upstream classification or geocoding corrections that leave those three fields unchanged would return pre-correction prose for a materially different prompt. Bumped to v2 key space (pre-fix rows ignored, re-LLM once on rollout). Added regression tests for category + country busting the cache. P1-B — digest prose cache key under-specifies the prompt. hashDigestInput sorted stories and hashed headline\|threatLevel only. The actual prompt includes ranked order + category + country + source. v2 hash now canonicalises to JSON of the fields in the prompt's ranked order. Test inverted to lock the corrected behaviour (reordering MUST miss the cache). Added a test for category change invalidating. P1-C — malformed cached digest poisons the envelope at SETEX time. On cache hit generateDigestProse accepted any object with a string lead, skipping the full shape check. enrichBriefEnvelopeWithLLM then wrote prose.threads/.signals into the envelope, and the cron SETEXed unvalidated. A bad cache row would 404 /api/brief at render time. Two-layer fix: 1. Extracted validateDigestProseShape(obj) — same strictness parseDigestProse ran on fresh output. generateDigestProse now runs it on cache hits too, and returns a normalised copy. 2. Cron now re-runs assertBriefEnvelope on the ENRICHED envelope before SETEX. On assertion failure it falls back to the unenriched baseline (already passed assertion on construction). Regression test: malformed cached row is rejected on hit and the LLM is called again to overwrite. Tests: 8 new regression cases locking all three findings. Total brief test suite now 185/185 green. Both tsconfigs typecheck clean. Cache-key version bumps (v1 -> v2) trigger one-off cache miss on deploy. Editorial prose re-LLM'd on the next cron tick per user. * fix(brief): address two P2 review findings on #3172 P2-A: misleading test name 'different users share the cache' asserted the opposite (per-user isolation). Renamed to 'different users do NOT share the digest cache even when the story pool is identical' so a future reader can't refactor away the per-user key on a misreading. P2-B: signal length validator only capped bytes (< 220 chars), so a 30-word signal could pass even though the prompt says '<=14 words'. Added a word-count filter with an 18-word ceiling (14 + 4 margin for model drift / hyphenated compounds). Regression test locks the behaviour: signals with >14-word drift are dropped, short imperatives pass. 43/43 brief-llm tests pass. Both tsconfigs typecheck clean.	2026-04-18 19:37:33 +04:00
Elie Habib	01c607c27c	fix(brief): compose magazine stories from digest accumulator, not news:insights (#3168 ) Root cause of the "email digest lists 30 critical events, brief shows 2 random Bellingcat stories" mismatch reported today: the email and the brief read from two unrelated Redis keys. email digest -> digest:accumulator:v1:{variant}:{lang} live per-variant ZSET of 30+ ingested stories, hydrated from story:track:v1:{hash} + sources. written by list-feed-digest on every ingest cycle. brief -> news:insights:v1 global 8-story summary written by seed-insights. After sensitivity=critical filter only 2 survive. A completely different pool on a different cadence. The brief was shipping from the wrong source, so a user who had just read "UNICEF / Hormuz / Rohingya" in their email would open their brief and see unrelated Bellingcat pieces. Fix: brief now composes from the same digest accumulator the email reads. scripts/lib/brief-compose.mjs exposes a new composeBriefFromDigestStories(rule, digestStories, insightsNumbers, {nowMs}) that maps the digest story shape ({hash, title, severity, sources[], ...}) through a small adapter into the upstream brief- filter shape, applies the user's sensitivity gate, and assembles the envelope. news:insights:v1 is still read — but only for the clusters / multi-source counters on the stats page. A failed insights fetch now returns zeroed stats instead of aborting brief composition, because the stories (not the numbers) are what matter. seed-digest-notifications: - composeBriefsForRun now calls buildDigest(candidate, windowStart) per rule instead of using a single global pool - memoizes buildDigest by (variant, lang, windowStart) to keep the per-user loop from issuing N identical ZRANGE+HGETALL round-trips - BRIEF_STORY_WINDOW_MS = 24h — a weekly-cadence user still expects a fresh brief in the dashboard every day, independent of email cadence - composeBriefForRule kept as @deprecated so tests that stub news:insights directly don't break; all live traffic uses the digest path Tests: new tests/brief-from-digest-stories.test.mjs (12 cases) locks the mapping — empty input, source selection, sensitivity pass/drop, 12-story cap, moderate→medium severity aliasing, category/country defaults, stats-number passthrough, determinism. 122/122 brief tests pass; both tsconfigs typecheck clean. Operator note: today's wrong brief at brief:user_...:2026-04-18 was already DELed manually. The next cron tick under this code composes a correct one from the same pool the email used.	2026-04-18 15:47:08 +04:00
Elie Habib	711636c7b6	feat(brief): consolidate composer into digest cron (retire standalone service) (#3157 ) * feat(brief): consolidate composer into digest cron (retire standalone service) Merges the Phase 3a standalone Railway composer into the existing digest cron. End state: one cron (seed-digest-notifications.mjs) writes brief:{userId}:{issueDate} for every eligible user AND dispatches the digest to their configured channels with a signed magazine URL appended. Net -1 Railway service. User's architectural note: "there is no reason to have 1 digest preparing all and sending, then another doing a duplicate". This delivers that — infrastructure consolidation, same send cadence, single source of truth for brief envelopes. File moves / deletes: - scripts/seed-brief-composer.mjs → scripts/lib/brief-compose.mjs Pure-helpers library: no main(), no env guards, no cron. Exports composeBriefForRule + groupEligibleRulesByUser + dedupeRulesByUser (shim) + shouldExitNonZero + date helpers + extractInsights. - Dockerfile.seed-brief-composer → deleted. - The seed-brief-composer Railway service is retired (user confirmed they would delete it manually). New files: - scripts/lib/brief-url-sign.mjs — plain .mjs port of the sign path in server/_shared/brief-url.ts (Web Crypto only, no node:crypto). - tests/brief-url-sign.test.mjs — parity tests that confirm tokens minted by the scripts-side signer verify via the edge-side verifier and produce byte-identical output for identical input. Digest cron (scripts/seed-digest-notifications.mjs): - Reads news:insights:v1 once per run, composes per-user brief envelopes, SETEX brief:{userId}:{issueDate} via body-POST pipeline. - Signs magazine URL per user (BRIEF_URL_SIGNING_SECRET + WORLDMONITOR_PUBLIC_BASE_URL new env requirements, see pre-merge). - Injects magazineUrl into buildChannelBodies for every channel (email, telegram, slack, discord) as a "📖 Open your WorldMonitor Brief magazine" footer CTA. - Email HTML gets a dedicated data-brief-cta-slot near the top of the body with a styled button. - Compose failures NEVER block the digest send — the digest cron's existing behaviour is preserved when the brief pipeline has issues. - Brief compose extracted to its own functions (composeBriefsForRun + composeAndStoreBriefForUser) to keep main's biome complexity at baseline (64 — was 63 before; inline would have pushed to 117). Tests: 98/98 across the brief suite. New parity tests confirm cross- module signer agreement. PRE-MERGE: add BRIEF_URL_SIGNING_SECRET and WORLDMONITOR_PUBLIC_BASE_URL to the digest-notifications Railway service env (same values already set on Vercel for Phase 2). Without them, brief compose is auto- disabled and the digest falls back to its current behaviour — safe to deploy before env is set. * fix(brief): digest Dockerfile + propagate compose failure to exit code Addresses two seventh-round review findings on PR #3157. 1. Cross-directory imports + current Railway build root (todo 230). The consolidated digest cron imports from ../api, ../shared, and (transitively via scripts/lib/brief-compose.mjs) ../server/_shared. The running digest-notifications Railway service builds from the scripts/ root — those parent paths are outside the deploy tree and would 500 on next rebuild with ERR_MODULE_NOT_FOUND. New Dockerfile.digest-notifications (repo-root build context) COPYs exactly the modules the cron needs: scripts/ contents, scripts/lib/, shared/brief-envelope., shared/brief-filter., server/_shared/brief-render., api/_upstash-json.js, api/_seed-envelope.js. Tight list to keep the watch surface small. Pattern matches the retired Dockerfile.seed-brief-composer + the existing Dockerfile.relay. 2. Silent compose failures (todo 231). composeBriefsForRun logged counters but never exited non-zero. An Upstash outage or missing signing secret silently dropped every brief write while Railway showed the cron green. The retired standalone composer exited 1 on structural failures; that observability was lost in the consolidation. Changed the compose fn to return {briefByUser, composeSuccess, composeFailed}. Main captures the counters, runs the full digest send loop first (compose-layer breakage must NEVER block user- visible digest delivery), then calls shouldExitNonZero at the very end. Exit-on-failure gives ops the Railway-red signal without touching send behaviour. Also: a total read failure of news:insights:v1 (catch branch) now counts as 1 compose failure so the gate trips on insights- key infra breakage, not just per-user write failures. Tests unchanged (98/98). Typecheck + node --check clean. Biome complexity ticks 63→65 — same pre-existing bucket, already tolerated by CI; no new blocker. PRE-MERGE Railway work still pending: set BRIEF_URL_SIGNING_SECRET + WORLDMONITOR_PUBLIC_BASE_URL on the digest-notifications service, AND switch its dockerfilePath to /Dockerfile.digest-notifications before merging. Without the dockerfilePath switch, the next rebuild fails. fix(brief): Dockerfile type:module + explicit missing-secret tripwire Addresses two eighth-round review findings on PR #3157. 1. ESM .js files parse as CommonJS in the container (todo 232). Dockerfile.digest-notifications COPYs shared/.js, server/_shared/.js, api/.js — all ESM because the repo-root package.json has "type":"module". But the image never copies the root package.json, so Node's nearest-pjson walk inside /app/ reaches / without finding one and defaults to CommonJS. First `export` statement throws `SyntaxError: Unexpected token 'export'` at startup. Fix: write a minimal /app/package.json with {"type":"module"} early in the build. Avoids dragging the full root package.json into the image while still giving Node the ESM hint it needs for repo-owned .js files. 2. Missing BRIEF_URL_SIGNING_SECRET silently tolerated (todo 233). The old gate folded "operator-disabled" (BRIEF_COMPOSE_ENABLED=0) and "required secret missing in rollout" into the same boolean via AND. A production deploy that forgot the env var would skip brief compose without any failure signal — Railway green, no briefs, no CTA in digests, nobody notices. Split the two states: BRIEF_COMPOSE_DISABLED_BY_OPERATOR (explicit kill switch, silent) and BRIEF_SIGNING_SECRET_MISSING (the misconfig we care about). When the secret is missing without the operator flag, composeBriefsForRun returns composeFailed=1 on first call so the end-of-run exit gate trips and Railway flags the run red. Digest send still proceeds — compose-layer issues never block notifications. Tests: 98/98. Syntax + node --check clean. fix(brief): address 2 remaining P2 review comments on PR #3157 Greptile review (2026-04-18T05:04Z) flagged three P2 items. The first (shouldExitNonZero never wired into cron) was already fixed in commit `35a46aa34`. This commit addresses the other two. 1. composeBriefForRule: issuedAt used Date.now() instead of the caller-supplied nowMs. Under the digest cron the delta is milliseconds and harmless, but it broke the function's determinism contract — same input must produce same output for tests + retries. Now uses the passed nowMs. 2. buildChannelBodies: magazineUrl embedded raw inside Telegram HTML <a href="..."> and Slack <URL\|text> syntax. The URL is HMAC- signed and shape-validated upstream (userId regex + YYYY-MM-DD date), so injection is practically impossible — but the email CTA (injectBriefCta) escapes per-target and channel footers should match that discipline. Added: - Telegram: escape &, <, >, " to HTML entities - Slack: strip <, >, \| (mrkdwn metacharacters) Discord and plain-text paths unchanged — Discord links tolerate raw URLs, plain text has no metacharacters to escape. Tests: 98/98 still pass (deterministic issuedAt change was transparent to existing assertions because tests already pass nowMs explicitly via the issuedAt fixture field).	2026-04-18 12:30:08 +04:00
Sebastien Melki	3314db2664	fix(relay): treat quiet hours start === end as disabled, not 24/7 (#3061 ) (#3066 ) * fix(relay): treat quiet hours start === end as disabled, not 24/7 (#3061) When quietHoursStart equalled quietHoursEnd, the midnight-spanning branch evaluated `hour >= N \|\| hour < N` which is true for all hours, silently suppressing all non-critical alerts permanently. Add an early return for start === end in the relay and reject the combination in Convex validation. Closes #3061 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: cross-check quiet hours start/end against persisted value on single-field updates Addresses Greptile review: validateQuietHoursArgs only caught start===end when both arrived in the same call. Now the mutation handlers also check against the DB record to prevent sequential single-field updates from creating a start===end state. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: gate quiet hours start===end check on effectiveEnabled Only enforce the start !== end invariant when quiet hours are effectively enabled. This allows users with legacy start===end records to disable quiet hours, change timezone/override, or recover from old bad state without getting locked out. Addresses koala73's P1 review feedback on #3066. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(relay): extract quiet-hours + consolidate equality check, add tests - Move isInQuietHours/toLocalHour to scripts/lib/quiet-hours.cjs so they are testable without importing the full relay (which has top-level side effects and env requirements). - Drop the unconditional start===end check from validateQuietHoursArgs; the effectiveEnabled-guarded check in setQuietHours / setQuietHoursForUser is now the single source of truth. Previously a user disabling quiet hours with start===end would be rejected even though the values are irrelevant when disabled. - Add tests/quiet-hours.test.mjs covering: disabled, start===end regression (#3061), midnight-spanning window, same-day window, inclusive/exclusive bounds, invalid timezone, timezone handling, defaults. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Elie Habib <elie.habib@gmail.com>	2026-04-14 13:14:53 +04:00
Sebastien Melki	50765023ea	feat: Cloud Preferences Sync polish + encryption key rotation (#3067 ) * feat: cloud prefs sync polish + encryption key rotation (#2906) 1. Add sync status indicator in Preferences panel — shows current state (synced/pending/error/offline) with colored dot, last-sync timestamp, and manual "Sync now" button for conflict/error recovery. 2. Support AES key rotation in scripts/lib/crypto.cjs — encrypt() always uses the latest ENCRYPTION_KEY_V{n} env var, decrypt() reads the version prefix to select the matching key. Fully backwards-compatible with the existing NOTIFICATION_ENCRYPTION_KEY single-key setup. 3. Add tests exercising the applyMigrations() schema-version plumbing and the new multi-version key rotation (12 tests, all passing). Closes #2906 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address Greptile review feedback - Raise key version ceiling from 9 to 99 to avoid silent footgun - Extract SYNC_STATE_LABELS/COLORS to shared constants (no drift risk) - Gate Cloud Sync UI on isCloudSyncEnabled() so it doesn't render when VITE_CLOUD_PREFS_ENABLED is off Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 19:44:25 +04:00
Elie Habib	24f23ba67a	fix(digest): skip Groq + fix Telegram 400 from oversized messages (#3002 ) * fix(digest): skip Groq (always 429) and fix Telegram 400 from oversized messages Groq consistently rate-limits on digest runs, adding ~1s latency before falling through to OpenRouter. Skip it via new callLLM skipProviders opt. Telegram sendMessage rejects with 400 when digest text exceeds the 4096 char limit (30 stories + AI summary = ~5600 chars). Truncate at last newline before the limit and close any unclosed HTML tags so truncation mid-tag doesn't also cause a parse error. Log the Telegram error response body so future 400s are diagnosable. * fix: strip partial HTML tag before rebalancing in sanitizeTelegramHtml The previous order appended closing tags first, then stripped the trailing partial tag, so truncation mid-tag (e.g. 'x <b>hello</') still produced malformed HTML. Reverse the order: strip partial tag, then close unclosed. * fix: re-check length after sanitize in truncateTelegramHtml Closing tags appended by sanitize can push a near-limit message over 4096. Recurse into truncation if sanitized output exceeds the limit.	2026-04-12 12:00:05 +04:00
Elie Habib	00320c26cf	feat(notifications): proactive intelligence agent (Phase 4) (#2889 ) * feat(notifications): proactive intelligence agent (Phase 4) New Railway cron (every 6 hours) that detects signal landscape changes and generates proactive intelligence briefs before events break. Reads ~8 Redis signal keys (CII risk, GPS interference, unrest, sanctions, cyber threats, thermal anomalies, weather, commodities), computes a landscape snapshot, diffs against the previous run, and generates an LLM brief when the diff score exceeds threshold. Key features: - Signal landscape diff with weighted scoring (new risk countries = 2pts, GPS zone changes = 1pt per zone, commodity movers >3% = 1pt) - Server-side convergence detection: countries with 3+ signal types flagged - First run stores baseline only (no false-positive brief) - Delivers via all 5 channels (Telegram, Slack, Discord, Email, Webhook) - PROACTIVE_INTEL_ENABLED=0 env var to disable - Skips users without saved preferences or deliverable channels Requires: Railway cron service configuration (every 6 hours) * fix(proactive): fetch all enabled rules + expand convergence to all signal types 1. Replace /relay/digest-rules (digest-mode only) with ConvexHttpClient query alertRules:getByEnabled to include ALL enabled rules, not just digest-mode users. Proactive briefs now reach real-time users too. 2. Expand convergence detection from 3 signal families (risk, unrest, sanctions) to all 7 (add GPS interference, cyber, thermal, weather). Track signal TYPES per country (Set), not event counts, so convergence means 3+ distinct signal categories, not 3+ events from one category. 3. Include signal type names in convergence zone output for LLM context and webhook payload. * fix(proactive): check channels before LLM + deactivate stale channels 1. Move channel fetch + deliverability check BEFORE user prefs and LLM call to avoid wasting LLM calls on users with no verified channels 2. Add deactivateChannel() calls on 404/410/403 responses in all delivery helpers (Telegram, Slack, Discord, Webhook), matching the pattern in notification-relay.cjs and seed-digest-notifications.mjs * fix(proactive): preserve landscape on transient failures + drop Telegram Markdown 1. Don't advance landscape baseline when channel fetch or LLM fails, so the brief retries on the next run instead of permanently suppressing the change window 2. Remove parse_mode: 'Markdown' from Telegram sendMessage to avoid 400 errors from unescaped characters in LLM output (matches digest pattern) * fix(proactive): only advance landscape baseline after successful delivery * fix(proactive): abort on degraded signals + don't advance on prefs failure 1. Track loaded signal key count. Abort run if <60% of keys loaded to prevent false diffs from degraded Redis snapshots becoming the new baseline. 2. Don't advance landscape when fetchUserPreferences() returns null (could be transient failure, not just "no saved prefs"). Retries next run instead of permanently suppressing the brief. * fix(notifications): distinguish no-prefs from fetch-error in user-context fetchUserPreferences() now returns { data, error } instead of bare null. error=true means transient failure (retry next run, don't advance baseline). data=null + error=false means user has no saved preferences (skip + advance). Proactive script: retry on error, skip+advance on no-prefs. Digest script: updated to destructure new return shape (behavior unchanged, both cases skip AI summary). * fix(proactive): address all Greptile review comments P1: Add link-local (169.254) and 0.0.0.0 to isPrivateIP SSRF check P1: Log channel-fetch failures (was silent catch{}) P2: Remove unused createHash import and BRIEF_TTL constant P2: Remove dead ?? 'full' fallback (rule.variant validated above) P2: Add HTTPS enforcement to sendSlack/sendDiscord (matching sendWebhook)	2026-04-10 08:08:27 +04:00
Elie Habib	fa64e2f61f	feat(notifications): AI-enriched digest delivery (#2876 ) * feat(notifications): AI-enriched digest delivery (Phase 1) Add personalized LLM-generated executive summaries to digest notifications. When AI_DIGEST_ENABLED=1 (default), the digest cron fetches user preferences (watchlist, panels, frameworks), generates a tailored intelligence brief via Groq/OpenRouter, and prepends it to the story list in both text and HTML formats. New infrastructure: - convex/userPreferences: internalQuery for service-to-service access - convex/http: /relay/user-preferences endpoint (RELAY_SHARED_SECRET auth) - scripts/lib/llm-chain.cjs: shared Ollama->Groq->OpenRouter provider chain - scripts/lib/user-context.cjs: user preference extraction + LLM prompt formatting AI summary is cached (1h TTL) per stories+userContext hash. Falls back to raw digest on LLM failure (no regression). Subject line changes to "Intelligence Brief" when AI summary is present. * feat(notifications): per-user AI digest opt-out toggle AI executive summary in digests is now optional per user via alertRules.aiDigestEnabled (default true). Users can toggle it off in Settings > Notifications > Digest > "AI executive summary". Schema: added aiDigestEnabled to alertRules table Backend: Convex mutations, HTTP relay, edge function all forward the field Frontend: toggle in digest settings section with descriptive copy Digest cron: skips LLM call when rule.aiDigestEnabled === false * fix(notifications): address PR review — cache key, HTML replacement, UA 1. Add variant to AI summary cache key to prevent cross-variant poisoning 2. Use replacer function in html.replace() to avoid $-pattern corruption from LLM output containing dollar amounts ($500M, $1T) 3. Use service UA (worldmonitor-llm/1.0) instead of Chrome UA for LLM calls * fix(notifications): skip AI summary without prefs + fix HTML regex 1. Return null from generateAISummary() when fetchUserPreferences() returns null, so users without saved preferences get raw digest instead of a generic LLM summary 2. Fix HTML replace regex to match actual padding value (40px 32px 0) so the executive summary block is inserted in email HTML * fix(notifications): channel check before LLM, omission-safe aiDigest, richer cache key 1. Move channel fetch + deliverability check BEFORE AI summary generation so users with no verified channels don't burn LLM calls every cron run 2. Only patch aiDigestEnabled when explicitly provided (not undefined), preventing stale frontend tabs from silently clearing an opt-out 3. Include severity, phase, and sources in story hash for cache key so the summary invalidates when those fields change	2026-04-09 21:35:26 +04:00
Elie Habib	751820c1cc	feat(prefs): Phase 0 + 1 — sync primitives, Convex schema & preferences API (#2505 )	2026-03-29 16:02:56 +04:00
Elie Habib	79ec6e601b	feat(prefs): Phase 0 — sync primitives and notification scaffolding (#2503 )	2026-03-29 13:57:34 +04:00
Elie Habib	3702463321	Add thermal escalation seeded service (#1747 ) * feat(thermal): add thermal escalation seeded service Cherry-picked from codex/thermal-escalation-phase1 and retargeted to main. Includes thermal escalation seed script, RPC handler, proto definitions, bootstrap/health/seed-health wiring, gateway cache tier, client service, and tests. * fix(thermal): wire data-loader, fix typing, recalculate summary Wire fetchThermalEscalations into data-loader.ts with panel forwarding, freshness tracking, and variant gating. Fix seed-health intervalMin from 90 to 180 to match 3h TTL. Replace 8 as-any casts with typed interface. Recalculate summary counts after maxItems slice. * fix(thermal): enforce maxItems on hydrated data + fix bootstrap keys Codex P2: hydration branch now slices clusters to maxItems before mapping, matching the RPC fallback behavior. Also add thermalEscalation to bootstrap.js BOOTSTRAP_CACHE_KEYS and SLOW_KEYS (was lost during conflict resolution). * fix(thermal): recalculate summary on sliced hydrated clusters When maxItems truncates the cluster array from bootstrap hydration, the summary was still using the original full-set counts. Now recalculates clusterCount, elevatedCount, spikeCount, etc. on the sliced array, matching the handler's behavior.	2026-03-17 14:24:26 +04:00

23 Commits