Commit Graph

31 Commits

Author SHA1 Message Date
Elie Habib
2f5445284b fix(brief): single canonical synthesis brain — eliminate email/brief lead divergence (#3396)
* feat(brief-llm): canonical synthesis prompt + v3 cache key

Extends generateDigestProse to be the single source of truth for
brief executive-summary synthesis (canonicalises what was previously
split between brief-llm's generateDigestProse and seed-digest-
notifications.mjs's generateAISummary). Ports Brain B's prompt
features into buildDigestPrompt:

- ctx={profile, greeting, isPublic} parameter (back-compat: 4-arg
  callers behave like today)
- per-story severity uppercased + short-hash prefix [h:XXXX] so the
  model can emit rankedStoryHashes for stable re-ranking
- profile lines + greeting opener appear only when ctx.isPublic !== true

validateDigestProseShape gains optional rankedStoryHashes (≥4-char
strings, capped to MAX_STORIES_PER_USER × 2). v2-shaped rows still
pass — field defaults to [].

hashDigestInput v3:
- material includes profile-SHA, greeting bucket, isPublic flag,
  per-story hash
- isPublic=true substitutes literal 'public' for userId in the cache
  key so all share-URL readers of the same (date, sensitivity, pool)
  hit ONE cache row (no PII in public cache key)

Adds generateDigestProsePublic(stories, sensitivity, deps) wrapper —
no userId param by design — for the share-URL surface.

Cache prefix bumped brief:llm:digest:v2 → v3. v2 rows expire on TTL.
Per the v1→v2 precedent (see hashDigestInput comment), one-tick cost
on rollout is acceptable for cache-key correctness.

Tests: 72/72 passing in tests/brief-llm.test.mjs (8 new for the v3
behaviors), full data suite 6952/6952.

Plan: docs/plans/2026-04-25-002-fix-brief-email-two-brain-divergence-plan.md
Step 1, Codex-approved (5 rounds).

* feat(brief): envelope v3 — adds digest.publicLead for share-URL surface

Bumps BRIEF_ENVELOPE_VERSION 2 → 3. Adds optional
BriefDigest.publicLead — non-personalised executive lead generated
by generateDigestProsePublic (already in this branch from the
previous commit) for the public share-URL surface. Personalised
`lead` is the canonical synthesis for authenticated channels;
publicLead is its profile-stripped sibling so api/brief/public/*
never serves user-specific content (watched assets/regions).

SUPPORTED_ENVELOPE_VERSIONS = [1, 2, 3] keeps v1 + v2 envelopes
in the 7-day TTL window readable through the rollout — the
composer only ever writes the current version, but readers must
tolerate older shapes that haven't expired yet. Same rollout
pattern used at the v1 → v2 bump.

Renderer changes (server/_shared/brief-render.js):
- ALLOWED_DIGEST_KEYS gains 'publicLead' (closed-key-set still
  enforced; v2 envelopes pass because publicLead === undefined is
  the v2 shape).
- assertBriefEnvelope: new isNonEmptyString check on publicLead
  when present. Type contract enforced; absence is OK.

Tests (tests/brief-magazine-render.test.mjs):
- New describe block "v3 publicLead field": v3 envelope renders;
  malformed publicLead rejected; v2 envelope still passes; ad-hoc
  digest keys (e.g. synthesisLevel) still rejected — confirming
  the closed-key-set defense holds for the cron-local-only fields
  the orchestrator must NOT persist.
- BRIEF_ENVELOPE_VERSION pin updated 2 → 3 with rollout-rationale
  comment.

Test results: 182 brief-related tests pass; full data suite
6956/6956.

Plan: docs/plans/2026-04-25-002-fix-brief-email-two-brain-divergence-plan.md
Step 2, Codex Round-3 Medium #2.

* feat(brief): synthesis splice + rankedStoryHashes pre-cap re-order

Plumbs the canonical synthesis output (lead, threads, signals,
publicLead, rankedStoryHashes from generateDigestProse) through the
pure composer so the orchestration layer can hand pre-resolved data
into envelope.digest. Composer stays sync / no I/O — Codex Round-2
High #2 honored.

Changes:

scripts/lib/brief-compose.mjs:
- digestStoryToUpstreamTopStory now emits `hash` (the digest story's
  stable identifier, falls back to titleHash when absent). Without
  this, rankedStoryHashes from the LLM has nothing to match against.
- composeBriefFromDigestStories accepts opts.synthesis = {lead,
  threads, signals, rankedStoryHashes?, publicLead?}. When passed,
  splices into envelope.digest after the stub is built. Partial
  synthesis (e.g. only `lead` populated) keeps stub defaults for the
  other fields — graceful degradation when L2 fallback fires.

shared/brief-filter.js:
- filterTopStories accepts optional rankedStoryHashes. New helper
  applyRankedOrder re-orders stories by short-hash prefix match
  BEFORE the cap is applied, so the model's editorial judgment of
  importance survives MAX_STORIES_PER_USER. Stable for ties; stories
  not in the ranking come after in original order. Empty/missing
  ranking is a no-op (legacy callers unchanged).

shared/brief-filter.d.ts:
- filterTopStories signature gains rankedStoryHashes?: string[].
- UpstreamTopStory gains hash?: unknown (carried through from
  digestStoryToUpstreamTopStory).

Tests added (tests/brief-from-digest-stories.test.mjs):
- synthesis substitutes lead/threads/signals/publicLead.
- legacy 4-arg callers (no synthesis) keep stub lead.
- partial synthesis (only lead) keeps stub threads/signals.
- rankedStoryHashes re-orders pool before cap.
- short-hash prefix match (model emits 8 chars; story carries full).
- unranked stories go after in original order.

Test results: 33/33 in brief-from-digest-stories; 182/182 across all
brief tests; full data suite 6956/6956.

Plan: docs/plans/2026-04-25-002-fix-brief-email-two-brain-divergence-plan.md
Step 3, Codex Round-2 Low + Round-2 High #2.

* feat(brief): single canonical synthesis per user; rewire all channels

Restructures the digest cron's per-user compose + send loops to
produce ONE canonical synthesis per user per issueSlot — the lead
text every channel (email HTML, plain-text, Telegram, Slack,
Discord, webhook) and the magazine show is byte-identical. This
eliminates the "two-brain" divergence that was producing different
exec summaries on different surfaces (observed 2026-04-25 0802).

Architecture:

composeBriefsForRun (orchestration):
- Pre-annotates every eligible rule with lastSentAt + isDue once,
  before the per-user pass. Same getLastSentAt helper the send loop
  uses so compose + send agree on lastSentAt for every rule.

composeAndStoreBriefForUser (per-user):
- Two-pass winner walk: try DUE rules first (sortedDue), fall back
  to ALL eligible rules (sortedAll) for compose-only ticks.
  Preserves today's dashboard refresh contract for weekly /
  twice_daily users on non-due ticks (Codex Round-4 High #1).
- Within each pass, walk by compareRules priority and pick the
  FIRST candidate with a non-empty pool — mirrors today's behavior
  at scripts/seed-digest-notifications.mjs:1044 and prevents the
  "highest-priority but empty pool" edge case (Codex Round-4
  Medium #2).
- Three-level synthesis fallback chain:
    L1: generateDigestProse(fullPool, ctx={profile,greeting,!public})
    L2: generateDigestProse(envelope-sized slice, ctx={})
    L3: stub from assembleStubbedBriefEnvelope
  Distinct log lines per fallback level so ops can quantify
  failure-mode distribution.
- Generates publicLead in parallel via generateDigestProsePublic
  (no userId param; cache-shared across all share-URL readers).
- Splices synthesis into envelope via composer's optional
  `synthesis` arg (Step 3); rankedStoryHashes re-orders the pool
  BEFORE the cap so editorial importance survives MAX_STORIES.
- synthesisLevel stored in the cron-local briefByUser entry — NOT
  persisted in the envelope (renderer's assertNoExtraKeys would
  reject; Codex Round-2 Medium #5).

Send loop:
- Reads lastSentAt via shared getLastSentAt helper (single source
  of truth with compose flow).
- briefLead = brief?.envelope?.data?.digest?.lead — the canonical
  lead. Passed to buildChannelBodies (text/Telegram/Slack/Discord),
  injectEmailSummary (HTML email), and sendWebhook (webhook
  payload's `summary` field). All-channel parity (Codex Round-1
  Medium #6).
- Subject ternary reads cron-local synthesisLevel: 1 or 2 →
  "Intelligence Brief", 3 → "Digest" (preserves today's UX for
  fallback paths; Codex Round-1 Missing #5).

Removed:
- generateAISummary() — the second LLM call that produced the
  divergent email lead. ~85 lines.
- AI_SUMMARY_CACHE_TTL constant — no longer referenced. The
  digest:ai-summary:v1:* cache rows expire on their existing 1h
  TTL (no cleanup pass).

Helpers added:
- getLastSentAt(rule) — extracted Upstash GET for digest:last-sent
  so compose + send both call one source of truth.
- buildSynthesisCtx(rule, nowMs) — formats profile + greeting for
  the canonical synthesis call. Preserves all today's prefs-fetch
  failure-mode behavior.

Composer:
- compareRules now exported from scripts/lib/brief-compose.mjs so
  the cron can sort each pass identically to groupEligibleRulesByUser.

Test results: full data suite 6962/6962 (was 6956 pre-Step 4; +6
new compose-synthesis tests from Step 3).

Plan: docs/plans/2026-04-25-002-fix-brief-email-two-brain-divergence-plan.md
Steps 4 + 4b. Codex-approved (5 rounds).

* fix(brief-render): public-share lead fail-safe — never leak personalised lead

Public-share render path (api/brief/public/[hash].ts → renderer
publicMode=true) MUST NEVER serve the personalised digest.lead
because that string can carry profile context — watched assets,
saved-region names, etc. — written by generateDigestProse with
ctx.profile populated.

Previously: redactForPublic redacted user.name and stories.whyMatters
but passed digest.lead through unchanged. Codex Round-2 High
(security finding).

Now (v3 envelope contract):
- redactForPublic substitutes digest.lead = digest.publicLead when
  the v3 envelope carries one (generated by generateDigestProsePublic
  with profile=null, cache-shared across all public readers).
- When publicLead is absent (v2 envelope still in TTL window OR v3
  envelope where publicLead generation failed), redactForPublic sets
  digest.lead to empty string.
- renderDigestGreeting: when lead is empty, OMIT the <blockquote>
  pull-quote entirely. Page still renders complete (greeting +
  horizontal rule), just without the italic lead block.
- NEVER falls back to the original personalised lead.

assertBriefEnvelope still validates publicLead's contract (when
present, must be a non-empty string) BEFORE redactForPublic runs,
so a malformed publicLead throws before any leak risk.

Tests added (tests/brief-magazine-render.test.mjs):
- v3 envelope renders publicLead in pull-quote, personalised lead
  text never appears.
- v2 envelope (no publicLead) omits pull-quote; rest of page
  intact.
- empty-string publicLead rejected by validator (defensive).
- private render still uses personalised lead.

Test results: 68 brief-magazine-render tests pass; full data suite
remains green from prior commit.

Plan: docs/plans/2026-04-25-002-fix-brief-email-two-brain-divergence-plan.md
Step 5, Codex Round-2 High (security).

* feat(digest): brief lead parity log + extra acceptance tests

Adds the parity-contract observability line and supplementary
acceptance tests for the canonical synthesis path.

Parity log (per send, after successful delivery):
  [digest] brief lead parity user=<id> rule=<v>:<s>:<lang>
    synthesis_level=<1|2|3> exec_len=<n> brief_lead_len=<n>
    channels_equal=<bool> public_lead_len=<n>

When channels_equal=false an extra WARN line fires —
"PARITY REGRESSION user=… — email lead != envelope lead." Sentry's
existing console-breadcrumb hook lifts this without an explicit
captureMessage call. Plan acceptance criterion A5.

Tests added (tests/brief-llm.test.mjs, +9):
- generateDigestProsePublic: two distinct callers with identical
  (sensitivity, story-pool) hit the SAME cache row (per Codex
  Round-2 Medium #4 — "no PII in public cache key").
- public + private writes never collide on cache key (defensive).
- greeting bucket change re-keys the personalised cache (Brain B
  parity).
- profile change re-keys the personalised cache.
- v3 cache prefix used (no v2 writes).

Test results: 77/77 in brief-llm; full data suite 6971/6971
(was 6962 pre-Step-7; +9 new public-cache tests).

Plan: docs/plans/2026-04-25-002-fix-brief-email-two-brain-divergence-plan.md
Steps 6 (partial) + 7. Acceptance A5, A6.g, A6.f.

* test(digest): backfill A6.h/i/l/m acceptance tests via helper extraction

* fix(brief): close two correctness regressions on multi-rule + public surface

Two findings from human review of the canonical-synthesis PR:

1. Public-share redaction leaked personalised signals + threads.
   The new prompt explicitly personalises both `lead` and `signals`
   ("personalise lead and signals"), but redactForPublic only
   substituted `lead` — leaving `signals` and `threads` intact.
   Public renderer's hasSignals gate would emit the signals page
   whenever `digest.signals.length > 0`, exposing watched-asset /
   region phrasing to anonymous readers. Same privacy bug class
   the original PR was meant to close, just on different fields.

2. Multi-rule users got cross-pool lead/storyList mismatch.
   composeAndStoreBriefForUser picks ONE winning rule for the
   canonical envelope. The send loop then injected that ONE
   `briefLead` into every due rule's channel body — even though
   each rule's storyList came from its own (per-rule) digest pool.
   Multi-rule users (e.g. `full` + `finance`) ended up with email
   bodies leading on geopolitics while listing finance stories.
   Cross-rule editorial mismatch reintroduced after the cross-
   surface fix.

Fix 1 — public signals + threads:
- Envelope shape: BriefDigest gains `publicSignals?: string[]` +
  `publicThreads?: BriefThread[]` (sibling fields to publicLead).
  Renderer's ALLOWED_DIGEST_KEYS extended; assertBriefEnvelope
  validates them when present.
- generateDigestProsePublic already returned a full prose object
  (lead + signals + threads) — orchestration now captures all
  three instead of just `.lead`. Composer splices each into its
  envelope slot.
- redactForPublic substitutes:
    digest.lead    ← publicLead (or empty → omits pull-quote)
    digest.signals ← publicSignals (or empty → omits signals page)
    digest.threads ← publicThreads (or category-derived stub via
                     new derivePublicThreadsStub helper — never
                     falls back to the personalised threads)
- New tests cover all three substitutions + their fail-safes.

Fix 2 — per-rule synthesis in send loop:
- Each due rule independently calls runSynthesisWithFallback over
  ITS OWN pool + ctx. Channel body lead is internally consistent
  with the storyList (both from the same pool).
- Cache absorbs the cost: when this is the winner rule, the
  synthesis hits the cache row written during the compose pass
  (same userId/sensitivity/pool/ctx) — no extra LLM call. Only
  multi-rule users with non-overlapping pools incur additional
  LLM calls.
- magazineUrl still points at the winner's envelope (single brief
  per user per slot — `(userId, issueSlot)` URL contract). Channel
  lead vs magazine lead may differ for non-winner rule sends;
  documented as acceptable trade-off (URL/key shape change to
  support per-rule magazines is out of scope for this PR).
- Parity log refined: adds `winner_match=<bool>` field. The
  PARITY REGRESSION warning now fires only when winner_match=true
  AND the channel lead differs from the envelope lead (the actual
  contract regression). Non-winner sends with legitimately
  different leads no longer spam the alert.

Test results:
- tests/brief-magazine-render.test.mjs: 75/75 (+7 new for public
  signals/threads + validator + private-mode-ignores-public-fields)
- Full data suite: 6995/6995 (was 6988; +7 net)
- typecheck + typecheck:api: clean

Plan: docs/plans/2026-04-25-002-fix-brief-email-two-brain-divergence-plan.md
Addresses 2 review findings on PR #3396 not anticipated in the
5-round Codex review.

* fix(brief): unify compose+send window, fall through filter-rejection

Address two residual risks in PR #3396 (single-canonical-brain refactor):

Risk 1 — canonical lead synthesized from a fixed 24h pool while the
send loop ships stories from `lastSentAt ?? 24h`. For weekly users
that meant a 24h-pool lead bolted onto a 7d email body — the same
cross-surface divergence the refactor was meant to eliminate, just in
a different shape. Twice-daily users hit a 12h-vs-24h variant.

Fix: extract the window formula to `digestWindowStartMs(lastSentAt,
nowMs, defaultLookbackMs)` in digest-orchestration-helpers.mjs and
call it from BOTH the compose path's digestFor closure AND the send
loop. The compose path now derives windowStart per-candidate from
`cand.lastSentAt`, identical to what the send loop will use for that
rule. Removed the now-unused BRIEF_STORY_WINDOW_MS constant.

Side-effect: digestFor now receives the full annotated candidate
(`cand`) instead of just the rule, so it can reach `cand.lastSentAt`.
Backwards-compatible at the helper level — pickWinningCandidateWithPool
forwards `cand` instead of `cand.rule`.

Cache memo hit rate drops since lastSentAt varies per-rule, but
correctness > a few extra Upstash GETs.

Risk 2 — pickWinningCandidateWithPool returned the first candidate
with a non-empty raw pool as winner. If composeBriefFromDigestStories
then dropped every story (URL/headline/shape filters), the caller
bailed without trying lower-priority candidates. Pre-PR behaviour was
to keep walking. This regressed multi-rule users whose top-priority
rule's pool happens to be entirely filter-rejected.

Fix: optional `tryCompose(cand, stories)` callback on
pickWinningCandidateWithPool. When provided, the helper calls it after
the non-empty pool check; falsy return → log filter-rejected and walk
to the next candidate; truthy → returns `{winner, stories,
composeResult}` so the caller can reuse the result. Without the
callback, legacy semantics preserved (existing tests + callers
unaffected).

Caller composeAndStoreBriefForUser passes a no-synthesis compose call
as tryCompose — cheap pure-JS, no I/O. Synthesis only runs once after
the winner is locked in, so the perf cost is one extra compose per
filter-rejected candidate, no extra LLM round-trips.

Tests:
- 10 new cases in tests/digest-orchestration-helpers.test.mjs
  covering: digestFor receiving full candidate; tryCompose
  fall-through to lower-priority; all-rejected returns null;
  composeResult forwarded; legacy semantics without tryCompose;
  digestWindowStartMs lastSentAt-vs-default branches; weekly +
  twice-daily window parity assertions; epoch-zero ?? guard.
- Updated tests/digest-cache-key-sensitivity.test.mjs static-shape
  regex to match the new `cand.rule.sensitivity` cache-key shape
  (intent unchanged: cache key MUST include sensitivity).

Stacked on PR #3396 — targets feat/brief-two-brain-divergence.
2026-04-25 16:22:31 +04:00
Elie Habib
9c14820c69 fix(digest): brief filter-drop instrumentation + cache-key correctness (#3387)
* fix(digest): include sensitivity in digestFor cache key

buildDigest filters by rule.sensitivity BEFORE dedup, but digestFor
memoized only on (variant, lang, windowStart). Stricter-sensitivity
users in a shared bucket inherited the looser populator's pool,
producing the wrong story set and defeating downstream topic-grouping
adjacency once filterTopStories re-applied sensitivity.

Solution 1 from docs/plans/2026-04-24-004-fix-brief-topic-adjacency-defects-plan.md.

* feat(digest): instrument per-user filterTopStories drops

Adds an optional onDrop metrics callback to filterTopStories and threads
it through composeBriefFromDigestStories. The seeder aggregates counts
per composed brief and emits one structured log line per user per tick:

  [digest] brief filter drops user=<id> sensitivity=<s> in=<count>
    dropped_severity=<n> dropped_url=<n> dropped_headline=<n>
    dropped_shape=<n> out=<count>

Decides whether the conditional Solution 3 (post-filter regroup) is
warranted by quantifying how often post-group filter drops puncture
multi-member topics in production. No behaviour change for callers
that omit onDrop.

Solution 0 from docs/plans/2026-04-24-004-fix-brief-topic-adjacency-defects-plan.md.

* fix(digest): close two Sol-0 instrumentation gaps from code review

Review surfaced two P2 gaps in the filter-drop telemetry that weakened
its diagnostic purpose for Sol-3 gating:

1. Cap-truncation silent drop: filterTopStories broke on
   `out.length >= maxStories` BEFORE the onDrop emit sites, so up to
   (DIGEST_MAX_ITEMS - MAX_STORIES_PER_USER) stories per user were
   invisible. Added a 'cap' reason to DropMetricsFn and emit one event
   per skipped story so `in - out - sum(dropped_*) == 0` reconciles.

2. Wipeout invisibility: composeAndStoreBriefForUser only logged drop
   stats for the WINNING candidate. When every candidate composed to
   null, the log line never fired — exactly the wipeout case Sol-0
   was meant to surface. Now tracks per-candidate drops and emits an
   aggregate `outcome=wipeout` line covering all attempts.

Also tightens the digest-cache-key sensitivity regex test to anchor
inside the cache-key template literal (it would otherwise match the
unrelated `chosenCandidate.sensitivity ?? 'high'` in the new log line).

PR review residuals from
docs/plans/2026-04-24-004-fix-brief-topic-adjacency-defects-plan.md
ce-code-review run 20260424-232911-37a2d5df.

* chore: ignore .context/ ce-code-review run artifacts

The ce-code-review skill writes per-run artifacts (reviewer JSON,
synthesis.md, metadata.json) under .context/compound-engineering/.
These are local-only — neither tracked nor linted.

* fix(digest): emit per-attempt filter-drop rows, not per-user

Addresses two PR #3387 review findings:

- P2: Earlier candidates that composed to null (wiped out by post-group
  filtering) had their dropStats silently discarded when a later
  candidate shipped — exactly the signal Sol-0 was meant to surface.
- P3: outcome=wipeout row was labeled with allCandidateDrops[0]
  .sensitivity, misleading when candidates within one user have
  different sensitivities.

Fix: emit one structured row per attempted candidate, tagged with that
candidate's own sensitivity and variant. Outcome is shipped|rejected.
A wipeout is now detectable as "all rows for this user are rejected
within the tick" — no aggregate-row ambiguity. Removes the
allCandidateDrops accumulator entirely.

* fix(digest): align composeBriefFromDigestStories sensitivity default to 'high'

Addresses PR #3387 review (P2): composeBriefFromDigestStories defaulted
to `?? 'all'` while buildDigest, the digestFor cache key, and the new
per-attempt log line all default to `?? 'high'`. The mismatch is
harmless in production (the live cron path pre-filters the pool) but:

- A non-prefiltered caller with undefined sensitivity would silently
  ship medium/low stories.
- Per-attempt telemetry labels the attempt as `sensitivity=high` while
  compose actually applied 'all' — operators are misled.

Aligning compose to 'high' makes the four sites agree and the telemetry
honest. Production output is byte-identical (input pool was already
'high'-filtered upstream).

Adds 3 regression tests asserting the new default: critical/high admitted,
medium/low dropped, and onDrop fires reason=severity for the dropped
levels (locks in alignment with per-attempt telemetry).

* fix(digest): align remaining sensitivity defaults to 'high'

Addresses PR #3387 review (P2 + P3): three more sites still defaulted
missing sensitivity to 'all' while compose/buildDigest/cache/log now
treat it as 'high'.

P2 — compareRules (scripts/lib/brief-compose.mjs:35-36): the rank
function used to default to 'all', placing legacy undefined-sensitivity
rules FIRST in the candidate order. Compose then applied a 'high'
filter to them, shipping a narrow brief while an explicit 'all' rule
for the same user was never tried. Aligned to 'high' so the rank
matches what compose actually applies.

P3 — enrichBriefEnvelopeWithLLM (scripts/lib/brief-llm.mjs:526):
the digest prompt and cache key still used 'all' for legacy rules,
misleading personalization ("Reader sensitivity level: all" while the
brief contains only critical/high stories) and busting the cache for
legacy vs explicit-'all' rows that should share entries.

Also aligns the @deprecated composeBriefForRule (line 164) for
consistency, since tests still import it.

3 new regression tests in tests/brief-composer-rule-dedup.test.mjs
lock in the new ranking: explicit 'all' beats undefined-sensitivity,
undefined-sensitivity ties with explicit 'high' (decided by updatedAt),
and groupEligibleRulesByUser candidate order respects the rank.

6853/6853 tests pass (was 6850 → +3).
2026-04-25 00:23:29 +04:00
Elie Habib
34dfc9a451 fix(news): ground LLM surfaces on real RSS description end-to-end (#3370)
* feat(news/parser): extract RSS/Atom description for LLM grounding (U1)

Add description field to ParsedItem, extract from the first non-empty of
description/content:encoded (RSS) or summary/content (Atom), picking the
longest after HTML-strip + entity-decode + whitespace-normalize. Clip to
400 chars. Reject empty, <40 chars after strip, or normalize-equal to the
headline — downstream consumers fall back to the cleaned headline on '',
preserving current behavior for feeds without a description.

CDATA end is anchored to the closing tag so internal ]]> sequences do not
truncate the match. Preserves cached rss:feed:v1 row compatibility during
the 1h TTL bleed since the field is additive.

Part of fix: pipe RSS description end-to-end so LLM surfaces stop
hallucinating named actors (docs/plans/2026-04-24-001-...).

Covers R1, R7.

* feat(news/story-track): persist description on story:track:v1 HSET (U2)

Append description to the story:track:v1 HSET only when non-empty. Additive
— no key version bump. Old rows and rows from feeds without a description
return undefined on HGETALL, letting downstream readers fall back to the
cleaned headline (R6).

Extract buildStoryTrackHsetFields as a pure helper so the inclusion gate is
unit-testable without Redis.

Update the contract comment in cache-keys.ts so the next reader of the
schema sees description as an optional field.

Covers R2, R6.

* feat(proto): NewsItem.snippet + SummarizeArticleRequest.bodies (U3)

Add two additive proto fields so the article description can ride to every
LLM-adjacent consumer without a breaking change:

- NewsItem.snippet (field 12): RSS/Atom description, HTML-stripped,
  ≤400 chars, empty when unavailable. Wired on toProtoItem.
- SummarizeArticleRequest.bodies (field 8): optional article bodies
  paired 1:1 with headlines for prompt grounding. Empty array is today's
  headline-only behavior.

Regenerated TS client/server stubs and OpenAPI YAML/JSON via sebuf v0.11.1
(PATH=~/go/bin required — Homebrew's protoc-gen-openapiv3 is an older
pre-bundle-mode build that collides on duplicate emission).

Pre-emptive bodies:[] placeholders at the two existing SummarizeArticle
call sites in src/services/summarization.ts; U6 replaces them with real
article bodies once SummarizeArticle handler reads the field.

Covers R3, R5.

* feat(brief/digest): forward RSS description end-to-end through brief envelope (U4)

Digest accumulator reader (seed-digest-notifications.mjs::buildDigest) now
plumbs the optional `description` field off each story:track:v1 HGETALL into
the digest story object. The brief adapter (brief-compose.mjs::
digestStoryToUpstreamTopStory) prefers the real RSS description over the
cleaned headline; when the upstream row has no description (old rows in the
48h bleed, feeds that don't carry one), we fall back to the cleaned headline
so today behavior is preserved (R6).

This is the upstream half of the description cache path. U5 lands the LLM-
side grounding + cache-prefix bump so Gemini actually sees the article body
instead of hallucinating a named actor from the headline.

Covers R4 (upstream half), R6.

* feat(brief/llm): RSS grounding + sanitisation + 4 cache prefix bumps (U5)

The actual fix for the headline-only named-actor hallucination class:
Gemini 2.5 Flash now receives the real article body as grounding context,
so it paraphrases what the article says instead of filling role-label
headlines from parametric priors ("Iran's new supreme leader" → "Ali
Khamenei" was the 2026-04-24 reproduction; with grounding, it becomes
the actual article-named actor).

Changes:

- buildStoryDescriptionPrompt interpolates a `Context: <body>` line
  between the metadata block and the "One editorial sentence" instruction
  when description is non-empty AND not normalise-equal to the headline.
  Clips to 400 chars as a second belt-and-braces after the U1 parser cap.
  No Context line → identical prompt to pre-fix (R6 preserved).

- sanitizeStoryForPrompt extended to cover `description`. Closes the
  asymmetry where whyMatters was sanitised and description wasn't —
  untrusted RSS bodies now flow through the same injection-marker
  neutraliser before prompt interpolation. generateStoryDescription wraps
  the story in sanitizeStoryForPrompt before calling the builder,
  matching generateWhyMatters.

- Four cache prefixes bumped atomically to evict pre-grounding rows:
    scripts/lib/brief-llm.mjs:
      brief:llm:description:v1 → v2  (Railway, description path)
      brief:llm:whymatters:v2 → v3   (Railway, whyMatters fallback)
    api/internal/brief-why-matters.ts:
      brief:llm:whymatters:v6 → v7                (edge, primary)
      brief:llm:whymatters:shadow:v4 → shadow:v5  (edge, shadow)
  hashBriefStory already includes description in the 6-field material
  (v5 contract) so identity naturally drifts; the prefix bump is the
  belt-and-braces that guarantees a clean cold-start on first tick.

- Tests: 8 new + 2 prefix-match updates on tests/brief-llm.test.mjs.
  Covers Context-line injection, empty/dup-of-headline rejection,
  400-char clip, sanitisation of adversarial descriptions, v2 write,
  and legacy-v1 row dark (forced cold-start).

Covers R4 + new sanitisation requirement.

* feat(news/summarize): accept bodies + bump summary cache v5→v6 (U6)

SummarizeArticle now grounds on per-headline article bodies when callers
supply them, so the dashboard "News summary" path stops hallucinating
across unrelated headlines when the upstream RSS carried context.

Three coordinated changes:

1. SummarizeArticleRequest handler reads req.bodies, sanitises each entry
   through sanitizeForPrompt (same trust treatment as geoContext — bodies
   are untrusted RSS text), clips to 400 chars, and pads to the headlines
   length so pair-wise identity is stable.

2. buildArticlePrompts accepts optional bodies and interleaves a
   `    Context: <body>` line under each numbered headline that has a
   non-empty body. Skipped in translate mode (headline[0]-only) and when
   all bodies are empty — yielding a byte-identical prompt to pre-U6
   for every current caller (R6 preserved).

3. summary-cache-key bumps CACHE_VERSION v5→v6 so the pre-grounding rows
   (produced from headline-only prompts) cold-start cleanly. Extends
   canonicalizeSummaryInputs + buildSummaryCacheKey with a pair-wise
   bodies segment `:bd<hash>`; the prefix is `:bd` rather than `:b` to
   avoid colliding with `:brief:` when pattern-matching keys. Translate
   mode is headline[0]-only and intentionally does not shift on bodies.

Dedup reorder preserved: the handler re-pairs bodies to the deduplicated
top-5 via findIndex, so layout matches without breaking cache identity.

New tests: 7 on buildArticlePrompts (bodies interleave, partial fill,
translate-mode skip, clip, short-array tolerance), 8 on
buildSummaryCacheKey (pair-wise sort, cache-bust on body drift, translate
skip). Existing summary-cache-key assertions updated v5→v6.

Covers R3, R4.

* feat(consumers): surface RSS snippet across dashboard, email, relay, MCP + audit (U7)

Thread the RSS description from the ingestion path (U1-U5) into every
user-facing LLM-adjacent surface. Audit the notification producers so
RSS-origin and domain-origin events stay on distinct contracts.

Dashboard (proto snippet → client → panel):
- src/types/index.ts NewsItem.snippet?:string (client-side field).
- src/app/data-loader.ts proto→client mapper propagates p.snippet.
- src/components/NewsPanel.ts renders snippet as a truncated (~200 chars,
  word-boundary ellipsis) `.item-snippet` line under each headline.
- NewsPanel.currentBodies tracks per-headline bodies paired 1:1 with
  currentHeadlines; passed as options.bodies to generateSummary so the
  server-side SummarizeArticle LLM grounds on the article body.

Summary plumbing:
- src/services/summarization.ts threads bodies through SummarizeOptions
  → generateSummary → runApiChain → tryApiProvider; cache key now includes
  bodies (via U6's buildSummaryCacheKey signature).

MCP world-brief:
- api/mcp.ts pairs headlines with their RSS snippets and POSTs `bodies`
  to /api/news/v1/summarize-article so the MCP tool surface is no longer
  starved.

Email digest:
- scripts/seed-digest-notifications.mjs plain-text formatDigest appends
  a ~200-char truncated snippet line under each story; HTML formatDigestHtml
  renders a dim-grey description div between title and meta. Both gated
  on non-empty description (R6 — empty → today's behavior).

Real-time alerts:
- src/services/breaking-news-alerts.ts BreakingAlert gains optional
  description; checkBatchForBreakingAlerts reads item.snippet; dispatchAlert
  includes `description` in the /api/notify payload when present.

Notification relay:
- scripts/notification-relay.cjs formatMessage gated on
  NOTIFY_RELAY_INCLUDE_SNIPPET=1 (default off). When on, RSS-origin
  payloads render a `> <snippet>` context line under the title. When off
  or payload.description absent, output is byte-identical to pre-U7.

Audit (RSS vs domain):
- tests/notification-relay-payload-audit.test.mjs enforces file-level
  @notification-source tags on every producer, rejects `description:` in
  domain-origin payload blocks, and verifies the relay codepath gates
  snippet rendering under the flag.
- Tag added to ais-relay.cjs (domain), seed-aviation.mjs (domain),
  alert-emitter.mjs (domain), breaking-news-alerts.ts (rss).

Deferred (plan explicitly flags): InsightsPanel + cluster-producer
plumbing (bodies default to [] — will unlock gradually once news:insights:v1
producer also carries primarySnippet).

Covers R5, R6.

* docs+test: grounding-path note + bump pinned CACHE_VERSION v5→v6 (U8)

Final verification for the RSS-description-end-to-end fix:

- docs/architecture.mdx — one-paragraph "News Grounding Pipeline"
  subsection tracing parser → story:track:v1.description → NewsItem.snippet
  → brief / SummarizeArticle / dashboard / email / relay / MCP, with the
  empty-description R6 fallback rule called out explicitly.
- tests/summarize-reasoning.test.mjs — Fix-4 static-analysis pin updated
  to match the v6 bump from U6. Without this the summary cache bump silently
  regressed CI's pinned-version assertion.

Final sweep (2026-04-24):
- grep -rn 'brief:llm:description:v1' → only in the U5 legacy-row test
  simulation (by design: proves the v2 bump forces cold-start).
- grep -rn 'brief:llm:whymatters:v2/v6/shadow:v4' → no live references.
- grep -rn 'summary:v5' → no references.
- CACHE_VERSION = 'v6' in src/utils/summary-cache-key.ts.
- Full tsx --test sweep across all tests/*.test.{mjs,mts}: 6747/6747 pass.
- npm run typecheck + typecheck:api: both clean.

Covers R4, R6, R7.

* fix(rss-description): address /ce:review findings before merge

14 fixes from structured code review across 13 reviewer personas.

Correctness-critical (P1 — fixes that prevent R6/U7 contract violations):
- NewsPanel signature covers currentBodies so view-mode toggles that leave
  headlines identical but bodies different now invalidate in-flight summaries.
  Without this, switching renderItems → renderClusters mid-summary let a
  grounded response arrive under a stale (now-orphaned) cache key.
- summarize-article.ts re-pairs bodies with headlines BEFORE dedup via a
  single zip-sanitize-filter-dedup pass. Previously bodies[] was indexed by
  position in light-sanitized headlines while findIndex looked up the
  full-sanitized array — any headline that sanitizeHeadlines emptied
  mispaired every subsequent body, grounding the LLM on the wrong story.
- Client skips the pre-chain cache lookup when bodies are present, since
  client builds keys from RAW bodies while server sanitizes first. The
  keys diverge on injection content, which would silently miss the
  server's authoritative cache every call.

Test + audit hardening:
- Legacy v1 eviction test now uses the real hashBriefStory(story()) suffix
  instead of a literal "somehash", so a bug where the reader still queried
  the v1 prefix at the real key would actually be caught.
- tests/summary-cache-key.test.mts adds 400-char clip identity coverage so
  the canonicalizer's clip and any downstream clip can't silently drift.
- tests/news-rss-description-extract.test.mts renames the well-formed
  CDATA test and adds a new test documenting the malformed-]]> fallback
  behavior (plain regex captures, article content survives).

Safe_auto cleanups:
- Deleted dead SNIPPET_PUSH_MAX constant in notification-relay.cjs.
- BETA-mode groq warm call now passes bodies, warming the right cache slot.
- seed-digest shares a local normalize-equality helper for description !=
  headline comparison, matching the parser's contract.
- Pair-wise sort in summary-cache-key tie-breaks on body so duplicate
  headlines produce stable order across runs.
- buildSummaryCacheKey gained JSDoc documenting the client/server contract
  and the bodies parameter semantics.
- MCP get_world_brief tool description now mentions RSS article-body
  grounding so calling agents see the current contract.
- _shared.ts `opts.bodies![i]!` double-bang replaced with `?? ''`.
- extractRawTagBody regexes cached in module-level Map, mirroring the
  existing TAG_REGEX_CACHE pattern.

Deferred to follow-up (tracked for PR description / separate issue):
- Promote shared MAX_BODY constant across the 5 clip sites
- Promote shared truncateForDisplay helper across 4 render sites
- Collapse NewsPanel.{currentHeadlines, currentBodies} → Array<{title, snippet}>
- Promote sanitizeStoryForPrompt to shared/brief-llm-core.js
- Split list-feed-digest.ts parser helpers into sibling -utils.ts
- Strengthen audit test: forward-sweep + behavioral gate test

Tests: 6749/6749 pass. Typecheck clean on both configs.

* fix(summarization): thread bodies through browser T5 path (Codex #2)

Addresses the second of two Codex-raised findings on PR #3370:

The PR threaded bodies through the server-side API provider chain
(Ollama → Groq → OpenRouter → /api/news/v1/summarize-article) but the
local browser T5 path at tryBrowserT5 was still summarising from
headlines alone. In BETA_MODE that ungrounded path runs BEFORE the
grounded server providers; in normal mode it remains the last
fallback. Whenever T5-small won, the dashboard summary surface
regressed to the headline-only path — the exact hallucination class
this PR exists to eliminate.

Fix: tryBrowserT5 accepts an optional `bodies` parameter and
interleaves each body with its paired headline via a `headline —
body` separator in the combined text (clipped to 200 chars per body
to stay within T5-small's ~512-token context window). All three call
sites (BETA warm, BETA cold, normal-mode fallback) now pass the
bodies threaded down from generateSummary options.bodies.

When bodies is empty/omitted, the combined text is byte-identical to
pre-fix (R6 preserved).

On Codex finding #1 (story:track:v1 additive-only HSET keeps a body
from an earlier mention of the same normalized title), declining to
change. The current rule — "if this mention has a body, overwrite;
otherwise leave the prior body alone" — is defensible: a body from
mention A is not falsified by mention B being body-less (a wire
reprint doesn't invalidate the original source's body). A feed that
publishes a corrected headline creates a new normalized-title hash,
so no stale body carries forward. The failure window is narrow (live
story evolving while keeping the same title through hours of
body-less wire reprints) and the 7-day STORY_TTL is the backstop.
Opening a follow-up issue to revisit semantics if real-world evidence
surfaces a stale-grounding case.

* fix(story-track): description always-written to overwrite stale bodies (Codex #1)

Revisiting Codex finding #1 on PR #3370 after re-review. The previous
response declined the fix with reasoning; on reflection the argument
was over-defending the current behavior.

Problem: buildStoryTrackHsetFields previously wrote `description` only
when non-empty. Because story:track:v1 rows are collapsed by
normalized-title hash, an earlier mention's body would persist for up
to STORY_TTL (7 days) on subsequent body-less mentions of the same
story. Consumers reading `track.description` via HGETALL could not
distinguish "this mention's body" from "some mention's body from the
last week," silently grounding brief / whyMatters / SummarizeArticle
LLMs on text the current mention never supplied. That violates the
grounding contract advertised to every downstream surface in this PR.

Fix: HSET `description` unconditionally on every mention — empty
string when the current item has no body, real body when it does. An
empty value overwrites any prior mention's body so the row is always
authoritative for the current cycle. Consumers continue to treat
empty description as "fall back to cleaned headline" (R6 preserved).
The 7-day STORY_TTL and normalized-title hash semantics are unchanged.

Trade-off accepted: a valid body from Feed A (NYT) is wiped when Feed
B (AP body-less wire reprint) arrives for the same normalized title,
even though Feed A's body is factually correct. Rationale: the
alternative — keeping Feed A's body indefinitely — means the user
sees Feed A's body attributed (by proximity) to an AP mention at a
later timestamp, which is at minimum misleading and at worst carries
retracted/corrected details. Honest absence beats unlabeled presence.

Tests: new stale-body overwrite sequence test (T0 body → T1 empty →
T2 new body), existing "writes description when non-empty" preserved,
existing "omits when empty" inverted to "writes empty, overwriting."
cache-keys.ts contract comment updated to mark description as
always-written rather than optional.
2026-04-24 16:25:14 +04:00
Elie Habib
8ea4c8f163 feat(digest-dedup): replayable per-story input log (opt-in, no behaviour change) (#3330)
* feat(digest-dedup): replayable per-story input log (opt-in, no behaviour change)

Ship the measurement layer before picking any recall-lift strategy.

Why: the current dedup path embeds titles only, so brief-wire headlines
that share a real event but drop the geographic anchor (e.g. "Alleged
Coup: defendant arrives in court" vs "Trial opens after Nigeria charges
six over 2025 coup plot") can slip past the 0.60 cosine threshold. To
tune recall without regressing precision we need a replayable per-tick
dataset — one record per story with the exact fields any downstream
candidate (title+slug, LLM-canonicalise, text-embedding-3-large, cross-
encoder re-rank, etc.) would need to score.

This PR ships ONLY the log. Zero behaviour change:
  - Opt-in via DIGEST_DEDUP_REPLAY_LOG=1 (default OFF).
  - Writer is best-effort: all errors swallowed + warned, never affects
    digest delivery. No throw path.
  - Records include hash, originalIndex, isRep, clusterId, raw +
    normalised title, link, severity/score/mentions/phase/sources,
    embeddingCacheKey, hasEmbedding sidecar flag, and the tick's config
    snapshot (mode, clustering, cosineThreshold, topicThreshold, veto).
  - clusterId derives from rep.mergedHashes (already set by
    materializeCluster) so the orchestrator is untouched.
  - Storage: Upstash list keyed by {variant}:{lang}:{sensitivity}:{date}
    with 30-day EXPIRE. Date suffix caps per-key growth; retention
    covers the labelling cadence + cross-candidate comparison window.
  - Env flag is '1'-only (fail-closed on typos, same pattern as
    DIGEST_DEDUP_MODE).

Activation path (post-merge): flip DIGEST_DEDUP_REPLAY_LOG=1 on the
seed-digest-notifications Railway service. Watch one cron tick for the
RPUSH + EXPIRE pair (or a single warn line if creds/upstream flake),
then leave running for at least one week to accumulate calibration data.

Tests: 21 unit tests covering flag parsing, key shape + sanitisation,
record field correctness (isRep, clusterId, embeddingCacheKey,
hasEmbedding, tickConfig), pipeline null/throw handling, malformed
input. Existing 77 dedup tests unchanged and still green.

* fix(digest-dedup): capture topicGroupingEnabled in replay tickConfig

Review catch (PR #3330): the tickConfig snapshot omitted
topicGroupingEnabled even though readOrchestratorConfig returns it and
the digest's post-dedup topic ordering gates on it. A tick run with
DIGEST_DEDUP_TOPIC_GROUPING=0 serialised identically to a default
tick, making those runs non-replayable for the calibration work this
log is meant to enable.

Add topicGroupingEnabled to the recorded tickConfig. One-line schema
fix + regression test asserting topic-grouping-off ticks serialise
distinctly from default.

22/22 tests pass.

* fix(digest-dedup): await replay-log write to survive explicit process.exit

Review catch (PR #3330): the fire-and-forget `void writeReplayLog(...)`
call could be dropped on the explicit-exit paths — the brief-compose
failure gate at line 1539 and main().catch at line 1545 both call
process.exit(1). Unlike natural exit, process.exit does not drain
in-flight promises, so the last N ticks' replay records could be
silently lost on runs where measurement fidelity matters most.

Fix: await the writeReplayLog call. Safe because:
  - writeReplayLog returns synchronously when the flag is off
    (replayLogEnabled check is the first thing it does)
  - It has a top-level try/catch that always returns a result object
  - The Upstash pipeline call has a 10s timeout ceiling
  - buildDigest already awaits many Upstash calls (dedup, compose,
    render) so one more is not a hot-path concern

Comment block added above the call explains why the await is
deliberate — so a future refactor doesn't revert it to void thinking
it's a leftover.

No test change: existing writeReplayLog unit tests already cover the
disabled / empty / success / error paths. The fix is a single-keyword
change in a caller that was already guaranteed-safe by the callee's
contract.

* refactor(digest-dedup): address Greptile P2 review comments on replay log

Three non-blocking polish items from the automated review, bundled
because they all touch the same new module and none change behaviour.

1. tsMs captured BEFORE deduplicateStories (seed-digest-notifications.mjs).
   Previously sampled after dedup returned, so briefTickId reflected
   dedup-completion time rather than tick-start. For downstream readers
   the natural reading of "briefTickId" is when the tick began
   processing; moved the Date.now() call to match that expectation.
   Drift is maybe 100ms-2s on cold-cache embed calls — small, but
   moving it is free.

2. buildReplayLogKey emptiness check now strips ':' and '-' in addition
   to '_'. A pathological ruleId of ':::' previously passed through
   verbatim, producing keys like `digest:replay-log:v1::::2026-04-23`
   that confuse redis-cli's namespace tooling (SCAN / KEYS / tab
   completion). The new guard falls back to "unknown" on any input
   that's all separators. Added a regression test covering the
   ':::' / '---' / '___' / mixed cases.

3. tickConfig is now a per-record shallow copy instead of a shared
   reference. Storage is unaffected (writeReplayLog serialises each
   record via JSON.stringify independently) but an in-memory consumer
   that mutated one record's tickConfig for experimentation would have
   silently affected all other records in the same batch. Added a
   regression test asserting mutation doesn't leak across records.

Tests: 24/24 pass (22 prior + 2 new regression). Typecheck + lint clean.
2026-04-23 11:50:19 +04:00
Elie Habib
29306008e4 fix(email): route Intelligence Brief off the alerts@ mailbox (#3321)
* fix(email): route Intelligence Brief off the alerts@ mailbox

The daily "WorldMonitor Intelligence Brief" email was shipping from
`alerts@worldmonitor.app` with a display name that — if the Railway env
override dropped the `Name <…>` wrapper — Gmail/Outlook fell back to
rendering the local-part ("alerts" / "alert") as the sender name.
Recipients saw a scary-looking "alert" in their inbox for what is
actually a curated editorial read.

Split the sender so editorial mail can't share the `alerts@` mailbox
with incident pushes:

- New env var `RESEND_FROM_BRIEF` (default `WorldMonitor Brief
  <brief@worldmonitor.app>`) consumed by seed-digest-notifications.mjs.
- Falls back to `RESEND_FROM_EMAIL`, then to the built-in default, so
  existing deploys keep working and the rollout is a single Railway
  env flip on the digest service.
- notification-relay.cjs (realtime push alerts) intentionally keeps
  `RESEND_FROM_EMAIL` / `alerts@` — accurate for that path.
- .env.example documents the display-name rule so the bare-address
  trap can't re-introduce the bug.

Rollout: set `RESEND_FROM_BRIEF=WorldMonitor Brief <brief@worldmonitor.app>`
on the `seed-digest-notifications` Railway service. Domain-level Resend
verification already covers the new local-part; no DNS change needed.

* fix(email): runtime normalize sender to prevent bare-address regression

PR review feedback from codex:

  > P2 — RESEND_FROM_BRIEF is consumed verbatim, so an operator can
  > still set brief@worldmonitor.app without a display name and
  > recreate the same Gmail/Outlook rendering bug for the daily brief.
  > Today that protection is only documentation in .env.example, not
  > runtime enforcement.

Add a small shared helper `scripts/lib/resend-from.cjs` that coerces a
bare email address into a "Name <addr>" wrapper with a loud warning
log, and wire it into the digest path.

- Bare-address input (e.g. `brief@worldmonitor.app`) is rewritten to
  `WorldMonitor Brief <brief@worldmonitor.app>` so Gmail/Outlook stop
  falling back to the local-part as the display name.
- Coercion emits a single `console.warn` line per boot so operators
  see the signal in Railway logs and can fix the underlying env.
- Fail-safe (not fail-closed) — a misconfigured env does NOT take the
  cron down.

Also resolves the P3 doc-vs-runtime divergence by reverting
.env.example's RESEND_FROM_EMAIL default from "WorldMonitor Alerts
<...>" back to "WorldMonitor <...>" to match the existing
notification-relay.cjs runtime default. The realtime-alert path will
get the same normalizer treatment in a follow-up PR that cohesively
touches notification-relay.cjs + Dockerfile.relay.

tests: 7 new cases in tests/resend-sender-normalize.test.mjs covering
empty/null/whitespace input, wrapped passthrough, trim, bare-address
coercion, warning emission, no-warning on wrapped, console.warn default
sink. Runs under `npm run test:data`.
2026-04-23 08:51:27 +04:00
Elie Habib
e878baec52 fix(digest): DIGEST_ONLY_USER self-expiring (mandatory until= suffix, 48h cap) (#3271)
* fix(digest): DIGEST_ONLY_USER self-expiring (mandatory until= suffix, 48h cap)

Review finding on PR #3255: DIGEST_ONLY_USER was a sticky production
footgun. If an operator forgot to unset after a one-off validation,
the cron silently filtered every other user out indefinitely while
still completing normally (exit 0) — prolonged partial outage with
"green" runs.

Fix: mandatory `|until=<ISO8601>` suffix within 48h of now. Otherwise
the flag is IGNORED with a loud warn, fan-out proceeds normally.
Active filter emits a structured console.warn at run start listing
expiry + remaining minutes.

Valid:
  DIGEST_ONLY_USER=user_xxx|until=2026-04-22T18:00Z

Rejected (→ loud warn, normal fan-out):
- Legacy bare `user_xxx` (missing required suffix)
- Unparseable ISO
- Expiry > 48h (forever-test mistake)
- Expiry in past (auto-disable)

Parser extracted to `scripts/lib/digest-only-user.mjs` (testable
without importing seed-digest-notifications.mjs which has no isMain
guard).

Tests: 17 cases covering unset / reject / active branches, ISO
variants, boundaries, and the 48h cap. 6066 total pass. typecheck × 2
clean.

Breaking change on the flag's format, but it shipped 2h before this
finding with no prod usage — tightening now is cheaper than after
an incident.

* chore(digest): address /ce:review P2s on DIGEST_ONLY_USER parser

Two style fixes flagged by Greptile on PR #3271:

1. Misleading multi-pipe error message.
   `user_xxx|until=<iso>|extra` returned "missing mandatory suffix",
   which points the operator toward adding a suffix that is already
   present (confused operator might try `user_xxx|until=...|until=...`).
   Now distinguishes parts.length===1 ("missing suffix") from >2
   ("expected exactly one '|' separator, got N").

2. Date.parse is lenient — accepts RFC 2822, locale strings, "April 22".
   The documented contract is strict ISO 8601; the 48h cap catches
   accidental-valid dates but the documentation lied. Added a regex
   guard up-front that enforces the ISO 8601 shape
   (YYYY-MM-DD optionally followed by time + TZ). Rejects the 6
   Date-parseable-but-not-ISO fixtures before Date.parse runs.

Both regressions pinned in tests/digest-only-user.test.mjs (18 pass,
was 17). typecheck × 2 clean.
2026-04-21 22:36:30 +04:00
Elie Habib
ec35cf4158 feat(brief): analyst prompt v2 — multi-sentence, grounded, story description (#3269)
* feat(brief): analyst prompt v2 — multi-sentence, grounded, includes story description

Shadow-diff of 12 prod stories on 2026-04-21 showed v1 analyst output
indistinguishable from legacy Gemini: identical single-sentence
abstraction ("destabilize / systemic / sovereign risk repricing") with
no named actors, metrics, or dates — in several cases Gemini was MORE
specific.

Root cause: 18–30 word cap compressed context specifics out.

v2 loosens three dials at once so we can settle the A/B:

1. New system prompt WHY_MATTERS_ANALYST_SYSTEM_V2 — 2–3 sentences,
   40–70 words, implicit SITUATION→ANALYSIS→(optional) WATCH arc,
   MUST cite one specific named actor / metric / date / place from
   the context. Analyst path only; gemini path stays on v1.

2. New parser parseWhyMattersV2 — accepts 100–500 chars, rejects
   preamble boilerplate + leaked section labels + markdown.

3. Story description plumbed through — endpoint body accepts optional
   story.description (≤ 1000 chars, body cap bumped 4 KB → 8 KB).
   Cron forwards it when upstream has one (skipped when it equals the
   headline — no new signal).

Cache + shadow bumped v3 → v4 / v1 → v2 so fresh output lands on the
first post-deploy cron tick. maxTokens 180 → 260 for ~3× output length.

If shadow-diff 24h after deploy still shows no delta vs gemini, kill
is BRIEF_WHY_MATTERS_PRIMARY=gemini on Vercel (instant, no redeploy).

Tests: 6059 pass (was 6022 + 37 new). typecheck × 2 clean.

* fix(brief): stop truncating v2 multi-sentence output + description in cache hash

Two P1s caught in PR #3269 review.

P1a — cron reparsed endpoint output with v1 single-sentence parser,
silently dropping sentences 2+3 of v2 analyst output. The endpoint had
ALREADY validated the string (parseWhyMattersV2 for analyst path;
parseWhyMatters for gemini). Re-parsing with v1 took only the first
sentence — exact regression #3269 was meant to fix.

Fix: trust the endpoint. Replace re-parse with bounds check (30–500
chars) + stub-echo reject. Added regression test asserting multi-
sentence output reaches the envelope unchanged.

P1b — `story.description` flowed into the analyst prompt but NOT into
the cache hash. Two requests with identical core fields but different
descriptions collided on one cache slot → second caller got prose
grounded in the FIRST caller's description.

Fix: add `description` as the 6th field of `hashBriefStory`. Bump
endpoint cache v4→v5 and shadow v2→v3 so buggy 5-field entries are
dropped. Updated the parity sentinel in brief-llm-core.test.mjs to
match 6-field semantics. Added regression tests covering different-
descriptions-differ and present-vs-absent-differ.

Tests: 6083 pass. typecheck × 2 clean.
2026-04-21 22:25:54 +04:00
Elie Habib
048bb8bb52 fix(brief): unblock whyMatters analyst endpoint (middleware 403) + DIGEST_ONLY_USER filter (#3255)
* fix(brief): unblock whyMatters analyst endpoint + add DIGEST_ONLY_USER filter

Three changes, all operational for PR #3248's brief-why-matters feature.

1. middleware.ts PUBLIC_API_PATHS allowlist
Railway logs post-#3248 merge showed every cron call to
/api/internal/brief-why-matters returning 403 — middleware's "short
UA" guard (~L183) rejects Node undici's default UA before the
endpoint's own Bearer-auth runs. The feature never executed in prod;
three-layer fallback silently shipped legacy Gemini output. Same
class as /api/seed-contract-probe (2026-04-15). Endpoint still
carries its own subtle-crypto HMAC auth, so bypassing the UA gate
is safe.

2. Explicit UA on callAnalystWhyMatters fetch
Defense-in-depth. Explicit 'worldmonitor-digest-notifications/1.0'
keeps the endpoint reachable if PUBLIC_API_PATHS is ever refactored,
and makes cron traffic distinguishable from ops curl in logs.

3. DIGEST_ONLY_USER=user_xxx filter
Operator single-user test flag. Set on Railway to run compose + send
for one user on the next tick (then unset) — validates new features
end-to-end without fanning out. Empty/unset = normal fan-out. Applied
right after rule fetch so both compose and dispatch paths respect it.

Regression tests: 15 new cases in tests/middleware-bot-gate.test.mts
pin every PUBLIC_API_PATHS entry against 3 triggers (empty/short/curl
UA) plus a negative sibling-path suite so a future prefix-match
refactor can't silently unblock /api/internal/.

Tests: 6043 pass. typecheck + typecheck:api clean. biome: pre-existing
main() complexity warning bumped 74→78 by the filter block (unchanged
in character from pre-PR).

* test(middleware): expand sibling-path negatives to cover all 3 trigger UAs

Greptile flagged: `SIBLING_PATHS` was only tested with `EMPTY_UA`. Under
the current middleware chain this is sufficient (sibling paths hit the
short-UA OR BOT_UA 403 regardless), but it doesn't pin *which* guard
fires. A future refactor that moves `PUBLIC_API_PATHS.has(path)` later
in the chain could let a curl or undici UA pass on a sibling path
without this suite failing.

Fix: iterate the 3 sibling paths against all 3 trigger UAs (empty,
short/undici, curl). Every combination must still 403 regardless of
which guard catches it. 6 new test cases.

Tests: 35 pass in the middleware-bot-gate suite (was 29).
2026-04-21 19:41:58 +04:00
Elie Habib
2f19d96357 feat(brief): route whyMatters through internal analyst-context endpoint (#3248)
* feat(brief): route whyMatters through internal analyst-context endpoint

The brief's "why this is important" callout currently calls Gemini on
only {headline, source, threatLevel, category, country} with no live
state. The LLM can't know whether a ceasefire is on day 2 or day 50,
that IMF flagged >90% gas dependency in UAE/Qatar/Bahrain, or what
today's forecasts look like. Output is generic prose instead of the
situational analysis WMAnalyst produces when given live context.

This PR adds an internal Vercel edge endpoint that reuses a trimmed
variant of the analyst context (country-brief, risk scores, top-3
forecasts, macro signals, market data — no GDELT, no digest-search)
and ships it through a one-sentence LLM call with the existing
WHY_MATTERS_SYSTEM prompt. The endpoint owns its own Upstash cache
(v3 prefix, 6h TTL), supports a shadow mode that runs both paths in
parallel for offline diffing, and is auth'd via RELAY_SHARED_SECRET.

Three-layer graceful degradation (endpoint → legacy Gemini-direct →
stub) keeps the brief shipping on any failure.

Env knobs:
- BRIEF_WHY_MATTERS_PRIMARY=analyst|gemini (default: analyst; typo → gemini)
- BRIEF_WHY_MATTERS_SHADOW=0|1 (default: 1; only '0' disables)
- BRIEF_WHY_MATTERS_SHADOW_SAMPLE_PCT=0..100 (default: 100)
- BRIEF_WHY_MATTERS_ENDPOINT_URL (Railway, optional override)

Cache keys:
- brief:llm:whymatters:v3:{hash16} — envelope {whyMatters, producedBy,
  at}, 6h TTL. Endpoint-owned.
- brief:llm:whymatters:shadow:v1:{hash16} — {analyst, gemini, chosen,
  at}, 7d TTL. Fire-and-forget.
- brief:llm:whymatters:v2:{hash16} — legacy. Cron's fallback path
  still reads/writes during the rollout window; expires in ≤24h.

Tests: 6022 pass (existing 5915 + 12 core + 36 endpoint + misc).
typecheck + typecheck:api + biome on changed files clean.

Plan (Codex-approved after 4 rounds):
docs/plans/2026-04-21-001-feat-brief-why-matters-analyst-endpoint-plan.md

* fix(brief): address /ce:review round 1 findings on PR #3248

Fixes 5 findings from multi-agent review, 2 of them P1:

- #241 P1: `.gitignore !api/internal/**` was too broad — it re-included
  `.env`, `.env.local`, and any future secret file dropped into that
  directory. Narrowed to explicit source extensions (`*.ts`, `*.js`,
  `*.mjs`) so parent `.env` / secrets rules stay in effect inside
  api/internal/.

- #242 P1: `Dockerfile.digest-notifications` did not COPY
  `shared/brief-llm-core.js` + `.d.ts`. Cron would have crashed at
  container start with ERR_MODULE_NOT_FOUND. Added alongside
  brief-envelope + brief-filter COPY lines.

- #243 P2: Cron dropped the endpoint's source/producedBy ground-truth
  signal, violating PR #3247's own round-3 memory
  (feedback_gate_on_ground_truth_not_configured_state.md). Added
  structured log at the call site: `[brief-llm] whyMatters source=<src>
  producedBy=<pb> hash=<h>`. Endpoint response now includes `hash` so
  log + shadow-record pairs can be cross-referenced.

- #244 P2: Defense-in-depth prompt-injection hardening. Story fields
  flowed verbatim into both LLM prompts, bypassing the repo's
  sanitizeForPrompt convention. Added sanitizeStoryFields helper and
  applied in both analyst and gemini paths.

- #245 P2: Removed redundant `validate` option from callLlmReasoning.
  With only openrouter configured in prod, a parse-reject walked the
  provider chain, then fell through to the other path (same provider),
  then the cron's own fallback (same model) — 3x billing on one reject.
  Post-call parseWhyMatters check already handles rejection cleanly.

Deferred to P3 follow-ups (todos 246-248): singleflight, v2 sunset,
misc polish (country-normalize LOC, JSDoc pruning, shadow waitUntil,
auto-sync mirror, context-assembly caching).

Tests: 6022 pass. typecheck + typecheck:api clean.

* fix(brief-why-matters): ctx.waitUntil for shadow write + sanitize legacy fallback

Two P2 findings on PR #3248:

1. Shadow record was fire-and-forget without ctx.waitUntil on an Edge
   function. Vercel can terminate the isolate after response return,
   so the background redisPipeline write completes unreliably — i.e.
   the rollout-validation signal the shadow keys were supposed to
   provide was flaky in production.

   Fix: accept an optional EdgeContext 2nd arg. Build the shadow
   promise up front (so it starts executing immediately) then register
   it with ctx.waitUntil when present. Falls back to plain unawaited
   execution when ctx is absent (local harness / tests).

2. scripts/lib/brief-llm.mjs legacy fallback path called
   buildWhyMattersPrompt(story) on raw fields with no sanitization.
   The analyst endpoint sanitizes before its own prompt build, but
   the fallback is exactly what runs when the endpoint misses /
   errors — so hostile headlines / sources reached the LLM verbatim
   on that path.

   Fix: local sanitizeStoryForPrompt wrapper imports sanitizeForPrompt
   from server/_shared/llm-sanitize.js (existing pattern — see
   scripts/seed-digest-notifications.mjs:41). Wraps story fields
   before buildWhyMattersPrompt. Cache key unchanged (hash is over raw
   story), so cache parity with the analyst endpoint's v3 entries is
   preserved.

Regression guard: new test asserts the fallback prompt strips
"ignore previous instructions", "### Assistant:" line prefixes, and
`<|im_start|>` tokens when injection-crafted fields arrive.

Typecheck + typecheck:api clean. 6023 / 6023 data tests pass.

* fix(digest-cron): COPY server/_shared/llm-sanitize into digest-notifications image

Reviewer P1 on PR #3248: my previous commit (4eee22083) added
`import sanitizeForPrompt from server/_shared/llm-sanitize.js` to
scripts/lib/brief-llm.mjs, but Dockerfile.digest-notifications cherry-
picks server/_shared/* files and doesn't copy llm-sanitize. Import is
top-level/static — the container would crash at module load with
ERR_MODULE_NOT_FOUND the moment seed-digest-notifications.mjs pulls in
scripts/lib/brief-llm.mjs. Not just on fallback — every startup.

Fix: add `COPY server/_shared/llm-sanitize.js server/_shared/llm-sanitize.d.ts`
next to the existing brief-render COPY line. Module is pure string
manipulation with zero transitive imports — nothing else needs to land.

Cites feedback_validation_docker_ship_full_scripts_dir.md in the comment
next to the COPY; the cherry-pick convention keeps biting when new
cross-dir imports land in scripts/lib/ or scripts/shared/.

Can't regression-test at build time from this branch without a
docker-build CI job, but the symptom is deterministic — local runs
remain green (they resolve against the live filesystem); only the
container crashes. Post-merge, Railway redeploy of seed-digest-
notifications should show a clean `Starting Container` log line
instead of the MODULE_NOT_FOUND crash my prior commit would have caused.
2026-04-21 14:03:27 +04:00
Elie Habib
4d9ae3b214 feat(digest): topic-grouped brief ordering (size-first) (#3247) 2026-04-21 08:58:02 +04:00
Elie Habib
1a2295157e feat(digest): DIGEST_SCORE_MIN absolute score floor for the brief (#3224)
* feat(digest): DIGEST_SCORE_MIN absolute score floor for the brief

Problem
-------
The 2026-04-20 08:00 brief contained 12 stories, 7 of which were
duplicates of 4 events, alongside low-importance filler (niche
commodity, domestic crime). notification-relay's IMPORTANCE_SCORE_MIN
gate (#3223, set to 63) only applies to the realtime fanout path.
The digest cron reads the same story:track:*.currentScore but has
NO absolute score floor — it just ranks and slices(0, 30), so on
slow news days low-importance items bubble up to fill slots.

Change
------
scripts/seed-digest-notifications.mjs:
  - New getDigestScoreMin() reads DIGEST_SCORE_MIN env at call time
    (Railway flips apply on the next cron tick, no redeploy).
  - Default 0 = no-op, so this PR is behaviour-neutral until the
    env var is set on Railway.
  - Filter runs AFTER deduplicateStories() so it drops clusters by
    the REPRESENTATIVE's score (which is the highest-scoring member
    of its cluster per materializeCluster's sort).
  - One-line operator log when the floor fires:
      [digest] score floor dropped N of M clusters (DIGEST_SCORE_MIN=X)

tests/digest-score-floor.test.mjs (6 regressions):
  - getDigestScoreMin reads from process.env (not a module const)
  - default is 0 (no-op)
  - rejects non-integer / negative values (degrades to 0)
  - filter runs AFTER dedup, BEFORE slice(0, DIGEST_MAX_ITEMS)
  - short-circuits when floor is 0 (no wasted filter pass)
  - log line emits "dropped N of M clusters"

Operator activation
-------------------
Set on Railway seed-digest-notifications service:

    DIGEST_SCORE_MIN=63

Start at 63 to match the realtime gate, then nudge up/down based
on the log lines over ~24h. Unset = off (pre-PR behaviour).

Why not bundle a cosine-threshold bump
--------------------------------------
The cosine-threshold tuning (0.60 -> 0.55 per the threshold probe)
is an env-only flip already supported by the dedup orchestrator.
Bundling an env-default change into this PR would slow rollback.
Operator sets DIGEST_DEDUP_COSINE_THRESHOLD=0.55 on Railway as a
separate action; this PR stays scoped to the score floor.

Verification
------------
- npm run test:data            5825/5825 pass
- tests/digest-score-floor      6/6 pass
- tests/edge-functions        171/171 pass
- typecheck + typecheck:api   clean
- biome check on changed files clean (pre-existing main() complexity
  warning on this file is unchanged)
- lint:md                      0 errors
- version:check                OK

Post-Deploy Monitoring & Validation
-----------------------------------
- **What to monitor** after setting DIGEST_SCORE_MIN on Railway:
  - `[digest] score floor dropped` lines — expect ~5-25% of
    clusters dropped on bulk-send ticks (stories=700+)
  - `[digest] Cron run complete: N digest(s) sent` stays > 0
- **Expected healthy behaviour**
  - 0-5 clusters dropped on normal ~80-story ticks
  - 50-200 dropped on bulk 700+ story ticks
  - brief still reports 10-30 stories for PRO users
- **Failure signals / rollback**
  - 0 digests sent for 24h after flipping the var
  - user-visible brief now has < 10 stories
  - Rollback: unset DIGEST_SCORE_MIN on Railway dashboard (instant,
    no deploy), next cron tick reverts to unfiltered behaviour
- **Validation window**: 24h
- **Owner**: koala73

Related
-------
- #3218 LLM prompt upgrade (source of importanceScore quality)
- #3221 geopolitical scope for critical
- #3223 notification-relay realtime gate (mirror knob)
- #3200 embedding-based dedup (the other half of brief quality)

* fix(digest): return null (not []) when score floor drains every cluster

Greptile P2 finding on PR #3224.

When DIGEST_SCORE_MIN is set high enough to filter every cluster,
buildDigest previously returned [] (empty array). The caller's
`if (!stories)` guard only catches falsy values, so [] slipped
past the "No stories in window" skip-log and the run reached
formatDigest([], nowMs) which returns null, then silently continued
at the !storyListPlain check.

Flow was still correct (no digest sent) but operators lost the
observability signal to distinguish "floor too high" from "no news
today" from "dedup ate everything".

Fix:
- buildDigest now returns null when the post-floor list is empty,
  matching the pre-dedup-empty path. Caller's existing !stories
  guard fires the canonical skip-log.
- Emits a distinct `[digest] score floor dropped ALL N clusters
  (DIGEST_SCORE_MIN=X) — skipping user` line BEFORE the return,
  so operators can spot an over-aggressive floor in the logs.
- Test added covering both the null-return contract and the
  distinct "dropped ALL" log line.

7/7 dedup-score-floor tests pass.
2026-04-20 10:19:03 +04:00
Elie Habib
38e6892995 fix(brief): per-run slot URL so same-day digests link to distinct briefs (#3205)
* fix(brief): per-run slot URL so same-day digests link to distinct briefs

Digest emails at 8am and 1pm on the same day pointed to byte-identical
magazine URLs because the URL was keyed on YYYY-MM-DD in the user tz.
Each compose run overwrote the single daily envelope in place, and the
composer rolling 24h story window meant afternoon output often looked
identical to morning. Readers clicking an older email got whatever the
latest cron happened to write.

Slot format is now YYYY-MM-DD-HHMM (local tz, per compose run). The
magazine URL, carousel URLs, and Redis key all carry the slot, and each
digest dispatch gets its own frozen envelope that lives out the 7d TTL.
envelope.data.date stays YYYY-MM-DD for rendering "19 April 2026".

The digest cron also writes a brief:latest:{userId} pointer (7d TTL,
overwritten each compose) so the dashboard panel and share-url endpoint
can locate the most recent brief without knowing the slot. The
previous date-probing strategy does not work once keys carry HHMM.

No back-compat for the old YYYY-MM-DD format: the verifier rejects it,
the composer only ever writes the new shape, and any in-flight
notifications signed under the old format will 403 on click. Acceptable
at the rollout boundary per product decision.

* fix(brief): carve middleware bot allowlist to accept slot-format carousel path

BRIEF_CAROUSEL_PATH_RE in middleware.ts was still matching only the
pre-slot YYYY-MM-DD segment, so every slot-based carousel URL emitted
by the digest cron (YYYY-MM-DD-HHMM) would miss the social allowlist
and fall into the generic bot gate. Telegram/Slack/Discord/LinkedIn
image fetchers would 403 on sendMediaGroup, breaking previews for the
new digest links.

CI missed this because tests/middleware-bot-gate.test.mts still
exercised the old /YYYY-MM-DD/ path shape. Swap the fixture to the
slot format and add a regression asserting the pre-slot shape is now
rejected, so legacy links cannot silently leak the allowlist after
the rollout.

* fix(brief): preserve caller-requested slot + correct no-brief share-url error

Two contract bugs in the slot rollout that silently misled callers:

1. GET /api/latest-brief?slot=X where X has no envelope was returning
   { status: 'composing', issueDate: <today UTC> } — which reads as
   "today's brief is composing" instead of "the specific slot you
   asked about doesn't exist". A caller probing a known historical
   slot would get a completely unrelated "today" signal. Now we echo
   the requested slot back (issueSlot + issueDate derived from its
   date portion) when the caller supplied ?slot=, and keep the
   UTC-today placeholder only for the no-param path.

2. POST /api/brief/share-url with no slot and no latest-pointer was
   falling into the generic invalid_slot_shape 400 branch. That is
   not an input-shape problem; it is "no brief exists yet for this
   user". Return 404 brief_not_found — the same code the
   existing-envelope check returns — so callers get one coherent
   contract: either the brief exists and is shareable, or it doesn't
   and you get 404.
2026-04-19 14:15:59 +04:00
Elie Habib
305dc5ef36 feat(digest-dedup): Phase A — embedding-based dedup scaffolding (no-op) (#3200)
* feat(digest-dedup): Phase A — embedding-based dedup scaffolding (no-op)

Replaces the inline Jaccard story-dedup in seed-digest-notifications
with an orchestrator that can run Jaccard, shadow, or full embedding
modes. Ships with DIGEST_DEDUP_MODE=jaccard as the default so
production behaviour is unchanged until Phase C shadow + Phase D flip.

New modules (scripts/lib/):
- brief-dedup-consts.mjs       tunables + cache prefix + __constants bag
- brief-dedup-jaccard.mjs      verbatim 0.55-threshold extract (fallback)
- entity-gazetteer.mjs         cities/regions gazetteer + common-caps
- brief-embedding.mjs          OpenRouter /embeddings client with Upstash
                               cache, all-or-nothing timeout, cosineSimilarity
- brief-dedup-embed.mjs        complete-link clustering + entity veto (pure)
- brief-dedup.mjs              orchestrator, env read at call entry,
                               shadow archive, structured log line

Operator tools (scripts/tools/):
- calibrate-dedup-threshold.mjs  offline calibration runner + histogram
- golden-pair-validator.mjs      live-embedder drift detector (nightly CI)
- shadow-sample.mjs              Sample A/B CSV emitter over SCAN archive

Tests:
- brief-dedup-jaccard.test.mjs    migrated from regex-harness to direct
                                   import plus orchestrator parity tests (22)
- brief-dedup-embedding.test.mjs  9 plan scenarios incl. 10-permutation
                                   property test, complete-link non-chain (21)
- brief-dedup-golden.test.mjs     20-pair mocked canary (21)

Workflows:
- .github/workflows/dedup-golden-pairs.yml  nightly live-embedder canary
                                             (07:17 UTC), opens issue on drift

Deviation from plan: the shouldVeto("Iran closes Hormuz", "Tehran
shuts Hormuz") case can't return true under a single coherent
classification (country-in-A vs capital-in-B sit on different sides
of the actor/location boundary). Gazetteer follows the plan's
"countries are actors" intent; the test is updated to assert false
with a comment pointing at the irreducible capital-country
coreference limitation.

Verification:
- npm run test:data          5825/5825 pass
- tests/edge-functions        171/171 pass
- typecheck + typecheck:api  clean
- biome check on new files    clean
- lint:md                     0 errors

Phase B (calibration), Phase C (shadow), and Phase D (flip) are
subsequent PRs.

* refactor(digest-dedup): address review findings 193-199

Fresh-eyes review found 3 P1s, 3 P2s, and a P3 bundle across
kieran-typescript, security-sentinel, performance-oracle, architecture-
strategist, and code-simplicity reviewers. Fixes below; all 64 dedup
tests + 5825 data tests + 171 edge-function tests still green.

P1 #193 - dedup regex + redis pipeline duplication
- Extract defaultRedisPipeline into scripts/lib/_upstash-pipeline.mjs;
  both orchestrator and embedding client import from there.
- normalizeForEmbedding now delegates to stripSourceSuffix from the
  Jaccard module so the outlet allow-list is single-sourced.

P1 #194 - embedding timeout floor + negative-budget path
- callEmbeddingsApi throws EmbeddingTimeoutError when timeoutMs<=0
  instead of opening a doomed 250ms fetch.
- Removed Math.max(250, ...) floor that let wall-clock cap overshoot.

P1 #195 - dead env getters
- Deleted getMode / isRemoteEmbedEnabled / isEntityVetoEnabled /
  getCosineThreshold / getWallClockMs from brief-dedup-consts.mjs
  (zero callers; orchestrator reimplements inline).

P2 #196 - orchestrator cleanup bundle
- Removed re-exports at bottom of brief-dedup.mjs.
- Extracted materializeCluster into brief-dedup-jaccard.mjs; both
  the fallback and orchestrator use the shared helper.
- Deleted clusterWithEntityVeto wrapper; orchestrator inlines the
  vetoFn wiring at the single call site.
- Shadow mode now runs Jaccard exactly once per tick (was twice).
- Fallback warn line carries reason=ErrorName so operators can
  filter timeout vs provider vs shape errors.
- Invalid DIGEST_DEDUP_MODE values emit a warn once per run (vs
  silently falling to jaccard).

P2 #197 - workflow + shadow-sample hardening
- dedup-golden-pairs.yml body composition no longer relies on a
  heredoc that would command-substitute validator stdout. Switched
  to printf with sanitised LOG_TAIL (printable ASCII only) and
  --body-file so crafted fixture text cannot escape into the runner.
- shadow-sample.mjs Upstash helper enforces a hardcoded command
  allowlist (SCAN | GET | EXISTS).

P2 #198 - test + observability polish
- Scenarios 2 and 3 deep-equal returned clusters against the Jaccard
  expected shape, not just length. Also assert the reason= field.

P3 #199 - nits
- Removed __constants test-bag; jaccard tests use named imports.
- Renamed deps.apiKey to deps._apiKey in embedding client.
- Added @pre JSDoc on diffClustersByHash about unique-hash contract.
- Deferred: mocked golden-pair test removal, gazetteer JSON migration,
  scripts/tools AGENTS.md doc note.

Todos 193-199 moved from pending to complete.

Verification:
- npm run test:data            5825/5825 pass
- tests/edge-functions          171/171 pass
- typecheck + typecheck:api    clean
- biome check on changed files clean

* fix(digest-dedup): address Greptile P2 findings on PR #3200

1. brief-embedding.mjs: wrap fetch lookup as
   `(...args) => globalThis.fetch(...args)` instead of aliasing bare
   `fetch`. Aliasing captures the binding at module-load time, so
   later instrumentation / Edge-runtime shims don't see the wrapper —
   same class of bug as the banned `fetch.bind(globalThis)` pattern
   flagged in AGENTS.md.

2. dedup-golden-pairs.yml: `gh issue create --label "..." || true`
   silently swallowed the failure when any of dedup/canary/p1 labels
   didn't pre-exist, breaking the drift alert channel while leaving
   the job red in the Actions UI. Switched to repeated `--label`
   flags + `--create-label` so any missing label is auto-created on
   first drift, and dropped the `|| true` so a legitimate failure
   (network / auth) surfaces instead of hiding.

Both fixes are P2-style per Greptile (confidence 5/5, no P0/P1);
applied pre-merge so the nightly canary is usable from day one.

* fix(digest-dedup): two P1s found on PR #3200

P1 — canary classifier must match production
Nightly golden-pair validator was checking a hardcoded threshold
(default 0.60) and always applied the entity veto, while the actual
dedup path at runtime reads DIGEST_DEDUP_COSINE_THRESHOLD and
DIGEST_DEDUP_ENTITY_VETO_ENABLED from env at every call. A Phase
C/D env flip could make the canary green while prod was wrong or
red while prod was healthy, defeating the whole point of a drift
detector.

Fix:
- golden-pair-validator.mjs now calls readOrchestratorConfig(process.env)
  — the same helper the orchestrator uses — so any classifier knob
  added later is picked up automatically. The threshold and veto-
  enabled flags are sourced from env by default; a --threshold CLI
  flag still overrides for manual calibration sweeps.
- dedup-golden-pairs.yml sources DIGEST_DEDUP_COSINE_THRESHOLD and
  DIGEST_DEDUP_ENTITY_VETO_ENABLED from GitHub repo variables (vars.*),
  which operators must keep in lockstep with Railway. The
  workflow_dispatch threshold input now defaults to empty; the
  scheduled canary always uses the production-parity config.
- Validator log line prints the effective config + source so nightly
  output makes the classifier visible.

P1 — shadow archive writes were fail-open
`defaultRedisPipeline()` returns null on timeout / auth / HTTP
failure. `writeShadowArchive()` only had a try/catch, so the null
result was silently treated as success. A Phase C rollout could
log clean "mode=shadow … disagreements=X" lines every tick while
the Upstash archive received zero writes — and Sample B labelling
would then find no batches, silently killing calibration.

Fix:
- writeShadowArchive now inspects the pipeline return. null result,
  non-array response, per-command {error}, or a cell without
  {result: "OK"} all return {ok: false, reason}.
- Orchestrator emits a warn line with the failure reason, and the
  structured log line carries archive_write=ok|failed so operators
  can grep for failed ticks.
- Regression test in brief-dedup-embedding.test.mjs simulates the
  null-pipeline contract and asserts both the warn and the structured
  field land.

Verification:
- test:data           5825/5825 pass
- dedup suites         65/65   pass (new: archive-fail regression)
- typecheck + api     clean
- biome check         clean on changed files

* fix(digest-dedup): two more P1s found on PR #3200

P1 — canary must also honour DIGEST_DEDUP_MODE + REMOTE_EMBED_ENABLED
The prior round fixed the threshold/veto knobs but left the canary
running embeddings regardless of whether production could actually
reach the embed path. If Railway has DIGEST_DEDUP_MODE=jaccard or
DIGEST_DEDUP_REMOTE_EMBED_ENABLED=0, production never calls the
classifier, so a drift signal is meaningless — or worse, a live
OpenRouter issue flags the canary while prod is obliviously fine.

Fix:
- golden-pair-validator.mjs reads mode + remoteEmbedEnabled from the
  same readOrchestratorConfig() helper the orchestrator uses. When
  either says "embed path inactive in prod", the validator logs an
  explicit skip line and exits 0. The nightly workflow then shows
  green, which is the correct signal ("nothing to drift against").
- A --force CLI flag remains for manual dispatch during staged
  rollouts.
- dedup-golden-pairs.yml sources DIGEST_DEDUP_MODE and
  DIGEST_DEDUP_REMOTE_EMBED_ENABLED from GitHub repo variables
  alongside the threshold and veto-enabled knobs, so all four
  classifier gates stay in lockstep with Railway.
- Validator log line now prints mode + remoteEmbedEnabled so the
  canary output surfaces which classifier it validated.

P1 — shadow-sample Sample A was biased by SCAN order
enumerate-and-dedup added every seen pair to a dedup key BEFORE
filtering by agreement. If the same pair appeared in an agreeing
batch first and a disagreeing batch later, the disagreeing
occurrence was silently dropped. SCAN order is unspecified, so
Sample A could omit real disagreement pairs.

Fix:
- Extracted the enumeration into a pure `enumeratePairs(archives, mode)`
  export so the logic is testable. Mode filter runs BEFORE the dedup
  check: agreeing pairs are skipped entirely under
  --mode disagreements, so any later disagreeing occurrence can
  still claim the dedup slot.
- Added tests/brief-dedup-shadow-sample.test.mjs with 5 regression
  cases: agreement-then-disagreement, reversed order (symmetry),
  always-agreed omission, population enumeration, cross-batch dedup.
- isMain guard added so importing the module for tests does not
  kick off the CLI scan path.

Verification:
- test:data           5825/5825 pass
- dedup suites         70/70   pass (5 new shadow-sample regressions)
- typecheck + api     clean
- biome check         clean on changed files

Operator follow-up before Phase C:
Set all FOUR dedup repo variables in GitHub alongside Railway:
  DIGEST_DEDUP_MODE, DIGEST_DEDUP_REMOTE_EMBED_ENABLED,
  DIGEST_DEDUP_COSINE_THRESHOLD, DIGEST_DEDUP_ENTITY_VETO_ENABLED

* refactor(digest-dedup): Railway is the single source of truth for dedup config

Fair user pushback: asking operators to set four DIGEST_DEDUP_*
values in BOTH Railway (where the cron runs) AND GitHub repo
variables (where the canary runs) is architectural debt. Two
copies of the same truth will always drift.

Solution: the digest cron publishes its resolved config to Upstash
on every tick under brief:dedup:config:v1 (2h TTL). The nightly
golden-pair canary reads that key instead of env vars. Railway
stays the sole source of truth; no parallel repo variables to
maintain. A missing/expired key signals "cron hasn't run" and
the canary skips with exit 0 — better than validating against
hardcoded defaults that might diverge from prod.

Changes:
- brief-dedup-consts.mjs: new ACTIVE_CONFIG_KEY + TTL constants.
- brief-dedup.mjs: new publishActiveConfig() fires at the start of
  every deduplicateStories() call (before the mode short-circuit,
  so jaccard ticks also publish a "mode=jaccard" signal the canary
  can read). Fire-and-forget; archive-write error semantics still
  apply if the operator wants stricter tracking.
- golden-pair-validator.mjs: removed readOrchestratorConfig(env)
  path. Now calls fetchActiveConfigFromUpstash() and either
  validates against that config, skips when the embed path is
  inactive, or skips when the key is missing (with --force
  override for manual dispatch).
- dedup-golden-pairs.yml: dropped the four DIGEST_DEDUP_* env lines
  and the corresponding repo-variable dependency. Only the three
  Upstash + OpenRouter secrets remain.
- tests: two new regressions assert config is published on every
  tick (shadow AND jaccard modes) with the right shape + TTL.

Operator onboarding now takes one action: set the four
DIGEST_DEDUP_* variables on the Railway seed-digest-notifications
service. Nothing to set in GitHub beyond the existing
OPENROUTER_API_KEY / UPSTASH_* secrets.

Verification:
- test:data           5825/5825 pass
- dedup suites         72/72   pass (2 new config-publish regressions)
- typecheck + api     clean
- biome check         clean on changed files

* refactor(digest-dedup): ship embed directly, drop phases/canary/shadow

User feedback: "i dont need multiple phases and shit, we go directly
to embed". Fair. Ripping out the overengineering I accumulated:

DELETED
- .github/workflows/dedup-golden-pairs.yml (nightly canary)
- scripts/tools/golden-pair-validator.mjs
- scripts/tools/shadow-sample.mjs
- scripts/tools/calibrate-dedup-threshold.mjs
- tests/fixtures/brief-dedup-golden-pairs.json
- tests/brief-dedup-golden.test.mjs
- tests/brief-dedup-shadow-sample.test.mjs

SIMPLIFIED
- brief-dedup.mjs: removed shadow mode, publishActiveConfig,
  writeShadowArchive, diffClustersByHash, jaccardRepsToClusterHashes,
  and the DIGEST_DEDUP_REMOTE_EMBED_ENABLED knob. MODE is now
  binary: `embed` (default) or `jaccard` (instant kill switch).
- brief-dedup-consts.mjs: dropped SHADOW_ARCHIVE_*, ACTIVE_CONFIG_*.
- Default flipped: DIGEST_DEDUP_MODE unset = embed (prod path).
  Railway deploy with OPENROUTER_API_KEY set = embeddings live on
  next cron tick. Set MODE=jaccard on Railway to revert instantly.

Orchestrator still falls back to Jaccard on any embed-path failure
(timeout, provider outage, missing API key, bad response). Fallback
warn carries reason=<ErrorName>. The cron never fails because
embeddings flaked. All 64 dedup tests + 5825 data tests still green.

Net diff: -1,407 lines.

Operator single action: set OPENROUTER_API_KEY on Railway's
seed-digest-notifications service (already present) and ship. No
GH Actions, no shadow archives, no labelling sprints. If the 0.60
threshold turns out wrong, tune DIGEST_DEDUP_COSINE_THRESHOLD on
Railway — takes effect on next tick, no redeploy.

* fix(digest-dedup): multi-word location phrases in the entity veto

Extractor was whitespace-tokenising and only single-token matching
against LOCATION_GAZETTEER, silently making every multi-word entry
unreachable:

  extractEntities("Houthis strike ship in Red Sea")
    → { locations: [], actors: ['houthis','red','sea'] }   ✗
  shouldVeto("Houthis strike ship in Red Sea",
             "US escorts convoy in Red Sea")  → false       ✗

With MODE=embed as the default, that turned off the main
anti-overmerge safety rail for bodies of water, regions, and
compound city names — exactly the P07-Hormuz / Houthis-Red-Sea
headlines the veto was designed to cover.

Fix: greedy longest-phrase scan with a sliding window. At each
token position try the longest multi-word phrase first (down to
2), require first AND last tokens to be capitalised (so lowercase
prose like "the middle east" doesn't falsely match while headline
"Middle East" does), lowercase connectors in between are fine
("Strait of Hormuz" → phrase "strait of hormuz" ✓). Falls back to
single-token lookup when no multi-word phrase fits.

Now:
  extractEntities("Houthis strike ship in Red Sea")
    → { locations: ['red sea'], actors: ['houthis'] }       ✓
  shouldVeto(Red-Sea-Houthis, Red-Sea-US) → true             ✓

Complexity still O(N · MAX_PHRASE_LEN) — MAX_PHRASE_LEN is 4
(longest gazetteer entry: "ho chi minh city"), so this is
effectively O(N).

Added 5 regression tests covering Red Sea, South China Sea,
Strait of Hormuz (lowercase-connector case), Abu Dhabi, and
New York, plus the Houthis-vs-US veto reproducer from the P1.
All 5825 data tests + 45 dedup tests green; lint + typecheck clean.
2026-04-19 13:49:48 +04:00
Elie Habib
122204f691 feat(brief): Phase 8 — Telegram carousel images via Satori + resvg-wasm (#3174)
* feat(brief): Phase 8 — Telegram carousel images via Satori + resvg-wasm

Implements the Phase 8 carousel renderer (Option B): server-side PNG
generation in a Vercel edge function using Satori (JSX to SVG) +
@resvg/resvg-wasm (SVG to PNG). Zero new Railway infra, zero
Chromium, same edge runtime that already serves the magazine HTML.

Files:

  server/_shared/brief-carousel-render.ts (new)
    Pure renderer: (BriefEnvelope, CarouselPage) -> Uint8Array PNG.
    Three layouts (cover/threads/story), 1200x630 OG size.
    Satori + resvg + WASM are lazy-imported so Node tests don't trip
    over '?url' asset imports and the 800KB wasm doesn't ship in
    every bundle. Font: Noto Serif regular, fetched once from Google
    Fonts and memoised on the edge isolate.

  api/brief/carousel/[userId]/[issueDate]/[page].ts (new)
    Public edge function reusing the magazine route's HMAC token —
    same signer, same (userId, issueDate) binding, so one token
    unlocks magazine HTML AND all three carousel images. Returns
    image/png with 7d immutable cache headers. 404 on invalid page
    index, 403 on bad token, 404 on Redis miss, 503 on missing
    signing secret. Render failure falls back to a 1x1 transparent
    PNG so Telegram's sendMediaGroup doesn't 500 the brief.

  scripts/seed-digest-notifications.mjs
    carouselUrlsFrom(magazineUrl) derives the 3 signed carousel
    URLs from the already-signed magazine URL. sendTelegramBriefCarousel
    calls Telegram's sendMediaGroup with those URLs + short caption.
    Runs before the existing sendTelegram(text) so the carousel is
    the header and the text the body — long-form stories remain
    forwardable as text. Best-effort: carousel failure doesn't
    block text delivery.

  package.json + package-lock.json
    satori ^0.10.14 + @resvg/resvg-wasm ^2.6.2.

Tests (tests/brief-carousel.test.mjs, 9 cases):
  - pageFromIndex mapping + out-of-range
  - carouselUrlsFrom: valid URL, localhost origin preserved, missing
    token, wrong path, invalid issueDate, garbage input
  - Drift guard: cron must still declare the same helper + template
    string. If it drifts, test fails with a pointer to move the impl
    into a shared module.

PNG render itself isn't unit-tested — Satori + WASM need a
browser/edge runtime. Covered by smoke validation step in the
deploy monitoring plan.

Both tsconfigs typecheck clean. 152/152 brief tests pass.

Scope boundaries (deferred):
  - Slack + Discord image attachments (different payload shapes)
  - notification-relay.cjs brief_ready dispatch (real-time route)
  - Redis caching of rendered PNG (edge Cache-Control is enough for
    MVP)

* fix(brief): address two P1 review findings on Phase 8 carousel

P1-A: 200 placeholder PNG cached 7d on render failure.
Route config said runtime: 'edge' but a comment contradicted it
claiming Node semantics. More importantly, any render/init failure
(WASM load, Satori, Google Fonts) was converted to a 1x1 transparent
PNG returned with Cache-Control: public, max-age=7d, immutable.
Telegram's media fetcher and Vercel's CDN would cache that blank
for the full brief TTL per chat message — one cold-start mismatch
= every reader of that brief sees blank carousel previews for a
week.

Fix: deleted errorPng(). Render failure now returns 503 with
Cache-Control: no-store. sendMediaGroup fails cleanly for that
carousel (the digest cron already treats it as best-effort and
still sends the long-form text message), next cron tick re-renders
from a fresh isolate. Self-healing across ticks. Contradictory
comment about Node runtime removed.

P1-B: Google Fonts as silent hard dependency.
The renderer claimed 'safe embedded/fallback path' in comments but
no fallback existed. loadFont() fetches Noto Serif from gstatic.com
and rethrows on any failure. Combined with P1-A's old 200-cache-7d
path, a transient CDN blip would lock in a blank carousel for a
week.

Fix: updated comments to honestly declare the CDN dependency plus
document the self-healing semantics now that P1-A's fix no longer
caches the failure. If Google Fonts reliability becomes a problem,
swap the fetch for a bundled base64 TTF — noted as the escape hatch.

Tests (tests/brief-carousel.test.mjs): 2 new regression cases.
11/11 carousel tests pass. Both tsconfigs typecheck clean locally.

Note on currently-red CI: failures are NOT typecheck errors — npm
ci dies fetching libvips for sharp (504 Gateway Time-out from
GitHub releases). sharp is a transitive dep via @xenova/transformers,
pre-existing, not touched by this PR. Transient infra flake.

* fix(brief): switch carousel to Node + @resvg/resvg-js (fixes deploy block)

Vercel edge bundler fails the carousel deploy with:
  'Edge Function is referencing unsupported modules:
   @resvg/resvg-wasm/index_bg.wasm?url'

The ?url asset-import syntax is a Vite-ism that Vercel's edge
bundler doesn't resolve. Two ways out: find a Vercel-blessed edge
WASM import incantation, or switch to Node runtime with the native
@resvg/resvg-js binding. The second is simpler, faster per request,
and avoids the whole WASM-in-edge-bundler rabbit hole.

Changes:
  - package.json: @resvg/resvg-wasm -> @resvg/resvg-js ^2.6.2
  - api/brief/carousel/.../[page].ts: runtime 'edge' -> 'nodejs20.x'
  - server/_shared/brief-carousel-render.ts: drop initWasm path,
    dynamic-import resvg-js in ensureLibs(). Satori and resvg load
    in parallel via Promise.all, shaving ~30ms off cold start.

Also addresses the P2 finding from review: the old ensureLibsAndWasm
had a concurrent-cold-start race where two callers could reach
'await initWasm()' simultaneously. Replaced the boolean flag with a
shared _libsLoadPromise so concurrent callers await the same load.
On failure the promise resets so the NEXT request retries rather
than poisoning the isolate for its lifetime.

Cold start ~700ms (Satori + resvg-js native init), warm ~40ms.
Carousel images are not latency-critical — fetched by Telegram's
media service, CDN-cached 7d.

Both tsconfigs typecheck clean. 11/11 carousel tests pass.

* fix(brief): carousel runtime = 'nodejs' (was 'nodejs20.x', rejected by Vercel)

Vercel's functions config validator rejects 'nodejs20.x' at deploy
time:

  unsupported "runtime" value in config: "nodejs20.x"
  (must be one of: ["edge","experimental-edge","nodejs"])

The Node version comes from the project's default (currently Node 20
via package.json engines + Vercel project settings), not from the
runtime string. Use 'nodejs' — unversioned — and let the platform
resolve it.

11/11 carousel tests pass.

* fix(brief): swap carousel font from woff2 to woff (Satori can't parse woff2)

Review on PR #3174: the FONT_URL was pointing at a gstatic.com woff2
file. Satori parses ttf / otf / woff v1 — NOT woff2. Every render
was about to throw on font decode, the route would return 503, and
the carousel would never deliver a single image.

Fix: point FONT_URL at @fontsource's Noto Serif Latin 400 WOFF v1
via jsdelivr. WOFF v1 is a TrueType wrapper that Satori parses
natively (verified: file says 'Web Open Font Format, TrueType,
version 1.1'). Same cold-start semantics as before — one fetch per
warm isolate, memoised.

Regression test: asserts FONT_URL ends in ttf/otf/woff and explicitly
rejects any .woff2 suffix. A future swap that silently reintroduces
woff2 now fails CI loudly instead of shipping a permanently-broken
renderer.

12/12 carousel tests pass. Both tsconfigs typecheck clean.
2026-04-18 20:27:41 +04:00
Elie Habib
c2356890da feat(brief): Phase 3b — LLM whyMatters + editorial digest prose via Gemini (#3172)
* feat(brief): Phase 3b — LLM whyMatters + editorial digest prose via Gemini

Replaces the Phase 3a stubs with editorial output from Gemini 2.5
Flash via the existing OpenRouter-backed callLLM chain. Two LLM
pathways, different caching semantics:

  whyMatters (per story): 1 editorial sentence, 18-30 words, global
  stakes. Cache brief:llm:whymatters:v1:{sha256(headline|source|severity)}
  with 24h TTL shared ACROSS users (whyMatters is not personalised).
  Bounded concurrency 5 so a 12-story brief doesn't open 12 parallel
  sockets to OpenRouter.

  digest prose (per user): JSON { lead, threads[], signals[] }
  replacing the stubs. Cache brief:llm:digest:v1:{userId}:{sensitivity}
  :{poolHash} with 4h TTL per-user. Pool hash is order-insensitive
  so rank shuffling doesn't invalidate.

Provider pinned to OpenRouter (google/gemini-2.5-flash) via
skipProviders: ['ollama', 'groq'] per explicit user direction.

Null-safe all the way down. If the LLM is unreachable, parse fails,
or cache throws, enrichBriefEnvelopeWithLLM returns the baseline
envelope with its stubs intact. The brief always ships. Kill switch
BRIEF_LLM_ENABLED is distinct from AI_DIGEST_ENABLED so the brief's
editorial prose and the email's AI summary can be toggled
independently during provider outages.

Files:
  scripts/lib/brief-llm.mjs (new) — pure prompt/parse helpers + IO
    generateWhyMatters/generateDigestProse + envelope enrichment
  scripts/seed-digest-notifications.mjs — BRIEF_LLM_ENABLED flag,
    briefLlmDeps closure, enrichment inserted between compose + SETEX
  tests/brief-llm.test.mjs (new, 34 cases)

End-to-end verification: the enriched envelope passes
assertBriefEnvelope() — the renderer's strict validator is the gate
between composer and api/brief, so we prove the enriched envelope
still validates.

156/156 brief tests pass. Both tsconfigs typecheck clean.

* fix(brief): address three P1 review findings on Phase 3b

All three findings are about cache-key correctness + envelope safety.

P1-A — whyMatters cache key under-specifies the prompt.
  hashStory keyed on headline|source|threatLevel, but the prompt also
  carries category + country. Upstream classification or geocoding
  corrections that leave those three fields unchanged would return
  pre-correction prose for a materially different prompt. Bumped to
  v2 key space (pre-fix rows ignored, re-LLM once on rollout). Added
  regression tests for category + country busting the cache.

P1-B — digest prose cache key under-specifies the prompt.
  hashDigestInput sorted stories and hashed headline|threatLevel only.
  The actual prompt includes ranked order + category + country + source.
  v2 hash now canonicalises to JSON of the fields in the prompt's
  ranked order. Test inverted to lock the corrected behaviour
  (reordering MUST miss the cache). Added a test for category change
  invalidating.

P1-C — malformed cached digest poisons the envelope at SETEX time.
  On cache hit generateDigestProse accepted any object with a string
  lead, skipping the full shape check. enrichBriefEnvelopeWithLLM then
  wrote prose.threads/.signals into the envelope, and the cron SETEXed
  unvalidated. A bad cache row would 404 /api/brief at render time.

  Two-layer fix:
    1. Extracted validateDigestProseShape(obj) — same strictness
       parseDigestProse ran on fresh output. generateDigestProse now
       runs it on cache hits too, and returns a normalised copy.
    2. Cron now re-runs assertBriefEnvelope on the ENRICHED envelope
       before SETEX. On assertion failure it falls back to the
       unenriched baseline (already passed assertion on construction).

  Regression test: malformed cached row is rejected on hit and the
  LLM is called again to overwrite.

Tests: 8 new regression cases locking all three findings. Total brief
test suite now 185/185 green. Both tsconfigs typecheck clean.

Cache-key version bumps (v1 -> v2) trigger one-off cache miss on
deploy. Editorial prose re-LLM'd on the next cron tick per user.

* fix(brief): address two P2 review findings on #3172

P2-A: misleading test name 'different users share the cache' asserted
the opposite (per-user isolation). Renamed to 'different users do NOT
share the digest cache even when the story pool is identical' so a
future reader can't refactor away the per-user key on a misreading.

P2-B: signal length validator only capped bytes (< 220 chars), so a
30-word signal could pass even though the prompt says '<=14 words'.
Added a word-count filter with an 18-word ceiling (14 + 4 margin for
model drift / hyphenated compounds). Regression test locks the
behaviour: signals with >14-word drift are dropped, short imperatives
pass.

43/43 brief-llm tests pass. Both tsconfigs typecheck clean.
2026-04-18 19:37:33 +04:00
Elie Habib
01c607c27c fix(brief): compose magazine stories from digest accumulator, not news:insights (#3168)
Root cause of the "email digest lists 30 critical events, brief shows
2 random Bellingcat stories" mismatch reported today: the email and
the brief read from two unrelated Redis keys.

  email digest -> digest:accumulator:v1:{variant}:{lang}
                  live per-variant ZSET of 30+ ingested stories,
                  hydrated from story:track:v1:{hash} + sources.
                  written by list-feed-digest on every ingest cycle.

  brief        -> news:insights:v1
                  global 8-story summary written by seed-insights.
                  After sensitivity=critical filter only 2 survive.
                  A completely different pool on a different cadence.

The brief was shipping from the wrong source, so a user who had just
read "UNICEF / Hormuz / Rohingya" in their email would open their
brief and see unrelated Bellingcat pieces.

Fix: brief now composes from the same digest accumulator the email
reads. scripts/lib/brief-compose.mjs exposes a new
composeBriefFromDigestStories(rule, digestStories, insightsNumbers,
{nowMs}) that maps the digest story shape ({hash, title, severity,
sources[], ...}) through a small adapter into the upstream brief-
filter shape, applies the user's sensitivity gate, and assembles the
envelope. news:insights:v1 is still read — but only for the
clusters / multi-source counters on the stats page. A failed
insights fetch now returns zeroed stats instead of aborting brief
composition, because the stories (not the numbers) are what matter.

seed-digest-notifications:
- composeBriefsForRun now calls buildDigest(candidate, windowStart)
  per rule instead of using a single global pool
- memoizes buildDigest by (variant, lang, windowStart) to keep the
  per-user loop from issuing N identical ZRANGE+HGETALL round-trips
- BRIEF_STORY_WINDOW_MS = 24h — a weekly-cadence user still expects
  a fresh brief in the dashboard every day, independent of email
  cadence
- composeBriefForRule kept as @deprecated so tests that stub
  news:insights directly don't break; all live traffic uses the
  digest path

Tests: new tests/brief-from-digest-stories.test.mjs (12 cases) locks
the mapping — empty input, source selection, sensitivity pass/drop,
12-story cap, moderate→medium severity aliasing, category/country
defaults, stats-number passthrough, determinism.

122/122 brief tests pass; both tsconfigs typecheck clean.

Operator note: today's wrong brief at brief:user_...:2026-04-18 was
already DELed manually. The next cron tick under this code composes
a correct one from the same pool the email used.
2026-04-18 15:47:08 +04:00
Elie Habib
711636c7b6 feat(brief): consolidate composer into digest cron (retire standalone service) (#3157)
* feat(brief): consolidate composer into digest cron (retire standalone service)

Merges the Phase 3a standalone Railway composer into the existing
digest cron. End state: one cron (seed-digest-notifications.mjs)
writes brief:{userId}:{issueDate} for every eligible user AND
dispatches the digest to their configured channels with a signed
magazine URL appended. Net -1 Railway service.

User's architectural note: "there is no reason to have 1 digest
preparing all and sending, then another doing a duplicate". This
delivers that — infrastructure consolidation, same send cadence,
single source of truth for brief envelopes.

File moves / deletes:

- scripts/seed-brief-composer.mjs → scripts/lib/brief-compose.mjs
  Pure-helpers library: no main(), no env guards, no cron. Exports
  composeBriefForRule + groupEligibleRulesByUser + dedupeRulesByUser
  (shim) + shouldExitNonZero + date helpers + extractInsights.
- Dockerfile.seed-brief-composer → deleted.
- The seed-brief-composer Railway service is retired (user confirmed
  they would delete it manually).

New files:

- scripts/lib/brief-url-sign.mjs — plain .mjs port of the sign path
  in server/_shared/brief-url.ts (Web Crypto only, no node:crypto).
- tests/brief-url-sign.test.mjs — parity tests that confirm tokens
  minted by the scripts-side signer verify via the edge-side verifier
  and produce byte-identical output for identical input.

Digest cron (scripts/seed-digest-notifications.mjs):

- Reads news:insights:v1 once per run, composes per-user brief
  envelopes, SETEX brief:{userId}:{issueDate} via body-POST pipeline.
- Signs magazine URL per user (BRIEF_URL_SIGNING_SECRET +
  WORLDMONITOR_PUBLIC_BASE_URL new env requirements, see pre-merge).
- Injects magazineUrl into buildChannelBodies for every channel
  (email, telegram, slack, discord) as a "📖 Open your WorldMonitor
  Brief magazine" footer CTA.
- Email HTML gets a dedicated data-brief-cta-slot near the top of
  the body with a styled button.
- Compose failures NEVER block the digest send — the digest cron's
  existing behaviour is preserved when the brief pipeline has issues.
- Brief compose extracted to its own functions (composeBriefsForRun
  + composeAndStoreBriefForUser) to keep main's biome complexity at
  baseline (64 — was 63 before; inline would have pushed to 117).

Tests: 98/98 across the brief suite. New parity tests confirm cross-
module signer agreement.

PRE-MERGE: add BRIEF_URL_SIGNING_SECRET and WORLDMONITOR_PUBLIC_BASE_URL
to the digest-notifications Railway service env (same values already
set on Vercel for Phase 2). Without them, brief compose is auto-
disabled and the digest falls back to its current behaviour — safe to
deploy before env is set.

* fix(brief): digest Dockerfile + propagate compose failure to exit code

Addresses two seventh-round review findings on PR #3157.

1. Cross-directory imports + current Railway build root (todo 230).
   The consolidated digest cron imports from ../api, ../shared, and
   (transitively via scripts/lib/brief-compose.mjs) ../server/_shared.
   The running digest-notifications Railway service builds from the
   scripts/ root — those parent paths are outside the deploy tree
   and would 500 on next rebuild with ERR_MODULE_NOT_FOUND.

   New Dockerfile.digest-notifications (repo-root build context)
   COPYs exactly the modules the cron needs: scripts/ contents,
   scripts/lib/, shared/brief-envelope.*, shared/brief-filter.*,
   server/_shared/brief-render.*, api/_upstash-json.js,
   api/_seed-envelope.js. Tight list to keep the watch surface small.
   Pattern matches the retired Dockerfile.seed-brief-composer + the
   existing Dockerfile.relay.

2. Silent compose failures (todo 231). composeBriefsForRun logged
   counters but never exited non-zero. An Upstash outage or missing
   signing secret silently dropped every brief write while Railway
   showed the cron green. The retired standalone composer exited 1
   on structural failures; that observability was lost in the
   consolidation.

   Changed the compose fn to return {briefByUser, composeSuccess,
   composeFailed}. Main captures the counters, runs the full digest
   send loop first (compose-layer breakage must NEVER block user-
   visible digest delivery), then calls shouldExitNonZero at the
   very end. Exit-on-failure gives ops the Railway-red signal
   without touching send behaviour.

   Also: a total read failure of news:insights:v1 (catch branch)
   now counts as 1 compose failure so the gate trips on insights-
   key infra breakage, not just per-user write failures.

Tests unchanged (98/98). Typecheck + node --check clean. Biome
complexity ticks 63→65 — same pre-existing bucket, already tolerated
by CI; no new blocker.

PRE-MERGE Railway work still pending: set BRIEF_URL_SIGNING_SECRET
+ WORLDMONITOR_PUBLIC_BASE_URL on the digest-notifications service,
AND switch its dockerfilePath to /Dockerfile.digest-notifications
before merging. Without the dockerfilePath switch, the next rebuild
fails.

* fix(brief): Dockerfile type:module + explicit missing-secret tripwire

Addresses two eighth-round review findings on PR #3157.

1. ESM .js files parse as CommonJS in the container (todo 232).
   Dockerfile.digest-notifications COPYs shared/*.js,
   server/_shared/*.js, api/*.js — all ESM because the repo-root
   package.json has "type":"module". But the image never copies the
   root package.json, so Node's nearest-pjson walk inside /app/
   reaches / without finding one and defaults to CommonJS. First
   `export` statement throws `SyntaxError: Unexpected token 'export'`
   at startup.

   Fix: write a minimal /app/package.json with {"type":"module"}
   early in the build. Avoids dragging the full root package.json
   into the image while still giving Node the ESM hint it needs for
   repo-owned .js files.

2. Missing BRIEF_URL_SIGNING_SECRET silently tolerated (todo 233).
   The old gate folded "operator-disabled" (BRIEF_COMPOSE_ENABLED=0)
   and "required secret missing in rollout" into the same boolean
   via AND. A production deploy that forgot the env var would skip
   brief compose without any failure signal — Railway green, no
   briefs, no CTA in digests, nobody notices.

   Split the two states: BRIEF_COMPOSE_DISABLED_BY_OPERATOR (explicit
   kill switch, silent) and BRIEF_SIGNING_SECRET_MISSING (the misconfig
   we care about). When the secret is missing without the operator
   flag, composeBriefsForRun returns composeFailed=1 on first call
   so the end-of-run exit gate trips and Railway flags the run red.
   Digest send still proceeds — compose-layer issues never block
   notifications.

Tests: 98/98. Syntax + node --check clean.

* fix(brief): address 2 remaining P2 review comments on PR #3157

Greptile review (2026-04-18T05:04Z) flagged three P2 items. The
first (shouldExitNonZero never wired into cron) was already fixed in
commit 35a46aa34. This commit addresses the other two.

1. composeBriefForRule: issuedAt used Date.now() instead of the
   caller-supplied nowMs. Under the digest cron the delta is
   milliseconds and harmless, but it broke the function's
   determinism contract — same input must produce same output for
   tests + retries. Now uses the passed nowMs.

2. buildChannelBodies: magazineUrl embedded raw inside Telegram HTML
   <a href="..."> and Slack <URL|text> syntax. The URL is HMAC-
   signed and shape-validated upstream (userId regex + YYYY-MM-DD
   date), so injection is practically impossible — but the email
   CTA (injectBriefCta) escapes per-target and channel footers
   should match that discipline. Added:
     - Telegram: escape &, <, >, " to HTML entities
     - Slack: strip <, >, | (mrkdwn metacharacters)
   Discord and plain-text paths unchanged — Discord links tolerate
   raw URLs, plain text has no metacharacters to escape.

Tests: 98/98 still pass (deterministic issuedAt change was
transparent to existing assertions because tests already pass nowMs
explicitly via the issuedAt fixture field).
2026-04-18 12:30:08 +04:00
Elie Habib
c109ddefad fix(digest): time-appropriate greeting in AI brief (#3018)
* fix(digest): use time-appropriate greeting based on user's local hour

The AI digest summary always opened with "Good morning" regardless of
when it was delivered. Now computes the user's local hour from their
configured digestTimezone and passes the correct greeting (morning,
afternoon, evening) to the LLM prompt.

* fix(digest): include greeting in AI summary cache key

Prevents a morning-cached summary from being replayed in the evening.
The greeting segment in the cache key ensures each time-of-day window
gets its own cached summary.

* fix(digest): log warning when timezone resolution fails in greeting

Addresses PR review: toLocalHour returning -1 silently falls back to
"Good evening". Now logs a warning with the bad timezone for Railway
observability.
2026-04-12 20:40:17 +04:00
Elie Habib
24f23ba67a fix(digest): skip Groq + fix Telegram 400 from oversized messages (#3002)
* fix(digest): skip Groq (always 429) and fix Telegram 400 from oversized messages

Groq consistently rate-limits on digest runs, adding ~1s latency before
falling through to OpenRouter. Skip it via new callLLM skipProviders opt.

Telegram sendMessage rejects with 400 when digest text exceeds the 4096
char limit (30 stories + AI summary = ~5600 chars). Truncate at last
newline before the limit and close any unclosed HTML tags so truncation
mid-tag doesn't also cause a parse error. Log the Telegram error response
body so future 400s are diagnosable.

* fix: strip partial HTML tag before rebalancing in sanitizeTelegramHtml

The previous order appended closing tags first, then stripped the trailing
partial tag, so truncation mid-tag (e.g. 'x <b>hello</') still produced
malformed HTML. Reverse the order: strip partial tag, then close unclosed.

* fix: re-check length after sanitize in truncateTelegramHtml

Closing tags appended by sanitize can push a near-limit message over 4096.
Recurse into truncation if sanitized output exceeds the limit.
2026-04-12 12:00:05 +04:00
Elie Habib
3d7e60ca7d fix(digest): never skip AI summary when userPreferences are missing (#2939)
* fix(digest): never skip AI summary when userPreferences are missing

Users who enabled the AI executive summary toggle on their notification
rule still received digest emails without the summary. The Railway log
pinpointed it:

  [digest] No preferences for user_... skipping AI summary
  [digest] Email delivered to ...

Root cause chain:
  convex/http.ts:591            /relay/user-preferences returns literal
                                null when no userPreferences row exists
                                for (userId, variant).
  scripts/lib/user-context.cjs  fetchUserPreferences forwards that as
                                { data: null, error: false }.
  scripts/seed-digest-notifications.mjs:458
                                generateAISummary bails with return null.

The AI-summary toggle lives on the alertRules table. userPreferences is
a SEPARATE table (the SPA app-settings blob: watchlist, airports,
panels). A user can have an alertRule (with aiDigestEnabled: true)
without having ever saved userPreferences, or only under a different
variant. Missing prefs must NOT silently disable the feature the user
explicitly enabled. The correct behavior is to degrade to a
non-personalized summary.

Fix: remove the early return in generateAISummary. Call
extractUserContext(null), which already returns a safe empty context,
and formatUserProfile(ctx, 'full') returns "Variant: full" alone. The
LLM then generates a generic daily brief instead of nothing. An info
log still reports the missing-prefs case for observability.

Regression coverage: tests/user-context.test.mjs (new, 10 cases) locks
in that extractUserContext(null|undefined|{}|"") returns the empty
shape and formatUserProfile(emptyCtx, variant) returns exactly
"Variant: {variant}". Any future refactor that reintroduces the
null-bail will fail the tests.

Note: the same log also shows the rule fired at 13:01 Dubai instead
of 8 AM / 8 PM. That is a separate issue in isDue or rule-save flow
and needs more log data to diagnose; not included here.

* fix(digest): distinguish transient prefs fetch failure from missing row

Addresses Greptile P2 review feedback on PR #2939.

fetchUserPreferences returns { data, error } where:
  error: true  = transient fetch failure (network, non-OK HTTP, env missing)
  error: false = the (userId, variant) row genuinely does not exist

The previous log treated both cases identically as "No stored preferences",
which was misleading when the real cause was an unreachable Convex endpoint.
Behavior is unchanged (both still degrade to a non-personalized summary),
only the log line differentiates them so transient fetch failures are
visible in observability.
2026-04-11 17:10:06 +04:00
Elie Habib
c8084e3c29 fix(digest): render AI summary markdown across all channels (#2935) 2026-04-11 09:39:27 +04:00
Elie Habib
b99ceacc37 fix(emails): redesign intelligence brief email template (#2933)
- Width: 600px -> 680px (less constrained)
- Outer background: #111 (not full black #0a0a0a)
- Header: logo + date on same row, cleaner layout
- AI Summary: injected via slot below header, above stats
  (was above the header via regex, wrong order)
- Stats cards: separated with gaps, individual borders
- Footer: cleaned up, added Discord, fixed X handle
- Summary slot system replaces fragile regex injection
2026-04-11 08:35:11 +04:00
Elie Habib
e14dc5b103 feat(notifications): PRO entitlement check before delivery (#2899)
* feat(notifications): PRO entitlement check before delivery in relay/digest/proactive

All three notification delivery paths now verify the user has PRO tier
before sending. Uses a new /relay/entitlement Convex HTTP endpoint
(RELAY_SHARED_SECRET auth) with 15min Redis cache per user.

Fail-open design: if entitlement service is unreachable, delivery
proceeds (don't block paying users on a service hiccup). Cache shared
across relay, digest, and proactive scripts via relay:entitlement:{userId}.

Prevents downgraded users from continuing to receive notifications
after their subscription expires.

Requires Convex deploy for the new /relay/entitlement route.

* fix(notifications): show second delivery time for twice_daily digest mode

When user selects "Twice daily", show "Also sends at X PM/AM" hint below
the hour selector so they know when the second digest arrives.
Updates dynamically when hour or mode changes.
2026-04-10 15:34:52 +04:00
Elie Habib
d85bee163e fix(seeds): remove military-bases from static-ref bundle (#2893)
* fix(seeds): remove military-bases from static-ref bundle

seed-military-bases.mjs is a one-time batch upload tool requiring
--env and --sha CLI args. Without args it exits code 1. It writes
no seed-meta so the freshness gate always tries to run it, always
fails, causing the bundle to exit non-zero.

* fix(notifications): wrap Telegram/Slack/Discord sends in try/catch

Unhandled fetch timeout (ETIMEDOUT) in sendTelegram crashed the entire
digest cron run. Email delivery succeeded but subsequent Telegram
delivery threw an uncaught error, killing the process.

Wrapped all three webhook-style send functions (Telegram, Slack, Discord)
in try/catch so network timeouts log a warning and return false instead
of crashing. sendEmail already had this pattern.
2026-04-10 14:10:30 +04:00
Elie Habib
00320c26cf feat(notifications): proactive intelligence agent (Phase 4) (#2889)
* feat(notifications): proactive intelligence agent (Phase 4)

New Railway cron (every 6 hours) that detects signal landscape changes
and generates proactive intelligence briefs before events break.

Reads ~8 Redis signal keys (CII risk, GPS interference, unrest, sanctions,
cyber threats, thermal anomalies, weather, commodities), computes a
landscape snapshot, diffs against the previous run, and generates an
LLM brief when the diff score exceeds threshold.

Key features:
- Signal landscape diff with weighted scoring (new risk countries = 2pts,
  GPS zone changes = 1pt per zone, commodity movers >3% = 1pt)
- Server-side convergence detection: countries with 3+ signal types flagged
- First run stores baseline only (no false-positive brief)
- Delivers via all 5 channels (Telegram, Slack, Discord, Email, Webhook)
- PROACTIVE_INTEL_ENABLED=0 env var to disable
- Skips users without saved preferences or deliverable channels

Requires: Railway cron service configuration (every 6 hours)

* fix(proactive): fetch all enabled rules + expand convergence to all signal types

1. Replace /relay/digest-rules (digest-mode only) with ConvexHttpClient
   query alertRules:getByEnabled to include ALL enabled rules, not just
   digest-mode users. Proactive briefs now reach real-time users too.
2. Expand convergence detection from 3 signal families (risk, unrest,
   sanctions) to all 7 (add GPS interference, cyber, thermal, weather).
   Track signal TYPES per country (Set), not event counts, so convergence
   means 3+ distinct signal categories, not 3+ events from one category.
3. Include signal type names in convergence zone output for LLM context
   and webhook payload.

* fix(proactive): check channels before LLM + deactivate stale channels

1. Move channel fetch + deliverability check BEFORE user prefs and LLM
   call to avoid wasting LLM calls on users with no verified channels
2. Add deactivateChannel() calls on 404/410/403 responses in all delivery
   helpers (Telegram, Slack, Discord, Webhook), matching the pattern in
   notification-relay.cjs and seed-digest-notifications.mjs

* fix(proactive): preserve landscape on transient failures + drop Telegram Markdown

1. Don't advance landscape baseline when channel fetch or LLM fails,
   so the brief retries on the next run instead of permanently suppressing
   the change window
2. Remove parse_mode: 'Markdown' from Telegram sendMessage to avoid 400
   errors from unescaped characters in LLM output (matches digest pattern)

* fix(proactive): only advance landscape baseline after successful delivery

* fix(proactive): abort on degraded signals + don't advance on prefs failure

1. Track loaded signal key count. Abort run if <60% of keys loaded
   to prevent false diffs from degraded Redis snapshots becoming
   the new baseline.
2. Don't advance landscape when fetchUserPreferences() returns null
   (could be transient failure, not just "no saved prefs"). Retries
   next run instead of permanently suppressing the brief.

* fix(notifications): distinguish no-prefs from fetch-error in user-context

fetchUserPreferences() now returns { data, error } instead of bare null.
error=true means transient failure (retry next run, don't advance baseline).
data=null + error=false means user has no saved preferences (skip + advance).

Proactive script: retry on error, skip+advance on no-prefs.
Digest script: updated to destructure new return shape (behavior unchanged,
  both cases skip AI summary).

* fix(proactive): address all Greptile review comments

P1: Add link-local (169.254) and 0.0.0.0 to isPrivateIP SSRF check
P1: Log channel-fetch failures (was silent catch{})
P2: Remove unused createHash import and BRIEF_TTL constant
P2: Remove dead ?? 'full' fallback (rule.variant validated above)
P2: Add HTTPS enforcement to sendSlack/sendDiscord (matching sendWebhook)
2026-04-10 08:08:27 +04:00
Elie Habib
12203a4f51 feat(notifications): generic webhook channel (Phase 3) (#2887)
* feat(notifications): generic webhook channel (Phase 3)

Add webhook as a 5th notification channel type. Users provide an HTTPS
URL, WorldMonitor POSTs structured JSON payloads to it. Enables
integration with Zapier, n8n, IFTTT, and custom pipelines.

Schema: webhook variant in notificationChannels with webhookEnvelope
(AES-256-GCM encrypted URL), webhookLabel, webhookSecret fields.

Relay: sendWebhook() with SSRF protection (DNS resolve + private IP
check), HTTPS-only enforcement, auto-deactivation on 404/410/403.

Digest cron: sendWebhook() delivers digest as structured JSON with
stories array, AI summary, and story count.

Requires Convex deploy for schema changes.

* fix(notifications): webhook UI, label persistence, SSRF fail-closed

Address review findings on PR #2887:

1. Add webhook to settings UI: channel row with URL input, label field,
   connect/cancel/save buttons, icon, and connected state display
2. Forward webhookLabel through edge function -> Convex relay -> mutation,
   persist in notificationChannels table (was silently discarded)
3. Fix digest sendWebhook SSRF: dns.resolve4().catch(()=>[]) fails open
   on IPv6-only hosts; now fails closed like the relay version

* fix(notifications): validate webhook URL at connect time + add webhookLabel to public mutation

1. Edge function now validates webhook URLs before encrypting: HTTPS required,
   private/local hostnames rejected (localhost, 127.*, 10.*, 192.168.*, etc.)
   Invalid URLs caught at connect time rather than silently failing on delivery.
2. Public setChannel mutation now accepts and persists webhookLabel,
   matching the internal mutation and schema.

* fix(notifications): include held alert details in webhook quiet-hours batch

Webhook batch_on_wake delivery now sends full alert details (eventType,
severity, title per alert) instead of just the batch subject line,
matching the information density of Slack/Discord/Email delivery.
2026-04-09 23:22:44 +04:00
Elie Habib
fa64e2f61f feat(notifications): AI-enriched digest delivery (#2876)
* feat(notifications): AI-enriched digest delivery (Phase 1)

Add personalized LLM-generated executive summaries to digest
notifications. When AI_DIGEST_ENABLED=1 (default), the digest cron
fetches user preferences (watchlist, panels, frameworks), generates a
tailored intelligence brief via Groq/OpenRouter, and prepends it to the
story list in both text and HTML formats.

New infrastructure:
- convex/userPreferences: internalQuery for service-to-service access
- convex/http: /relay/user-preferences endpoint (RELAY_SHARED_SECRET auth)
- scripts/lib/llm-chain.cjs: shared Ollama->Groq->OpenRouter provider chain
- scripts/lib/user-context.cjs: user preference extraction + LLM prompt formatting

AI summary is cached (1h TTL) per stories+userContext hash. Falls back
to raw digest on LLM failure (no regression). Subject line changes to
"Intelligence Brief" when AI summary is present.

* feat(notifications): per-user AI digest opt-out toggle

AI executive summary in digests is now optional per user via
alertRules.aiDigestEnabled (default true). Users can toggle it off in
Settings > Notifications > Digest > "AI executive summary".

Schema: added aiDigestEnabled to alertRules table
Backend: Convex mutations, HTTP relay, edge function all forward the field
Frontend: toggle in digest settings section with descriptive copy
Digest cron: skips LLM call when rule.aiDigestEnabled === false

* fix(notifications): address PR review — cache key, HTML replacement, UA

1. Add variant to AI summary cache key to prevent cross-variant poisoning
2. Use replacer function in html.replace() to avoid $-pattern corruption
   from LLM output containing dollar amounts ($500M, $1T)
3. Use service UA (worldmonitor-llm/1.0) instead of Chrome UA for LLM calls

* fix(notifications): skip AI summary without prefs + fix HTML regex

1. Return null from generateAISummary() when fetchUserPreferences()
   returns null, so users without saved preferences get raw digest
   instead of a generic LLM summary
2. Fix HTML replace regex to match actual padding value (40px 32px 0)
   so the executive summary block is inserted in email HTML

* fix(notifications): channel check before LLM, omission-safe aiDigest, richer cache key

1. Move channel fetch + deliverability check BEFORE AI summary generation
   so users with no verified channels don't burn LLM calls every cron run
2. Only patch aiDigestEnabled when explicitly provided (not undefined),
   preventing stale frontend tabs from silently clearing an opt-out
3. Include severity, phase, and sources in story hash for cache key
   so the summary invalidates when those fields change
2026-04-09 21:35:26 +04:00
Elie Habib
65b4655dc6 fix(digest): namespace accumulator by language, add per-severity caps (#2826)
* fix(digest): namespace accumulator by language, add per-severity caps

Root cause: digest:accumulator:v1:${variant} was shared across all
languages. A buildDigest("full", "de") request wrote German stories
to the same accumulator the English digest cron consumed, leaking
non-English headlines into English email digests.

Fix (3 layers):
1. Accumulator key is now language-aware:
   digest:accumulator:v1:${variant}:${lang}
   writeStoryTracking receives lang and writes to the correct key.
   Cron reads from the lang-specific key (defaults to 'en').

2. Defense-in-depth: lang is stored on story:track:v1:* hash records.
   Cron filters stories where track.lang !== target lang.

3. Per-severity display caps use named constants:
   CRITICAL=Infinity, HIGH=15, MEDIUM=10 (was hardcoded 10 for all).
   Both text and HTML formatters use the same constants.

* fix(digest): remove track.lang cron filter, rely solely on accumulator key

track.lang is written by writeStoryTracking via HSET which overwrites
on every call. If the same normalized title appears in multiple
languages within the 48h TTL, the last writer wins the lang field.
Using it as a cron-side filter creates a race where valid stories get
dropped. The accumulator key namespacing (variant:lang) is the sole
language isolation mechanism.
2026-04-08 17:28:21 +04:00
Elie Habib
998c554a6f feat(payments): subscription welcome email + admin notification (#2809)
* feat(payments): subscription welcome email + admin notification

On subscription.active webhook:
1. Send branded welcome email to user via Resend (matches WM design)
2. Send admin notification to elie@worldmonitor.app with plan, email, userId

Also removed the Dodo customer block from checkout creation since
Dodo locks prefilled customer fields as read-only, preventing users
from editing their email/name during payment.

* fix(payments): correct email feature cards per plan tier + fix plan name mapping

Pro plans showed "Full API Access" which is false (apiAccess: false in catalog).
Now shows plan-appropriate features: Pro gets dashboards/alerts, API plans get
API access. Also aligned PLAN_DISPLAY keys with actual catalog planKeys
(api_starter, api_starter_annual, api_business, enterprise).

* fix(payments): address Greptile review on subscription emails

P1: Throw on Resend failure so Convex retries transient errors (5xx,
429, network) instead of silently dropping emails.

P2: Only send welcome email for brand-new subscriptions, not
reactivations. Uses the existing `existing` variable to distinguish.

P2: Log a warning when customer email is missing from the webhook
payload so dropped emails are visible in logs.

* fix(emails): replace placeholder logo and remove Someone.ceo branding

All 3 email templates (subscription welcome, register-interest, daily
digest) used a Unicode circle character as a placeholder logo and
"by Someone.ceo" as a subtitle. Replaced with the actual hosted
WorldMonitor favicon and removed the Someone.ceo line.
2026-04-08 08:05:32 +04:00
Elie Habib
2e3de91192 fix(digest): deduplicate near-identical stories in notifications (#2724)
* fix(digest): deduplicate near-identical stories in notification digests

RSS feeds from Reuters, AP, BBC, etc. publish the same event with
slightly different headlines ("search underway" vs "search under way",
"- reuters.com" vs "- Reuters"). Each variant got a unique title hash,
flooding digests with 5-10 copies of the same story.

Two-layer fix:
- Notification script: Jaccard word-overlap clustering (>55% threshold)
  merges near-duplicates, keeps highest-scored representative, sums
  mention counts and merges sources.
- Server normalizeTitle: strips source attribution suffixes before
  hashing so "Title - reuters.com" and "Title - Reuters" share one
  accumulator entry going forward.
- Display: strip source suffixes from formatted titles (source already
  shown separately as [Reuters US, ...]).

* fix(digest): fetch sources for all merged hashes, not just representative

deduplicateStories() runs before SMEMBERS fetch, so merged stories'
source sets were lost. Now carries mergedHashes array through clustering
and fetches story:sources for every hash in the cluster, unioning them.

* style: drop redundant /i flag from normalizeTitle regexes

Input is already lowercased by .toLowerCase() before the regex runs.
2026-04-05 13:47:53 +04:00
Elie Habib
c51717e76a feat(digest): daily digest notification mode (#2614)
* feat(digest): add daily digest notification mode (Enhancement 2)

- convex/schema.ts: add digestMode/digestHour/digestTimezone to alertRules
- convex/alertRules.ts: setDigestSettings mutation, setDigestSettingsForUser
  internal mutation, getDigestRules internal query
- convex/http.ts: GET /relay/digest-rules for Railway cron; set-digest-settings
  action in /relay/notification-channels
- cache-keys.ts: DIGEST_LAST_SENT_KEY + DIGEST_ACCUMULATOR_TTL (48h); fix
  accumulator EXPIRE to use 48h instead of 7-day STORY_TTL
- notification-relay.cjs: skip digest-mode rules in processEvent — prevents
  daily/weekly users from receiving both real-time and digest messages
- seed-digest-notifications.mjs: new Railway cron (every 30 min) — queries
  due rules, ZRANGEBYSCORE accumulator, batch HGETALL story tracks, derives
  phase, formats digest per channel, updates digest:last-sent
- notification-channels.ts: DigestMode type, digest fields on AlertRule,
  setDigestSettings() client function
- api/notification-channels.ts: set-digest-settings action

* fix(digest): correct twice_daily scheduling and only advance lastSent on confirmed delivery

isDue() only checked a single hour slot, so twice_daily users got one digest per day
instead of two. Now checks both primaryHour and (primaryHour+12)%24 for twice_daily.

All four send functions returned void and errors were swallowed, causing dispatched=true
to be set unconditionally. Replaced with boolean returns and anyDelivered guard so
lastSentKey is only written when at least one channel confirms a 2xx delivery.

* fix(digest): add discord to deactivate allowlist, bounds-check digestHour, minor cleanup

/relay/deactivate was rejecting channelType="discord" with 400, so stale Discord
webhooks were never auto-deactivated. Added "discord" to the validation guard.

Added 0-23 integer bounds check for digestHour in both setDigestSettings mutations
to reject bad values at the DB layer rather than silently storing them.

Removed unused createHash import and added AbortSignal.timeout(10000) to
upstashRest to match upstashPipeline and prevent cron hangs.

* fix(daily-digest): add DIGEST_CRON_ENABLED guard, IANA timezone validation, and Digest Mode UI

- seed-digest-notifications.mjs: exit 0 when DIGEST_CRON_ENABLED=0 so Railway
  cron does not error on intentionally disabled runs
- convex/alertRules.ts: validate digestTimezone via Intl.DateTimeFormat; throw
  ConvexError with descriptive message for invalid IANA strings
- preferences-content.ts: add Digest Mode section with mode select (realtime/
  daily/twice_daily/weekly), delivery hour select, and timezone input; details
  panel hidden in realtime mode; wired to setDigestSettings with 800ms debounce

Fixes gaps F, G, I from docs/plans/2026-04-02-003-fix-news-alerts-pr-gaps-plan.md

* fix(digest): close digest blackhole and wire timezone validation through internal mutation

- convex/alertRules.ts: add IANA timezone validation to setDigestSettingsForUser
  (internalMutation called by http.ts); the public mutation already validated but
  the edge/relay path bypassed it
- preferences-content.ts: add VITE_DIGEST_CRON_ENABLED browser flag; when =0,
  disable the digest mode select and show only Real-time with a note so users
  cannot enter a blackhole state where the relay skips their rule and the cron
  never runs

Addresses P1 and P2 review findings on #2614

* fix(digest): restore missing > closing the usDigestDetails div opening tag

* feat(digest): redesign email to match WorldMonitor design system

Dark theme (#0a0a0a bg, #111 cards), #4ade80 green accent, 4px top bar,
table-based logo header, severity-bucketed story cards with colored left
borders, stats row (total/critical/high), green CTA button. Plain text
fallback preserved for Telegram/Slack/Discord channels.

* test(digest): add rollout-flag and timezone-validation regression tests

Covers three paths flagged as untested by reviewers:
- VITE_DIGEST_CRON_ENABLED gates digest-mode options and usDigestDetails visibility
- setDigestSettings (public) validates digestTimezone via Intl.DateTimeFormat
- setDigestSettingsForUser (internalMutation) also validates digestTimezone
  to prevent silent bypass through the edge-to-Convex path
2026-04-02 22:17:24 +04:00