fix(brief): address Greptile P1 + P4 review on merged PR #3396 (#3401)

P1 — False-positive PARITY REGRESSION for AI-digest opt-out users
  (scripts/seed-digest-notifications.mjs)

When rule.aiDigestEnabled === false, briefLead is intentionally
null (no summary in channel bodies), but envLead still reads the
envelope's stub lead. The string comparison null !== '<stub lead>'
fired channels_equal=false on every tick for every opted-out user
— flooding the parity log with noise and risking the PARITY
REGRESSION alert becoming useless.

The WARN was already gated by `briefLead && envLead` (so no Sentry
flood), but the LOG line still misled operators counting
channels_equal=false. Gate the entire parity-log block on the same
condition that governs briefLead population:

  if (AI_DIGEST_ENABLED && rule.aiDigestEnabled !== false) {
    // … parity log + warn …
  }

Opt-out users now produce no parity-log line at all (correct —
there's no canonical synthesis to compare against).

P4 — greetingBucket '' fallback semantics
  (scripts/lib/brief-llm.mjs)

Doc-only — Greptile flagged that unrecognised greetings collapse
to '' (a single bucket). Added a comment clarifying this is
intentional: '' is a stable fourth bucket, not a sentinel for
"missing data". A user whose greeting flips between recognised
and unrecognised values gets different cache keys, which is
correct (those produce visibly different leads).

Other Greptile findings (no code change — replied via PR comment):
- P2 (double-fetch in [...sortedDue, ...sortedAll]): already
  addressed in helper extraction commit df3563080 of PR #3396 —
  see `seen` Set dedupe at scripts/lib/digest-orchestration-helpers.mjs:103.
- P2 (parity check no-op for opted-in): outdated as written —
  after 5d10cee86's per-rule synthesis, briefLead is per-rule and
  envLead is winner-rule's envelope.lead. They diverge for
  non-winner rules (legitimate); agree for winner rules (cache-
  shared via generateDigestProse). The check still serves its
  documented purpose for cache-drift detection.

Stacked on the merged PR #3396; opens as a follow-up since the
parent branch is now closed.

Test results: 7012/7012 (was 7006 pre-rebase onto post-merge main).
This commit is contained in:
Elie Habib
2026-04-25 16:43:50 +04:00
committed by GitHub
parent 7e68b30eb8
commit 92dd046820
2 changed files with 51 additions and 32 deletions

View File

@@ -344,6 +344,16 @@ const DIGEST_PROSE_SYSTEM_BASE =
* locales; the bucket collapses them to one of three slots so the
* cache key only changes when the time-of-day window changes.
*
* Unrecognised greetings (locale-specific phrases the keyword
* heuristic doesn't match, empty strings after locale changes,
* non-string inputs) collapse to the literal `''` slot. This is
* INTENTIONAL — it's a stable fourth bucket, not a sentinel for
* "missing data". A user whose greeting flips between a recognised
* value (e.g. "Good morning") and an unrecognised one (e.g. a
* locale-specific phrase) will get different cache keys, which is
* correct: those produce visibly different leads. Greptile P2 on
* PR #3396 raised the visibility, kept the behaviour.
*
* @param {string|null|undefined} greeting
* @returns {'morning' | 'afternoon' | 'evening' | ''}
*/

View File

@@ -1767,7 +1767,13 @@ async function main() {
console.log(
`[digest] Sent ${stories.length} stories to ${rule.userId} (${rule.variant}, ${rule.digestMode})`,
);
// Parity observability. Two distinct properties to track:
// Parity observability. Gated on AI_DIGEST_ENABLED + per-rule
// aiDigestEnabled — without this guard, opt-out users (briefLead
// is intentionally null) trigger PARITY REGRESSION every tick
// (null !== '<envelope stub lead>'), flooding Sentry with
// false positives. Greptile P1 on PR #3396.
//
// Two distinct properties to track:
//
// 1. CHANNEL parity (load-bearing): for ONE send, every channel
// body of THIS rule (email HTML + plain text + Telegram +
@@ -1777,41 +1783,44 @@ async function main() {
//
// 2. WINNER parity (informational): when `winner_match=true`,
// THIS rule is the same one the magazine envelope was
// composed from — so channel lead == magazine lead. When
// `winner_match=false`, this is a non-winner rule send;
// channel lead reflects this rule's pool while the magazine
// URL points at the winner's editorial. Expected divergence,
// not a regression.
// composed from — so channel lead == magazine lead (cache-
// shared via generateDigestProse). When `winner_match=false`,
// this is a non-winner rule send; channel lead reflects this
// rule's pool while the magazine URL points at the winner's
// editorial. Expected divergence, not a regression.
//
// PARITY REGRESSION fires only when winner_match=true AND the
// channel lead differs from the envelope lead (the canonical-
// synthesis contract has actually broken).
const envLead = brief?.envelope?.data?.digest?.lead ?? '';
const winnerVariant = brief?.chosenVariant ?? '';
const winnerMatch = winnerVariant === (rule.variant ?? 'full');
const channelsEqual = briefLead === envLead;
const publicLead = brief?.envelope?.data?.digest?.publicLead ?? '';
console.log(
`[digest] brief lead parity user=${rule.userId} ` +
`rule=${rule.variant ?? 'full'}:${rule.sensitivity ?? 'high'}:${rule.lang ?? 'en'} ` +
`winner_match=${winnerMatch} ` +
`synthesis_level=${synthesisLevel} ` +
`exec_len=${(briefLead ?? '').length} ` +
`brief_lead_len=${envLead.length} ` +
`channels_equal=${channelsEqual} ` +
`public_lead_len=${publicLead.length}`,
);
if (winnerMatch && !channelsEqual && briefLead && envLead) {
// Sentry alert candidate — winner_match=true means this rule
// composed the envelope, so its channel lead MUST match the
// envelope lead. Mismatch = canonical-synthesis cache drift
// or code regression. Logged loudly so Sentry's console-
// breadcrumb hook surfaces it without an explicit
// captureMessage call.
console.warn(
`[digest] PARITY REGRESSION user=${rule.userId} — winner-rule channel lead != envelope lead. ` +
`Investigate: cache drift between compose pass and send pass?`,
// synthesis cache row has drifted between compose and send
// passes — a real contract break).
if (AI_DIGEST_ENABLED && rule.aiDigestEnabled !== false) {
const envLead = brief?.envelope?.data?.digest?.lead ?? '';
const winnerVariant = brief?.chosenVariant ?? '';
const winnerMatch = winnerVariant === (rule.variant ?? 'full');
const channelsEqual = briefLead === envLead;
const publicLead = brief?.envelope?.data?.digest?.publicLead ?? '';
console.log(
`[digest] brief lead parity user=${rule.userId} ` +
`rule=${rule.variant ?? 'full'}:${rule.sensitivity ?? 'high'}:${rule.lang ?? 'en'} ` +
`winner_match=${winnerMatch} ` +
`synthesis_level=${synthesisLevel} ` +
`exec_len=${(briefLead ?? '').length} ` +
`brief_lead_len=${envLead.length} ` +
`channels_equal=${channelsEqual} ` +
`public_lead_len=${publicLead.length}`,
);
if (winnerMatch && !channelsEqual && briefLead && envLead) {
// Sentry alert candidate — winner_match=true means this rule
// composed the envelope, so its channel lead MUST match the
// envelope lead. Mismatch = canonical-synthesis cache drift
// or code regression. Logged loudly so Sentry's console-
// breadcrumb hook surfaces it without an explicit
// captureMessage call.
console.warn(
`[digest] PARITY REGRESSION user=${rule.userId} — winner-rule channel lead != envelope lead. ` +
`Investigate: cache drift between compose pass and send pass?`,
);
}
}
}
}