Commit Graph

7 Commits

Author SHA1 Message Date
Elie Habib
93eca7bbbf fix(digest): dense-fill topicOf with -1 sentinel + surface missed indices
Greptile P2 on PR #3247: `new Array(top.length)` creates a sparse array.
If a future injected clusterFn doesn't cover every input index,
topicOf[i] would be undefined, which then silently poisons the phase-1
aggregates (topicSize[undefined] / topicMax[undefined]) and degrades
the topic sort without any observable failure.

Fill with -1 so absence is unambiguous, then validate after clusterFn
runs and throw if any index is still -1. The outer try/catch captures
the error and returns {reps: top, topicCount: top.length, error}
matching the existing contract — primary order is preserved, no crash.

No behavior change today: singleLinkCluster's union-find guarantees
every index is covered. This just guards the invariant for future
clusterFn injections.
2026-04-21 08:52:49 +04:00
Elie Habib
f5205bdb57 fix(digest): truthful typo warn + gate grouping on actual embeddings
Two P2 bugs from round-3 review:

P2a: warn text lied. After the typo→jaccard fix, the warn still said
"defaulting to embed", which is the opposite of what the code now does.
During an outage, operators reading the warn get told the wrong thing.
Updated to "falling back to jaccard (safe rollback path)" to match the
actual behavior.

P2b: shouldGroupTopics gate used stale signal. Gating on cfg.mode ===
'embed' worked for configured-jaccard (kill switch) but NOT for runtime
Jaccard fallback. When the embed path throws inside deduplicateStories,
it falls back to Jaccard but cfg.mode is still 'embed'. The gate passed,
groupTopicsPostDedup ran with an empty embeddingByHash, and the caller
emitted a misleading "topic grouping failed: missing embedding" warn
ON TOP of the legitimate "falling back to Jaccard" warn.

Ground-truth fix: gate on `embeddingByHash.size > 0`. The Map is the
authoritative signal for "primary embed path produced vectors" —
populated only on success, empty in both fallback paths (configured
and runtime). One gate, both paths clean.

Added regression test: "runtime Jaccard fallback returns empty
embeddingByHash + empty logSummary" — proves the ground-truth invariant
the caller relies on, so a future change can't re-introduce the leak.

Tests: 5915 pass (+1 new). typecheck, typecheck:api, biome clean.
2026-04-21 08:35:10 +04:00
Elie Habib
d234452e5c fix(digest): two-phase topic sort + typo-safe mode fallback
Two P1/P2 bugs found in post-PR-#3247 review:

P1: groupTopicsPostDedup did NOT guarantee contiguous topic blocks.
The global sort key (topicSize, topicMax, repScore, titleHashHex) fell
through to per-rep repScore when two topics tied on size and max,
interleaving members. Two same-size-same-max topics {A90,A80} and
{B90,B70} output as [A90,B90,A80,B70] — broken contiguity, breaking
the editorial promise of the PR.

Fix: two-phase sort. Phase 1 orders TOPICS by
  (topicSize DESC, topicMax DESC, topicTieHash ASC)
where topicTieHash = min titleHash among the topic's members — a
topic-level invariant, not a rep-level one. Phase 2 orders members
within each topic by (repScore DESC, titleHashHex ASC). Concatenate
in topic order. Members of the same topic CANNOT interleave with any
other topic's members. Added regression test with the exact fixture.

P2: DIGEST_DEDUP_MODE typos failed OPEN to embed, not Jaccard. File
header documented "non-{embed,jaccard} → jaccard with warn" but
readOrchestratorConfig mapped typos to mode='embed'. Operator scenario:
during an embed outage sets DIGEST_DEDUP_MODE=jacard — kill switch
silently stays off.

Fix: typo / unrecognised value resolves to mode='jaccard' (safer),
matching the documented contract. invalidModeRaw warn still fires so
operators see the typo. Added 3 parsing tests (typo, garbage, empty).

Tests: 5914 pass (was 5910 + 4 new). typecheck, typecheck:api, biome clean.
2026-04-21 07:29:20 +04:00
Elie Habib
38541c1075 chore(digest): address /ce:review round 1 findings
Fixes 5 findings raised by the multi-agent review of PR #3247:

- #234 P2: Drop dead `deps.log` shim from `deduplicateStories` — caller
  now owns the log line so the param rotted (no test used it post-#3247).
  Removed from JSDoc + signature logic.

- #236 P3: Add defensive warn when `winningIdx === undefined` during
  sidecar `embeddingByHash` population. Shouldn't fire with the current
  `materializeCluster` contract, but catches a future refactor where a
  synthesised rep would silently skip topic grouping.

- #237 P3: Skip `groupTopicsPostDedup` when `cfg.mode === 'jaccard'` —
  the kill-switch path returns an empty `embeddingByHash`, and running
  the secondary pass on it would log a noisy "missing embedding" warn
  every tick. Gate the call site; passthrough primary order.

- #240a P3: Remove dead `top ?? []` fallback after `!Array.isArray(top)`
  already handled the falsy case. Replaced with an explicit Array check
  for the rare "falsy but also not-array" input (defence-in-depth).

- #240b P3: Delete two redundant test blocks — the `titleHashHex tiebreak`
  2-rep fixture (permutation-invariance at 15-rep scale already covers
  this invariant) and the `caller log-line format (regex splice)` describe
  block (the regex lives in seed-digest-notifications.mjs, not brief-dedup;
  the full-flow envelope-cleanliness test exercises the caller end-to-end).

Deferred to follow-up: #235 (structured logParts vs regex splice), #238
(plumb cfg to avoid double env-read), #239 (repo-wide env .trim pattern).

Tests: 5910 pass (was 5913; -3 redundant tests removed). typecheck,
typecheck:api, biome all clean.
2026-04-21 07:20:12 +04:00
Elie Habib
8fe6284c4f feat(digest): topic-grouped brief ordering (size-first)
Brief composer currently surfaces top-N stories in raw currentScore DESC
order. On topic-dominant news days (e.g. 2026-04-20 20:00 brief) related
stories scatter — 4 Hormuz angles at positions 1/3/8/11 with unrelated
stories wedged between them.

Secondary clustering pass on already-sliced top-30 reps at a looser
cosine threshold (default 0.45), then re-orders by a total key:
(topicSize DESC, topicMax DESC, repScore DESC, titleHashHex ASC). The
dominant thread leads; within-thread order is score DESC; ties are
deterministic. Hidden behind DIGEST_DEDUP_TOPIC_GROUPING (default 1 —
kill switch = '0', no deploy).

Design notes:
- Post-slice placement bounds work to N ≤ 30 (~0.4 ms) and avoids
  reshuffling reps that never surface.
- Sidecar Map<hash, number[]> returned from deduplicateStories — no
  hidden __embedding fields on the user-facing Rep (would otherwise
  risk leaking into the brief envelope).
- groupTopicsPostDedup is pure: no I/O, no logging, injected
  clusterFn for testability. Errors are RETURNED (not thrown) so a
  helper bug cannot cascade into the outer Jaccard fallback boundary.
- Caller owns logging: deduplicateStories returns logSummary; caller
  splices ` topics=N ` after `clusters=M ` via a simple regex, emits
  one log line per tick.

Env:
- DIGEST_DEDUP_TOPIC_GROUPING   = '0' disables (default on)
- DIGEST_DEDUP_TOPIC_THRESHOLD  = float in (0,1], default 0.45

Tests: 55 in tests/brief-dedup-embedding.test.mjs (was 33, +22 new):
size-first ordering, topicMax tiebreak, within-topic score, titleHash
determinism, kill switch, permutation invariance, empty/singleton,
injected clusterer throws, missing embedding, materialized-rep keying,
envelope cleanliness (JSON.stringify has no _embedding / __ /
embeddingByHash), log-line splice regex, pre-slice input size, and
6 env-parsing cases.

Full verification: npm run test:data (5913 pass), typecheck,
typecheck:api, biome check on changed files — all clean. Pre-existing
main() complexity (74) unchanged.
2026-04-21 06:50:52 +04:00
Elie Habib
d1ebc84c6c feat(digest-dedup): single-link clustering (F1 0.73 vs 0.53 complete-link) (#3234)
Problem
-------
The post-threshold-tuning brief at
/api/brief/user_3BovQ1tYlaz2YIGYAdDPXGFBgKy/2026-04-20-1532 still
showed 4 copies of "US seizes Iranian ship", 3 copies of the Hormuz
closure, and 2 copies of the oil-price story — despite running the
calibrated 0.55 threshold.

Root cause: complete-link is too strict for wire-headline clustering.
Pairwise cosines in the 4-way ship-seizure cluster:

    1 <-> 5: 0.632    5 <-> 8: 0.692
    1 <-> 8: 0.500    5 <-> 10: 0.656
    1 <-> 10: 0.554   8 <-> 10: 0.510

Complete-link requires EVERY pair to clear threshold. Pair 1<->8 at
0.500 fails so the whole 4-way cluster can't form, and all 4 stories
bubble up as separate reps, eating 4 slots of the 12-story brief.

Measured on the 12 real titles from that brief:

    Algorithm                 | Clusters | F1    | P    | R
    --------------------------|----------|-------|------|------
    complete-link @ 0.55 (was)|        7 | 0.526 | 0.56 | 0.50
    complete-link @ 0.50      |        6 | 0.435 | 0.38 | 0.50
    single-link   @ 0.55      |        4 | 0.435 | 0.28 | 1.00  over-merge
    single-link   @ 0.60      |        6 | 0.727 | 0.67 | 0.80  winner

Change
------
scripts/lib/brief-dedup-embed.mjs:
  New singleLinkCluster(items, {cosineThreshold, vetoFn}) using
  union-find. Chain merges through strong intermediates when a
  direct pair is weak; respects the entity veto (blocked pairs
  don't union). O(N^2 alpha(N)); permutation-invariant by
  construction.

scripts/lib/brief-dedup.mjs:
  New DIGEST_DEDUP_CLUSTERING env var (default 'single', set
  'complete' to revert). readOrchestratorConfig returns 'clustering'
  field. Dispatch at call site picks the right function. Structured
  log line now includes clustering=<algo>.

tests/brief-dedup-embedding.test.mjs:
  +8 regressions:
    - singleLinkCluster chains the 4-way through a bridge
    - veto blocks unions even when cosine passes
    - permutation-invariance property test (5 shuffles)
    - empty-input
    - DIGEST_DEDUP_CLUSTERING default is 'single'
    - DIGEST_DEDUP_CLUSTERING=complete kill switch works
    - unrecognised values fall back to 'single'
    - log line includes clustering=<algo>

Bridge-pollution risk note
--------------------------
The original plan rejected single-link to avoid the Jaccard-era
"bridge pollution" (A~B=0.6, B~C=0.6, A~C=0.3 all chain through a
mixed-topic B). With text-embedding-3-small at cosine >= 0.60, a
bridge must be semantically real — the probe showed a 37% F1 bump
with no new FPs on the production case. Setting
DIGEST_DEDUP_CLUSTERING=complete on Railway is the instant rollback
if a bad day ever surfaces chaining.

Operator activation
-------------------
After merge, on Railway seed-digest-notifications service:

    DIGEST_DEDUP_COSINE_THRESHOLD=0.60

No other changes needed — clustering=single is the default.

Verification
------------
- npm run test:data           5825/5825 pass
- tests/brief-dedup-embedding  53/53   pass (45 existing + 8 new)
- typecheck + typecheck:api   clean
- biome check on changed files clean

Post-Deploy Monitoring & Validation
-----------------------------------
- Grep '[digest] dedup mode=embed clustering=single' in Railway logs
  — confirms the new algo is live
- Expect clusters= to drop further on bulk ticks (stories=700+):
  current ~23 on 84-story ticks -> expected ~15-18
- Manually open next brief post-deploy, visually verify ship-seizure
  / Hormuz / oil stories no longer duplicate
- Rollback: DIGEST_DEDUP_CLUSTERING=complete on Railway (instant,
  no deploy), next cron tick reverts to old behaviour
- Validation window: 24h
- Owner: koala73

Related
-------
- #3200 embedding-based dedup (introduced complete-link)
- #3224 DIGEST_SCORE_MIN floor (the low-importance half of the fix)
2026-04-20 16:21:20 +04:00
Elie Habib
305dc5ef36 feat(digest-dedup): Phase A — embedding-based dedup scaffolding (no-op) (#3200)
* feat(digest-dedup): Phase A — embedding-based dedup scaffolding (no-op)

Replaces the inline Jaccard story-dedup in seed-digest-notifications
with an orchestrator that can run Jaccard, shadow, or full embedding
modes. Ships with DIGEST_DEDUP_MODE=jaccard as the default so
production behaviour is unchanged until Phase C shadow + Phase D flip.

New modules (scripts/lib/):
- brief-dedup-consts.mjs       tunables + cache prefix + __constants bag
- brief-dedup-jaccard.mjs      verbatim 0.55-threshold extract (fallback)
- entity-gazetteer.mjs         cities/regions gazetteer + common-caps
- brief-embedding.mjs          OpenRouter /embeddings client with Upstash
                               cache, all-or-nothing timeout, cosineSimilarity
- brief-dedup-embed.mjs        complete-link clustering + entity veto (pure)
- brief-dedup.mjs              orchestrator, env read at call entry,
                               shadow archive, structured log line

Operator tools (scripts/tools/):
- calibrate-dedup-threshold.mjs  offline calibration runner + histogram
- golden-pair-validator.mjs      live-embedder drift detector (nightly CI)
- shadow-sample.mjs              Sample A/B CSV emitter over SCAN archive

Tests:
- brief-dedup-jaccard.test.mjs    migrated from regex-harness to direct
                                   import plus orchestrator parity tests (22)
- brief-dedup-embedding.test.mjs  9 plan scenarios incl. 10-permutation
                                   property test, complete-link non-chain (21)
- brief-dedup-golden.test.mjs     20-pair mocked canary (21)

Workflows:
- .github/workflows/dedup-golden-pairs.yml  nightly live-embedder canary
                                             (07:17 UTC), opens issue on drift

Deviation from plan: the shouldVeto("Iran closes Hormuz", "Tehran
shuts Hormuz") case can't return true under a single coherent
classification (country-in-A vs capital-in-B sit on different sides
of the actor/location boundary). Gazetteer follows the plan's
"countries are actors" intent; the test is updated to assert false
with a comment pointing at the irreducible capital-country
coreference limitation.

Verification:
- npm run test:data          5825/5825 pass
- tests/edge-functions        171/171 pass
- typecheck + typecheck:api  clean
- biome check on new files    clean
- lint:md                     0 errors

Phase B (calibration), Phase C (shadow), and Phase D (flip) are
subsequent PRs.

* refactor(digest-dedup): address review findings 193-199

Fresh-eyes review found 3 P1s, 3 P2s, and a P3 bundle across
kieran-typescript, security-sentinel, performance-oracle, architecture-
strategist, and code-simplicity reviewers. Fixes below; all 64 dedup
tests + 5825 data tests + 171 edge-function tests still green.

P1 #193 - dedup regex + redis pipeline duplication
- Extract defaultRedisPipeline into scripts/lib/_upstash-pipeline.mjs;
  both orchestrator and embedding client import from there.
- normalizeForEmbedding now delegates to stripSourceSuffix from the
  Jaccard module so the outlet allow-list is single-sourced.

P1 #194 - embedding timeout floor + negative-budget path
- callEmbeddingsApi throws EmbeddingTimeoutError when timeoutMs<=0
  instead of opening a doomed 250ms fetch.
- Removed Math.max(250, ...) floor that let wall-clock cap overshoot.

P1 #195 - dead env getters
- Deleted getMode / isRemoteEmbedEnabled / isEntityVetoEnabled /
  getCosineThreshold / getWallClockMs from brief-dedup-consts.mjs
  (zero callers; orchestrator reimplements inline).

P2 #196 - orchestrator cleanup bundle
- Removed re-exports at bottom of brief-dedup.mjs.
- Extracted materializeCluster into brief-dedup-jaccard.mjs; both
  the fallback and orchestrator use the shared helper.
- Deleted clusterWithEntityVeto wrapper; orchestrator inlines the
  vetoFn wiring at the single call site.
- Shadow mode now runs Jaccard exactly once per tick (was twice).
- Fallback warn line carries reason=ErrorName so operators can
  filter timeout vs provider vs shape errors.
- Invalid DIGEST_DEDUP_MODE values emit a warn once per run (vs
  silently falling to jaccard).

P2 #197 - workflow + shadow-sample hardening
- dedup-golden-pairs.yml body composition no longer relies on a
  heredoc that would command-substitute validator stdout. Switched
  to printf with sanitised LOG_TAIL (printable ASCII only) and
  --body-file so crafted fixture text cannot escape into the runner.
- shadow-sample.mjs Upstash helper enforces a hardcoded command
  allowlist (SCAN | GET | EXISTS).

P2 #198 - test + observability polish
- Scenarios 2 and 3 deep-equal returned clusters against the Jaccard
  expected shape, not just length. Also assert the reason= field.

P3 #199 - nits
- Removed __constants test-bag; jaccard tests use named imports.
- Renamed deps.apiKey to deps._apiKey in embedding client.
- Added @pre JSDoc on diffClustersByHash about unique-hash contract.
- Deferred: mocked golden-pair test removal, gazetteer JSON migration,
  scripts/tools AGENTS.md doc note.

Todos 193-199 moved from pending to complete.

Verification:
- npm run test:data            5825/5825 pass
- tests/edge-functions          171/171 pass
- typecheck + typecheck:api    clean
- biome check on changed files clean

* fix(digest-dedup): address Greptile P2 findings on PR #3200

1. brief-embedding.mjs: wrap fetch lookup as
   `(...args) => globalThis.fetch(...args)` instead of aliasing bare
   `fetch`. Aliasing captures the binding at module-load time, so
   later instrumentation / Edge-runtime shims don't see the wrapper —
   same class of bug as the banned `fetch.bind(globalThis)` pattern
   flagged in AGENTS.md.

2. dedup-golden-pairs.yml: `gh issue create --label "..." || true`
   silently swallowed the failure when any of dedup/canary/p1 labels
   didn't pre-exist, breaking the drift alert channel while leaving
   the job red in the Actions UI. Switched to repeated `--label`
   flags + `--create-label` so any missing label is auto-created on
   first drift, and dropped the `|| true` so a legitimate failure
   (network / auth) surfaces instead of hiding.

Both fixes are P2-style per Greptile (confidence 5/5, no P0/P1);
applied pre-merge so the nightly canary is usable from day one.

* fix(digest-dedup): two P1s found on PR #3200

P1 — canary classifier must match production
Nightly golden-pair validator was checking a hardcoded threshold
(default 0.60) and always applied the entity veto, while the actual
dedup path at runtime reads DIGEST_DEDUP_COSINE_THRESHOLD and
DIGEST_DEDUP_ENTITY_VETO_ENABLED from env at every call. A Phase
C/D env flip could make the canary green while prod was wrong or
red while prod was healthy, defeating the whole point of a drift
detector.

Fix:
- golden-pair-validator.mjs now calls readOrchestratorConfig(process.env)
  — the same helper the orchestrator uses — so any classifier knob
  added later is picked up automatically. The threshold and veto-
  enabled flags are sourced from env by default; a --threshold CLI
  flag still overrides for manual calibration sweeps.
- dedup-golden-pairs.yml sources DIGEST_DEDUP_COSINE_THRESHOLD and
  DIGEST_DEDUP_ENTITY_VETO_ENABLED from GitHub repo variables (vars.*),
  which operators must keep in lockstep with Railway. The
  workflow_dispatch threshold input now defaults to empty; the
  scheduled canary always uses the production-parity config.
- Validator log line prints the effective config + source so nightly
  output makes the classifier visible.

P1 — shadow archive writes were fail-open
`defaultRedisPipeline()` returns null on timeout / auth / HTTP
failure. `writeShadowArchive()` only had a try/catch, so the null
result was silently treated as success. A Phase C rollout could
log clean "mode=shadow … disagreements=X" lines every tick while
the Upstash archive received zero writes — and Sample B labelling
would then find no batches, silently killing calibration.

Fix:
- writeShadowArchive now inspects the pipeline return. null result,
  non-array response, per-command {error}, or a cell without
  {result: "OK"} all return {ok: false, reason}.
- Orchestrator emits a warn line with the failure reason, and the
  structured log line carries archive_write=ok|failed so operators
  can grep for failed ticks.
- Regression test in brief-dedup-embedding.test.mjs simulates the
  null-pipeline contract and asserts both the warn and the structured
  field land.

Verification:
- test:data           5825/5825 pass
- dedup suites         65/65   pass (new: archive-fail regression)
- typecheck + api     clean
- biome check         clean on changed files

* fix(digest-dedup): two more P1s found on PR #3200

P1 — canary must also honour DIGEST_DEDUP_MODE + REMOTE_EMBED_ENABLED
The prior round fixed the threshold/veto knobs but left the canary
running embeddings regardless of whether production could actually
reach the embed path. If Railway has DIGEST_DEDUP_MODE=jaccard or
DIGEST_DEDUP_REMOTE_EMBED_ENABLED=0, production never calls the
classifier, so a drift signal is meaningless — or worse, a live
OpenRouter issue flags the canary while prod is obliviously fine.

Fix:
- golden-pair-validator.mjs reads mode + remoteEmbedEnabled from the
  same readOrchestratorConfig() helper the orchestrator uses. When
  either says "embed path inactive in prod", the validator logs an
  explicit skip line and exits 0. The nightly workflow then shows
  green, which is the correct signal ("nothing to drift against").
- A --force CLI flag remains for manual dispatch during staged
  rollouts.
- dedup-golden-pairs.yml sources DIGEST_DEDUP_MODE and
  DIGEST_DEDUP_REMOTE_EMBED_ENABLED from GitHub repo variables
  alongside the threshold and veto-enabled knobs, so all four
  classifier gates stay in lockstep with Railway.
- Validator log line now prints mode + remoteEmbedEnabled so the
  canary output surfaces which classifier it validated.

P1 — shadow-sample Sample A was biased by SCAN order
enumerate-and-dedup added every seen pair to a dedup key BEFORE
filtering by agreement. If the same pair appeared in an agreeing
batch first and a disagreeing batch later, the disagreeing
occurrence was silently dropped. SCAN order is unspecified, so
Sample A could omit real disagreement pairs.

Fix:
- Extracted the enumeration into a pure `enumeratePairs(archives, mode)`
  export so the logic is testable. Mode filter runs BEFORE the dedup
  check: agreeing pairs are skipped entirely under
  --mode disagreements, so any later disagreeing occurrence can
  still claim the dedup slot.
- Added tests/brief-dedup-shadow-sample.test.mjs with 5 regression
  cases: agreement-then-disagreement, reversed order (symmetry),
  always-agreed omission, population enumeration, cross-batch dedup.
- isMain guard added so importing the module for tests does not
  kick off the CLI scan path.

Verification:
- test:data           5825/5825 pass
- dedup suites         70/70   pass (5 new shadow-sample regressions)
- typecheck + api     clean
- biome check         clean on changed files

Operator follow-up before Phase C:
Set all FOUR dedup repo variables in GitHub alongside Railway:
  DIGEST_DEDUP_MODE, DIGEST_DEDUP_REMOTE_EMBED_ENABLED,
  DIGEST_DEDUP_COSINE_THRESHOLD, DIGEST_DEDUP_ENTITY_VETO_ENABLED

* refactor(digest-dedup): Railway is the single source of truth for dedup config

Fair user pushback: asking operators to set four DIGEST_DEDUP_*
values in BOTH Railway (where the cron runs) AND GitHub repo
variables (where the canary runs) is architectural debt. Two
copies of the same truth will always drift.

Solution: the digest cron publishes its resolved config to Upstash
on every tick under brief:dedup:config:v1 (2h TTL). The nightly
golden-pair canary reads that key instead of env vars. Railway
stays the sole source of truth; no parallel repo variables to
maintain. A missing/expired key signals "cron hasn't run" and
the canary skips with exit 0 — better than validating against
hardcoded defaults that might diverge from prod.

Changes:
- brief-dedup-consts.mjs: new ACTIVE_CONFIG_KEY + TTL constants.
- brief-dedup.mjs: new publishActiveConfig() fires at the start of
  every deduplicateStories() call (before the mode short-circuit,
  so jaccard ticks also publish a "mode=jaccard" signal the canary
  can read). Fire-and-forget; archive-write error semantics still
  apply if the operator wants stricter tracking.
- golden-pair-validator.mjs: removed readOrchestratorConfig(env)
  path. Now calls fetchActiveConfigFromUpstash() and either
  validates against that config, skips when the embed path is
  inactive, or skips when the key is missing (with --force
  override for manual dispatch).
- dedup-golden-pairs.yml: dropped the four DIGEST_DEDUP_* env lines
  and the corresponding repo-variable dependency. Only the three
  Upstash + OpenRouter secrets remain.
- tests: two new regressions assert config is published on every
  tick (shadow AND jaccard modes) with the right shape + TTL.

Operator onboarding now takes one action: set the four
DIGEST_DEDUP_* variables on the Railway seed-digest-notifications
service. Nothing to set in GitHub beyond the existing
OPENROUTER_API_KEY / UPSTASH_* secrets.

Verification:
- test:data           5825/5825 pass
- dedup suites         72/72   pass (2 new config-publish regressions)
- typecheck + api     clean
- biome check         clean on changed files

* refactor(digest-dedup): ship embed directly, drop phases/canary/shadow

User feedback: "i dont need multiple phases and shit, we go directly
to embed". Fair. Ripping out the overengineering I accumulated:

DELETED
- .github/workflows/dedup-golden-pairs.yml (nightly canary)
- scripts/tools/golden-pair-validator.mjs
- scripts/tools/shadow-sample.mjs
- scripts/tools/calibrate-dedup-threshold.mjs
- tests/fixtures/brief-dedup-golden-pairs.json
- tests/brief-dedup-golden.test.mjs
- tests/brief-dedup-shadow-sample.test.mjs

SIMPLIFIED
- brief-dedup.mjs: removed shadow mode, publishActiveConfig,
  writeShadowArchive, diffClustersByHash, jaccardRepsToClusterHashes,
  and the DIGEST_DEDUP_REMOTE_EMBED_ENABLED knob. MODE is now
  binary: `embed` (default) or `jaccard` (instant kill switch).
- brief-dedup-consts.mjs: dropped SHADOW_ARCHIVE_*, ACTIVE_CONFIG_*.
- Default flipped: DIGEST_DEDUP_MODE unset = embed (prod path).
  Railway deploy with OPENROUTER_API_KEY set = embeddings live on
  next cron tick. Set MODE=jaccard on Railway to revert instantly.

Orchestrator still falls back to Jaccard on any embed-path failure
(timeout, provider outage, missing API key, bad response). Fallback
warn carries reason=<ErrorName>. The cron never fails because
embeddings flaked. All 64 dedup tests + 5825 data tests still green.

Net diff: -1,407 lines.

Operator single action: set OPENROUTER_API_KEY on Railway's
seed-digest-notifications service (already present) and ship. No
GH Actions, no shadow archives, no labelling sprints. If the 0.60
threshold turns out wrong, tune DIGEST_DEDUP_COSINE_THRESHOLD on
Railway — takes effect on next tick, no redeploy.

* fix(digest-dedup): multi-word location phrases in the entity veto

Extractor was whitespace-tokenising and only single-token matching
against LOCATION_GAZETTEER, silently making every multi-word entry
unreachable:

  extractEntities("Houthis strike ship in Red Sea")
    → { locations: [], actors: ['houthis','red','sea'] }   ✗
  shouldVeto("Houthis strike ship in Red Sea",
             "US escorts convoy in Red Sea")  → false       ✗

With MODE=embed as the default, that turned off the main
anti-overmerge safety rail for bodies of water, regions, and
compound city names — exactly the P07-Hormuz / Houthis-Red-Sea
headlines the veto was designed to cover.

Fix: greedy longest-phrase scan with a sliding window. At each
token position try the longest multi-word phrase first (down to
2), require first AND last tokens to be capitalised (so lowercase
prose like "the middle east" doesn't falsely match while headline
"Middle East" does), lowercase connectors in between are fine
("Strait of Hormuz" → phrase "strait of hormuz" ✓). Falls back to
single-token lookup when no multi-word phrase fits.

Now:
  extractEntities("Houthis strike ship in Red Sea")
    → { locations: ['red sea'], actors: ['houthis'] }       ✓
  shouldVeto(Red-Sea-Houthis, Red-Sea-US) → true             ✓

Complexity still O(N · MAX_PHRASE_LEN) — MAX_PHRASE_LEN is 4
(longest gazetteer entry: "ho chi minh city"), so this is
effectively O(N).

Added 5 regression tests covering Red Sea, South China Sea,
Strait of Hormuz (lowercase-connector case), Abu Dhabi, and
New York, plus the Houthis-vs-US veto reproducer from the P1.
All 5825 data tests + 45 dedup tests green; lint + typecheck clean.
2026-04-19 13:49:48 +04:00