mirror of
https://github.com/koala73/worldmonitor.git
synced 2026-04-25 17:14:57 +02:00
93eca7bbbf0f8dab501ba6e76fa9db21c2d4dff3
7 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
93eca7bbbf |
fix(digest): dense-fill topicOf with -1 sentinel + surface missed indices
Greptile P2 on PR #3247: `new Array(top.length)` creates a sparse array. If a future injected clusterFn doesn't cover every input index, topicOf[i] would be undefined, which then silently poisons the phase-1 aggregates (topicSize[undefined] / topicMax[undefined]) and degrades the topic sort without any observable failure. Fill with -1 so absence is unambiguous, then validate after clusterFn runs and throw if any index is still -1. The outer try/catch captures the error and returns {reps: top, topicCount: top.length, error} matching the existing contract — primary order is preserved, no crash. No behavior change today: singleLinkCluster's union-find guarantees every index is covered. This just guards the invariant for future clusterFn injections. |
||
|
|
f5205bdb57 |
fix(digest): truthful typo warn + gate grouping on actual embeddings
Two P2 bugs from round-3 review: P2a: warn text lied. After the typo→jaccard fix, the warn still said "defaulting to embed", which is the opposite of what the code now does. During an outage, operators reading the warn get told the wrong thing. Updated to "falling back to jaccard (safe rollback path)" to match the actual behavior. P2b: shouldGroupTopics gate used stale signal. Gating on cfg.mode === 'embed' worked for configured-jaccard (kill switch) but NOT for runtime Jaccard fallback. When the embed path throws inside deduplicateStories, it falls back to Jaccard but cfg.mode is still 'embed'. The gate passed, groupTopicsPostDedup ran with an empty embeddingByHash, and the caller emitted a misleading "topic grouping failed: missing embedding" warn ON TOP of the legitimate "falling back to Jaccard" warn. Ground-truth fix: gate on `embeddingByHash.size > 0`. The Map is the authoritative signal for "primary embed path produced vectors" — populated only on success, empty in both fallback paths (configured and runtime). One gate, both paths clean. Added regression test: "runtime Jaccard fallback returns empty embeddingByHash + empty logSummary" — proves the ground-truth invariant the caller relies on, so a future change can't re-introduce the leak. Tests: 5915 pass (+1 new). typecheck, typecheck:api, biome clean. |
||
|
|
d234452e5c |
fix(digest): two-phase topic sort + typo-safe mode fallback
Two P1/P2 bugs found in post-PR-#3247 review:
P1: groupTopicsPostDedup did NOT guarantee contiguous topic blocks.
The global sort key (topicSize, topicMax, repScore, titleHashHex) fell
through to per-rep repScore when two topics tied on size and max,
interleaving members. Two same-size-same-max topics {A90,A80} and
{B90,B70} output as [A90,B90,A80,B70] — broken contiguity, breaking
the editorial promise of the PR.
Fix: two-phase sort. Phase 1 orders TOPICS by
(topicSize DESC, topicMax DESC, topicTieHash ASC)
where topicTieHash = min titleHash among the topic's members — a
topic-level invariant, not a rep-level one. Phase 2 orders members
within each topic by (repScore DESC, titleHashHex ASC). Concatenate
in topic order. Members of the same topic CANNOT interleave with any
other topic's members. Added regression test with the exact fixture.
P2: DIGEST_DEDUP_MODE typos failed OPEN to embed, not Jaccard. File
header documented "non-{embed,jaccard} → jaccard with warn" but
readOrchestratorConfig mapped typos to mode='embed'. Operator scenario:
during an embed outage sets DIGEST_DEDUP_MODE=jacard — kill switch
silently stays off.
Fix: typo / unrecognised value resolves to mode='jaccard' (safer),
matching the documented contract. invalidModeRaw warn still fires so
operators see the typo. Added 3 parsing tests (typo, garbage, empty).
Tests: 5914 pass (was 5910 + 4 new). typecheck, typecheck:api, biome clean.
|
||
|
|
38541c1075 |
chore(digest): address /ce:review round 1 findings
Fixes 5 findings raised by the multi-agent review of PR #3247: - #234 P2: Drop dead `deps.log` shim from `deduplicateStories` — caller now owns the log line so the param rotted (no test used it post-#3247). Removed from JSDoc + signature logic. - #236 P3: Add defensive warn when `winningIdx === undefined` during sidecar `embeddingByHash` population. Shouldn't fire with the current `materializeCluster` contract, but catches a future refactor where a synthesised rep would silently skip topic grouping. - #237 P3: Skip `groupTopicsPostDedup` when `cfg.mode === 'jaccard'` — the kill-switch path returns an empty `embeddingByHash`, and running the secondary pass on it would log a noisy "missing embedding" warn every tick. Gate the call site; passthrough primary order. - #240a P3: Remove dead `top ?? []` fallback after `!Array.isArray(top)` already handled the falsy case. Replaced with an explicit Array check for the rare "falsy but also not-array" input (defence-in-depth). - #240b P3: Delete two redundant test blocks — the `titleHashHex tiebreak` 2-rep fixture (permutation-invariance at 15-rep scale already covers this invariant) and the `caller log-line format (regex splice)` describe block (the regex lives in seed-digest-notifications.mjs, not brief-dedup; the full-flow envelope-cleanliness test exercises the caller end-to-end). Deferred to follow-up: #235 (structured logParts vs regex splice), #238 (plumb cfg to avoid double env-read), #239 (repo-wide env .trim pattern). Tests: 5910 pass (was 5913; -3 redundant tests removed). typecheck, typecheck:api, biome all clean. |
||
|
|
8fe6284c4f |
feat(digest): topic-grouped brief ordering (size-first)
Brief composer currently surfaces top-N stories in raw currentScore DESC order. On topic-dominant news days (e.g. 2026-04-20 20:00 brief) related stories scatter — 4 Hormuz angles at positions 1/3/8/11 with unrelated stories wedged between them. Secondary clustering pass on already-sliced top-30 reps at a looser cosine threshold (default 0.45), then re-orders by a total key: (topicSize DESC, topicMax DESC, repScore DESC, titleHashHex ASC). The dominant thread leads; within-thread order is score DESC; ties are deterministic. Hidden behind DIGEST_DEDUP_TOPIC_GROUPING (default 1 — kill switch = '0', no deploy). Design notes: - Post-slice placement bounds work to N ≤ 30 (~0.4 ms) and avoids reshuffling reps that never surface. - Sidecar Map<hash, number[]> returned from deduplicateStories — no hidden __embedding fields on the user-facing Rep (would otherwise risk leaking into the brief envelope). - groupTopicsPostDedup is pure: no I/O, no logging, injected clusterFn for testability. Errors are RETURNED (not thrown) so a helper bug cannot cascade into the outer Jaccard fallback boundary. - Caller owns logging: deduplicateStories returns logSummary; caller splices ` topics=N ` after `clusters=M ` via a simple regex, emits one log line per tick. Env: - DIGEST_DEDUP_TOPIC_GROUPING = '0' disables (default on) - DIGEST_DEDUP_TOPIC_THRESHOLD = float in (0,1], default 0.45 Tests: 55 in tests/brief-dedup-embedding.test.mjs (was 33, +22 new): size-first ordering, topicMax tiebreak, within-topic score, titleHash determinism, kill switch, permutation invariance, empty/singleton, injected clusterer throws, missing embedding, materialized-rep keying, envelope cleanliness (JSON.stringify has no _embedding / __ / embeddingByHash), log-line splice regex, pre-slice input size, and 6 env-parsing cases. Full verification: npm run test:data (5913 pass), typecheck, typecheck:api, biome check on changed files — all clean. Pre-existing main() complexity (74) unchanged. |
||
|
|
d1ebc84c6c |
feat(digest-dedup): single-link clustering (F1 0.73 vs 0.53 complete-link) (#3234)
Problem
-------
The post-threshold-tuning brief at
/api/brief/user_3BovQ1tYlaz2YIGYAdDPXGFBgKy/2026-04-20-1532 still
showed 4 copies of "US seizes Iranian ship", 3 copies of the Hormuz
closure, and 2 copies of the oil-price story — despite running the
calibrated 0.55 threshold.
Root cause: complete-link is too strict for wire-headline clustering.
Pairwise cosines in the 4-way ship-seizure cluster:
1 <-> 5: 0.632 5 <-> 8: 0.692
1 <-> 8: 0.500 5 <-> 10: 0.656
1 <-> 10: 0.554 8 <-> 10: 0.510
Complete-link requires EVERY pair to clear threshold. Pair 1<->8 at
0.500 fails so the whole 4-way cluster can't form, and all 4 stories
bubble up as separate reps, eating 4 slots of the 12-story brief.
Measured on the 12 real titles from that brief:
Algorithm | Clusters | F1 | P | R
--------------------------|----------|-------|------|------
complete-link @ 0.55 (was)| 7 | 0.526 | 0.56 | 0.50
complete-link @ 0.50 | 6 | 0.435 | 0.38 | 0.50
single-link @ 0.55 | 4 | 0.435 | 0.28 | 1.00 over-merge
single-link @ 0.60 | 6 | 0.727 | 0.67 | 0.80 winner
Change
------
scripts/lib/brief-dedup-embed.mjs:
New singleLinkCluster(items, {cosineThreshold, vetoFn}) using
union-find. Chain merges through strong intermediates when a
direct pair is weak; respects the entity veto (blocked pairs
don't union). O(N^2 alpha(N)); permutation-invariant by
construction.
scripts/lib/brief-dedup.mjs:
New DIGEST_DEDUP_CLUSTERING env var (default 'single', set
'complete' to revert). readOrchestratorConfig returns 'clustering'
field. Dispatch at call site picks the right function. Structured
log line now includes clustering=<algo>.
tests/brief-dedup-embedding.test.mjs:
+8 regressions:
- singleLinkCluster chains the 4-way through a bridge
- veto blocks unions even when cosine passes
- permutation-invariance property test (5 shuffles)
- empty-input
- DIGEST_DEDUP_CLUSTERING default is 'single'
- DIGEST_DEDUP_CLUSTERING=complete kill switch works
- unrecognised values fall back to 'single'
- log line includes clustering=<algo>
Bridge-pollution risk note
--------------------------
The original plan rejected single-link to avoid the Jaccard-era
"bridge pollution" (A~B=0.6, B~C=0.6, A~C=0.3 all chain through a
mixed-topic B). With text-embedding-3-small at cosine >= 0.60, a
bridge must be semantically real — the probe showed a 37% F1 bump
with no new FPs on the production case. Setting
DIGEST_DEDUP_CLUSTERING=complete on Railway is the instant rollback
if a bad day ever surfaces chaining.
Operator activation
-------------------
After merge, on Railway seed-digest-notifications service:
DIGEST_DEDUP_COSINE_THRESHOLD=0.60
No other changes needed — clustering=single is the default.
Verification
------------
- npm run test:data 5825/5825 pass
- tests/brief-dedup-embedding 53/53 pass (45 existing + 8 new)
- typecheck + typecheck:api clean
- biome check on changed files clean
Post-Deploy Monitoring & Validation
-----------------------------------
- Grep '[digest] dedup mode=embed clustering=single' in Railway logs
— confirms the new algo is live
- Expect clusters= to drop further on bulk ticks (stories=700+):
current ~23 on 84-story ticks -> expected ~15-18
- Manually open next brief post-deploy, visually verify ship-seizure
/ Hormuz / oil stories no longer duplicate
- Rollback: DIGEST_DEDUP_CLUSTERING=complete on Railway (instant,
no deploy), next cron tick reverts to old behaviour
- Validation window: 24h
- Owner: koala73
Related
-------
- #3200 embedding-based dedup (introduced complete-link)
- #3224 DIGEST_SCORE_MIN floor (the low-importance half of the fix)
|
||
|
|
305dc5ef36 |
feat(digest-dedup): Phase A — embedding-based dedup scaffolding (no-op) (#3200)
* feat(digest-dedup): Phase A — embedding-based dedup scaffolding (no-op)
Replaces the inline Jaccard story-dedup in seed-digest-notifications
with an orchestrator that can run Jaccard, shadow, or full embedding
modes. Ships with DIGEST_DEDUP_MODE=jaccard as the default so
production behaviour is unchanged until Phase C shadow + Phase D flip.
New modules (scripts/lib/):
- brief-dedup-consts.mjs tunables + cache prefix + __constants bag
- brief-dedup-jaccard.mjs verbatim 0.55-threshold extract (fallback)
- entity-gazetteer.mjs cities/regions gazetteer + common-caps
- brief-embedding.mjs OpenRouter /embeddings client with Upstash
cache, all-or-nothing timeout, cosineSimilarity
- brief-dedup-embed.mjs complete-link clustering + entity veto (pure)
- brief-dedup.mjs orchestrator, env read at call entry,
shadow archive, structured log line
Operator tools (scripts/tools/):
- calibrate-dedup-threshold.mjs offline calibration runner + histogram
- golden-pair-validator.mjs live-embedder drift detector (nightly CI)
- shadow-sample.mjs Sample A/B CSV emitter over SCAN archive
Tests:
- brief-dedup-jaccard.test.mjs migrated from regex-harness to direct
import plus orchestrator parity tests (22)
- brief-dedup-embedding.test.mjs 9 plan scenarios incl. 10-permutation
property test, complete-link non-chain (21)
- brief-dedup-golden.test.mjs 20-pair mocked canary (21)
Workflows:
- .github/workflows/dedup-golden-pairs.yml nightly live-embedder canary
(07:17 UTC), opens issue on drift
Deviation from plan: the shouldVeto("Iran closes Hormuz", "Tehran
shuts Hormuz") case can't return true under a single coherent
classification (country-in-A vs capital-in-B sit on different sides
of the actor/location boundary). Gazetteer follows the plan's
"countries are actors" intent; the test is updated to assert false
with a comment pointing at the irreducible capital-country
coreference limitation.
Verification:
- npm run test:data 5825/5825 pass
- tests/edge-functions 171/171 pass
- typecheck + typecheck:api clean
- biome check on new files clean
- lint:md 0 errors
Phase B (calibration), Phase C (shadow), and Phase D (flip) are
subsequent PRs.
* refactor(digest-dedup): address review findings 193-199
Fresh-eyes review found 3 P1s, 3 P2s, and a P3 bundle across
kieran-typescript, security-sentinel, performance-oracle, architecture-
strategist, and code-simplicity reviewers. Fixes below; all 64 dedup
tests + 5825 data tests + 171 edge-function tests still green.
P1 #193 - dedup regex + redis pipeline duplication
- Extract defaultRedisPipeline into scripts/lib/_upstash-pipeline.mjs;
both orchestrator and embedding client import from there.
- normalizeForEmbedding now delegates to stripSourceSuffix from the
Jaccard module so the outlet allow-list is single-sourced.
P1 #194 - embedding timeout floor + negative-budget path
- callEmbeddingsApi throws EmbeddingTimeoutError when timeoutMs<=0
instead of opening a doomed 250ms fetch.
- Removed Math.max(250, ...) floor that let wall-clock cap overshoot.
P1 #195 - dead env getters
- Deleted getMode / isRemoteEmbedEnabled / isEntityVetoEnabled /
getCosineThreshold / getWallClockMs from brief-dedup-consts.mjs
(zero callers; orchestrator reimplements inline).
P2 #196 - orchestrator cleanup bundle
- Removed re-exports at bottom of brief-dedup.mjs.
- Extracted materializeCluster into brief-dedup-jaccard.mjs; both
the fallback and orchestrator use the shared helper.
- Deleted clusterWithEntityVeto wrapper; orchestrator inlines the
vetoFn wiring at the single call site.
- Shadow mode now runs Jaccard exactly once per tick (was twice).
- Fallback warn line carries reason=ErrorName so operators can
filter timeout vs provider vs shape errors.
- Invalid DIGEST_DEDUP_MODE values emit a warn once per run (vs
silently falling to jaccard).
P2 #197 - workflow + shadow-sample hardening
- dedup-golden-pairs.yml body composition no longer relies on a
heredoc that would command-substitute validator stdout. Switched
to printf with sanitised LOG_TAIL (printable ASCII only) and
--body-file so crafted fixture text cannot escape into the runner.
- shadow-sample.mjs Upstash helper enforces a hardcoded command
allowlist (SCAN | GET | EXISTS).
P2 #198 - test + observability polish
- Scenarios 2 and 3 deep-equal returned clusters against the Jaccard
expected shape, not just length. Also assert the reason= field.
P3 #199 - nits
- Removed __constants test-bag; jaccard tests use named imports.
- Renamed deps.apiKey to deps._apiKey in embedding client.
- Added @pre JSDoc on diffClustersByHash about unique-hash contract.
- Deferred: mocked golden-pair test removal, gazetteer JSON migration,
scripts/tools AGENTS.md doc note.
Todos 193-199 moved from pending to complete.
Verification:
- npm run test:data 5825/5825 pass
- tests/edge-functions 171/171 pass
- typecheck + typecheck:api clean
- biome check on changed files clean
* fix(digest-dedup): address Greptile P2 findings on PR #3200
1. brief-embedding.mjs: wrap fetch lookup as
`(...args) => globalThis.fetch(...args)` instead of aliasing bare
`fetch`. Aliasing captures the binding at module-load time, so
later instrumentation / Edge-runtime shims don't see the wrapper —
same class of bug as the banned `fetch.bind(globalThis)` pattern
flagged in AGENTS.md.
2. dedup-golden-pairs.yml: `gh issue create --label "..." || true`
silently swallowed the failure when any of dedup/canary/p1 labels
didn't pre-exist, breaking the drift alert channel while leaving
the job red in the Actions UI. Switched to repeated `--label`
flags + `--create-label` so any missing label is auto-created on
first drift, and dropped the `|| true` so a legitimate failure
(network / auth) surfaces instead of hiding.
Both fixes are P2-style per Greptile (confidence 5/5, no P0/P1);
applied pre-merge so the nightly canary is usable from day one.
* fix(digest-dedup): two P1s found on PR #3200
P1 — canary classifier must match production
Nightly golden-pair validator was checking a hardcoded threshold
(default 0.60) and always applied the entity veto, while the actual
dedup path at runtime reads DIGEST_DEDUP_COSINE_THRESHOLD and
DIGEST_DEDUP_ENTITY_VETO_ENABLED from env at every call. A Phase
C/D env flip could make the canary green while prod was wrong or
red while prod was healthy, defeating the whole point of a drift
detector.
Fix:
- golden-pair-validator.mjs now calls readOrchestratorConfig(process.env)
— the same helper the orchestrator uses — so any classifier knob
added later is picked up automatically. The threshold and veto-
enabled flags are sourced from env by default; a --threshold CLI
flag still overrides for manual calibration sweeps.
- dedup-golden-pairs.yml sources DIGEST_DEDUP_COSINE_THRESHOLD and
DIGEST_DEDUP_ENTITY_VETO_ENABLED from GitHub repo variables (vars.*),
which operators must keep in lockstep with Railway. The
workflow_dispatch threshold input now defaults to empty; the
scheduled canary always uses the production-parity config.
- Validator log line prints the effective config + source so nightly
output makes the classifier visible.
P1 — shadow archive writes were fail-open
`defaultRedisPipeline()` returns null on timeout / auth / HTTP
failure. `writeShadowArchive()` only had a try/catch, so the null
result was silently treated as success. A Phase C rollout could
log clean "mode=shadow … disagreements=X" lines every tick while
the Upstash archive received zero writes — and Sample B labelling
would then find no batches, silently killing calibration.
Fix:
- writeShadowArchive now inspects the pipeline return. null result,
non-array response, per-command {error}, or a cell without
{result: "OK"} all return {ok: false, reason}.
- Orchestrator emits a warn line with the failure reason, and the
structured log line carries archive_write=ok|failed so operators
can grep for failed ticks.
- Regression test in brief-dedup-embedding.test.mjs simulates the
null-pipeline contract and asserts both the warn and the structured
field land.
Verification:
- test:data 5825/5825 pass
- dedup suites 65/65 pass (new: archive-fail regression)
- typecheck + api clean
- biome check clean on changed files
* fix(digest-dedup): two more P1s found on PR #3200
P1 — canary must also honour DIGEST_DEDUP_MODE + REMOTE_EMBED_ENABLED
The prior round fixed the threshold/veto knobs but left the canary
running embeddings regardless of whether production could actually
reach the embed path. If Railway has DIGEST_DEDUP_MODE=jaccard or
DIGEST_DEDUP_REMOTE_EMBED_ENABLED=0, production never calls the
classifier, so a drift signal is meaningless — or worse, a live
OpenRouter issue flags the canary while prod is obliviously fine.
Fix:
- golden-pair-validator.mjs reads mode + remoteEmbedEnabled from the
same readOrchestratorConfig() helper the orchestrator uses. When
either says "embed path inactive in prod", the validator logs an
explicit skip line and exits 0. The nightly workflow then shows
green, which is the correct signal ("nothing to drift against").
- A --force CLI flag remains for manual dispatch during staged
rollouts.
- dedup-golden-pairs.yml sources DIGEST_DEDUP_MODE and
DIGEST_DEDUP_REMOTE_EMBED_ENABLED from GitHub repo variables
alongside the threshold and veto-enabled knobs, so all four
classifier gates stay in lockstep with Railway.
- Validator log line now prints mode + remoteEmbedEnabled so the
canary output surfaces which classifier it validated.
P1 — shadow-sample Sample A was biased by SCAN order
enumerate-and-dedup added every seen pair to a dedup key BEFORE
filtering by agreement. If the same pair appeared in an agreeing
batch first and a disagreeing batch later, the disagreeing
occurrence was silently dropped. SCAN order is unspecified, so
Sample A could omit real disagreement pairs.
Fix:
- Extracted the enumeration into a pure `enumeratePairs(archives, mode)`
export so the logic is testable. Mode filter runs BEFORE the dedup
check: agreeing pairs are skipped entirely under
--mode disagreements, so any later disagreeing occurrence can
still claim the dedup slot.
- Added tests/brief-dedup-shadow-sample.test.mjs with 5 regression
cases: agreement-then-disagreement, reversed order (symmetry),
always-agreed omission, population enumeration, cross-batch dedup.
- isMain guard added so importing the module for tests does not
kick off the CLI scan path.
Verification:
- test:data 5825/5825 pass
- dedup suites 70/70 pass (5 new shadow-sample regressions)
- typecheck + api clean
- biome check clean on changed files
Operator follow-up before Phase C:
Set all FOUR dedup repo variables in GitHub alongside Railway:
DIGEST_DEDUP_MODE, DIGEST_DEDUP_REMOTE_EMBED_ENABLED,
DIGEST_DEDUP_COSINE_THRESHOLD, DIGEST_DEDUP_ENTITY_VETO_ENABLED
* refactor(digest-dedup): Railway is the single source of truth for dedup config
Fair user pushback: asking operators to set four DIGEST_DEDUP_*
values in BOTH Railway (where the cron runs) AND GitHub repo
variables (where the canary runs) is architectural debt. Two
copies of the same truth will always drift.
Solution: the digest cron publishes its resolved config to Upstash
on every tick under brief:dedup:config:v1 (2h TTL). The nightly
golden-pair canary reads that key instead of env vars. Railway
stays the sole source of truth; no parallel repo variables to
maintain. A missing/expired key signals "cron hasn't run" and
the canary skips with exit 0 — better than validating against
hardcoded defaults that might diverge from prod.
Changes:
- brief-dedup-consts.mjs: new ACTIVE_CONFIG_KEY + TTL constants.
- brief-dedup.mjs: new publishActiveConfig() fires at the start of
every deduplicateStories() call (before the mode short-circuit,
so jaccard ticks also publish a "mode=jaccard" signal the canary
can read). Fire-and-forget; archive-write error semantics still
apply if the operator wants stricter tracking.
- golden-pair-validator.mjs: removed readOrchestratorConfig(env)
path. Now calls fetchActiveConfigFromUpstash() and either
validates against that config, skips when the embed path is
inactive, or skips when the key is missing (with --force
override for manual dispatch).
- dedup-golden-pairs.yml: dropped the four DIGEST_DEDUP_* env lines
and the corresponding repo-variable dependency. Only the three
Upstash + OpenRouter secrets remain.
- tests: two new regressions assert config is published on every
tick (shadow AND jaccard modes) with the right shape + TTL.
Operator onboarding now takes one action: set the four
DIGEST_DEDUP_* variables on the Railway seed-digest-notifications
service. Nothing to set in GitHub beyond the existing
OPENROUTER_API_KEY / UPSTASH_* secrets.
Verification:
- test:data 5825/5825 pass
- dedup suites 72/72 pass (2 new config-publish regressions)
- typecheck + api clean
- biome check clean on changed files
* refactor(digest-dedup): ship embed directly, drop phases/canary/shadow
User feedback: "i dont need multiple phases and shit, we go directly
to embed". Fair. Ripping out the overengineering I accumulated:
DELETED
- .github/workflows/dedup-golden-pairs.yml (nightly canary)
- scripts/tools/golden-pair-validator.mjs
- scripts/tools/shadow-sample.mjs
- scripts/tools/calibrate-dedup-threshold.mjs
- tests/fixtures/brief-dedup-golden-pairs.json
- tests/brief-dedup-golden.test.mjs
- tests/brief-dedup-shadow-sample.test.mjs
SIMPLIFIED
- brief-dedup.mjs: removed shadow mode, publishActiveConfig,
writeShadowArchive, diffClustersByHash, jaccardRepsToClusterHashes,
and the DIGEST_DEDUP_REMOTE_EMBED_ENABLED knob. MODE is now
binary: `embed` (default) or `jaccard` (instant kill switch).
- brief-dedup-consts.mjs: dropped SHADOW_ARCHIVE_*, ACTIVE_CONFIG_*.
- Default flipped: DIGEST_DEDUP_MODE unset = embed (prod path).
Railway deploy with OPENROUTER_API_KEY set = embeddings live on
next cron tick. Set MODE=jaccard on Railway to revert instantly.
Orchestrator still falls back to Jaccard on any embed-path failure
(timeout, provider outage, missing API key, bad response). Fallback
warn carries reason=<ErrorName>. The cron never fails because
embeddings flaked. All 64 dedup tests + 5825 data tests still green.
Net diff: -1,407 lines.
Operator single action: set OPENROUTER_API_KEY on Railway's
seed-digest-notifications service (already present) and ship. No
GH Actions, no shadow archives, no labelling sprints. If the 0.60
threshold turns out wrong, tune DIGEST_DEDUP_COSINE_THRESHOLD on
Railway — takes effect on next tick, no redeploy.
* fix(digest-dedup): multi-word location phrases in the entity veto
Extractor was whitespace-tokenising and only single-token matching
against LOCATION_GAZETTEER, silently making every multi-word entry
unreachable:
extractEntities("Houthis strike ship in Red Sea")
→ { locations: [], actors: ['houthis','red','sea'] } ✗
shouldVeto("Houthis strike ship in Red Sea",
"US escorts convoy in Red Sea") → false ✗
With MODE=embed as the default, that turned off the main
anti-overmerge safety rail for bodies of water, regions, and
compound city names — exactly the P07-Hormuz / Houthis-Red-Sea
headlines the veto was designed to cover.
Fix: greedy longest-phrase scan with a sliding window. At each
token position try the longest multi-word phrase first (down to
2), require first AND last tokens to be capitalised (so lowercase
prose like "the middle east" doesn't falsely match while headline
"Middle East" does), lowercase connectors in between are fine
("Strait of Hormuz" → phrase "strait of hormuz" ✓). Falls back to
single-token lookup when no multi-word phrase fits.
Now:
extractEntities("Houthis strike ship in Red Sea")
→ { locations: ['red sea'], actors: ['houthis'] } ✓
shouldVeto(Red-Sea-Houthis, Red-Sea-US) → true ✓
Complexity still O(N · MAX_PHRASE_LEN) — MAX_PHRASE_LEN is 4
(longest gazetteer entry: "ho chi minh city"), so this is
effectively O(N).
Added 5 regression tests covering Red Sea, South China Sea,
Strait of Hormuz (lowercase-connector case), Abu Dhabi, and
New York, plus the Houthis-vs-US veto reproducer from the P1.
All 5825 data tests + 45 dedup tests green; lint + typecheck clean.
|