mirror of
https://github.com/koala73/worldmonitor.git
synced 2026-04-25 17:14:57 +02:00
* feat(digest-dedup): replayable per-story input log (opt-in, no behaviour change)
Ship the measurement layer before picking any recall-lift strategy.
Why: the current dedup path embeds titles only, so brief-wire headlines
that share a real event but drop the geographic anchor (e.g. "Alleged
Coup: defendant arrives in court" vs "Trial opens after Nigeria charges
six over 2025 coup plot") can slip past the 0.60 cosine threshold. To
tune recall without regressing precision we need a replayable per-tick
dataset — one record per story with the exact fields any downstream
candidate (title+slug, LLM-canonicalise, text-embedding-3-large, cross-
encoder re-rank, etc.) would need to score.
This PR ships ONLY the log. Zero behaviour change:
- Opt-in via DIGEST_DEDUP_REPLAY_LOG=1 (default OFF).
- Writer is best-effort: all errors swallowed + warned, never affects
digest delivery. No throw path.
- Records include hash, originalIndex, isRep, clusterId, raw +
normalised title, link, severity/score/mentions/phase/sources,
embeddingCacheKey, hasEmbedding sidecar flag, and the tick's config
snapshot (mode, clustering, cosineThreshold, topicThreshold, veto).
- clusterId derives from rep.mergedHashes (already set by
materializeCluster) so the orchestrator is untouched.
- Storage: Upstash list keyed by {variant}:{lang}:{sensitivity}:{date}
with 30-day EXPIRE. Date suffix caps per-key growth; retention
covers the labelling cadence + cross-candidate comparison window.
- Env flag is '1'-only (fail-closed on typos, same pattern as
DIGEST_DEDUP_MODE).
Activation path (post-merge): flip DIGEST_DEDUP_REPLAY_LOG=1 on the
seed-digest-notifications Railway service. Watch one cron tick for the
RPUSH + EXPIRE pair (or a single warn line if creds/upstream flake),
then leave running for at least one week to accumulate calibration data.
Tests: 21 unit tests covering flag parsing, key shape + sanitisation,
record field correctness (isRep, clusterId, embeddingCacheKey,
hasEmbedding, tickConfig), pipeline null/throw handling, malformed
input. Existing 77 dedup tests unchanged and still green.
* fix(digest-dedup): capture topicGroupingEnabled in replay tickConfig
Review catch (PR #3330): the tickConfig snapshot omitted
topicGroupingEnabled even though readOrchestratorConfig returns it and
the digest's post-dedup topic ordering gates on it. A tick run with
DIGEST_DEDUP_TOPIC_GROUPING=0 serialised identically to a default
tick, making those runs non-replayable for the calibration work this
log is meant to enable.
Add topicGroupingEnabled to the recorded tickConfig. One-line schema
fix + regression test asserting topic-grouping-off ticks serialise
distinctly from default.
22/22 tests pass.
* fix(digest-dedup): await replay-log write to survive explicit process.exit
Review catch (PR #3330): the fire-and-forget `void writeReplayLog(...)`
call could be dropped on the explicit-exit paths — the brief-compose
failure gate at line 1539 and main().catch at line 1545 both call
process.exit(1). Unlike natural exit, process.exit does not drain
in-flight promises, so the last N ticks' replay records could be
silently lost on runs where measurement fidelity matters most.
Fix: await the writeReplayLog call. Safe because:
- writeReplayLog returns synchronously when the flag is off
(replayLogEnabled check is the first thing it does)
- It has a top-level try/catch that always returns a result object
- The Upstash pipeline call has a 10s timeout ceiling
- buildDigest already awaits many Upstash calls (dedup, compose,
render) so one more is not a hot-path concern
Comment block added above the call explains why the await is
deliberate — so a future refactor doesn't revert it to void thinking
it's a leftover.
No test change: existing writeReplayLog unit tests already cover the
disabled / empty / success / error paths. The fix is a single-keyword
change in a caller that was already guaranteed-safe by the callee's
contract.
* refactor(digest-dedup): address Greptile P2 review comments on replay log
Three non-blocking polish items from the automated review, bundled
because they all touch the same new module and none change behaviour.
1. tsMs captured BEFORE deduplicateStories (seed-digest-notifications.mjs).
Previously sampled after dedup returned, so briefTickId reflected
dedup-completion time rather than tick-start. For downstream readers
the natural reading of "briefTickId" is when the tick began
processing; moved the Date.now() call to match that expectation.
Drift is maybe 100ms-2s on cold-cache embed calls — small, but
moving it is free.
2. buildReplayLogKey emptiness check now strips ':' and '-' in addition
to '_'. A pathological ruleId of ':::' previously passed through
verbatim, producing keys like `digest:replay-log:v1::::2026-04-23`
that confuse redis-cli's namespace tooling (SCAN / KEYS / tab
completion). The new guard falls back to "unknown" on any input
that's all separators. Added a regression test covering the
':::' / '---' / '___' / mixed cases.
3. tickConfig is now a per-record shallow copy instead of a shared
reference. Storage is unaffected (writeReplayLog serialises each
record via JSON.stringify independently) but an in-memory consumer
that mutated one record's tickConfig for experimentation would have
silently affected all other records in the same batch. Added a
regression test asserting mutation doesn't leak across records.
Tests: 24/24 pass (22 prior + 2 new regression). Typecheck + lint clean.