feat(scoring): composite importance score + story tracking infrastructure (#2604)

* feat(scoring): add composite importance score + story tracking infrastructure

- Extract SOURCE_TIERS/getSourceTier to server/_shared/source-tiers.ts so
  server handlers can import it without pulling in client-only modules;
  src/config/feeds.ts re-exports for backward compatibility
- Add story tracking Redis key helpers to cache-keys.ts
  (story:track:v1, story:sources:v1, story:peak:v1, digest:accumulator:v1)
- Export SEVERITY_SCORES from _classifier.ts for server-side score math
- Add upstashPipeline() to redis.ts for arbitrary batched Redis writes
- Add importanceScore/corroborationCount/storyPhase fields to proto,
  generated TS, and src/types NewsItem
- Add StoryMeta message and StoryPhase enum to proto
- In list-feed-digest.ts:
  - Build corroboration map across full corpus BEFORE per-category truncation
  - Compute importanceScore (severity 40% + tier 20% + corroboration 30%
    + recency 10%) per item
  - Sort by importanceScore desc before truncating at MAX_ITEMS_PER_CATEGORY
  - Write story:track / story:sources / story:peak / digest:accumulator
    to Redis in 80-story pipeline batches after each digest build

Score gate in notification-relay.cjs follows in the next PR (shadow mode,
behind IMPORTANCE_SCORE_LIVE flag). RELAY_GATES_READY removal of
/api/notify comes after 48h shadow comparison confirms parity.

* fix(scoring): add storyPhase field + regenerate proto types

- Add storyPhase to ParsedItem and toProtoItem (defaults UNSPECIFIED)
- Regenerate service_server.ts: required fields, StoryPhase type relocated
- Regenerate service_client.ts and OpenAPI docs from buf generate
- Fix typecheck:api error on missing required storyPhase in NewsItem

* fix(scoring): address all code review findings from PR #2604

P1:
- await writeStoryTracking instead of fire-and-forget to prevent
  silent data loss on edge isolate teardown
- remove duplicate upstashPipeline; use existing runRedisPipeline
- strip non-https links before Redis write (XSS prevention)
- implement storyPhase read path: HGETALL batch + computePhase()
  so BREAKING/DEVELOPING/SUSTAINED/FADING badges are now live

P2/P3:
- extend STORY_TTL 48h → 7 days (sustained stories no longer reset)
- extract SCORE_WEIGHTS named constants with rationale comment
- move SEVERITY_SCORES out of _classifier.ts into list-feed-digest.ts
- add normalizeTitle comment referencing todo #102
- pre-compute title hashes once, share between phase read + write

* fix(scoring): correct enrichment-before-scoring and write-before-read ordering

Two sequencing bugs:

1. enrichWithAiCache ran after truncation (post-slice), so items whose
   threat level was upgraded by the LLM cache could have already been
   cut from the top-20, and downgraded items kept inflated scores.
   Fix: enrich ALL items from the full corpus before scoring, so
   importanceScore always uses the final post-LLM classification level.

2. Phase HGETALL read happened before writeStoryTracking, meaning
   first-time stories had no Redis entry and always returned UNSPECIFIED
   instead of BREAKING, and all existing stories lagged one cycle behind.
   Fix: write tracking first, then read back for phase assignment.
This commit is contained in:
Elie Habib
2026-04-02 20:46:04 +04:00
committed by GitHub
parent 3569da0cd3
commit 8d8cf56ce2
10 changed files with 606 additions and 311 deletions

View File

@@ -293,6 +293,25 @@ components:
locationName:
type: string
description: Human-readable location name.
importanceScore:
type: integer
format: int32
description: |-
Composite importance score (0-100): severity × 40% + source tier × 20% + corroboration × 30% + recency × 10%.
Absent (0) when not yet scored.
corroborationCount:
type: integer
format: int32
description: Number of distinct sources that reported the same story in this digest cycle.
storyPhase:
type: string
enum:
- STORY_PHASE_UNSPECIFIED
- STORY_PHASE_BREAKING
- STORY_PHASE_DEVELOPING
- STORY_PHASE_SUSTAINED
- STORY_PHASE_FADING
description: StoryPhase represents the lifecycle stage of a tracked news story.
required:
- source
- title