Files
worldmonitor/docs/api/NewsService.openapi.json
Elie Habib 6c017998d3 feat(e3): story persistence tracking (#2620)
* feat(e3): story persistence tracking

Adds cross-cycle story tracking layer to the RSS digest pipeline:

- Proto: StoryMeta message + StoryPhase enum on NewsItem (fields 9-11).
  importanceScore and corroborationCount stubs added for E1.
- list-feed-digest.ts: builds corroboration map across ALL items before
  truncation; batch-reads existing story:track hashes from Redis; writes
  HINCRBY/HSET/HSETNX/SADD/EXPIRE per story in 80-story pipeline chunks;
  attaches StoryMeta (firstSeen, mentionCount, sourceCount, phase) to
  each proto item using read-back data.
- cache-keys.ts: STORY_TRACK_KEY_PREFIX, STORY_SOURCES_KEY_PREFIX,
  DIGEST_ACCUMULATOR_KEY_PREFIX, STORY_TRACKING_TTL_S.
- src/types/index.ts: StoryMeta, StoryPhase, NewsItem extended.
- data-loader.ts: protoItemToNewsItem maps STORY_PHASE_* → client phase.
- NewsPanel.ts: BREAKING/DEVELOPING/ONGOING phase badges in item rows.

New story first appearance: phase=BREAKING. After 2 mentions within 2h:
DEVELOPING. After 6+ mentions or >2h: SUSTAINED. If score drops below
50% of peak: FADING (used by E1; defaults to SUSTAINED for now).

Redis keys per story (48h TTL):
  story:track:v1:<hash16>   → hash (firstSeen,lastSeen,mentionCount,...)
  story:sources:v1:<hash16> → set  (feed names, for cross-source count)

* fix(e3): correct storyMeta staleness and mentionCount semantics

P1 — storyMeta was always one cycle behind because storyTracks was read
before writeStoryTracking ran. Fix: keep read-before-write but compute
storyMeta from merged in-memory state (stale.mentionCount + 1, fresh
sourceCount from corroborationMap). New stories get mentionCount=1 and
phase=BREAKING in the same cycle they first appear — no extra Redis
round-trip needed.

P2 — mentionCount incremented once per item occurrence, so a story seen
in 3 sources in its first cycle was immediately stored as mentionCount=3.
Fix: deduplicate by titleHash in writeStoryTracking so each unique story
gets exactly one HINCRBY per digest cycle regardless of source count.
SADD still collects all sources for the set key.

* fix(e3): Unicode hash collision, ALERT badge regression, FADING comment

P1 — normalizeTitle used [^\w\s] without the u flag; \w is ASCII-only
so every Arabic/CJK/Cyrillic title stripped to "" and shared one Redis
hash. Fixed: use /[^\p{L}\p{N}\s]/gu (Unicode property escapes require
the u flag).

P1 — ALERT badge was gated on !item.storyMeta, suppressing the indicator
for any tracked story regardless of isAlert. Phase and alert are
orthogonal signals; ALERT now renders unconditionally when isAlert=true.

P2 — FADING branch is intentionally inactive until E1 ships real scores
(currentScore/peakScore placeholder 0 via HSETNX). Added comment to
document the intentional ordering.

* fix(news-alerts): skip sustained/fading stories in breaking alert selectBest

Sustained and fading story phases are already well-covered by the feed;
only breaking and developing phases warrant a banner interrupt. Items
without storyMeta (phase unspecified) pass through unchanged.

Fixes gap C from docs/plans/2026-04-02-003-fix-news-alerts-pr-gaps-plan.md

* fix(e3): remove rebase artifacts from list-feed-digest

Removes a stray closing brace, duplicate ASCII normalizeTitle
(Unicode-aware version from the fix commit is correct), and
a leftover storyPhase assignment that references a removed field.

All typecheck and typecheck:api pass clean.
2026-04-02 21:16:35 +04:00

1 line
10 KiB
JSON
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
{"components":{"schemas":{"CategoriesEntry":{"properties":{"key":{"type":"string"},"value":{"$ref":"#/components/schemas/CategoryBucket"}},"type":"object"},"CategoryBucket":{"properties":{"items":{"items":{"$ref":"#/components/schemas/NewsItem"},"type":"array"}},"type":"object"},"Error":{"description":"Error is returned when a handler encounters an error. It contains a simple error message that the developer can customize.","properties":{"message":{"description":"Error message (e.g., 'user not found', 'database connection failed')","type":"string"}},"type":"object"},"FeedStatusesEntry":{"properties":{"key":{"type":"string"},"value":{"type":"string"}},"type":"object"},"FieldViolation":{"description":"FieldViolation describes a single validation error for a specific field.","properties":{"description":{"description":"Human-readable description of the validation violation (e.g., 'must be a valid email address', 'required field missing')","type":"string"},"field":{"description":"The field path that failed validation (e.g., 'user.email' for nested fields). For header validation, this will be the header name (e.g., 'X-API-Key')","type":"string"}},"required":["field","description"],"type":"object"},"GeoCoordinates":{"description":"GeoCoordinates represents a geographic location using WGS84 coordinates.","properties":{"latitude":{"description":"Latitude in decimal degrees (-90 to 90).","format":"double","maximum":90,"minimum":-90,"type":"number"},"longitude":{"description":"Longitude in decimal degrees (-180 to 180).","format":"double","maximum":180,"minimum":-180,"type":"number"}},"type":"object"},"GetSummarizeArticleCacheRequest":{"description":"GetSummarizeArticleCacheRequest looks up a pre-computed summary by cache key.","properties":{"cacheKey":{"description":"Deterministic cache key computed by buildSummaryCacheKey().","type":"string"}},"type":"object"},"ListFeedDigestRequest":{"properties":{"lang":{"description":"ISO 639-1 language code (en, fr, ar, etc.)","type":"string"},"variant":{"description":"Site variant: full, tech, finance, happy","type":"string"}},"type":"object"},"ListFeedDigestResponse":{"properties":{"categories":{"additionalProperties":{"$ref":"#/components/schemas/CategoryBucket"},"description":"Per-category buckets — keys match category names from feed config","type":"object"},"feedStatuses":{"additionalProperties":{"type":"string"},"description":"Per-feed status — only non-ok states emitted; absent key implies ok.\n Values: empty (feed returned 0 items), timeout (timed out during fetch).","type":"object"},"generatedAt":{"description":"ISO 8601 timestamp of when this digest was generated","type":"string"}},"type":"object"},"NewsItem":{"description":"NewsItem represents a single news article from RSS feed aggregation.","properties":{"corroborationCount":{"description":"Number of distinct sources that reported the same story in this digest cycle.","format":"int32","type":"integer"},"importanceScore":{"description":"Composite importance score (0-100): severity × 40% + source tier × 20% + corroboration × 30% + recency × 10%.","format":"int32","type":"integer"},"isAlert":{"description":"Whether this article triggered an alert condition.","type":"boolean"},"link":{"description":"Article URL.","type":"string"},"location":{"$ref":"#/components/schemas/GeoCoordinates"},"locationName":{"description":"Human-readable location name.","type":"string"},"publishedAt":{"description":"Publication time, as Unix epoch milliseconds.. Warning: Values \u003e 2^53 may lose precision in JavaScript","format":"int64","type":"integer"},"source":{"description":"Source feed name.","minLength":1,"type":"string"},"storyMeta":{"$ref":"#/components/schemas/StoryMeta"},"threat":{"$ref":"#/components/schemas/ThreatClassification"},"title":{"description":"Article headline.","minLength":1,"type":"string"}},"required":["source","title"],"type":"object"},"StoryMeta":{"description":"StoryMeta carries cross-cycle persistence data attached to each news item.","properties":{"firstSeen":{"description":"Epoch ms when the story first appeared in any digest cycle.. Warning: Values \u003e 2^53 may lose precision in JavaScript","format":"int64","type":"integer"},"mentionCount":{"description":"Total number of digest cycles in which this story appeared.","format":"int32","type":"integer"},"phase":{"description":"StoryPhase represents the lifecycle stage of a tracked news story.","enum":["STORY_PHASE_UNSPECIFIED","STORY_PHASE_BREAKING","STORY_PHASE_DEVELOPING","STORY_PHASE_SUSTAINED","STORY_PHASE_FADING"],"type":"string"},"sourceCount":{"description":"Number of unique sources that reported this story (cached from Redis Set).","format":"int32","type":"integer"}},"type":"object"},"SummarizeArticleRequest":{"description":"SummarizeArticleRequest specifies parameters for LLM article summarization.","properties":{"geoContext":{"description":"Geographic signal context to include in the prompt.","type":"string"},"headlines":{"items":{"description":"Headlines to summarize (max 8 used).","minItems":1,"type":"string"},"minItems":1,"type":"array"},"lang":{"description":"Output language code, default \"en\".","type":"string"},"mode":{"description":"Summarization mode: \"brief\", \"analysis\", \"translate\", \"\" (default).","type":"string"},"provider":{"description":"LLM provider: \"ollama\", \"groq\", \"openrouter\"","minLength":1,"type":"string"},"systemAppend":{"description":"Optional system prompt append for analytical framework instructions.","type":"string"},"variant":{"description":"Variant: \"full\", \"tech\", or target language for translate mode.","type":"string"}},"required":["provider"],"type":"object"},"SummarizeArticleResponse":{"description":"SummarizeArticleResponse contains the LLM summarization result.","properties":{"error":{"description":"Error message if the request failed.","type":"string"},"errorType":{"description":"Error type/name (e.g. \"TypeError\").","type":"string"},"fallback":{"description":"Whether the client should try the next provider in the fallback chain.","type":"boolean"},"model":{"description":"Model identifier used for generation.","type":"string"},"provider":{"description":"Provider that produced the result (or \"cache\").","type":"string"},"status":{"description":"SummarizeStatus indicates the outcome of a summarization request.","enum":["SUMMARIZE_STATUS_UNSPECIFIED","SUMMARIZE_STATUS_SUCCESS","SUMMARIZE_STATUS_CACHED","SUMMARIZE_STATUS_SKIPPED","SUMMARIZE_STATUS_ERROR"],"type":"string"},"statusDetail":{"description":"Human-readable detail for non-success statuses (skip reason, etc.).","type":"string"},"summary":{"description":"The generated summary text.","type":"string"},"tokens":{"description":"Token count from the LLM response.","format":"int32","type":"integer"}},"type":"object"},"ThreatClassification":{"description":"ThreatClassification represents an AI-assessed threat level for a news item.","properties":{"category":{"description":"Event category.","type":"string"},"confidence":{"description":"Confidence score (0.0 to 1.0).","format":"double","maximum":1,"minimum":0,"type":"number"},"level":{"description":"ThreatLevel represents the assessed threat level of a news event.","enum":["THREAT_LEVEL_UNSPECIFIED","THREAT_LEVEL_LOW","THREAT_LEVEL_MEDIUM","THREAT_LEVEL_HIGH","THREAT_LEVEL_CRITICAL"],"type":"string"},"source":{"description":"Classification source — \"keyword\", \"ml\", or \"llm\".","type":"string"}},"type":"object"},"ValidationError":{"description":"ValidationError is returned when request validation fails. It contains a list of field violations describing what went wrong.","properties":{"violations":{"description":"List of validation violations","items":{"$ref":"#/components/schemas/FieldViolation"},"type":"array"}},"required":["violations"],"type":"object"}}},"info":{"title":"NewsService API","version":"1.0.0"},"openapi":"3.1.0","paths":{"/api/news/v1/list-feed-digest":{"get":{"description":"ListFeedDigest returns a pre-aggregated digest of all RSS feeds for a site variant.","operationId":"ListFeedDigest","parameters":[{"description":"Site variant: full, tech, finance, happy","in":"query","name":"variant","required":false,"schema":{"type":"string"}},{"description":"ISO 639-1 language code (en, fr, ar, etc.)","in":"query","name":"lang","required":false,"schema":{"type":"string"}}],"responses":{"200":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/ListFeedDigestResponse"}}},"description":"Successful response"},"400":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/ValidationError"}}},"description":"Validation error"},"default":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Error"}}},"description":"Error response"}},"summary":"ListFeedDigest","tags":["NewsService"]}},"/api/news/v1/summarize-article":{"post":{"description":"SummarizeArticle generates an LLM summary with provider selection and fallback support.","operationId":"SummarizeArticle","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/SummarizeArticleRequest"}}},"required":true},"responses":{"200":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/SummarizeArticleResponse"}}},"description":"Successful response"},"400":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/ValidationError"}}},"description":"Validation error"},"default":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Error"}}},"description":"Error response"}},"summary":"SummarizeArticle","tags":["NewsService"]}},"/api/news/v1/summarize-article-cache":{"get":{"description":"GetSummarizeArticleCache looks up a cached summary by deterministic key (CDN-cacheable GET).","operationId":"GetSummarizeArticleCache","parameters":[{"description":"Deterministic cache key computed by buildSummaryCacheKey().","in":"query","name":"cache_key","required":false,"schema":{"type":"string"}}],"responses":{"200":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/SummarizeArticleResponse"}}},"description":"Successful response"},"400":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/ValidationError"}}},"description":"Validation error"},"default":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Error"}}},"description":"Error response"}},"summary":"GetSummarizeArticleCache","tags":["NewsService"]}}}}