Files
worldmonitor/docs/api/NewsService.openapi.yaml
Elie Habib 6c017998d3 feat(e3): story persistence tracking (#2620)
* feat(e3): story persistence tracking

Adds cross-cycle story tracking layer to the RSS digest pipeline:

- Proto: StoryMeta message + StoryPhase enum on NewsItem (fields 9-11).
  importanceScore and corroborationCount stubs added for E1.
- list-feed-digest.ts: builds corroboration map across ALL items before
  truncation; batch-reads existing story:track hashes from Redis; writes
  HINCRBY/HSET/HSETNX/SADD/EXPIRE per story in 80-story pipeline chunks;
  attaches StoryMeta (firstSeen, mentionCount, sourceCount, phase) to
  each proto item using read-back data.
- cache-keys.ts: STORY_TRACK_KEY_PREFIX, STORY_SOURCES_KEY_PREFIX,
  DIGEST_ACCUMULATOR_KEY_PREFIX, STORY_TRACKING_TTL_S.
- src/types/index.ts: StoryMeta, StoryPhase, NewsItem extended.
- data-loader.ts: protoItemToNewsItem maps STORY_PHASE_* → client phase.
- NewsPanel.ts: BREAKING/DEVELOPING/ONGOING phase badges in item rows.

New story first appearance: phase=BREAKING. After 2 mentions within 2h:
DEVELOPING. After 6+ mentions or >2h: SUSTAINED. If score drops below
50% of peak: FADING (used by E1; defaults to SUSTAINED for now).

Redis keys per story (48h TTL):
  story:track:v1:<hash16>   → hash (firstSeen,lastSeen,mentionCount,...)
  story:sources:v1:<hash16> → set  (feed names, for cross-source count)

* fix(e3): correct storyMeta staleness and mentionCount semantics

P1 — storyMeta was always one cycle behind because storyTracks was read
before writeStoryTracking ran. Fix: keep read-before-write but compute
storyMeta from merged in-memory state (stale.mentionCount + 1, fresh
sourceCount from corroborationMap). New stories get mentionCount=1 and
phase=BREAKING in the same cycle they first appear — no extra Redis
round-trip needed.

P2 — mentionCount incremented once per item occurrence, so a story seen
in 3 sources in its first cycle was immediately stored as mentionCount=3.
Fix: deduplicate by titleHash in writeStoryTracking so each unique story
gets exactly one HINCRBY per digest cycle regardless of source count.
SADD still collects all sources for the set key.

* fix(e3): Unicode hash collision, ALERT badge regression, FADING comment

P1 — normalizeTitle used [^\w\s] without the u flag; \w is ASCII-only
so every Arabic/CJK/Cyrillic title stripped to "" and shared one Redis
hash. Fixed: use /[^\p{L}\p{N}\s]/gu (Unicode property escapes require
the u flag).

P1 — ALERT badge was gated on !item.storyMeta, suppressing the indicator
for any tracked story regardless of isAlert. Phase and alert are
orthogonal signals; ALERT now renders unconditionally when isAlert=true.

P2 — FADING branch is intentionally inactive until E1 ships real scores
(currentScore/peakScore placeholder 0 via HSETNX). Added comment to
document the intentional ordering.

* fix(news-alerts): skip sustained/fading stories in breaking alert selectBest

Sustained and fading story phases are already well-covered by the feed;
only breaking and developing phases warrant a banner interrupt. Items
without storyMeta (phase unspecified) pass through unchanged.

Fixes gap C from docs/plans/2026-04-02-003-fix-news-alerts-pr-gaps-plan.md

* fix(e3): remove rebase artifacts from list-feed-digest

Removes a stray closing brace, duplicate ASCII normalizeTitle
(Unicode-aware version from the fix commit is correct), and
a leftover storyPhase assignment that references a removed field.

All typecheck and typecheck:api pass clean.
2026-04-02 21:16:35 +04:00

376 lines
16 KiB
YAML
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
openapi: 3.1.0
info:
title: NewsService API
version: 1.0.0
paths:
/api/news/v1/summarize-article:
post:
tags:
- NewsService
summary: SummarizeArticle
description: SummarizeArticle generates an LLM summary with provider selection and fallback support.
operationId: SummarizeArticle
requestBody:
content:
application/json:
schema:
$ref: '#/components/schemas/SummarizeArticleRequest'
required: true
responses:
"200":
description: Successful response
content:
application/json:
schema:
$ref: '#/components/schemas/SummarizeArticleResponse'
"400":
description: Validation error
content:
application/json:
schema:
$ref: '#/components/schemas/ValidationError'
default:
description: Error response
content:
application/json:
schema:
$ref: '#/components/schemas/Error'
/api/news/v1/summarize-article-cache:
get:
tags:
- NewsService
summary: GetSummarizeArticleCache
description: GetSummarizeArticleCache looks up a cached summary by deterministic key (CDN-cacheable GET).
operationId: GetSummarizeArticleCache
parameters:
- name: cache_key
in: query
description: Deterministic cache key computed by buildSummaryCacheKey().
required: false
schema:
type: string
responses:
"200":
description: Successful response
content:
application/json:
schema:
$ref: '#/components/schemas/SummarizeArticleResponse'
"400":
description: Validation error
content:
application/json:
schema:
$ref: '#/components/schemas/ValidationError'
default:
description: Error response
content:
application/json:
schema:
$ref: '#/components/schemas/Error'
/api/news/v1/list-feed-digest:
get:
tags:
- NewsService
summary: ListFeedDigest
description: ListFeedDigest returns a pre-aggregated digest of all RSS feeds for a site variant.
operationId: ListFeedDigest
parameters:
- name: variant
in: query
description: 'Site variant: full, tech, finance, happy'
required: false
schema:
type: string
- name: lang
in: query
description: ISO 639-1 language code (en, fr, ar, etc.)
required: false
schema:
type: string
responses:
"200":
description: Successful response
content:
application/json:
schema:
$ref: '#/components/schemas/ListFeedDigestResponse'
"400":
description: Validation error
content:
application/json:
schema:
$ref: '#/components/schemas/ValidationError'
default:
description: Error response
content:
application/json:
schema:
$ref: '#/components/schemas/Error'
components:
schemas:
Error:
type: object
properties:
message:
type: string
description: Error message (e.g., 'user not found', 'database connection failed')
description: Error is returned when a handler encounters an error. It contains a simple error message that the developer can customize.
FieldViolation:
type: object
properties:
field:
type: string
description: The field path that failed validation (e.g., 'user.email' for nested fields). For header validation, this will be the header name (e.g., 'X-API-Key')
description:
type: string
description: Human-readable description of the validation violation (e.g., 'must be a valid email address', 'required field missing')
required:
- field
- description
description: FieldViolation describes a single validation error for a specific field.
ValidationError:
type: object
properties:
violations:
type: array
items:
$ref: '#/components/schemas/FieldViolation'
description: List of validation violations
required:
- violations
description: ValidationError is returned when request validation fails. It contains a list of field violations describing what went wrong.
SummarizeArticleRequest:
type: object
properties:
provider:
type: string
minLength: 1
description: 'LLM provider: "ollama", "groq", "openrouter"'
headlines:
type: array
items:
type: string
minItems: 1
description: Headlines to summarize (max 8 used).
minItems: 1
mode:
type: string
description: 'Summarization mode: "brief", "analysis", "translate", "" (default).'
geoContext:
type: string
description: Geographic signal context to include in the prompt.
variant:
type: string
description: 'Variant: "full", "tech", or target language for translate mode.'
lang:
type: string
description: Output language code, default "en".
systemAppend:
type: string
description: Optional system prompt append for analytical framework instructions.
required:
- provider
description: SummarizeArticleRequest specifies parameters for LLM article summarization.
SummarizeArticleResponse:
type: object
properties:
summary:
type: string
description: The generated summary text.
model:
type: string
description: Model identifier used for generation.
provider:
type: string
description: Provider that produced the result (or "cache").
tokens:
type: integer
format: int32
description: Token count from the LLM response.
fallback:
type: boolean
description: Whether the client should try the next provider in the fallback chain.
error:
type: string
description: Error message if the request failed.
errorType:
type: string
description: Error type/name (e.g. "TypeError").
status:
type: string
enum:
- SUMMARIZE_STATUS_UNSPECIFIED
- SUMMARIZE_STATUS_SUCCESS
- SUMMARIZE_STATUS_CACHED
- SUMMARIZE_STATUS_SKIPPED
- SUMMARIZE_STATUS_ERROR
description: SummarizeStatus indicates the outcome of a summarization request.
statusDetail:
type: string
description: Human-readable detail for non-success statuses (skip reason, etc.).
description: SummarizeArticleResponse contains the LLM summarization result.
GetSummarizeArticleCacheRequest:
type: object
properties:
cacheKey:
type: string
description: Deterministic cache key computed by buildSummaryCacheKey().
description: GetSummarizeArticleCacheRequest looks up a pre-computed summary by cache key.
ListFeedDigestRequest:
type: object
properties:
variant:
type: string
description: 'Site variant: full, tech, finance, happy'
lang:
type: string
description: ISO 639-1 language code (en, fr, ar, etc.)
ListFeedDigestResponse:
type: object
properties:
categories:
type: object
additionalProperties:
$ref: '#/components/schemas/CategoryBucket'
description: Per-category buckets — keys match category names from feed config
feedStatuses:
type: object
additionalProperties:
type: string
description: |-
Per-feed status — only non-ok states emitted; absent key implies ok.
Values: empty (feed returned 0 items), timeout (timed out during fetch).
generatedAt:
type: string
description: ISO 8601 timestamp of when this digest was generated
CategoriesEntry:
type: object
properties:
key:
type: string
value:
$ref: '#/components/schemas/CategoryBucket'
FeedStatusesEntry:
type: object
properties:
key:
type: string
value:
type: string
CategoryBucket:
type: object
properties:
items:
type: array
items:
$ref: '#/components/schemas/NewsItem'
NewsItem:
type: object
properties:
source:
type: string
minLength: 1
description: Source feed name.
title:
type: string
minLength: 1
description: Article headline.
link:
type: string
description: Article URL.
publishedAt:
type: integer
format: int64
description: 'Publication time, as Unix epoch milliseconds.. Warning: Values > 2^53 may lose precision in JavaScript'
isAlert:
type: boolean
description: Whether this article triggered an alert condition.
threat:
$ref: '#/components/schemas/ThreatClassification'
location:
$ref: '#/components/schemas/GeoCoordinates'
locationName:
type: string
description: Human-readable location name.
importanceScore:
type: integer
format: int32
description: 'Composite importance score (0-100): severity × 40% + source tier × 20% + corroboration × 30% + recency × 10%.'
corroborationCount:
type: integer
format: int32
description: Number of distinct sources that reported the same story in this digest cycle.
storyMeta:
$ref: '#/components/schemas/StoryMeta'
required:
- source
- title
description: NewsItem represents a single news article from RSS feed aggregation.
ThreatClassification:
type: object
properties:
level:
type: string
enum:
- THREAT_LEVEL_UNSPECIFIED
- THREAT_LEVEL_LOW
- THREAT_LEVEL_MEDIUM
- THREAT_LEVEL_HIGH
- THREAT_LEVEL_CRITICAL
description: ThreatLevel represents the assessed threat level of a news event.
category:
type: string
description: Event category.
confidence:
type: number
maximum: 1
minimum: 0
format: double
description: Confidence score (0.0 to 1.0).
source:
type: string
description: Classification source — "keyword", "ml", or "llm".
description: ThreatClassification represents an AI-assessed threat level for a news item.
GeoCoordinates:
type: object
properties:
latitude:
type: number
maximum: 90
minimum: -90
format: double
description: Latitude in decimal degrees (-90 to 90).
longitude:
type: number
maximum: 180
minimum: -180
format: double
description: Longitude in decimal degrees (-180 to 180).
description: GeoCoordinates represents a geographic location using WGS84 coordinates.
StoryMeta:
type: object
properties:
firstSeen:
type: integer
format: int64
description: 'Epoch ms when the story first appeared in any digest cycle.. Warning: Values > 2^53 may lose precision in JavaScript'
mentionCount:
type: integer
format: int32
description: Total number of digest cycles in which this story appeared.
sourceCount:
type: integer
format: int32
description: Number of unique sources that reported this story (cached from Redis Set).
phase:
type: string
enum:
- STORY_PHASE_UNSPECIFIED
- STORY_PHASE_BREAKING
- STORY_PHASE_DEVELOPING
- STORY_PHASE_SUSTAINED
- STORY_PHASE_FADING
description: StoryPhase represents the lifecycle stage of a tracked news story.
description: StoryMeta carries cross-cycle persistence data attached to each news item.