mirror of
https://github.com/koala73/worldmonitor.git
synced 2026-04-25 17:14:57 +02:00
* feat(e3): story persistence tracking
Adds cross-cycle story tracking layer to the RSS digest pipeline:
- Proto: StoryMeta message + StoryPhase enum on NewsItem (fields 9-11).
importanceScore and corroborationCount stubs added for E1.
- list-feed-digest.ts: builds corroboration map across ALL items before
truncation; batch-reads existing story:track hashes from Redis; writes
HINCRBY/HSET/HSETNX/SADD/EXPIRE per story in 80-story pipeline chunks;
attaches StoryMeta (firstSeen, mentionCount, sourceCount, phase) to
each proto item using read-back data.
- cache-keys.ts: STORY_TRACK_KEY_PREFIX, STORY_SOURCES_KEY_PREFIX,
DIGEST_ACCUMULATOR_KEY_PREFIX, STORY_TRACKING_TTL_S.
- src/types/index.ts: StoryMeta, StoryPhase, NewsItem extended.
- data-loader.ts: protoItemToNewsItem maps STORY_PHASE_* → client phase.
- NewsPanel.ts: BREAKING/DEVELOPING/ONGOING phase badges in item rows.
New story first appearance: phase=BREAKING. After 2 mentions within 2h:
DEVELOPING. After 6+ mentions or >2h: SUSTAINED. If score drops below
50% of peak: FADING (used by E1; defaults to SUSTAINED for now).
Redis keys per story (48h TTL):
story:track:v1:<hash16> → hash (firstSeen,lastSeen,mentionCount,...)
story:sources:v1:<hash16> → set (feed names, for cross-source count)
* fix(e3): correct storyMeta staleness and mentionCount semantics
P1 — storyMeta was always one cycle behind because storyTracks was read
before writeStoryTracking ran. Fix: keep read-before-write but compute
storyMeta from merged in-memory state (stale.mentionCount + 1, fresh
sourceCount from corroborationMap). New stories get mentionCount=1 and
phase=BREAKING in the same cycle they first appear — no extra Redis
round-trip needed.
P2 — mentionCount incremented once per item occurrence, so a story seen
in 3 sources in its first cycle was immediately stored as mentionCount=3.
Fix: deduplicate by titleHash in writeStoryTracking so each unique story
gets exactly one HINCRBY per digest cycle regardless of source count.
SADD still collects all sources for the set key.
* fix(e3): Unicode hash collision, ALERT badge regression, FADING comment
P1 — normalizeTitle used [^\w\s] without the u flag; \w is ASCII-only
so every Arabic/CJK/Cyrillic title stripped to "" and shared one Redis
hash. Fixed: use /[^\p{L}\p{N}\s]/gu (Unicode property escapes require
the u flag).
P1 — ALERT badge was gated on !item.storyMeta, suppressing the indicator
for any tracked story regardless of isAlert. Phase and alert are
orthogonal signals; ALERT now renders unconditionally when isAlert=true.
P2 — FADING branch is intentionally inactive until E1 ships real scores
(currentScore/peakScore placeholder 0 via HSETNX). Added comment to
document the intentional ordering.
* fix(news-alerts): skip sustained/fading stories in breaking alert selectBest
Sustained and fading story phases are already well-covered by the feed;
only breaking and developing phases warrant a banner interrupt. Items
without storyMeta (phase unspecified) pass through unchanged.
Fixes gap C from docs/plans/2026-04-02-003-fix-news-alerts-pr-gaps-plan.md
* fix(e3): remove rebase artifacts from list-feed-digest
Removes a stray closing brace, duplicate ASCII normalizeTitle
(Unicode-aware version from the fix commit is correct), and
a leftover storyPhase assignment that references a removed field.
All typecheck and typecheck:api pass clean.
376 lines
16 KiB
YAML
376 lines
16 KiB
YAML
openapi: 3.1.0
|
||
info:
|
||
title: NewsService API
|
||
version: 1.0.0
|
||
paths:
|
||
/api/news/v1/summarize-article:
|
||
post:
|
||
tags:
|
||
- NewsService
|
||
summary: SummarizeArticle
|
||
description: SummarizeArticle generates an LLM summary with provider selection and fallback support.
|
||
operationId: SummarizeArticle
|
||
requestBody:
|
||
content:
|
||
application/json:
|
||
schema:
|
||
$ref: '#/components/schemas/SummarizeArticleRequest'
|
||
required: true
|
||
responses:
|
||
"200":
|
||
description: Successful response
|
||
content:
|
||
application/json:
|
||
schema:
|
||
$ref: '#/components/schemas/SummarizeArticleResponse'
|
||
"400":
|
||
description: Validation error
|
||
content:
|
||
application/json:
|
||
schema:
|
||
$ref: '#/components/schemas/ValidationError'
|
||
default:
|
||
description: Error response
|
||
content:
|
||
application/json:
|
||
schema:
|
||
$ref: '#/components/schemas/Error'
|
||
/api/news/v1/summarize-article-cache:
|
||
get:
|
||
tags:
|
||
- NewsService
|
||
summary: GetSummarizeArticleCache
|
||
description: GetSummarizeArticleCache looks up a cached summary by deterministic key (CDN-cacheable GET).
|
||
operationId: GetSummarizeArticleCache
|
||
parameters:
|
||
- name: cache_key
|
||
in: query
|
||
description: Deterministic cache key computed by buildSummaryCacheKey().
|
||
required: false
|
||
schema:
|
||
type: string
|
||
responses:
|
||
"200":
|
||
description: Successful response
|
||
content:
|
||
application/json:
|
||
schema:
|
||
$ref: '#/components/schemas/SummarizeArticleResponse'
|
||
"400":
|
||
description: Validation error
|
||
content:
|
||
application/json:
|
||
schema:
|
||
$ref: '#/components/schemas/ValidationError'
|
||
default:
|
||
description: Error response
|
||
content:
|
||
application/json:
|
||
schema:
|
||
$ref: '#/components/schemas/Error'
|
||
/api/news/v1/list-feed-digest:
|
||
get:
|
||
tags:
|
||
- NewsService
|
||
summary: ListFeedDigest
|
||
description: ListFeedDigest returns a pre-aggregated digest of all RSS feeds for a site variant.
|
||
operationId: ListFeedDigest
|
||
parameters:
|
||
- name: variant
|
||
in: query
|
||
description: 'Site variant: full, tech, finance, happy'
|
||
required: false
|
||
schema:
|
||
type: string
|
||
- name: lang
|
||
in: query
|
||
description: ISO 639-1 language code (en, fr, ar, etc.)
|
||
required: false
|
||
schema:
|
||
type: string
|
||
responses:
|
||
"200":
|
||
description: Successful response
|
||
content:
|
||
application/json:
|
||
schema:
|
||
$ref: '#/components/schemas/ListFeedDigestResponse'
|
||
"400":
|
||
description: Validation error
|
||
content:
|
||
application/json:
|
||
schema:
|
||
$ref: '#/components/schemas/ValidationError'
|
||
default:
|
||
description: Error response
|
||
content:
|
||
application/json:
|
||
schema:
|
||
$ref: '#/components/schemas/Error'
|
||
components:
|
||
schemas:
|
||
Error:
|
||
type: object
|
||
properties:
|
||
message:
|
||
type: string
|
||
description: Error message (e.g., 'user not found', 'database connection failed')
|
||
description: Error is returned when a handler encounters an error. It contains a simple error message that the developer can customize.
|
||
FieldViolation:
|
||
type: object
|
||
properties:
|
||
field:
|
||
type: string
|
||
description: The field path that failed validation (e.g., 'user.email' for nested fields). For header validation, this will be the header name (e.g., 'X-API-Key')
|
||
description:
|
||
type: string
|
||
description: Human-readable description of the validation violation (e.g., 'must be a valid email address', 'required field missing')
|
||
required:
|
||
- field
|
||
- description
|
||
description: FieldViolation describes a single validation error for a specific field.
|
||
ValidationError:
|
||
type: object
|
||
properties:
|
||
violations:
|
||
type: array
|
||
items:
|
||
$ref: '#/components/schemas/FieldViolation'
|
||
description: List of validation violations
|
||
required:
|
||
- violations
|
||
description: ValidationError is returned when request validation fails. It contains a list of field violations describing what went wrong.
|
||
SummarizeArticleRequest:
|
||
type: object
|
||
properties:
|
||
provider:
|
||
type: string
|
||
minLength: 1
|
||
description: 'LLM provider: "ollama", "groq", "openrouter"'
|
||
headlines:
|
||
type: array
|
||
items:
|
||
type: string
|
||
minItems: 1
|
||
description: Headlines to summarize (max 8 used).
|
||
minItems: 1
|
||
mode:
|
||
type: string
|
||
description: 'Summarization mode: "brief", "analysis", "translate", "" (default).'
|
||
geoContext:
|
||
type: string
|
||
description: Geographic signal context to include in the prompt.
|
||
variant:
|
||
type: string
|
||
description: 'Variant: "full", "tech", or target language for translate mode.'
|
||
lang:
|
||
type: string
|
||
description: Output language code, default "en".
|
||
systemAppend:
|
||
type: string
|
||
description: Optional system prompt append for analytical framework instructions.
|
||
required:
|
||
- provider
|
||
description: SummarizeArticleRequest specifies parameters for LLM article summarization.
|
||
SummarizeArticleResponse:
|
||
type: object
|
||
properties:
|
||
summary:
|
||
type: string
|
||
description: The generated summary text.
|
||
model:
|
||
type: string
|
||
description: Model identifier used for generation.
|
||
provider:
|
||
type: string
|
||
description: Provider that produced the result (or "cache").
|
||
tokens:
|
||
type: integer
|
||
format: int32
|
||
description: Token count from the LLM response.
|
||
fallback:
|
||
type: boolean
|
||
description: Whether the client should try the next provider in the fallback chain.
|
||
error:
|
||
type: string
|
||
description: Error message if the request failed.
|
||
errorType:
|
||
type: string
|
||
description: Error type/name (e.g. "TypeError").
|
||
status:
|
||
type: string
|
||
enum:
|
||
- SUMMARIZE_STATUS_UNSPECIFIED
|
||
- SUMMARIZE_STATUS_SUCCESS
|
||
- SUMMARIZE_STATUS_CACHED
|
||
- SUMMARIZE_STATUS_SKIPPED
|
||
- SUMMARIZE_STATUS_ERROR
|
||
description: SummarizeStatus indicates the outcome of a summarization request.
|
||
statusDetail:
|
||
type: string
|
||
description: Human-readable detail for non-success statuses (skip reason, etc.).
|
||
description: SummarizeArticleResponse contains the LLM summarization result.
|
||
GetSummarizeArticleCacheRequest:
|
||
type: object
|
||
properties:
|
||
cacheKey:
|
||
type: string
|
||
description: Deterministic cache key computed by buildSummaryCacheKey().
|
||
description: GetSummarizeArticleCacheRequest looks up a pre-computed summary by cache key.
|
||
ListFeedDigestRequest:
|
||
type: object
|
||
properties:
|
||
variant:
|
||
type: string
|
||
description: 'Site variant: full, tech, finance, happy'
|
||
lang:
|
||
type: string
|
||
description: ISO 639-1 language code (en, fr, ar, etc.)
|
||
ListFeedDigestResponse:
|
||
type: object
|
||
properties:
|
||
categories:
|
||
type: object
|
||
additionalProperties:
|
||
$ref: '#/components/schemas/CategoryBucket'
|
||
description: Per-category buckets — keys match category names from feed config
|
||
feedStatuses:
|
||
type: object
|
||
additionalProperties:
|
||
type: string
|
||
description: |-
|
||
Per-feed status — only non-ok states emitted; absent key implies ok.
|
||
Values: empty (feed returned 0 items), timeout (timed out during fetch).
|
||
generatedAt:
|
||
type: string
|
||
description: ISO 8601 timestamp of when this digest was generated
|
||
CategoriesEntry:
|
||
type: object
|
||
properties:
|
||
key:
|
||
type: string
|
||
value:
|
||
$ref: '#/components/schemas/CategoryBucket'
|
||
FeedStatusesEntry:
|
||
type: object
|
||
properties:
|
||
key:
|
||
type: string
|
||
value:
|
||
type: string
|
||
CategoryBucket:
|
||
type: object
|
||
properties:
|
||
items:
|
||
type: array
|
||
items:
|
||
$ref: '#/components/schemas/NewsItem'
|
||
NewsItem:
|
||
type: object
|
||
properties:
|
||
source:
|
||
type: string
|
||
minLength: 1
|
||
description: Source feed name.
|
||
title:
|
||
type: string
|
||
minLength: 1
|
||
description: Article headline.
|
||
link:
|
||
type: string
|
||
description: Article URL.
|
||
publishedAt:
|
||
type: integer
|
||
format: int64
|
||
description: 'Publication time, as Unix epoch milliseconds.. Warning: Values > 2^53 may lose precision in JavaScript'
|
||
isAlert:
|
||
type: boolean
|
||
description: Whether this article triggered an alert condition.
|
||
threat:
|
||
$ref: '#/components/schemas/ThreatClassification'
|
||
location:
|
||
$ref: '#/components/schemas/GeoCoordinates'
|
||
locationName:
|
||
type: string
|
||
description: Human-readable location name.
|
||
importanceScore:
|
||
type: integer
|
||
format: int32
|
||
description: 'Composite importance score (0-100): severity × 40% + source tier × 20% + corroboration × 30% + recency × 10%.'
|
||
corroborationCount:
|
||
type: integer
|
||
format: int32
|
||
description: Number of distinct sources that reported the same story in this digest cycle.
|
||
storyMeta:
|
||
$ref: '#/components/schemas/StoryMeta'
|
||
required:
|
||
- source
|
||
- title
|
||
description: NewsItem represents a single news article from RSS feed aggregation.
|
||
ThreatClassification:
|
||
type: object
|
||
properties:
|
||
level:
|
||
type: string
|
||
enum:
|
||
- THREAT_LEVEL_UNSPECIFIED
|
||
- THREAT_LEVEL_LOW
|
||
- THREAT_LEVEL_MEDIUM
|
||
- THREAT_LEVEL_HIGH
|
||
- THREAT_LEVEL_CRITICAL
|
||
description: ThreatLevel represents the assessed threat level of a news event.
|
||
category:
|
||
type: string
|
||
description: Event category.
|
||
confidence:
|
||
type: number
|
||
maximum: 1
|
||
minimum: 0
|
||
format: double
|
||
description: Confidence score (0.0 to 1.0).
|
||
source:
|
||
type: string
|
||
description: Classification source — "keyword", "ml", or "llm".
|
||
description: ThreatClassification represents an AI-assessed threat level for a news item.
|
||
GeoCoordinates:
|
||
type: object
|
||
properties:
|
||
latitude:
|
||
type: number
|
||
maximum: 90
|
||
minimum: -90
|
||
format: double
|
||
description: Latitude in decimal degrees (-90 to 90).
|
||
longitude:
|
||
type: number
|
||
maximum: 180
|
||
minimum: -180
|
||
format: double
|
||
description: Longitude in decimal degrees (-180 to 180).
|
||
description: GeoCoordinates represents a geographic location using WGS84 coordinates.
|
||
StoryMeta:
|
||
type: object
|
||
properties:
|
||
firstSeen:
|
||
type: integer
|
||
format: int64
|
||
description: 'Epoch ms when the story first appeared in any digest cycle.. Warning: Values > 2^53 may lose precision in JavaScript'
|
||
mentionCount:
|
||
type: integer
|
||
format: int32
|
||
description: Total number of digest cycles in which this story appeared.
|
||
sourceCount:
|
||
type: integer
|
||
format: int32
|
||
description: Number of unique sources that reported this story (cached from Redis Set).
|
||
phase:
|
||
type: string
|
||
enum:
|
||
- STORY_PHASE_UNSPECIFIED
|
||
- STORY_PHASE_BREAKING
|
||
- STORY_PHASE_DEVELOPING
|
||
- STORY_PHASE_SUSTAINED
|
||
- STORY_PHASE_FADING
|
||
description: StoryPhase represents the lifecycle stage of a tracked news story.
|
||
description: StoryMeta carries cross-cycle persistence data attached to each news item.
|