mirror of
https://github.com/koala73/worldmonitor.git
synced 2026-04-26 01:24:59 +02:00
* feat(e3): story persistence tracking
Adds cross-cycle story tracking layer to the RSS digest pipeline:
- Proto: StoryMeta message + StoryPhase enum on NewsItem (fields 9-11).
importanceScore and corroborationCount stubs added for E1.
- list-feed-digest.ts: builds corroboration map across ALL items before
truncation; batch-reads existing story:track hashes from Redis; writes
HINCRBY/HSET/HSETNX/SADD/EXPIRE per story in 80-story pipeline chunks;
attaches StoryMeta (firstSeen, mentionCount, sourceCount, phase) to
each proto item using read-back data.
- cache-keys.ts: STORY_TRACK_KEY_PREFIX, STORY_SOURCES_KEY_PREFIX,
DIGEST_ACCUMULATOR_KEY_PREFIX, STORY_TRACKING_TTL_S.
- src/types/index.ts: StoryMeta, StoryPhase, NewsItem extended.
- data-loader.ts: protoItemToNewsItem maps STORY_PHASE_* → client phase.
- NewsPanel.ts: BREAKING/DEVELOPING/ONGOING phase badges in item rows.
New story first appearance: phase=BREAKING. After 2 mentions within 2h:
DEVELOPING. After 6+ mentions or >2h: SUSTAINED. If score drops below
50% of peak: FADING (used by E1; defaults to SUSTAINED for now).
Redis keys per story (48h TTL):
story:track:v1:<hash16> → hash (firstSeen,lastSeen,mentionCount,...)
story:sources:v1:<hash16> → set (feed names, for cross-source count)
* fix(e3): correct storyMeta staleness and mentionCount semantics
P1 — storyMeta was always one cycle behind because storyTracks was read
before writeStoryTracking ran. Fix: keep read-before-write but compute
storyMeta from merged in-memory state (stale.mentionCount + 1, fresh
sourceCount from corroborationMap). New stories get mentionCount=1 and
phase=BREAKING in the same cycle they first appear — no extra Redis
round-trip needed.
P2 — mentionCount incremented once per item occurrence, so a story seen
in 3 sources in its first cycle was immediately stored as mentionCount=3.
Fix: deduplicate by titleHash in writeStoryTracking so each unique story
gets exactly one HINCRBY per digest cycle regardless of source count.
SADD still collects all sources for the set key.
* fix(e3): Unicode hash collision, ALERT badge regression, FADING comment
P1 — normalizeTitle used [^\w\s] without the u flag; \w is ASCII-only
so every Arabic/CJK/Cyrillic title stripped to "" and shared one Redis
hash. Fixed: use /[^\p{L}\p{N}\s]/gu (Unicode property escapes require
the u flag).
P1 — ALERT badge was gated on !item.storyMeta, suppressing the indicator
for any tracked story regardless of isAlert. Phase and alert are
orthogonal signals; ALERT now renders unconditionally when isAlert=true.
P2 — FADING branch is intentionally inactive until E1 ships real scores
(currentScore/peakScore placeholder 0 via HSETNX). Added comment to
document the intentional ordering.
* fix(news-alerts): skip sustained/fading stories in breaking alert selectBest
Sustained and fading story phases are already well-covered by the feed;
only breaking and developing phases warrant a banner interrupt. Items
without storyMeta (phase unspecified) pass through unchanged.
Fixes gap C from docs/plans/2026-04-02-003-fix-news-alerts-pr-gaps-plan.md
* fix(e3): remove rebase artifacts from list-feed-digest
Removes a stray closing brace, duplicate ASCII normalizeTitle
(Unicode-aware version from the fix commit is correct), and
a leftover storyPhase assignment that references a removed field.
All typecheck and typecheck:api pass clean.
1 line
10 KiB
JSON
1 line
10 KiB
JSON
{"components":{"schemas":{"CategoriesEntry":{"properties":{"key":{"type":"string"},"value":{"$ref":"#/components/schemas/CategoryBucket"}},"type":"object"},"CategoryBucket":{"properties":{"items":{"items":{"$ref":"#/components/schemas/NewsItem"},"type":"array"}},"type":"object"},"Error":{"description":"Error is returned when a handler encounters an error. It contains a simple error message that the developer can customize.","properties":{"message":{"description":"Error message (e.g., 'user not found', 'database connection failed')","type":"string"}},"type":"object"},"FeedStatusesEntry":{"properties":{"key":{"type":"string"},"value":{"type":"string"}},"type":"object"},"FieldViolation":{"description":"FieldViolation describes a single validation error for a specific field.","properties":{"description":{"description":"Human-readable description of the validation violation (e.g., 'must be a valid email address', 'required field missing')","type":"string"},"field":{"description":"The field path that failed validation (e.g., 'user.email' for nested fields). For header validation, this will be the header name (e.g., 'X-API-Key')","type":"string"}},"required":["field","description"],"type":"object"},"GeoCoordinates":{"description":"GeoCoordinates represents a geographic location using WGS84 coordinates.","properties":{"latitude":{"description":"Latitude in decimal degrees (-90 to 90).","format":"double","maximum":90,"minimum":-90,"type":"number"},"longitude":{"description":"Longitude in decimal degrees (-180 to 180).","format":"double","maximum":180,"minimum":-180,"type":"number"}},"type":"object"},"GetSummarizeArticleCacheRequest":{"description":"GetSummarizeArticleCacheRequest looks up a pre-computed summary by cache key.","properties":{"cacheKey":{"description":"Deterministic cache key computed by buildSummaryCacheKey().","type":"string"}},"type":"object"},"ListFeedDigestRequest":{"properties":{"lang":{"description":"ISO 639-1 language code (en, fr, ar, etc.)","type":"string"},"variant":{"description":"Site variant: full, tech, finance, happy","type":"string"}},"type":"object"},"ListFeedDigestResponse":{"properties":{"categories":{"additionalProperties":{"$ref":"#/components/schemas/CategoryBucket"},"description":"Per-category buckets — keys match category names from feed config","type":"object"},"feedStatuses":{"additionalProperties":{"type":"string"},"description":"Per-feed status — only non-ok states emitted; absent key implies ok.\n Values: empty (feed returned 0 items), timeout (timed out during fetch).","type":"object"},"generatedAt":{"description":"ISO 8601 timestamp of when this digest was generated","type":"string"}},"type":"object"},"NewsItem":{"description":"NewsItem represents a single news article from RSS feed aggregation.","properties":{"corroborationCount":{"description":"Number of distinct sources that reported the same story in this digest cycle.","format":"int32","type":"integer"},"importanceScore":{"description":"Composite importance score (0-100): severity × 40% + source tier × 20% + corroboration × 30% + recency × 10%.","format":"int32","type":"integer"},"isAlert":{"description":"Whether this article triggered an alert condition.","type":"boolean"},"link":{"description":"Article URL.","type":"string"},"location":{"$ref":"#/components/schemas/GeoCoordinates"},"locationName":{"description":"Human-readable location name.","type":"string"},"publishedAt":{"description":"Publication time, as Unix epoch milliseconds.. Warning: Values \u003e 2^53 may lose precision in JavaScript","format":"int64","type":"integer"},"source":{"description":"Source feed name.","minLength":1,"type":"string"},"storyMeta":{"$ref":"#/components/schemas/StoryMeta"},"threat":{"$ref":"#/components/schemas/ThreatClassification"},"title":{"description":"Article headline.","minLength":1,"type":"string"}},"required":["source","title"],"type":"object"},"StoryMeta":{"description":"StoryMeta carries cross-cycle persistence data attached to each news item.","properties":{"firstSeen":{"description":"Epoch ms when the story first appeared in any digest cycle.. Warning: Values \u003e 2^53 may lose precision in JavaScript","format":"int64","type":"integer"},"mentionCount":{"description":"Total number of digest cycles in which this story appeared.","format":"int32","type":"integer"},"phase":{"description":"StoryPhase represents the lifecycle stage of a tracked news story.","enum":["STORY_PHASE_UNSPECIFIED","STORY_PHASE_BREAKING","STORY_PHASE_DEVELOPING","STORY_PHASE_SUSTAINED","STORY_PHASE_FADING"],"type":"string"},"sourceCount":{"description":"Number of unique sources that reported this story (cached from Redis Set).","format":"int32","type":"integer"}},"type":"object"},"SummarizeArticleRequest":{"description":"SummarizeArticleRequest specifies parameters for LLM article summarization.","properties":{"geoContext":{"description":"Geographic signal context to include in the prompt.","type":"string"},"headlines":{"items":{"description":"Headlines to summarize (max 8 used).","minItems":1,"type":"string"},"minItems":1,"type":"array"},"lang":{"description":"Output language code, default \"en\".","type":"string"},"mode":{"description":"Summarization mode: \"brief\", \"analysis\", \"translate\", \"\" (default).","type":"string"},"provider":{"description":"LLM provider: \"ollama\", \"groq\", \"openrouter\"","minLength":1,"type":"string"},"systemAppend":{"description":"Optional system prompt append for analytical framework instructions.","type":"string"},"variant":{"description":"Variant: \"full\", \"tech\", or target language for translate mode.","type":"string"}},"required":["provider"],"type":"object"},"SummarizeArticleResponse":{"description":"SummarizeArticleResponse contains the LLM summarization result.","properties":{"error":{"description":"Error message if the request failed.","type":"string"},"errorType":{"description":"Error type/name (e.g. \"TypeError\").","type":"string"},"fallback":{"description":"Whether the client should try the next provider in the fallback chain.","type":"boolean"},"model":{"description":"Model identifier used for generation.","type":"string"},"provider":{"description":"Provider that produced the result (or \"cache\").","type":"string"},"status":{"description":"SummarizeStatus indicates the outcome of a summarization request.","enum":["SUMMARIZE_STATUS_UNSPECIFIED","SUMMARIZE_STATUS_SUCCESS","SUMMARIZE_STATUS_CACHED","SUMMARIZE_STATUS_SKIPPED","SUMMARIZE_STATUS_ERROR"],"type":"string"},"statusDetail":{"description":"Human-readable detail for non-success statuses (skip reason, etc.).","type":"string"},"summary":{"description":"The generated summary text.","type":"string"},"tokens":{"description":"Token count from the LLM response.","format":"int32","type":"integer"}},"type":"object"},"ThreatClassification":{"description":"ThreatClassification represents an AI-assessed threat level for a news item.","properties":{"category":{"description":"Event category.","type":"string"},"confidence":{"description":"Confidence score (0.0 to 1.0).","format":"double","maximum":1,"minimum":0,"type":"number"},"level":{"description":"ThreatLevel represents the assessed threat level of a news event.","enum":["THREAT_LEVEL_UNSPECIFIED","THREAT_LEVEL_LOW","THREAT_LEVEL_MEDIUM","THREAT_LEVEL_HIGH","THREAT_LEVEL_CRITICAL"],"type":"string"},"source":{"description":"Classification source — \"keyword\", \"ml\", or \"llm\".","type":"string"}},"type":"object"},"ValidationError":{"description":"ValidationError is returned when request validation fails. It contains a list of field violations describing what went wrong.","properties":{"violations":{"description":"List of validation violations","items":{"$ref":"#/components/schemas/FieldViolation"},"type":"array"}},"required":["violations"],"type":"object"}}},"info":{"title":"NewsService API","version":"1.0.0"},"openapi":"3.1.0","paths":{"/api/news/v1/list-feed-digest":{"get":{"description":"ListFeedDigest returns a pre-aggregated digest of all RSS feeds for a site variant.","operationId":"ListFeedDigest","parameters":[{"description":"Site variant: full, tech, finance, happy","in":"query","name":"variant","required":false,"schema":{"type":"string"}},{"description":"ISO 639-1 language code (en, fr, ar, etc.)","in":"query","name":"lang","required":false,"schema":{"type":"string"}}],"responses":{"200":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/ListFeedDigestResponse"}}},"description":"Successful response"},"400":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/ValidationError"}}},"description":"Validation error"},"default":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Error"}}},"description":"Error response"}},"summary":"ListFeedDigest","tags":["NewsService"]}},"/api/news/v1/summarize-article":{"post":{"description":"SummarizeArticle generates an LLM summary with provider selection and fallback support.","operationId":"SummarizeArticle","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/SummarizeArticleRequest"}}},"required":true},"responses":{"200":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/SummarizeArticleResponse"}}},"description":"Successful response"},"400":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/ValidationError"}}},"description":"Validation error"},"default":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Error"}}},"description":"Error response"}},"summary":"SummarizeArticle","tags":["NewsService"]}},"/api/news/v1/summarize-article-cache":{"get":{"description":"GetSummarizeArticleCache looks up a cached summary by deterministic key (CDN-cacheable GET).","operationId":"GetSummarizeArticleCache","parameters":[{"description":"Deterministic cache key computed by buildSummaryCacheKey().","in":"query","name":"cache_key","required":false,"schema":{"type":"string"}}],"responses":{"200":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/SummarizeArticleResponse"}}},"description":"Successful response"},"400":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/ValidationError"}}},"description":"Validation error"},"default":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Error"}}},"description":"Error response"}},"summary":"GetSummarizeArticleCache","tags":["NewsService"]}}}} |