Files
worldmonitor/docs/api/NewsService.openapi.json
Elie Habib 8d8cf56ce2 feat(scoring): composite importance score + story tracking infrastructure (#2604)
* feat(scoring): add composite importance score + story tracking infrastructure

- Extract SOURCE_TIERS/getSourceTier to server/_shared/source-tiers.ts so
  server handlers can import it without pulling in client-only modules;
  src/config/feeds.ts re-exports for backward compatibility
- Add story tracking Redis key helpers to cache-keys.ts
  (story:track:v1, story:sources:v1, story:peak:v1, digest:accumulator:v1)
- Export SEVERITY_SCORES from _classifier.ts for server-side score math
- Add upstashPipeline() to redis.ts for arbitrary batched Redis writes
- Add importanceScore/corroborationCount/storyPhase fields to proto,
  generated TS, and src/types NewsItem
- Add StoryMeta message and StoryPhase enum to proto
- In list-feed-digest.ts:
  - Build corroboration map across full corpus BEFORE per-category truncation
  - Compute importanceScore (severity 40% + tier 20% + corroboration 30%
    + recency 10%) per item
  - Sort by importanceScore desc before truncating at MAX_ITEMS_PER_CATEGORY
  - Write story:track / story:sources / story:peak / digest:accumulator
    to Redis in 80-story pipeline batches after each digest build

Score gate in notification-relay.cjs follows in the next PR (shadow mode,
behind IMPORTANCE_SCORE_LIVE flag). RELAY_GATES_READY removal of
/api/notify comes after 48h shadow comparison confirms parity.

* fix(scoring): add storyPhase field + regenerate proto types

- Add storyPhase to ParsedItem and toProtoItem (defaults UNSPECIFIED)
- Regenerate service_server.ts: required fields, StoryPhase type relocated
- Regenerate service_client.ts and OpenAPI docs from buf generate
- Fix typecheck:api error on missing required storyPhase in NewsItem

* fix(scoring): address all code review findings from PR #2604

P1:
- await writeStoryTracking instead of fire-and-forget to prevent
  silent data loss on edge isolate teardown
- remove duplicate upstashPipeline; use existing runRedisPipeline
- strip non-https links before Redis write (XSS prevention)
- implement storyPhase read path: HGETALL batch + computePhase()
  so BREAKING/DEVELOPING/SUSTAINED/FADING badges are now live

P2/P3:
- extend STORY_TTL 48h → 7 days (sustained stories no longer reset)
- extract SCORE_WEIGHTS named constants with rationale comment
- move SEVERITY_SCORES out of _classifier.ts into list-feed-digest.ts
- add normalizeTitle comment referencing todo #102
- pre-compute title hashes once, share between phase read + write

* fix(scoring): correct enrichment-before-scoring and write-before-read ordering

Two sequencing bugs:

1. enrichWithAiCache ran after truncation (post-slice), so items whose
   threat level was upgraded by the LLM cache could have already been
   cut from the top-20, and downgraded items kept inflated scores.
   Fix: enrich ALL items from the full corpus before scoring, so
   importanceScore always uses the final post-LLM classification level.

2. Phase HGETALL read happened before writeStoryTracking, meaning
   first-time stories had no Redis entry and always returned UNSPECIFIED
   instead of BREAKING, and all existing stories lagged one cycle behind.
   Fix: write tracking first, then read back for phase assignment.
2026-04-02 20:46:04 +04:00

1 line
9.6 KiB
JSON
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
{"components":{"schemas":{"CategoriesEntry":{"properties":{"key":{"type":"string"},"value":{"$ref":"#/components/schemas/CategoryBucket"}},"type":"object"},"CategoryBucket":{"properties":{"items":{"items":{"$ref":"#/components/schemas/NewsItem"},"type":"array"}},"type":"object"},"Error":{"description":"Error is returned when a handler encounters an error. It contains a simple error message that the developer can customize.","properties":{"message":{"description":"Error message (e.g., 'user not found', 'database connection failed')","type":"string"}},"type":"object"},"FeedStatusesEntry":{"properties":{"key":{"type":"string"},"value":{"type":"string"}},"type":"object"},"FieldViolation":{"description":"FieldViolation describes a single validation error for a specific field.","properties":{"description":{"description":"Human-readable description of the validation violation (e.g., 'must be a valid email address', 'required field missing')","type":"string"},"field":{"description":"The field path that failed validation (e.g., 'user.email' for nested fields). For header validation, this will be the header name (e.g., 'X-API-Key')","type":"string"}},"required":["field","description"],"type":"object"},"GeoCoordinates":{"description":"GeoCoordinates represents a geographic location using WGS84 coordinates.","properties":{"latitude":{"description":"Latitude in decimal degrees (-90 to 90).","format":"double","maximum":90,"minimum":-90,"type":"number"},"longitude":{"description":"Longitude in decimal degrees (-180 to 180).","format":"double","maximum":180,"minimum":-180,"type":"number"}},"type":"object"},"GetSummarizeArticleCacheRequest":{"description":"GetSummarizeArticleCacheRequest looks up a pre-computed summary by cache key.","properties":{"cacheKey":{"description":"Deterministic cache key computed by buildSummaryCacheKey().","type":"string"}},"type":"object"},"ListFeedDigestRequest":{"properties":{"lang":{"description":"ISO 639-1 language code (en, fr, ar, etc.)","type":"string"},"variant":{"description":"Site variant: full, tech, finance, happy","type":"string"}},"type":"object"},"ListFeedDigestResponse":{"properties":{"categories":{"additionalProperties":{"$ref":"#/components/schemas/CategoryBucket"},"description":"Per-category buckets — keys match category names from feed config","type":"object"},"feedStatuses":{"additionalProperties":{"type":"string"},"description":"Per-feed status — only non-ok states emitted; absent key implies ok.\n Values: empty (feed returned 0 items), timeout (timed out during fetch).","type":"object"},"generatedAt":{"description":"ISO 8601 timestamp of when this digest was generated","type":"string"}},"type":"object"},"NewsItem":{"description":"NewsItem represents a single news article from RSS feed aggregation.","properties":{"corroborationCount":{"description":"Number of distinct sources that reported the same story in this digest cycle.","format":"int32","type":"integer"},"importanceScore":{"description":"Composite importance score (0-100): severity × 40% + source tier × 20% + corroboration × 30% + recency × 10%.\n Absent (0) when not yet scored.","format":"int32","type":"integer"},"isAlert":{"description":"Whether this article triggered an alert condition.","type":"boolean"},"link":{"description":"Article URL.","type":"string"},"location":{"$ref":"#/components/schemas/GeoCoordinates"},"locationName":{"description":"Human-readable location name.","type":"string"},"publishedAt":{"description":"Publication time, as Unix epoch milliseconds.. Warning: Values \u003e 2^53 may lose precision in JavaScript","format":"int64","type":"integer"},"source":{"description":"Source feed name.","minLength":1,"type":"string"},"storyPhase":{"description":"StoryPhase represents the lifecycle stage of a tracked news story.","enum":["STORY_PHASE_UNSPECIFIED","STORY_PHASE_BREAKING","STORY_PHASE_DEVELOPING","STORY_PHASE_SUSTAINED","STORY_PHASE_FADING"],"type":"string"},"threat":{"$ref":"#/components/schemas/ThreatClassification"},"title":{"description":"Article headline.","minLength":1,"type":"string"}},"required":["source","title"],"type":"object"},"SummarizeArticleRequest":{"description":"SummarizeArticleRequest specifies parameters for LLM article summarization.","properties":{"geoContext":{"description":"Geographic signal context to include in the prompt.","type":"string"},"headlines":{"items":{"description":"Headlines to summarize (max 8 used).","minItems":1,"type":"string"},"minItems":1,"type":"array"},"lang":{"description":"Output language code, default \"en\".","type":"string"},"mode":{"description":"Summarization mode: \"brief\", \"analysis\", \"translate\", \"\" (default).","type":"string"},"provider":{"description":"LLM provider: \"ollama\", \"groq\", \"openrouter\"","minLength":1,"type":"string"},"systemAppend":{"description":"Optional system prompt append for analytical framework instructions.","type":"string"},"variant":{"description":"Variant: \"full\", \"tech\", or target language for translate mode.","type":"string"}},"required":["provider"],"type":"object"},"SummarizeArticleResponse":{"description":"SummarizeArticleResponse contains the LLM summarization result.","properties":{"error":{"description":"Error message if the request failed.","type":"string"},"errorType":{"description":"Error type/name (e.g. \"TypeError\").","type":"string"},"fallback":{"description":"Whether the client should try the next provider in the fallback chain.","type":"boolean"},"model":{"description":"Model identifier used for generation.","type":"string"},"provider":{"description":"Provider that produced the result (or \"cache\").","type":"string"},"status":{"description":"SummarizeStatus indicates the outcome of a summarization request.","enum":["SUMMARIZE_STATUS_UNSPECIFIED","SUMMARIZE_STATUS_SUCCESS","SUMMARIZE_STATUS_CACHED","SUMMARIZE_STATUS_SKIPPED","SUMMARIZE_STATUS_ERROR"],"type":"string"},"statusDetail":{"description":"Human-readable detail for non-success statuses (skip reason, etc.).","type":"string"},"summary":{"description":"The generated summary text.","type":"string"},"tokens":{"description":"Token count from the LLM response.","format":"int32","type":"integer"}},"type":"object"},"ThreatClassification":{"description":"ThreatClassification represents an AI-assessed threat level for a news item.","properties":{"category":{"description":"Event category.","type":"string"},"confidence":{"description":"Confidence score (0.0 to 1.0).","format":"double","maximum":1,"minimum":0,"type":"number"},"level":{"description":"ThreatLevel represents the assessed threat level of a news event.","enum":["THREAT_LEVEL_UNSPECIFIED","THREAT_LEVEL_LOW","THREAT_LEVEL_MEDIUM","THREAT_LEVEL_HIGH","THREAT_LEVEL_CRITICAL"],"type":"string"},"source":{"description":"Classification source — \"keyword\", \"ml\", or \"llm\".","type":"string"}},"type":"object"},"ValidationError":{"description":"ValidationError is returned when request validation fails. It contains a list of field violations describing what went wrong.","properties":{"violations":{"description":"List of validation violations","items":{"$ref":"#/components/schemas/FieldViolation"},"type":"array"}},"required":["violations"],"type":"object"}}},"info":{"title":"NewsService API","version":"1.0.0"},"openapi":"3.1.0","paths":{"/api/news/v1/list-feed-digest":{"get":{"description":"ListFeedDigest returns a pre-aggregated digest of all RSS feeds for a site variant.","operationId":"ListFeedDigest","parameters":[{"description":"Site variant: full, tech, finance, happy","in":"query","name":"variant","required":false,"schema":{"type":"string"}},{"description":"ISO 639-1 language code (en, fr, ar, etc.)","in":"query","name":"lang","required":false,"schema":{"type":"string"}}],"responses":{"200":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/ListFeedDigestResponse"}}},"description":"Successful response"},"400":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/ValidationError"}}},"description":"Validation error"},"default":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Error"}}},"description":"Error response"}},"summary":"ListFeedDigest","tags":["NewsService"]}},"/api/news/v1/summarize-article":{"post":{"description":"SummarizeArticle generates an LLM summary with provider selection and fallback support.","operationId":"SummarizeArticle","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/SummarizeArticleRequest"}}},"required":true},"responses":{"200":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/SummarizeArticleResponse"}}},"description":"Successful response"},"400":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/ValidationError"}}},"description":"Validation error"},"default":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Error"}}},"description":"Error response"}},"summary":"SummarizeArticle","tags":["NewsService"]}},"/api/news/v1/summarize-article-cache":{"get":{"description":"GetSummarizeArticleCache looks up a cached summary by deterministic key (CDN-cacheable GET).","operationId":"GetSummarizeArticleCache","parameters":[{"description":"Deterministic cache key computed by buildSummaryCacheKey().","in":"query","name":"cache_key","required":false,"schema":{"type":"string"}}],"responses":{"200":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/SummarizeArticleResponse"}}},"description":"Successful response"},"400":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/ValidationError"}}},"description":"Validation error"},"default":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Error"}}},"description":"Error response"}},"summary":"GetSummarizeArticleCache","tags":["NewsService"]}}}}