Files
worldmonitor/docs/api/NewsService.openapi.yaml
Elie Habib 8d8cf56ce2 feat(scoring): composite importance score + story tracking infrastructure (#2604)
* feat(scoring): add composite importance score + story tracking infrastructure

- Extract SOURCE_TIERS/getSourceTier to server/_shared/source-tiers.ts so
  server handlers can import it without pulling in client-only modules;
  src/config/feeds.ts re-exports for backward compatibility
- Add story tracking Redis key helpers to cache-keys.ts
  (story:track:v1, story:sources:v1, story:peak:v1, digest:accumulator:v1)
- Export SEVERITY_SCORES from _classifier.ts for server-side score math
- Add upstashPipeline() to redis.ts for arbitrary batched Redis writes
- Add importanceScore/corroborationCount/storyPhase fields to proto,
  generated TS, and src/types NewsItem
- Add StoryMeta message and StoryPhase enum to proto
- In list-feed-digest.ts:
  - Build corroboration map across full corpus BEFORE per-category truncation
  - Compute importanceScore (severity 40% + tier 20% + corroboration 30%
    + recency 10%) per item
  - Sort by importanceScore desc before truncating at MAX_ITEMS_PER_CATEGORY
  - Write story:track / story:sources / story:peak / digest:accumulator
    to Redis in 80-story pipeline batches after each digest build

Score gate in notification-relay.cjs follows in the next PR (shadow mode,
behind IMPORTANCE_SCORE_LIVE flag). RELAY_GATES_READY removal of
/api/notify comes after 48h shadow comparison confirms parity.

* fix(scoring): add storyPhase field + regenerate proto types

- Add storyPhase to ParsedItem and toProtoItem (defaults UNSPECIFIED)
- Regenerate service_server.ts: required fields, StoryPhase type relocated
- Regenerate service_client.ts and OpenAPI docs from buf generate
- Fix typecheck:api error on missing required storyPhase in NewsItem

* fix(scoring): address all code review findings from PR #2604

P1:
- await writeStoryTracking instead of fire-and-forget to prevent
  silent data loss on edge isolate teardown
- remove duplicate upstashPipeline; use existing runRedisPipeline
- strip non-https links before Redis write (XSS prevention)
- implement storyPhase read path: HGETALL batch + computePhase()
  so BREAKING/DEVELOPING/SUSTAINED/FADING badges are now live

P2/P3:
- extend STORY_TTL 48h → 7 days (sustained stories no longer reset)
- extract SCORE_WEIGHTS named constants with rationale comment
- move SEVERITY_SCORES out of _classifier.ts into list-feed-digest.ts
- add normalizeTitle comment referencing todo #102
- pre-compute title hashes once, share between phase read + write

* fix(scoring): correct enrichment-before-scoring and write-before-read ordering

Two sequencing bugs:

1. enrichWithAiCache ran after truncation (post-slice), so items whose
   threat level was upgraded by the LLM cache could have already been
   cut from the top-20, and downgraded items kept inflated scores.
   Fix: enrich ALL items from the full corpus before scoring, so
   importanceScore always uses the final post-LLM classification level.

2. Phase HGETALL read happened before writeStoryTracking, meaning
   first-time stories had no Redis entry and always returned UNSPECIFIED
   instead of BREAKING, and all existing stories lagged one cycle behind.
   Fix: write tracking first, then read back for phase assignment.
2026-04-02 20:46:04 +04:00

360 lines
15 KiB
YAML
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
openapi: 3.1.0
info:
title: NewsService API
version: 1.0.0
paths:
/api/news/v1/summarize-article:
post:
tags:
- NewsService
summary: SummarizeArticle
description: SummarizeArticle generates an LLM summary with provider selection and fallback support.
operationId: SummarizeArticle
requestBody:
content:
application/json:
schema:
$ref: '#/components/schemas/SummarizeArticleRequest'
required: true
responses:
"200":
description: Successful response
content:
application/json:
schema:
$ref: '#/components/schemas/SummarizeArticleResponse'
"400":
description: Validation error
content:
application/json:
schema:
$ref: '#/components/schemas/ValidationError'
default:
description: Error response
content:
application/json:
schema:
$ref: '#/components/schemas/Error'
/api/news/v1/summarize-article-cache:
get:
tags:
- NewsService
summary: GetSummarizeArticleCache
description: GetSummarizeArticleCache looks up a cached summary by deterministic key (CDN-cacheable GET).
operationId: GetSummarizeArticleCache
parameters:
- name: cache_key
in: query
description: Deterministic cache key computed by buildSummaryCacheKey().
required: false
schema:
type: string
responses:
"200":
description: Successful response
content:
application/json:
schema:
$ref: '#/components/schemas/SummarizeArticleResponse'
"400":
description: Validation error
content:
application/json:
schema:
$ref: '#/components/schemas/ValidationError'
default:
description: Error response
content:
application/json:
schema:
$ref: '#/components/schemas/Error'
/api/news/v1/list-feed-digest:
get:
tags:
- NewsService
summary: ListFeedDigest
description: ListFeedDigest returns a pre-aggregated digest of all RSS feeds for a site variant.
operationId: ListFeedDigest
parameters:
- name: variant
in: query
description: 'Site variant: full, tech, finance, happy'
required: false
schema:
type: string
- name: lang
in: query
description: ISO 639-1 language code (en, fr, ar, etc.)
required: false
schema:
type: string
responses:
"200":
description: Successful response
content:
application/json:
schema:
$ref: '#/components/schemas/ListFeedDigestResponse'
"400":
description: Validation error
content:
application/json:
schema:
$ref: '#/components/schemas/ValidationError'
default:
description: Error response
content:
application/json:
schema:
$ref: '#/components/schemas/Error'
components:
schemas:
Error:
type: object
properties:
message:
type: string
description: Error message (e.g., 'user not found', 'database connection failed')
description: Error is returned when a handler encounters an error. It contains a simple error message that the developer can customize.
FieldViolation:
type: object
properties:
field:
type: string
description: The field path that failed validation (e.g., 'user.email' for nested fields). For header validation, this will be the header name (e.g., 'X-API-Key')
description:
type: string
description: Human-readable description of the validation violation (e.g., 'must be a valid email address', 'required field missing')
required:
- field
- description
description: FieldViolation describes a single validation error for a specific field.
ValidationError:
type: object
properties:
violations:
type: array
items:
$ref: '#/components/schemas/FieldViolation'
description: List of validation violations
required:
- violations
description: ValidationError is returned when request validation fails. It contains a list of field violations describing what went wrong.
SummarizeArticleRequest:
type: object
properties:
provider:
type: string
minLength: 1
description: 'LLM provider: "ollama", "groq", "openrouter"'
headlines:
type: array
items:
type: string
minItems: 1
description: Headlines to summarize (max 8 used).
minItems: 1
mode:
type: string
description: 'Summarization mode: "brief", "analysis", "translate", "" (default).'
geoContext:
type: string
description: Geographic signal context to include in the prompt.
variant:
type: string
description: 'Variant: "full", "tech", or target language for translate mode.'
lang:
type: string
description: Output language code, default "en".
systemAppend:
type: string
description: Optional system prompt append for analytical framework instructions.
required:
- provider
description: SummarizeArticleRequest specifies parameters for LLM article summarization.
SummarizeArticleResponse:
type: object
properties:
summary:
type: string
description: The generated summary text.
model:
type: string
description: Model identifier used for generation.
provider:
type: string
description: Provider that produced the result (or "cache").
tokens:
type: integer
format: int32
description: Token count from the LLM response.
fallback:
type: boolean
description: Whether the client should try the next provider in the fallback chain.
error:
type: string
description: Error message if the request failed.
errorType:
type: string
description: Error type/name (e.g. "TypeError").
status:
type: string
enum:
- SUMMARIZE_STATUS_UNSPECIFIED
- SUMMARIZE_STATUS_SUCCESS
- SUMMARIZE_STATUS_CACHED
- SUMMARIZE_STATUS_SKIPPED
- SUMMARIZE_STATUS_ERROR
description: SummarizeStatus indicates the outcome of a summarization request.
statusDetail:
type: string
description: Human-readable detail for non-success statuses (skip reason, etc.).
description: SummarizeArticleResponse contains the LLM summarization result.
GetSummarizeArticleCacheRequest:
type: object
properties:
cacheKey:
type: string
description: Deterministic cache key computed by buildSummaryCacheKey().
description: GetSummarizeArticleCacheRequest looks up a pre-computed summary by cache key.
ListFeedDigestRequest:
type: object
properties:
variant:
type: string
description: 'Site variant: full, tech, finance, happy'
lang:
type: string
description: ISO 639-1 language code (en, fr, ar, etc.)
ListFeedDigestResponse:
type: object
properties:
categories:
type: object
additionalProperties:
$ref: '#/components/schemas/CategoryBucket'
description: Per-category buckets — keys match category names from feed config
feedStatuses:
type: object
additionalProperties:
type: string
description: |-
Per-feed status — only non-ok states emitted; absent key implies ok.
Values: empty (feed returned 0 items), timeout (timed out during fetch).
generatedAt:
type: string
description: ISO 8601 timestamp of when this digest was generated
CategoriesEntry:
type: object
properties:
key:
type: string
value:
$ref: '#/components/schemas/CategoryBucket'
FeedStatusesEntry:
type: object
properties:
key:
type: string
value:
type: string
CategoryBucket:
type: object
properties:
items:
type: array
items:
$ref: '#/components/schemas/NewsItem'
NewsItem:
type: object
properties:
source:
type: string
minLength: 1
description: Source feed name.
title:
type: string
minLength: 1
description: Article headline.
link:
type: string
description: Article URL.
publishedAt:
type: integer
format: int64
description: 'Publication time, as Unix epoch milliseconds.. Warning: Values > 2^53 may lose precision in JavaScript'
isAlert:
type: boolean
description: Whether this article triggered an alert condition.
threat:
$ref: '#/components/schemas/ThreatClassification'
location:
$ref: '#/components/schemas/GeoCoordinates'
locationName:
type: string
description: Human-readable location name.
importanceScore:
type: integer
format: int32
description: |-
Composite importance score (0-100): severity × 40% + source tier × 20% + corroboration × 30% + recency × 10%.
Absent (0) when not yet scored.
corroborationCount:
type: integer
format: int32
description: Number of distinct sources that reported the same story in this digest cycle.
storyPhase:
type: string
enum:
- STORY_PHASE_UNSPECIFIED
- STORY_PHASE_BREAKING
- STORY_PHASE_DEVELOPING
- STORY_PHASE_SUSTAINED
- STORY_PHASE_FADING
description: StoryPhase represents the lifecycle stage of a tracked news story.
required:
- source
- title
description: NewsItem represents a single news article from RSS feed aggregation.
ThreatClassification:
type: object
properties:
level:
type: string
enum:
- THREAT_LEVEL_UNSPECIFIED
- THREAT_LEVEL_LOW
- THREAT_LEVEL_MEDIUM
- THREAT_LEVEL_HIGH
- THREAT_LEVEL_CRITICAL
description: ThreatLevel represents the assessed threat level of a news event.
category:
type: string
description: Event category.
confidence:
type: number
maximum: 1
minimum: 0
format: double
description: Confidence score (0.0 to 1.0).
source:
type: string
description: Classification source — "keyword", "ml", or "llm".
description: ThreatClassification represents an AI-assessed threat level for a news item.
GeoCoordinates:
type: object
properties:
latitude:
type: number
maximum: 90
minimum: -90
format: double
description: Latitude in decimal degrees (-90 to 90).
longitude:
type: number
maximum: 180
minimum: -180
format: double
description: Longitude in decimal degrees (-180 to 180).
description: GeoCoordinates represents a geographic location using WGS84 coordinates.