test(usage): assert telemetry payload + identity resolver + operator guide

- tests/usage-telemetry-emission.test.mts stubs globalThis.fetch to
  capture the Axiom ingest POST body and asserts the four review-flagged
  fields end-to-end through the gateway: domain on /api/v2/<svc>/* (was
  "v2"), customer_id on legacy premium bearer success (was null/anon),
  tier on entitlement-gated success via the Convex fallback path (was 0),
  plus a ctx-optional regression guard
- server/__tests__/usage-identity.test.ts unit-tests the pure
  buildUsageIdentity() resolver across every auth_kind branch, tier
  coercion, and the secret-handling invariant (raw enterprise key never
  lands in any output field)
- docs/architecture/usage-telemetry.md is the operator + dev guide:
  field reference, architecture, configuration, failure modes, local
  workflow, eight Axiom APL recipes, and runbooks for adding fields /
  new gateway return paths

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Sebastien Melki
2026-04-25 15:53:44 +03:00
parent dbcea4d0c4
commit 53d295f176
3 changed files with 853 additions and 0 deletions

View File

@@ -0,0 +1,351 @@
# Usage telemetry (Axiom)
Operator + developer guide to the gateway's per-request usage telemetry pipeline.
Implements the requirements in `docs/brainstorms/2026-04-24-axiom-api-observability-requirements.md`.
---
## What it is
Every inbound API request that hits `createDomainGateway()` emits one structured
event to Axiom describing **who** called **what**, **how it was authenticated**,
**what it cost**, and **how it was served**. Deep fetch helpers
(`fetchJson`, `cachedFetchJsonWithMeta`) emit a second event type per upstream
call so customer × provider attribution is reconstructible.
It is **observability only** — never on the request-critical path. The whole
sink runs inside `ctx.waitUntil(...)` with a 1.5s timeout, no retries, and a
circuit breaker that trips on 5% failure ratio over a 5-minute window.
## What you get out of it
Two event types in dataset `wm_api_usage`:
### `request` (one per inbound request)
| Field | Example | Notes |
|--------------------|-------------------------------------------|----------------------------------------------|
| `event_type` | `"request"` | |
| `request_id` | `"req_xxx"` | from `x-request-id` or generated |
| `route` | `/api/market/v1/analyze-stock` | |
| `domain` | `"market"` | strips leading `vN` for `/api/v2/<svc>/…` |
| `method`, `status` | `"GET"`, `200` | |
| `duration_ms` | `412` | wall-clock at the gateway |
| `req_bytes`, `res_bytes` | | response counted only on 200/304 GET |
| `customer_id` | Clerk user ID, org ID, enterprise slug, or widget key | `null` only for anon |
| `principal_id` | user ID or **hash** of API/widget key | never the raw secret |
| `auth_kind` | `clerk_jwt` \| `user_api_key` \| `enterprise_api_key` \| `widget_key` \| `anon` | |
| `tier` | `0` free / `1` pro / `2` api / `3` enterprise | `0` if unknown |
| `cache_tier` | `fast` \| `medium` \| `slow` \| `slow-browser` \| `static` \| `daily` \| `no-store` | only on 200/304 |
| `country`, `execution_region` | `"US"`, `"iad1"` | Vercel-provided |
| `execution_plane` | `"vercel-edge"` | |
| `origin_kind` | `api-key` \| `oauth` \| `browser-same-origin` \| `browser-cross-origin` \| `null` | derived from headers by `deriveOriginKind()``mcp` and `internal-cron` exist in the `OriginKind` type for upstream/future use but are not currently emitted on the request path |
| `ua_hash` | SHA-256 of the UA | hashed so PII doesn't land in Axiom |
| `sentry_trace_id` | `"abc123…"` | join key into Sentry |
| `reason` | `ok` \| `origin_403` \| `rate_limit_429` | |
### `upstream` (one per outbound fetch from a request handler)
| Field | Example |
|----------------------|--------------------------|
| `request_id` | links back to the parent |
| `provider`, `host` | `"yahoo-finance"`, `"query1.finance.yahoo.com"` |
| `operation` | logical op name set by the helper |
| `status`, `duration_ms`, `request_bytes`, `response_bytes` | |
| `cache_status` | `miss` \| `fresh` \| `stale-while-revalidate` \| `neg-sentinel` |
| `customer_id`, `route`, `tier` | inherited from the inbound request via AsyncLocalStorage |
## What it answers
A non-exhaustive list — copy-paste APL queries are in the **Analysis** section below.
- Per-customer request volume, p50/p95 latency, error rate
- Per-route premium-vs-free traffic mix
- CDN cache-tier distribution per route (calibrate `RPC_CACHE_TIER`)
- Top-of-funnel for noisy abusers (`auth_kind=anon` × `country` × `route`)
- Upstream provider cost per customer (`upstream` join `request` on `request_id`)
- Bearer-vs-API-key vs anon ratio per premium route
- Region heatmaps (`execution_region` × `route`)
---
## Architecture
```
┌─────────────────────────────────────────────────────┐
│ Vercel Edge handler │
│ │
request ──► │ createDomainGateway() │
│ auth resolution → usage:UsageIdentityInput │
│ runWithUsageScope({ ctx, customerId, route, … }) │
│ └─ user handler ── fetchJson / cachedFetch... ─┼─► upstream
│ (reads scope, emits │ API
│ upstream event) │
│ emitRequest(...) at every return point ──────────┼────► Axiom
│ └─ ctx.waitUntil(emitUsageEvents(...)) │ wm_api_usage
└─────────────────────────────────────────────────────┘
```
Code map:
| Concern | File |
|----------------------------------------|--------------------------------------------|
| Gateway emit points + identity accumulator | `server/gateway.ts` |
| Identity resolver (pure) | `server/_shared/usage-identity.ts` |
| Event shapes, builders, Axiom sink, breaker, ALS scope | `server/_shared/usage.ts` |
| Upstream-event emission from fetch helpers | `server/_shared/cached-fetch.ts`, `server/_shared/fetch-json.ts` |
Key invariants:
1. **Builders accept allowlisted primitives only** — they never accept
`Request`, `Response`, or untyped objects, so future field additions can't
leak by structural impossibility.
2. **`emitRequest()` fires at every gateway return path** — origin block,
OPTIONS, 401/403/404/405, rate-limit 429, ETag 304, success 200, error 500.
Adding a new return path requires adding the emit, or telemetry coverage
silently regresses.
3. **`principal_id` is a hash for secret-bearing auth** (API key, widget key)
so raw secrets never land in Axiom.
4. **Telemetry failure must not affect API availability or latency** — sink is
fire-and-forget with timeout + breaker; any error path drops the event with
a 1%-sampled `console.warn`.
---
## Configuration
Two env vars control the pipeline. Both are independent of every other system.
| Var | Required for | Behavior when missing |
|--------------------|--------------|-------------------------------------------|
| `USAGE_TELEMETRY` | Emission | Set to `1` to enable. Anything else → emission is a no-op (zero network calls, zero allocations of the event payload). |
| `AXIOM_API_TOKEN` | Delivery | Events build but `sendToAxiom` short-circuits to a 1%-sampled `[usage-telemetry] drop { reason: 'no-token' }` warning. |
Vercel project setup:
1. Axiom → create dataset **`wm_api_usage`** (the constant in
`server/_shared/usage.ts:18`; rename if you want a different name).
2. Axiom → Settings → API Tokens → create an **Ingest** token scoped to that
dataset. Copy the `xaat-…` value.
3. Vercel → Project → Settings → Environment Variables, add for the desired
environments (Production / Preview):
```
USAGE_TELEMETRY=1
AXIOM_API_TOKEN=xaat-...
```
4. Redeploy. Axiom infers schema from the first events — no upfront schema
work needed.
### Failure modes (deploy-with-Axiom-down is safe)
| Scenario | Behavior |
|---------------------------------------|------------------------------------------------------|
| `USAGE_TELEMETRY` unset | emit is a no-op, identity object is still built but discarded |
| `USAGE_TELEMETRY=1`, no token | event built, `fetch` skipped, sampled warn |
| Axiom returns non-2xx | `recordSample(false)`, sampled warn |
| Axiom timeout (>1.5s) | `AbortController` aborts, sampled warn |
| ≥5% failure ratio over 5min (≥20 samples) | breaker trips → all sends short-circuit until ratio recovers |
| Direct gateway caller passes no `ctx` | emit is a no-op (the `ctx?.waitUntil` guard) |
### Kill switch
There is no in-code feature flag separate from the env vars. To disable in
production: set `USAGE_TELEMETRY=0` (or unset it) and redeploy. Existing
in-flight requests drain on the next isolate cycle.
---
## Local development & testing
### Smoke test without Axiom
Just run the dev server with neither env var set. Hit any route. The path is
fully exercised — only the Axiom POST is skipped.
```sh
vercel dev
curl http://localhost:3000/api/seismology/v1/list-earthquakes
```
In any non-`production` build, the response carries an `x-usage-telemetry`
header. Use it as a wiring check:
```sh
curl -sI http://localhost:3000/api/seismology/v1/list-earthquakes | grep -i x-usage
# x-usage-telemetry: off # USAGE_TELEMETRY unset
# x-usage-telemetry: ok # enabled, breaker closed
# x-usage-telemetry: degraded # breaker tripped — Axiom is failing
```
### End-to-end with a real Axiom dataset
```sh
USAGE_TELEMETRY=1 AXIOM_API_TOKEN=xaat-... vercel dev
curl http://localhost:3000/api/market/v1/list-market-quotes?symbols=AAPL
```
Then in Axiom:
```kusto
['wm_api_usage']
| where _time > ago(2m)
| project _time, route, status, customer_id, auth_kind, tier, duration_ms
```
### Automated tests
Three suites cover the pipeline:
1. **Identity unit tests** — `server/__tests__/usage-identity.test.ts` cover the
pure `buildUsageIdentity()` resolver across every `auth_kind` branch.
2. **Gateway emit assertions** — `tests/usage-telemetry-emission.test.mts`
stubs `globalThis.fetch` to capture the Axiom POST body and asserts the
`domain`, `customer_id`, `auth_kind`, and `tier` fields end-to-end through
the gateway.
3. **Auth-path regression tests** — `tests/premium-stock-gateway.test.mts` and
`tests/gateway-cdn-origin-policy.test.mts` exercise the gateway without a
`ctx` argument, locking in the "telemetry must not break direct callers"
invariant.
Run them:
```sh
npx tsx --test tests/usage-telemetry-emission.test.mts \
tests/premium-stock-gateway.test.mts \
tests/gateway-cdn-origin-policy.test.mts
npx vitest run server/__tests__/usage-identity.test.ts
```
---
## Analysis recipes (Axiom APL)
All queries assume dataset `wm_api_usage`. Adjust time windows as needed.
### Per-customer request volume + error rate
```kusto
['wm_api_usage']
| where event_type == "request" and _time > ago(24h)
| summarize requests = count(),
errors_5xx = countif(status >= 500),
errors_4xx = countif(status >= 400 and status < 500),
p95_ms = percentile(duration_ms, 95)
by customer_id
| order by requests desc
```
### p50 / p95 latency per route
```kusto
['wm_api_usage']
| where event_type == "request" and _time > ago(1h)
| summarize p50 = percentile(duration_ms, 50),
p95 = percentile(duration_ms, 95),
n = count()
by route
| where n > 50
| order by p95 desc
```
### Premium vs free traffic mix per route
```kusto
['wm_api_usage']
| where event_type == "request" and _time > ago(24h)
| extend tier_bucket = case(tier >= 2, "api+ent", tier == 1, "pro", "free/anon")
| summarize n = count() by route, tier_bucket
| evaluate pivot(tier_bucket, sum(n))
| order by route asc
```
### CDN cache-tier mix per route — calibrates `RPC_CACHE_TIER`
```kusto
['wm_api_usage']
| where event_type == "request" and status == 200 and method == "GET" and _time > ago(24h)
| summarize n = count() by route, cache_tier
| evaluate pivot(cache_tier, sum(n))
| order by route asc
```
A route dominated by `slow-browser` that *should* be CDN-cached is a hint to
add an entry to `RPC_CACHE_TIER` in `server/gateway.ts`.
### Anonymous abuse hotspots
```kusto
['wm_api_usage']
| where event_type == "request" and auth_kind == "anon" and _time > ago(1h)
| summarize n = count() by route, country
| where n > 100
| order by n desc
```
### Upstream cost per customer (provider attribution)
```kusto
['wm_api_usage']
| where event_type == "upstream" and _time > ago(24h)
| summarize calls = count(),
response_bytes_mb = sum(response_bytes) / 1024.0 / 1024.0,
p95_ms = percentile(duration_ms, 95)
by customer_id, provider
| order by calls desc
```
### Cache hit ratio per provider (correctness signal)
```kusto
['wm_api_usage']
| where event_type == "upstream" and _time > ago(24h)
| summarize n = count() by provider, cache_status
| evaluate pivot(cache_status, sum(n))
| extend hit_ratio = (fresh + coalesce(['stale-while-revalidate'], 0)) * 1.0 / (fresh + miss + coalesce(['stale-while-revalidate'], 0))
| order by hit_ratio asc
```
### Sentry × Axiom join
When Sentry surfaces an exception, copy its trace ID and:
```kusto
['wm_api_usage']
| where sentry_trace_id == "<paste from Sentry>"
```
…to see the exact request envelope (route, customer, latency, cache outcome).
### Telemetry health watch
```kusto
['wm_api_usage']
| where _time > ago(1h)
| summarize events_per_min = count() by bin(_time, 1m)
| order by _time asc
```
A drop to zero with no corresponding traffic drop = breaker tripped or
Vercel/Axiom integration broken — pair it with the `[usage-telemetry] drop`
warns in Vercel logs to find the cause.
---
## Adding new telemetry fields
1. Add the field to `RequestEvent` (or `UpstreamEvent`) in
`server/_shared/usage.ts`.
2. Extend the corresponding builder (`buildRequestEvent` /
`buildUpstreamEvent`) — only allowlisted primitives, no untyped objects.
3. If the value comes from gateway state, set it on the `usage` accumulator
in `gateway.ts`. Otherwise plumb it through the builder call sites.
4. Axiom auto-discovers the new column on the next ingest. No schema migration.
5. Update this doc's field table.
## Adding a new gateway return path
If you add a new `return new Response(...)` inside `createDomainGateway()`,
**you must call `emitRequest(status, reason, cacheTier, resBytes?)` immediately
before it.** Telemetry coverage is enforced by code review, not lint. The
`reason` field uses the existing `RequestReason` union — extend it if the
return represents a new failure class.

View File

@@ -0,0 +1,143 @@
/**
* Pure-resolver tests for buildUsageIdentity().
*
* The resolver maps gateway-internal auth state to the four telemetry identity
* fields (auth_kind, principal_id, customer_id, tier). It is intentionally
* pure — no JWT verification, no key hashing of secrets, no I/O — so the
* branch matrix is trivially testable here.
*/
import { describe, expect, test } from 'vitest';
import { buildUsageIdentity, type UsageIdentityInput } from '../_shared/usage-identity';
function baseInput(overrides: Partial<UsageIdentityInput> = {}): UsageIdentityInput {
return {
sessionUserId: null,
isUserApiKey: false,
enterpriseApiKey: null,
widgetKey: null,
clerkOrgId: null,
userApiKeyCustomerRef: null,
tier: null,
...overrides,
};
}
describe('buildUsageIdentity — auth_kind branches', () => {
test('user_api_key takes precedence over every other signal', () => {
const ident = buildUsageIdentity(baseInput({
isUserApiKey: true,
sessionUserId: 'user_123',
userApiKeyCustomerRef: 'customer_abc',
enterpriseApiKey: 'should-be-ignored',
widgetKey: 'should-be-ignored',
tier: 2,
}));
expect(ident.auth_kind).toBe('user_api_key');
expect(ident.principal_id).toBe('user_123');
expect(ident.customer_id).toBe('customer_abc');
expect(ident.tier).toBe(2);
});
test('user_api_key falls back to sessionUserId for customer_id when no explicit ref', () => {
const ident = buildUsageIdentity(baseInput({
isUserApiKey: true,
sessionUserId: 'user_123',
tier: 1,
}));
expect(ident.customer_id).toBe('user_123');
});
test('clerk_jwt: customer_id prefers org over user when org is present', () => {
const ident = buildUsageIdentity(baseInput({
sessionUserId: 'user_123',
clerkOrgId: 'org_acme',
tier: 1,
}));
expect(ident.auth_kind).toBe('clerk_jwt');
expect(ident.principal_id).toBe('user_123');
expect(ident.customer_id).toBe('org_acme');
expect(ident.tier).toBe(1);
});
test('clerk_jwt: customer_id falls back to user when no org', () => {
const ident = buildUsageIdentity(baseInput({
sessionUserId: 'user_123',
}));
expect(ident.customer_id).toBe('user_123');
expect(ident.tier).toBe(0);
});
test('enterprise_api_key: principal_id is hashed, not raw', () => {
const ident = buildUsageIdentity(baseInput({
enterpriseApiKey: 'wm_super_secret_key',
tier: 3,
}));
expect(ident.auth_kind).toBe('enterprise_api_key');
expect(ident.principal_id).not.toBe('wm_super_secret_key');
expect(ident.principal_id).toMatch(/^[0-9a-z]+$/);
// Customer is the unmapped sentinel until a real entry is added to ENTERPRISE_KEY_TO_CUSTOMER
expect(ident.customer_id).toBe('enterprise-unmapped');
expect(ident.tier).toBe(3);
});
test('widget_key: customer_id is the widget key itself, principal_id is hashed', () => {
const ident = buildUsageIdentity(baseInput({
widgetKey: 'widget_pub_xyz',
}));
expect(ident.auth_kind).toBe('widget_key');
expect(ident.customer_id).toBe('widget_pub_xyz');
expect(ident.principal_id).not.toBe('widget_pub_xyz');
expect(ident.principal_id).toMatch(/^[0-9a-z]+$/);
expect(ident.tier).toBe(0);
});
test('anon: every field null, tier always zero', () => {
const ident = buildUsageIdentity(baseInput());
expect(ident.auth_kind).toBe('anon');
expect(ident.principal_id).toBeNull();
expect(ident.customer_id).toBeNull();
expect(ident.tier).toBe(0);
});
test('anon: tier coerces to 0 even if input.tier was set (defensive)', () => {
// No identity signal but a leftover tier value should not show up as a mystery free row.
const ident = buildUsageIdentity(baseInput({ tier: 99 }));
expect(ident.tier).toBe(0);
});
});
describe('buildUsageIdentity — tier handling', () => {
test('null tier coerces to 0 for non-anon kinds', () => {
const ident = buildUsageIdentity(baseInput({ sessionUserId: 'u', tier: null }));
expect(ident.tier).toBe(0);
});
test('zero tier is preserved (not promoted)', () => {
const ident = buildUsageIdentity(baseInput({ sessionUserId: 'u', tier: 0 }));
expect(ident.tier).toBe(0);
});
test('integer tiers pass through unchanged', () => {
for (const t of [0, 1, 2, 3]) {
const ident = buildUsageIdentity(baseInput({ sessionUserId: 'u', tier: t }));
expect(ident.tier).toBe(t);
}
});
});
describe('buildUsageIdentity — secret handling', () => {
test('enterprise key never appears verbatim in any output field', () => {
const secret = 'wm_ent_LEAKY_VALUE_DO_NOT_LOG';
const ident = buildUsageIdentity(baseInput({ enterpriseApiKey: secret }));
expect(JSON.stringify(ident)).not.toContain(secret);
});
test('widget key appears as customer_id (intentional — widget keys are public)', () => {
// Widget keys are embeds installed on third-party sites; treating them as
// customer attribution is the contract documented in usage-identity.ts:73-79.
const ident = buildUsageIdentity(baseInput({ widgetKey: 'widget_public_xyz' }));
expect(ident.customer_id).toBe('widget_public_xyz');
});
});

View File

@@ -0,0 +1,359 @@
/**
* Asserts the Axiom telemetry payload emitted by createDomainGateway() —
* specifically the four fields the round-1 Codex review flagged:
*
* - domain (must be 'shipping' for /api/v2/shipping/* routes, not 'v2')
* - customer_id (must be populated on legacy premium bearer-token success)
* - auth_kind (must reflect the resolved identity, not stay 'anon')
* - tier (recorded when entitlement-gated routes succeed; covered indirectly
* by the legacy bearer success case via the Dodo `tier` branch)
*
* Strategy: enable telemetry (USAGE_TELEMETRY=1 + AXIOM_API_TOKEN=fake), stub
* globalThis.fetch to intercept the Axiom ingest POST, and pass a real ctx
* whose waitUntil collects the in-flight Promises so we can await them after
* the gateway returns.
*/
import assert from 'node:assert/strict';
import { createServer, type Server } from 'node:http';
import { afterEach, before, after, describe, it } from 'node:test';
import { generateKeyPair, exportJWK, SignJWT } from 'jose';
import { createDomainGateway, type GatewayCtx } from '../server/gateway.ts';
interface CapturedEvent {
event_type: string;
domain: string;
route: string;
status: number;
customer_id: string | null;
auth_kind: string;
tier: number;
}
function makeRecordingCtx(): { ctx: GatewayCtx; settled: Promise<void[]> } {
const pending: Promise<unknown>[] = [];
const ctx: GatewayCtx = {
waitUntil: (p) => { pending.push(p); },
};
return {
ctx,
get settled() { return Promise.all(pending) as Promise<void[]>; },
} as { ctx: GatewayCtx; settled: Promise<void[]> };
}
function installAxiomFetchSpy(
originalFetch: typeof fetch,
opts: { entitlementsResponse?: unknown } = {},
): {
events: CapturedEvent[];
restore: () => void;
} {
const events: CapturedEvent[] = [];
globalThis.fetch = (async (input: RequestInfo | URL, init?: RequestInit) => {
const url = typeof input === 'string' ? input : input instanceof URL ? input.toString() : input.url;
if (url.includes('api.axiom.co')) {
const body = init?.body ? JSON.parse(init.body as string) as CapturedEvent[] : [];
for (const ev of body) events.push(ev);
return new Response('{}', { status: 200 });
}
if (url.includes('/api/internal-entitlements')) {
return new Response(JSON.stringify(opts.entitlementsResponse ?? null), {
status: 200,
headers: { 'Content-Type': 'application/json' },
});
}
return originalFetch(input as Request | string | URL, init);
}) as typeof fetch;
return { events, restore: () => { globalThis.fetch = originalFetch; } };
}
const ORIGINAL_FETCH = globalThis.fetch;
const ORIGINAL_USAGE_FLAG = process.env.USAGE_TELEMETRY;
const ORIGINAL_AXIOM_TOKEN = process.env.AXIOM_API_TOKEN;
const ORIGINAL_VALID_KEYS = process.env.WORLDMONITOR_VALID_KEYS;
afterEach(() => {
globalThis.fetch = ORIGINAL_FETCH;
if (ORIGINAL_USAGE_FLAG == null) delete process.env.USAGE_TELEMETRY;
else process.env.USAGE_TELEMETRY = ORIGINAL_USAGE_FLAG;
if (ORIGINAL_AXIOM_TOKEN == null) delete process.env.AXIOM_API_TOKEN;
else process.env.AXIOM_API_TOKEN = ORIGINAL_AXIOM_TOKEN;
if (ORIGINAL_VALID_KEYS == null) delete process.env.WORLDMONITOR_VALID_KEYS;
else process.env.WORLDMONITOR_VALID_KEYS = ORIGINAL_VALID_KEYS;
});
describe('gateway telemetry payload — domain extraction', () => {
it("emits domain='shipping' for /api/v2/shipping/* routes (not 'v2')", async () => {
process.env.USAGE_TELEMETRY = '1';
process.env.AXIOM_API_TOKEN = 'test-token';
const spy = installAxiomFetchSpy(ORIGINAL_FETCH);
const handler = createDomainGateway([
{
method: 'GET',
path: '/api/v2/shipping/route-intelligence',
handler: async () => new Response('{"ok":true}', { status: 200 }),
},
]);
const recorder = makeRecordingCtx();
const res = await handler(
new Request('https://worldmonitor.app/api/v2/shipping/route-intelligence', {
headers: { Origin: 'https://worldmonitor.app' },
}),
recorder.ctx,
);
// Anonymous → 401 (premium path, missing API key + no bearer)
assert.equal(res.status, 401);
await recorder.settled;
spy.restore();
assert.equal(spy.events.length, 1, 'expected exactly one telemetry event');
const ev = spy.events[0]!;
assert.equal(ev.domain, 'shipping', `domain should strip leading vN segment, got '${ev.domain}'`);
assert.equal(ev.route, '/api/v2/shipping/route-intelligence');
assert.equal(ev.auth_kind, 'anon');
assert.equal(ev.customer_id, null);
assert.equal(ev.tier, 0);
});
it("emits domain='market' for the standard /api/<domain>/v1/<rpc> layout", async () => {
process.env.USAGE_TELEMETRY = '1';
process.env.AXIOM_API_TOKEN = 'test-token';
const spy = installAxiomFetchSpy(ORIGINAL_FETCH);
const handler = createDomainGateway([
{
method: 'GET',
path: '/api/market/v1/list-market-quotes',
handler: async () => new Response('{"ok":true}', { status: 200 }),
},
]);
const recorder = makeRecordingCtx();
const res = await handler(
new Request('https://worldmonitor.app/api/market/v1/list-market-quotes?symbols=AAPL', {
headers: { Origin: 'https://worldmonitor.app' },
}),
recorder.ctx,
);
assert.equal(res.status, 200);
await recorder.settled;
spy.restore();
assert.equal(spy.events.length, 1);
assert.equal(spy.events[0]!.domain, 'market');
});
});
describe('gateway telemetry payload — bearer identity propagation', () => {
let privateKey: CryptoKey;
let jwksServer: Server;
let jwksPort: number;
before(async () => {
const { publicKey, privateKey: pk } = await generateKeyPair('RS256');
privateKey = pk;
const publicJwk = await exportJWK(publicKey);
publicJwk.kid = 'telemetry-key-1';
publicJwk.alg = 'RS256';
publicJwk.use = 'sig';
const jwks = { keys: [publicJwk] };
jwksServer = createServer((req, res) => {
if (req.url === '/.well-known/jwks.json') {
res.writeHead(200, { 'Content-Type': 'application/json' });
res.end(JSON.stringify(jwks));
} else {
res.writeHead(404);
res.end();
}
});
await new Promise<void>((resolve) => jwksServer.listen(0, '127.0.0.1', () => resolve()));
const addr = jwksServer.address();
jwksPort = typeof addr === 'object' && addr ? addr.port : 0;
process.env.CLERK_JWT_ISSUER_DOMAIN = `http://127.0.0.1:${jwksPort}`;
});
after(async () => {
jwksServer?.close();
delete process.env.CLERK_JWT_ISSUER_DOMAIN;
});
function signToken(claims: Record<string, unknown>) {
return new SignJWT(claims)
.setProtectedHeader({ alg: 'RS256', kid: 'telemetry-key-1' })
.setIssuer(`http://127.0.0.1:${jwksPort}`)
.setAudience('convex')
.setSubject(claims.sub as string ?? 'user_test')
.setIssuedAt()
.setExpirationTime('1h')
.sign(privateKey);
}
it('records customer_id from a successful legacy premium bearer call', async () => {
process.env.USAGE_TELEMETRY = '1';
process.env.AXIOM_API_TOKEN = 'test-token';
const spy = installAxiomFetchSpy(ORIGINAL_FETCH);
const handler = createDomainGateway([
{
method: 'GET',
path: '/api/resilience/v1/get-resilience-score',
handler: async () => new Response('{"ok":true}', { status: 200 }),
},
]);
const token = await signToken({ sub: 'user_pro', plan: 'pro' });
const recorder = makeRecordingCtx();
const res = await handler(
new Request('https://worldmonitor.app/api/resilience/v1/get-resilience-score?countryCode=US', {
headers: {
Origin: 'https://worldmonitor.app',
Authorization: `Bearer ${token}`,
},
}),
recorder.ctx,
);
assert.equal(res.status, 200);
await recorder.settled;
spy.restore();
assert.equal(spy.events.length, 1, 'expected exactly one telemetry event');
const ev = spy.events[0]!;
// The whole point of fix #2: pre-fix this would have been null/anon.
assert.equal(ev.customer_id, 'user_pro', 'customer_id should be the bearer subject');
assert.equal(ev.auth_kind, 'clerk_jwt');
assert.equal(ev.domain, 'resilience');
assert.equal(ev.status, 200);
});
it("records tier=2 for an entitlement-gated success (the path the round-1 P2 fix targets)", async () => {
// /api/market/v1/analyze-stock requires tier 2 in ENDPOINT_ENTITLEMENTS.
// Pre-fix: usage.tier stayed null → emitted as 0. Post-fix: gateway re-reads
// entitlements after checkEntitlement allows the request, so tier=2 lands on
// the wire. We exercise this by stubbing the Convex entitlements fallback —
// Redis returns null without UPSTASH env, then getEntitlements falls through
// to the Convex HTTP path which we intercept via the same fetch spy.
process.env.USAGE_TELEMETRY = '1';
process.env.AXIOM_API_TOKEN = 'test-token';
process.env.CONVEX_SITE_URL = 'https://convex.test';
process.env.CONVEX_SERVER_SHARED_SECRET = 'test-shared-secret';
const fakeEntitlements = {
planKey: 'api_starter',
features: {
tier: 2,
apiAccess: true,
apiRateLimit: 1000,
maxDashboards: 10,
prioritySupport: false,
exportFormats: ['json'],
},
validUntil: Date.now() + 60_000,
};
const spy = installAxiomFetchSpy(ORIGINAL_FETCH, { entitlementsResponse: fakeEntitlements });
const handler = createDomainGateway([
{
method: 'GET',
path: '/api/market/v1/analyze-stock',
handler: async () => new Response('{"ok":true}', { status: 200 }),
},
]);
// plan: 'api' so the legacy bearer-role short-circuit (`session.role === 'pro'`)
// does NOT fire — we want the entitlement-check path that populates usage.tier.
const token = await signToken({ sub: 'user_api', plan: 'api' });
const recorder = makeRecordingCtx();
const res = await handler(
new Request('https://worldmonitor.app/api/market/v1/analyze-stock?symbol=AAPL', {
headers: {
Origin: 'https://worldmonitor.app',
Authorization: `Bearer ${token}`,
},
}),
recorder.ctx,
);
assert.equal(res.status, 200, 'entitlement-gated request with sufficient tier should succeed');
await recorder.settled;
spy.restore();
delete process.env.CONVEX_SITE_URL;
delete process.env.CONVEX_SERVER_SHARED_SECRET;
assert.equal(spy.events.length, 1);
const ev = spy.events[0]!;
assert.equal(ev.tier, 2, `tier should reflect resolved entitlement, got ${ev.tier}`);
assert.equal(ev.customer_id, 'user_api');
assert.equal(ev.auth_kind, 'clerk_jwt');
assert.equal(ev.domain, 'market');
assert.equal(ev.route, '/api/market/v1/analyze-stock');
});
it('still emits with auth_kind=anon when the bearer is invalid', async () => {
process.env.USAGE_TELEMETRY = '1';
process.env.AXIOM_API_TOKEN = 'test-token';
const spy = installAxiomFetchSpy(ORIGINAL_FETCH);
const handler = createDomainGateway([
{
method: 'GET',
path: '/api/resilience/v1/get-resilience-score',
handler: async () => new Response('{"ok":true}', { status: 200 }),
},
]);
const recorder = makeRecordingCtx();
const res = await handler(
new Request('https://worldmonitor.app/api/resilience/v1/get-resilience-score?countryCode=US', {
headers: {
Origin: 'https://worldmonitor.app',
Authorization: 'Bearer not-a-real-token',
},
}),
recorder.ctx,
);
assert.equal(res.status, 401);
await recorder.settled;
spy.restore();
assert.equal(spy.events.length, 1);
const ev = spy.events[0]!;
assert.equal(ev.auth_kind, 'anon');
assert.equal(ev.customer_id, null);
});
});
describe('gateway telemetry payload — ctx-optional safety', () => {
it('handler(req) without ctx still resolves cleanly even with telemetry on', async () => {
process.env.USAGE_TELEMETRY = '1';
process.env.AXIOM_API_TOKEN = 'test-token';
const spy = installAxiomFetchSpy(ORIGINAL_FETCH);
const handler = createDomainGateway([
{
method: 'GET',
path: '/api/market/v1/list-market-quotes',
handler: async () => new Response('{"ok":true}', { status: 200 }),
},
]);
const res = await handler(
new Request('https://worldmonitor.app/api/market/v1/list-market-quotes?symbols=AAPL', {
headers: { Origin: 'https://worldmonitor.app' },
}),
);
assert.equal(res.status, 200);
spy.restore();
// No ctx → emit short-circuits → no events delivered. The point is that
// the handler does not throw "Cannot read properties of undefined".
assert.equal(spy.events.length, 0);
});
});