mirror of
https://github.com/koala73/worldmonitor.git
synced 2026-04-25 17:14:57 +02:00
* feat(seed-contract): PR 1 foundation — envelope helpers + contract validators + static conformance test
Adds the foundational pieces for the unified seed contract rollout described in
docs/plans/2026-04-14-002-fix-runseed-zero-record-lockout-plan.md. Behavior-
preserving by construction: legacy-shape Redis values unwrap as { _seed: null,
data: raw } and pass through every helper unchanged.
New files:
- scripts/_seed-envelope-source.mjs — single source of truth for unwrapEnvelope,
stripSeedEnvelope, buildEnvelope.
- api/_seed-envelope.js — edge-safe mirror (AGENTS.md:80 forbids api/* importing
from server/).
- server/_shared/seed-envelope.ts — TS mirror with SeedMeta, SeedEnvelope,
UnwrapResult types.
- scripts/_seed-contract.mjs — SeedContractError + validateDescriptor (10
required fields, 10 optional, unknown-field rejection) + resolveRecordCount
(non-negative integer or throw).
- scripts/verify-seed-envelope-parity.mjs — diffs function bodies between the
two JS copies; TS copy guarded by tsc.
- tests/seed-envelope.test.mjs — 14 tests for the three helpers (null,
legacy-passthrough, stringified JSON, round-trip).
- tests/seed-contract.test.mjs — 25 tests for validateDescriptor/
resolveRecordCount + a soft-warn conformance scan that STATICALLY parses
scripts/seed-*.mjs (never dynamic import — several seeders process.exit() at
module load). Currently logs 91 seeders awaiting declareRecords migration.
Wiring (minimal, behavior-preserving):
- api/health.js: imports unwrapEnvelope; routes readSeedMeta's parsed value
through it. Legacy meta has no _seed wrapper → passes through unchanged.
- scripts/_bundle-runner.mjs: readSectionFreshness prefers envelope at
section.canonicalKey when present, falls back to the existing
seed-meta:<key> read via section.seedMetaKey (unchanged path today since no
bundle defines canonicalKey yet).
No seeder modified. No writes changed. All 5279 existing data tests still
green; both typechecks clean; parity verifier green; 39 new tests pass.
PR 2 will migrate seeders, bundles, and readers to envelope semantics. PR 3
removes the legacy path and hard-fails the conformance test.
* fix(seed-contract): address PR #3095 review — metaTtlSeconds opt, bundle fallback, strict conformance mode
Review findings applied:
P1 — metaTtlSeconds missing from OPTIONAL_FIELDS whitelist.
scripts/seed-jodi-gas.mjs:250 passes metaTtlSeconds to runSeed(); field is
consumed by _seed-utils writeSeedMeta. Without it in the whitelist, PR 2's
validateDescriptor wiring would throw 'unknown field' the moment jodi-gas
migrates. Added with a 'removed in PR 3' note.
P2 — Bundle canonicalKey short-circuit over-runs during migration.
readSectionFreshness previously returned null if canonicalKey had no envelope
yet, even when a legacy seed-meta key was also declared — making every cron
re-run the section. Fixed to fall through to seedMetaKey on null envelope so
the transition state is safe.
P3 — Conformance soft-warn signal was invisible in CI.
tests/seed-contract.test.mjs now emits a t.diagnostic summary line
('N/M seeders export declareRecords') visible on every run and gates hard-fail
behind SEED_CONTRACT_STRICT=1 so PR 3 can flip to strict without more code.
Nitpick — parity regex missed 'export async function'.
Added '(?:async\s+)?' to scripts/verify-seed-envelope-parity.mjs function
extraction regex.
Verified: 39 tests green, parity verifier green, strict mode correctly
hard-fails with 91 seeders missing (expected during PR 1).
* fix(seed-contract): address review round 2 — NaN/empty-string validation, Error cause, parity CI wiring
P2 — Non-finite ttlSeconds/maxStaleMin bypassed validation.
`typeof NaN === 'number'` and `NaN > 0 === false` meant a NaN duration
passed the old typeof+<=0 checks and would have poisoned TTLs once
validateDescriptor is wired into runSeed. Now gated by Number.isFinite,
which rejects NaN and ±Infinity. Tests added for NaN/Infinity on both
fields.
P2 — Empty/whitespace-only strings for domain/resource/canonicalKey/sourceVersion
bypassed validation. Added .trim() === '' rejection + tests per field.
This mattered because canonicalKey='' would have landed writes at the
empty key and seed-meta under a blank resource namespace.
P3 — SeedContractError silently dropped the Error v2 cause option.
Constructor now forwards { cause } through super() so err.cause works
with standard tooling (Node's default stack printer, Sentry chained-cause
serialization). resolveRecordCount's manual err.cause = err assignment
was replaced with the options-bag form. Test added for both constructor
direct-use and the resolveRecordCount wrap path.
P3 — Parity verifier was not on an automated path. Added
tests/seed-envelope-parity.test.mjs which spawns scripts/verify-seed-envelope-parity.mjs
via execFile; non-zero exit (drift) → test fails. Now runs as part of
`npm run test:data` (tsx --test tests/*.test.mjs). Drift injection
confirmed: sed -i modifying api/_seed-envelope.js makes the test fail
with 'Command failed' from execFile.
51 tests total (was 39). All green on clean tree.
* fix(seed-contract): conformance test checks full descriptor, not just declareRecords
Previous conformance check green-lit any seeder that exported
declareRecords, even if the runSeed(...) call-site omitted other
validateDescriptor-required opts (validateFn, ttlSeconds, sourceVersion,
schemaVersion, maxStaleMin). That would have produced a false readiness
signal for PR 3's strict flip: test goes green, but wiring
validateDescriptor() into runSeed in PR 2 would still throw at runtime
across the fleet.
Examples verified on the PR head:
- scripts/seed-cot.mjs:188-192 — no sourceVersion/schemaVersion/maxStaleMin
- scripts/seed-market-breadth.mjs:121-124 — same
- scripts/seed-jodi-gas.mjs:248-253 — no schemaVersion/maxStaleMin
Now the conformance test:
1. AST-lite extracts the runSeed(...) call site with balanced parens,
tolerating strings and comments.
2. Checks every REQUIRED_OPTS_FIELDS entry (validateFn, declareRecords,
ttlSeconds, sourceVersion, schemaVersion, maxStaleMin) is present as
an object key in that call-site.
3. Emits a per-file diagnostic listing missing fields.
4. Migration signal is now accurate: 0/91 seeders fully satisfy the
descriptor (was claiming 0/91 missing just declareRecords). Matches
the underlying validateDescriptor behavior.
Verified: strict mode (SEED_CONTRACT_STRICT=1) surfaces 'opt:schemaVersion,
opt:maxStaleMin' as missing fields per seeder — actionable for PR 2
migration work. 51 tests total (unchanged count; behavior change is in
which seeders the one conformance test considers migrated).
* fix(seed-contract): strip comments/strings before parsing runSeed() call site
The conformance scanner located the first 'runSeed(' substring in the raw
source, which caught commented-out mentions upstream of the real call.
Offending files where this produced false 'incomplete' diagnoses:
- scripts/seed-bis-data.mjs:209 // runSeed() calls process.exit(0)…
real call at :220
- scripts/seed-economy.mjs:788 header comment mentioning runSeed()
real call at :891
Three files had the same pattern. Under strict mode these would have been
false hard failures in PR 3 even when the real descriptor was migrated.
Fix:
- stripCommentsAndStrings(src) produces a view where block comments, line
comments, and string/template literals are replaced with spaces (line
feeds preserved). Indices stay aligned with the original source so
extractRunSeedCall can match against the stripped view and then slice
the original source for the real call body.
- descriptorFieldsPresent() also runs its field-presence regex against
the stripped call body so '// TODO: validateFn' inside the call doesn't
fool the check.
- hasRunSeedCall() uses the stripped view too, which correctly excludes
5 seeders that only mentioned runSeed in comments. Count dropped
91→86 real callers.
Added 4 targeted tests covering:
- runSeed() inside a line comment ahead of the real call
- runSeed() inside a block comment
- runSeed() inside a string literal ("don't call runSeed() directly")
- descriptor field names inside an inline comment don't count as present
Verified on the actual files: seed-bis-data.mjs first real runSeed( in
stripped source is at line 220 (was line 209 before fix).
40 tests total, all green.
* fix(seed-contract): parity verifier survives unbalanced braces in string/template literals
Addresses Greptile P2 on PR #3095: the body extractor in
scripts/verify-seed-envelope-parity.mjs counted raw { and } on every
character. A future helper body that legitimately contains
`const marker = '{'` would have pushed depth past zero at the literal
brace and truncated the body — silently masking drift in the rest of
the function.
Extracted the scan into scanBalanced(source, start, open, close) which
skips characters inside line comments, block comments, and string /
template literals (with escape handling and template-literal ${} recursion
for interpolation). Call sites in extractFunctions updated to use the new
scanner for both the arg-list parens and the function body braces.
Made extractFunctions and scanBalanced exported so the new test file
can exercise them directly. Gated main() behind an isMain check so
importing the module from tests doesn't trigger process.exit.
New tests in tests/seed-envelope-parity.test.mjs:
- extractFunctions tolerates unbalanced braces in string literals
- same for template literals
- same for braces inside block comments
- same for braces inside line comments
- scanBalanced respects backslash-escapes inside strings
- scanBalanced recurses into template-literal ${} interpolation
Also addresses the other two Greptile P2s which were already fixed in
earlier commits on this branch:
- Empty-string gap (99646dd9a): .trim()==='' rejection added
- SeedContractError cause drop (99646dd9a): constructor forwards cause
through super's options bag per Error v2 spec
61 tests green. Both typechecks clean.
188 lines
7.2 KiB
JavaScript
188 lines
7.2 KiB
JavaScript
#!/usr/bin/env node
|
|
// Verify that the three seed-envelope helper files stay in sync.
|
|
//
|
|
// The source of truth is scripts/_seed-envelope-source.mjs. Two mirrored copies
|
|
// live at:
|
|
// - api/_seed-envelope.js (edge-safe, for api/*.js)
|
|
// - server/_shared/seed-envelope.ts (TypeScript, for server/ and scripts/)
|
|
//
|
|
// The TypeScript copy carries additional type declarations, so the check is
|
|
// function-by-function: every function exported from the source must appear in
|
|
// both copies with identical runtime body (after normalizing TS annotations).
|
|
//
|
|
// Exit 1 with a diff on drift.
|
|
|
|
import { readFile } from 'node:fs/promises';
|
|
import { fileURLToPath } from 'node:url';
|
|
import { dirname, resolve } from 'node:path';
|
|
|
|
const here = dirname(fileURLToPath(import.meta.url));
|
|
const repoRoot = resolve(here, '..');
|
|
|
|
// Parity scope.
|
|
//
|
|
// Source of truth: scripts/_seed-envelope-source.mjs (plain JS, hand-authored).
|
|
// Must-match copy: api/_seed-envelope.js (plain JS, hand-authored).
|
|
//
|
|
// The TypeScript copy at server/_shared/seed-envelope.ts is type-checked by
|
|
// `tsc` and reviewed manually. It is NOT diffed here because TS-specific casts
|
|
// (`as any`, `as SeedMeta`, etc.) can't be stripped without introducing their
|
|
// own bug class. The drift risk on the TS file is mitigated by (a) this header
|
|
// comment in that file forbidding direct edits, (b) the typecheck guard, and
|
|
// (c) code review. If we ever need stricter enforcement, a separate AST-aware
|
|
// comparator can run over the TS file.
|
|
const SOURCE = resolve(repoRoot, 'scripts/_seed-envelope-source.mjs');
|
|
const EDGE = resolve(repoRoot, 'api/_seed-envelope.js');
|
|
|
|
/**
|
|
* Extract bare function bodies from a source file, keyed by name.
|
|
* Returns a Map<name, body> where body is the function's implementation with
|
|
* TypeScript type annotations stripped and whitespace normalized.
|
|
*
|
|
* Exported so tests can exercise brace/string edge cases directly.
|
|
*/
|
|
export function extractFunctions(source) {
|
|
const fns = new Map();
|
|
// Match: export function NAME<generics?>(args): returnType? { body }
|
|
// We capture NAME and the brace-balanced body.
|
|
const pattern = /export\s+(?:async\s+)?function\s+(\w+)\s*(?:<[^>]+>)?\s*\(/g;
|
|
let match;
|
|
while ((match = pattern.exec(source)) != null) {
|
|
const name = match[1];
|
|
const afterParen = match.index + match[0].length;
|
|
// Find matching close paren for args
|
|
// Balance the arg-list parens, skipping string / template / comment bodies.
|
|
// scanBalanced expects `start` to point at (or before) the opening
|
|
// delimiter; `afterParen` is one past it, so step back.
|
|
let i = scanBalanced(source, afterParen - 1, '(', ')');
|
|
// Skip to opening { (may cross return-type annotations that contain `:`).
|
|
while (i < source.length && source[i] !== '{') i++;
|
|
if (i >= source.length) continue;
|
|
// Balance the function body's braces using the same string/comment-aware
|
|
// scanner. Raw `{` inside a string literal like `const marker = '{'` used
|
|
// to drop `depth` past zero and either truncate or overrun the body.
|
|
const bodyStart = i + 1;
|
|
// `i` points at the opening `{`, which is exactly what scanBalanced wants.
|
|
i = scanBalanced(source, i, '{', '}');
|
|
const bodyEnd = i - 1;
|
|
const body = source.slice(bodyStart, bodyEnd);
|
|
// Bodies must be VERBATIM identical across the three files (parity rule).
|
|
// Type annotations are only permitted OUTSIDE function bodies — signatures,
|
|
// top-level interfaces, etc. We compare normalized (whitespace/comments
|
|
// collapsed) bodies but never strip characters from inside them.
|
|
fns.set(name, normalize(body));
|
|
}
|
|
return fns;
|
|
}
|
|
|
|
/**
|
|
* Scan from `start` (which must point AT or just before the opening delimiter),
|
|
* balancing `open`/`close` while skipping characters inside line comments,
|
|
* block comments, and string / template literals. Returns the index one past
|
|
* the matching close delimiter. If input is malformed we return `source.length`
|
|
* so the caller still produces a (truncated) body rather than an infinite loop.
|
|
*/
|
|
export function scanBalanced(source, start, open, close) {
|
|
let i = start;
|
|
// Align `i` to the opening delimiter if it isn't already.
|
|
while (i < source.length && source[i] !== open) i++;
|
|
if (i >= source.length) return source.length;
|
|
let depth = 1;
|
|
i++;
|
|
while (i < source.length && depth > 0) {
|
|
const ch = source[i];
|
|
const next = source[i + 1];
|
|
if (ch === '/' && next === '/') {
|
|
const nl = source.indexOf('\n', i);
|
|
i = nl < 0 ? source.length : nl;
|
|
continue;
|
|
}
|
|
if (ch === '/' && next === '*') {
|
|
const c = source.indexOf('*/', i + 2);
|
|
i = c < 0 ? source.length : c + 2;
|
|
continue;
|
|
}
|
|
if (ch === '"' || ch === "'" || ch === '`') {
|
|
let j = i + 1;
|
|
while (j < source.length && source[j] !== ch) {
|
|
if (source[j] === '\\' && j + 1 < source.length) { j += 2; continue; }
|
|
// Template-literal interpolation `${ ... }` — recurse to skip matched
|
|
// braces inside the interpolation so an expression like `${{a:1}}`
|
|
// doesn't leak a stray `}` into our outer body balance.
|
|
if (ch === '`' && source[j] === '$' && source[j + 1] === '{') {
|
|
j = scanBalanced(source, j + 1, '{', '}');
|
|
continue;
|
|
}
|
|
j++;
|
|
}
|
|
i = j + 1;
|
|
continue;
|
|
}
|
|
if (ch === open) depth++;
|
|
else if (ch === close) depth--;
|
|
i++;
|
|
}
|
|
return i;
|
|
}
|
|
|
|
function normalize(s) {
|
|
return s
|
|
.replace(/\/\/[^\n]*/g, '')
|
|
.replace(/\/\*[\s\S]*?\*\//g, '')
|
|
.replace(/\s+/g, ' ')
|
|
.trim();
|
|
}
|
|
|
|
const EXPECTED_EXPORTS = ['unwrapEnvelope', 'stripSeedEnvelope', 'buildEnvelope'];
|
|
|
|
async function main() {
|
|
const [sourceSrc, edgeSrc] = await Promise.all([
|
|
readFile(SOURCE, 'utf8'),
|
|
readFile(EDGE, 'utf8'),
|
|
]);
|
|
|
|
const sourceFns = extractFunctions(sourceSrc);
|
|
const edgeFns = extractFunctions(edgeSrc);
|
|
|
|
const errors = [];
|
|
|
|
for (const name of EXPECTED_EXPORTS) {
|
|
if (!sourceFns.has(name)) errors.push(`source missing export: ${name}`);
|
|
if (!edgeFns.has(name)) errors.push(`api/_seed-envelope.js missing export: ${name}`);
|
|
}
|
|
|
|
if (errors.length) {
|
|
console.error('Missing exports:');
|
|
for (const e of errors) console.error(` ${e}`);
|
|
process.exit(1);
|
|
}
|
|
|
|
for (const name of EXPECTED_EXPORTS) {
|
|
const src = sourceFns.get(name);
|
|
const edge = edgeFns.get(name);
|
|
if (src !== edge) {
|
|
errors.push(`drift: api/_seed-envelope.js::${name} differs from source.\n source: ${src}\n edge: ${edge}`);
|
|
}
|
|
}
|
|
|
|
if (errors.length) {
|
|
console.error('Seed-envelope parity check FAILED:');
|
|
for (const e of errors) console.error(`\n ${e}`);
|
|
process.exit(1);
|
|
}
|
|
|
|
console.log('seed-envelope parity: OK (3 exports verified across source + edge). TS mirror checked by tsc.');
|
|
}
|
|
|
|
// isMain guard — only run the verifier when invoked directly as a CLI. Tests
|
|
// import this module to exercise extractFunctions/scanBalanced in isolation,
|
|
// and running main() on import would trigger process.exit from the test
|
|
// process.
|
|
const isMain = process.argv[1] && import.meta.url.endsWith(process.argv[1].replace(/^file:\/\//, ''));
|
|
if (isMain) {
|
|
main().catch((err) => {
|
|
console.error('verify-seed-envelope-parity: unexpected error', err);
|
|
process.exit(1);
|
|
});
|
|
}
|