mirror of
https://github.com/koala73/worldmonitor.git
synced 2026-04-25 17:14:57 +02:00
* feat(energy-atlas): GEM pipeline data import — gas 75→297, oil 75→334 (parity-push closure) Closes the ~3.6× pipeline-scale gap that PR #3397's import infrastructure was built for. Per docs/methodology/pipelines.mdx operator runbook. Source releases (CC-BY 4.0, attribution preserved in registry envelope): - GEM-GGIT-Gas-Pipelines-2025-11.xlsx SHA256: f56d8b14400e558f06e53a4205034d3d506fc38c5ae6bf58000252f87b1845e6 URL: https://globalenergymonitor.org/wp-content/uploads/2025/11/GEM-GGIT-Gas-Pipelines-2025-11.xlsx - GEM-GOIT-Oil-NGL-Pipelines-2025-03.xlsx SHA256: d1648d28aed99cfd2264047f1e944ddfccf50ce9feeac7de5db233c601dc3bb2 URL: https://globalenergymonitor.org/wp-content/uploads/2025/03/GEM-GOIT-Oil-NGL-Pipelines-2025-03.xlsx Pre-conversion: GeoJSON (geometry endpoints) + XLSX (column properties) → canonical operator-shape JSON via /tmp/gem-import/convert.py. Filter knobs: - status ∈ {operating, construction} - length ≥ 750 km (gas) / 400 km (oil) — asymmetric per-fuel trunk-class - capacity unit conversions: bcm/y native; MMcf/d, MMSCMD, mtpa, m3/day, bpd, Mb/d, kbd → bcm/y (gas) or bbl/d (oil) at canonical conversion factors. - Country names → ISO 3166-1 alpha-2 via pycountry + alias table. Merge results (via scripts/import-gem-pipelines.mjs --merge): gas: +222 added, 15 duplicates skipped (haversine ≤ 5km AND token Jaccard ≥ 0.6) oil: +259 added, 16 duplicates skipped Final: 297 gas / 334 oil. Hand-curated 75+75 preserved with full evidence; GEM rows ship physicalStateSource='gem', classifierConfidence=0.4, operatorStatement=null, sanctionRefs=[]. Floor bump: scripts/_pipeline-registry.mjs MIN_PIPELINES_PER_REGISTRY 8 → 200. Live counts (297/334) leave ~100 rows of jitter headroom so a partial re-import or coverage-narrowing release fails loud rather than halving the registry silently. Tests: - tests/pipelines-registry.test.mts: bumped synthetic-registry Array.from({length:8}) → length:210 to clear new floor; added 'gem' to the evidence-source whitelist for non-flowing badges (parity with the derivePipelinePublicBadge audit done in PR #3397 U1). - tests/import-gem-pipelines.test.mjs: bumped registry-conformance loop 3 → 70 to clear new floor. - 51/51 pipeline tests pass; tsc --noEmit clean. vs peer reference site (281 gas + 265 oil): we now match (gas 297) and exceed (oil 334). Functional + visual + data parity for the energy variant is closed; remaining gaps are editorial-cadence (weekly briefing) which is intentionally out of scope per the parity-push plan. * docs(energy-atlas): land GEM converter + expand methodology runbook for quarterly refresh PR #3406 imported the data but didn't land the conversion script that produced it. This commit lands the converter at scripts/_gem-geojson-to-canonical.py so future operators can reproduce the import deterministically, and rewrites the docs/methodology/pipelines.mdx runbook to match what actually works: - Use GeoJSON (not XLSX) — the XLSX has properties but no lat/lon columns; only the GIS .zip's GeoJSON has both. The original runbook said to download XLSX which would fail at the lat/lon validation step. - Cadence: quarterly refresh, with concrete signals (peer-site comparison, 90-day calendar reminder). - Source datasets: explicit GGIT (gas) + GOIT (oil/NGL) tracker names so future operators don't re-request the wrong dataset (the Extraction Tracker = wells/fields, NOT pipelines — ours requires the Infrastructure Trackers). - Last-known-good URLs documented + URL pattern explained as fallback when GEM rotates per release. - Filter knob defaults documented inline (gas ≥ 750km, oil ≥ 400km, status ∈ {operating, construction}, capacity unit conversion table). - Failure-mode table mapping common errors to fixes. Converter takes paths via env vars (GEM_GAS_GEOJSON, GEM_OIL_GEOJSON, GEM_DOWNLOADED_AT, GEM_SOURCE_VERSION) instead of hardcoded paths so it works for any release without code edits. * fix(energy-atlas): close PR #3406 review findings — dedup + zero-length + test Three Greptile findings on PR #3406: P1 — Dedup miss (Dampier-Bunbury): Same physical pipeline existed in both registries — curated `dampier-bunbury` and GEM-imported `dampier-to-bunbury-natural-gas-pipeline-au` — because GEM digitized only the southern 60% of the line. The shared Bunbury terminus matched at 13.7 km but the average-endpoint distance was 287 km, just over the 5 km gate. Fix: scripts/_pipeline-dedup.mjs adds a name-set-identity short-circuit — if Jaccard == 1.0 (after stopword removal) AND any of the 4 endpoint pairings is ≤ 25 km, treat as duplicate. The 25 km anchor preserves the existing "name collision in different ocean → still added" contract. Added regression test: identical Dampier-Bunbury inputs → 0 added, 1 skipped, matched against `dampier-bunbury`. P1 — Zero-length geometry (9 rows: Trans-Alaska, Enbridge Line 3, Ichthys, etc.): GEM source GeoJSON occasionally has a Point geometry or single-coord LineString, producing pipelines where startPoint == endPoint. They render as map-point artifacts and skew aggregate-length stats. Fix (defense in depth): - scripts/_gem-geojson-to-canonical.py drops at conversion time (`zero_length` reason in drop log). - scripts/_pipeline-registry.mjs validateRegistry rejects defensively so even a hand-curated row with degenerate geometry fails loud. P2 — Test repetition coupled to fixture row count: Hardcoded `for (let i = 0; i < 70; i++)` × 3 fixture rows = 210 silently breaks if fixture is trimmed below 3. Fix: `Math.ceil(REGISTRY_FLOOR / fixture.length) + 5` derives reps from the floor and current fixture length. Re-run --merge with all fixes applied: gas: 75 → 293 (+218 added, 17 deduped — was 222/15 before; +2 catches via name-set-identity short-circuit; -2 zero-length never imported) oil: 75 → 325 (+250 added, 18 deduped — was 259/16; +2 catches; -7 zero-length) Tests: 74/74 pipeline tests pass; tsc --noEmit clean.
282 lines
10 KiB
JavaScript
282 lines
10 KiB
JavaScript
// @ts-check
|
|
//
|
|
// Tests for scripts/import-gem-pipelines.mjs — the GEM Oil & Gas Infrastructure
|
|
// Tracker → registry-shape parser. Test-first per the plan's Execution note: the
|
|
// schema-sentinel + status/productClass/capacity-unit mapping is the highest-
|
|
// risk failure mode, so coverage for it lands before the implementation does.
|
|
//
|
|
// Fixture: tests/fixtures/gem-pipelines-sample.json — operator-shape JSON
|
|
// (Excel pre-converted externally; the parser is local-file-only, no xlsx
|
|
// dep, no runtime URL fetch).
|
|
|
|
import { strict as assert } from 'node:assert';
|
|
import { test, describe } from 'node:test';
|
|
import { readFileSync } from 'node:fs';
|
|
import { resolve, dirname } from 'node:path';
|
|
import { fileURLToPath } from 'node:url';
|
|
import { parseGemPipelines, REQUIRED_COLUMNS } from '../scripts/import-gem-pipelines.mjs';
|
|
import { validateRegistry } from '../scripts/_pipeline-registry.mjs';
|
|
|
|
const __dirname = dirname(fileURLToPath(import.meta.url));
|
|
const fixturePath = resolve(__dirname, 'fixtures/gem-pipelines-sample.json');
|
|
const fixture = JSON.parse(readFileSync(fixturePath, 'utf-8'));
|
|
|
|
describe('import-gem-pipelines — schema sentinel', () => {
|
|
test('REQUIRED_COLUMNS is exported and non-empty', () => {
|
|
assert.ok(Array.isArray(REQUIRED_COLUMNS));
|
|
assert.ok(REQUIRED_COLUMNS.length >= 5);
|
|
});
|
|
|
|
test('throws on missing required column', () => {
|
|
const broken = {
|
|
...fixture,
|
|
pipelines: fixture.pipelines.map((p) => {
|
|
const { name: _drop, ...rest } = p;
|
|
return rest;
|
|
}),
|
|
};
|
|
assert.throws(
|
|
() => parseGemPipelines(broken),
|
|
/missing|name|schema/i,
|
|
'parser must throw on column drift, not silently accept',
|
|
);
|
|
});
|
|
|
|
test('throws on non-object input', () => {
|
|
assert.throws(() => parseGemPipelines(null), /input/i);
|
|
assert.throws(() => parseGemPipelines([]), /input|pipelines/i);
|
|
});
|
|
|
|
test('throws when pipelines field is missing', () => {
|
|
assert.throws(() => parseGemPipelines({ source: 'test' }), /pipelines/i);
|
|
});
|
|
});
|
|
|
|
describe('import-gem-pipelines — fuel split', () => {
|
|
test('splits gas + oil into two arrays', () => {
|
|
const { gas, oil } = parseGemPipelines(fixture);
|
|
assert.equal(gas.length, 3, 'fixture has 3 gas rows');
|
|
assert.equal(oil.length, 3, 'fixture has 3 oil rows');
|
|
});
|
|
|
|
test('gas pipelines do NOT carry productClass (gas registry forbids it)', () => {
|
|
const { gas } = parseGemPipelines(fixture);
|
|
for (const p of gas) {
|
|
assert.equal(p.productClass, undefined, `${p.name}: gas should not have productClass`);
|
|
}
|
|
});
|
|
|
|
test('every oil pipeline declares a productClass from the enum', () => {
|
|
const { oil } = parseGemPipelines(fixture);
|
|
for (const p of oil) {
|
|
assert.ok(
|
|
['crude', 'products', 'mixed'].includes(p.productClass),
|
|
`${p.name} has invalid productClass: ${p.productClass}`,
|
|
);
|
|
}
|
|
});
|
|
});
|
|
|
|
describe('import-gem-pipelines — status mapping', () => {
|
|
test("'Operating' maps to physicalState='flowing'", () => {
|
|
const { gas, oil } = parseGemPipelines(fixture);
|
|
const op = [...gas, ...oil].filter((p) => p.name.includes('Operating'));
|
|
assert.ok(op.length > 0);
|
|
for (const p of op) {
|
|
assert.equal(p.evidence.physicalState, 'flowing');
|
|
}
|
|
});
|
|
|
|
test("'Construction' maps to physicalState='unknown' (planned/not commissioned)", () => {
|
|
const { gas } = parseGemPipelines(fixture);
|
|
const ctr = gas.find((p) => p.name.includes('Construction'));
|
|
assert.ok(ctr);
|
|
assert.equal(ctr.evidence.physicalState, 'unknown');
|
|
});
|
|
|
|
test("'Cancelled' / 'Mothballed' map to physicalState='offline'", () => {
|
|
const { gas, oil } = parseGemPipelines(fixture);
|
|
const cancelled = gas.find((p) => p.name.includes('Cancelled'));
|
|
const mothballed = oil.find((p) => p.name.includes('Mothballed'));
|
|
assert.ok(cancelled);
|
|
assert.ok(mothballed);
|
|
assert.equal(cancelled.evidence.physicalState, 'offline');
|
|
assert.equal(mothballed.evidence.physicalState, 'offline');
|
|
});
|
|
});
|
|
|
|
describe('import-gem-pipelines — productClass mapping', () => {
|
|
test("'Crude Oil' product → productClass='crude'", () => {
|
|
const { oil } = parseGemPipelines(fixture);
|
|
const crude = oil.find((p) => p.name.includes('Crude Oil Trunk'));
|
|
assert.ok(crude);
|
|
assert.equal(crude.productClass, 'crude');
|
|
});
|
|
|
|
test("'Refined Products' product → productClass='products'", () => {
|
|
const { oil } = parseGemPipelines(fixture);
|
|
const refined = oil.find((p) => p.name.includes('Refined Products'));
|
|
assert.ok(refined);
|
|
assert.equal(refined.productClass, 'products');
|
|
});
|
|
});
|
|
|
|
describe('import-gem-pipelines — capacity-unit conversion', () => {
|
|
test('gas capacity in bcm/y is preserved unchanged', () => {
|
|
const { gas } = parseGemPipelines(fixture);
|
|
const opGas = gas.find((p) => p.name.includes('Operating'));
|
|
assert.ok(opGas);
|
|
assert.equal(opGas.capacityBcmYr, 24);
|
|
});
|
|
|
|
test('oil capacity in bbl/d is converted to Mbd (thousand barrels per day)', () => {
|
|
const { oil } = parseGemPipelines(fixture);
|
|
const crude = oil.find((p) => p.name.includes('Crude Oil Trunk'));
|
|
assert.ok(crude);
|
|
// Schema convention: the field is named `capacityMbd` (the customary
|
|
// industry abbreviation) but the VALUE is in millions of barrels per
|
|
// day, NOT thousands — matching the existing on-main hand-curated rows
|
|
// (e.g. CPC pipeline ships as `capacityMbd: 1.4` for 1.4M bbl/d).
|
|
// So 400_000 bbl/d ÷ 1_000_000 = 0.4 capacityMbd.
|
|
assert.equal(crude.capacityMbd, 0.4);
|
|
});
|
|
|
|
test('oil capacity already in Mbd is preserved unchanged', () => {
|
|
const { oil } = parseGemPipelines(fixture);
|
|
const refined = oil.find((p) => p.name.includes('Refined Products'));
|
|
assert.ok(refined);
|
|
assert.equal(refined.capacityMbd, 0.65);
|
|
});
|
|
});
|
|
|
|
describe('import-gem-pipelines — minimum-viable evidence', () => {
|
|
test('every emitted candidate has physicalStateSource=gem', () => {
|
|
const { gas, oil } = parseGemPipelines(fixture);
|
|
for (const p of [...gas, ...oil]) {
|
|
assert.equal(p.evidence.physicalStateSource, 'gem');
|
|
}
|
|
});
|
|
|
|
test('every emitted candidate has classifierVersion=gem-import-v1', () => {
|
|
const { gas, oil } = parseGemPipelines(fixture);
|
|
for (const p of [...gas, ...oil]) {
|
|
assert.equal(p.evidence.classifierVersion, 'gem-import-v1');
|
|
}
|
|
});
|
|
|
|
test('every emitted candidate has classifierConfidence ≤ 0.5', () => {
|
|
const { gas, oil } = parseGemPipelines(fixture);
|
|
for (const p of [...gas, ...oil]) {
|
|
assert.ok(p.evidence.classifierConfidence <= 0.5);
|
|
assert.ok(p.evidence.classifierConfidence >= 0);
|
|
}
|
|
});
|
|
|
|
test('every emitted candidate has empty sanctionRefs and null operatorStatement', () => {
|
|
const { gas, oil } = parseGemPipelines(fixture);
|
|
for (const p of [...gas, ...oil]) {
|
|
assert.deepEqual(p.evidence.sanctionRefs, []);
|
|
assert.equal(p.evidence.operatorStatement, null);
|
|
}
|
|
});
|
|
});
|
|
|
|
describe('import-gem-pipelines — registry-shape conformance', () => {
|
|
// Compute the repeat count from the floor + the fixture row count so this
|
|
// test stays correct if the fixture is trimmed or the floor is raised. The
|
|
// hardcoded `for (let i = 0; i < 70; i++)` was fragile — Greptile P2 on PR
|
|
// #3406. +5 over the floor leaves a safety margin without inflating the test.
|
|
const REGISTRY_FLOOR = 200;
|
|
|
|
test('emitted gas registry passes validateRegistry', () => {
|
|
const { gas } = parseGemPipelines(fixture);
|
|
const reps = Math.ceil(REGISTRY_FLOOR / gas.length) + 5;
|
|
const repeated = [];
|
|
for (let i = 0; i < reps; i++) {
|
|
for (const p of gas) repeated.push({ ...p, id: `${p.id}-rep${i}` });
|
|
}
|
|
const reg = {
|
|
pipelines: Object.fromEntries(repeated.map((p) => [p.id, p])),
|
|
};
|
|
assert.equal(validateRegistry(reg), true);
|
|
});
|
|
|
|
test('emitted oil registry passes validateRegistry', () => {
|
|
const { oil } = parseGemPipelines(fixture);
|
|
const reps = Math.ceil(REGISTRY_FLOOR / oil.length) + 5;
|
|
const repeated = [];
|
|
for (let i = 0; i < reps; i++) {
|
|
for (const p of oil) repeated.push({ ...p, id: `${p.id}-rep${i}` });
|
|
}
|
|
const reg = {
|
|
pipelines: Object.fromEntries(repeated.map((p) => [p.id, p])),
|
|
};
|
|
assert.equal(validateRegistry(reg), true);
|
|
});
|
|
});
|
|
|
|
describe('import-gem-pipelines — determinism (review-fix #3)', () => {
|
|
test('two parser runs on identical input produce identical output', () => {
|
|
// Regression: pre-fix, lastEvidenceUpdate used new Date() per run, so
|
|
// re-running parseGemPipelines on the same JSON on different days
|
|
// produced different output → noisy diffs every quarterly re-import.
|
|
// Now derived from envelope.downloadedAt, so output is byte-identical.
|
|
const r1 = JSON.stringify(parseGemPipelines(fixture));
|
|
const r2 = JSON.stringify(parseGemPipelines(fixture));
|
|
assert.equal(r1, r2);
|
|
});
|
|
|
|
test('lastEvidenceUpdate derives from envelope.downloadedAt', () => {
|
|
// Fixture has downloadedAt: 2026-04-25 → emitted as 2026-04-25T00:00:00Z.
|
|
const { gas } = parseGemPipelines(fixture);
|
|
for (const p of gas) {
|
|
assert.equal(p.evidence.lastEvidenceUpdate, '2026-04-25T00:00:00Z');
|
|
}
|
|
});
|
|
|
|
test('missing downloadedAt → epoch sentinel (loud failure, not silent today)', () => {
|
|
// If the operator forgets the date field, the emitted timestamp should
|
|
// be obviously wrong rather than today's wall clock — surfaces the
|
|
// gap in code review of the data file.
|
|
const noDate = { ...fixture };
|
|
delete noDate.downloadedAt;
|
|
delete noDate.sourceVersion;
|
|
const { gas } = parseGemPipelines(noDate);
|
|
for (const p of gas) {
|
|
assert.equal(p.evidence.lastEvidenceUpdate, '1970-01-01T00:00:00Z');
|
|
}
|
|
});
|
|
});
|
|
|
|
describe('import-gem-pipelines — coordinate validity', () => {
|
|
test('rows with invalid lat/lon are dropped (not silently kept with lat=0)', () => {
|
|
const broken = {
|
|
...fixture,
|
|
pipelines: [
|
|
...fixture.pipelines,
|
|
{
|
|
name: 'Test Bad Coords',
|
|
operator: 'X',
|
|
fuel: 'Natural Gas',
|
|
product: '',
|
|
fromCountry: 'XX',
|
|
toCountry: 'YY',
|
|
transitCountries: [],
|
|
capacity: 5,
|
|
capacityUnit: 'bcm/y',
|
|
lengthKm: 100,
|
|
status: 'Operating',
|
|
startYear: 2020,
|
|
startLat: 200, // out of range
|
|
startLon: 0,
|
|
endLat: 0,
|
|
endLon: 0,
|
|
},
|
|
],
|
|
};
|
|
const { gas } = parseGemPipelines(broken);
|
|
const bad = gas.find((p) => p.name.includes('Bad Coords'));
|
|
assert.equal(bad, undefined, 'row with out-of-range lat must be dropped, not coerced');
|
|
});
|
|
});
|