mirror of
https://github.com/koala73/worldmonitor.git
synced 2026-04-25 17:14:57 +02:00
* feat(energy-atlas): GEM pipeline data import — gas 75→297, oil 75→334 (parity-push closure) Closes the ~3.6× pipeline-scale gap that PR #3397's import infrastructure was built for. Per docs/methodology/pipelines.mdx operator runbook. Source releases (CC-BY 4.0, attribution preserved in registry envelope): - GEM-GGIT-Gas-Pipelines-2025-11.xlsx SHA256: f56d8b14400e558f06e53a4205034d3d506fc38c5ae6bf58000252f87b1845e6 URL: https://globalenergymonitor.org/wp-content/uploads/2025/11/GEM-GGIT-Gas-Pipelines-2025-11.xlsx - GEM-GOIT-Oil-NGL-Pipelines-2025-03.xlsx SHA256: d1648d28aed99cfd2264047f1e944ddfccf50ce9feeac7de5db233c601dc3bb2 URL: https://globalenergymonitor.org/wp-content/uploads/2025/03/GEM-GOIT-Oil-NGL-Pipelines-2025-03.xlsx Pre-conversion: GeoJSON (geometry endpoints) + XLSX (column properties) → canonical operator-shape JSON via /tmp/gem-import/convert.py. Filter knobs: - status ∈ {operating, construction} - length ≥ 750 km (gas) / 400 km (oil) — asymmetric per-fuel trunk-class - capacity unit conversions: bcm/y native; MMcf/d, MMSCMD, mtpa, m3/day, bpd, Mb/d, kbd → bcm/y (gas) or bbl/d (oil) at canonical conversion factors. - Country names → ISO 3166-1 alpha-2 via pycountry + alias table. Merge results (via scripts/import-gem-pipelines.mjs --merge): gas: +222 added, 15 duplicates skipped (haversine ≤ 5km AND token Jaccard ≥ 0.6) oil: +259 added, 16 duplicates skipped Final: 297 gas / 334 oil. Hand-curated 75+75 preserved with full evidence; GEM rows ship physicalStateSource='gem', classifierConfidence=0.4, operatorStatement=null, sanctionRefs=[]. Floor bump: scripts/_pipeline-registry.mjs MIN_PIPELINES_PER_REGISTRY 8 → 200. Live counts (297/334) leave ~100 rows of jitter headroom so a partial re-import or coverage-narrowing release fails loud rather than halving the registry silently. Tests: - tests/pipelines-registry.test.mts: bumped synthetic-registry Array.from({length:8}) → length:210 to clear new floor; added 'gem' to the evidence-source whitelist for non-flowing badges (parity with the derivePipelinePublicBadge audit done in PR #3397 U1). - tests/import-gem-pipelines.test.mjs: bumped registry-conformance loop 3 → 70 to clear new floor. - 51/51 pipeline tests pass; tsc --noEmit clean. vs peer reference site (281 gas + 265 oil): we now match (gas 297) and exceed (oil 334). Functional + visual + data parity for the energy variant is closed; remaining gaps are editorial-cadence (weekly briefing) which is intentionally out of scope per the parity-push plan. * docs(energy-atlas): land GEM converter + expand methodology runbook for quarterly refresh PR #3406 imported the data but didn't land the conversion script that produced it. This commit lands the converter at scripts/_gem-geojson-to-canonical.py so future operators can reproduce the import deterministically, and rewrites the docs/methodology/pipelines.mdx runbook to match what actually works: - Use GeoJSON (not XLSX) — the XLSX has properties but no lat/lon columns; only the GIS .zip's GeoJSON has both. The original runbook said to download XLSX which would fail at the lat/lon validation step. - Cadence: quarterly refresh, with concrete signals (peer-site comparison, 90-day calendar reminder). - Source datasets: explicit GGIT (gas) + GOIT (oil/NGL) tracker names so future operators don't re-request the wrong dataset (the Extraction Tracker = wells/fields, NOT pipelines — ours requires the Infrastructure Trackers). - Last-known-good URLs documented + URL pattern explained as fallback when GEM rotates per release. - Filter knob defaults documented inline (gas ≥ 750km, oil ≥ 400km, status ∈ {operating, construction}, capacity unit conversion table). - Failure-mode table mapping common errors to fixes. Converter takes paths via env vars (GEM_GAS_GEOJSON, GEM_OIL_GEOJSON, GEM_DOWNLOADED_AT, GEM_SOURCE_VERSION) instead of hardcoded paths so it works for any release without code edits. * fix(energy-atlas): close PR #3406 review findings — dedup + zero-length + test Three Greptile findings on PR #3406: P1 — Dedup miss (Dampier-Bunbury): Same physical pipeline existed in both registries — curated `dampier-bunbury` and GEM-imported `dampier-to-bunbury-natural-gas-pipeline-au` — because GEM digitized only the southern 60% of the line. The shared Bunbury terminus matched at 13.7 km but the average-endpoint distance was 287 km, just over the 5 km gate. Fix: scripts/_pipeline-dedup.mjs adds a name-set-identity short-circuit — if Jaccard == 1.0 (after stopword removal) AND any of the 4 endpoint pairings is ≤ 25 km, treat as duplicate. The 25 km anchor preserves the existing "name collision in different ocean → still added" contract. Added regression test: identical Dampier-Bunbury inputs → 0 added, 1 skipped, matched against `dampier-bunbury`. P1 — Zero-length geometry (9 rows: Trans-Alaska, Enbridge Line 3, Ichthys, etc.): GEM source GeoJSON occasionally has a Point geometry or single-coord LineString, producing pipelines where startPoint == endPoint. They render as map-point artifacts and skew aggregate-length stats. Fix (defense in depth): - scripts/_gem-geojson-to-canonical.py drops at conversion time (`zero_length` reason in drop log). - scripts/_pipeline-registry.mjs validateRegistry rejects defensively so even a hand-curated row with degenerate geometry fails loud. P2 — Test repetition coupled to fixture row count: Hardcoded `for (let i = 0; i < 70; i++)` × 3 fixture rows = 210 silently breaks if fixture is trimmed below 3. Fix: `Math.ceil(REGISTRY_FLOOR / fixture.length) + 5` derives reps from the floor and current fixture length. Re-run --merge with all fixes applied: gas: 75 → 293 (+218 added, 17 deduped — was 222/15 before; +2 catches via name-set-identity short-circuit; -2 zero-length never imported) oil: 75 → 325 (+250 added, 18 deduped — was 259/16; +2 catches; -7 zero-length) Tests: 74/74 pipeline tests pass; tsc --noEmit clean.
304 lines
11 KiB
TypeScript
304 lines
11 KiB
TypeScript
// @ts-check
|
|
import { strict as assert } from 'node:assert';
|
|
import { test, describe } from 'node:test';
|
|
import { readFileSync } from 'node:fs';
|
|
import { resolve } from 'node:path';
|
|
import { fileURLToPath } from 'node:url';
|
|
import { dirname } from 'node:path';
|
|
import {
|
|
validateRegistry,
|
|
recordCount,
|
|
GAS_CANONICAL_KEY,
|
|
OIL_CANONICAL_KEY,
|
|
VALID_OIL_PRODUCT_CLASSES,
|
|
VALID_SOURCES,
|
|
} from '../scripts/_pipeline-registry.mjs';
|
|
|
|
const __dirname = dirname(fileURLToPath(import.meta.url));
|
|
const gasRaw = readFileSync(resolve(__dirname, '../scripts/data/pipelines-gas.json'), 'utf-8');
|
|
const oilRaw = readFileSync(resolve(__dirname, '../scripts/data/pipelines-oil.json'), 'utf-8');
|
|
const gas = JSON.parse(gasRaw) as { pipelines: Record<string, any> };
|
|
const oil = JSON.parse(oilRaw) as { pipelines: Record<string, any> };
|
|
|
|
describe('pipeline registries — schema', () => {
|
|
test('gas registry passes validateRegistry', () => {
|
|
assert.equal(validateRegistry(gas), true);
|
|
});
|
|
|
|
test('oil registry passes validateRegistry', () => {
|
|
assert.equal(validateRegistry(oil), true);
|
|
});
|
|
|
|
test('canonical keys are stable strings', () => {
|
|
assert.equal(GAS_CANONICAL_KEY, 'energy:pipelines:gas:v1');
|
|
assert.equal(OIL_CANONICAL_KEY, 'energy:pipelines:oil:v1');
|
|
});
|
|
|
|
test('recordCount returns non-zero for both registries', () => {
|
|
assert.ok(recordCount(gas) >= 8);
|
|
assert.ok(recordCount(oil) >= 8);
|
|
});
|
|
});
|
|
|
|
describe('pipeline registries — identity + geometry', () => {
|
|
test('all ids are unique across gas + oil (no collisions)', () => {
|
|
const gasIds = Object.keys(gas.pipelines);
|
|
const oilIds = Object.keys(oil.pipelines);
|
|
const overlap = gasIds.filter(id => oilIds.includes(id));
|
|
assert.equal(overlap.length, 0, `overlapping ids: ${overlap.join(',')}`);
|
|
});
|
|
|
|
test('every pipeline.id matches its object key', () => {
|
|
for (const [key, p] of Object.entries(gas.pipelines)) {
|
|
assert.equal(p.id, key, `gas: ${key} -> id=${p.id}`);
|
|
}
|
|
for (const [key, p] of Object.entries(oil.pipelines)) {
|
|
assert.equal(p.id, key, `oil: ${key} -> id=${p.id}`);
|
|
}
|
|
});
|
|
|
|
test('every country code is ISO 3166-1 alpha-2', () => {
|
|
const iso2 = /^[A-Z]{2}$/;
|
|
const all = [...Object.values(gas.pipelines), ...Object.values(oil.pipelines)];
|
|
for (const p of all) {
|
|
assert.ok(iso2.test(p.fromCountry), `bad fromCountry on ${p.id}: ${p.fromCountry}`);
|
|
assert.ok(iso2.test(p.toCountry), `bad toCountry on ${p.id}: ${p.toCountry}`);
|
|
for (const t of p.transitCountries) {
|
|
assert.ok(iso2.test(t), `bad transitCountry on ${p.id}: ${t}`);
|
|
}
|
|
}
|
|
});
|
|
|
|
test('endpoint coordinates are within Earth bounds', () => {
|
|
const all = [...Object.values(gas.pipelines), ...Object.values(oil.pipelines)];
|
|
for (const p of all) {
|
|
assert.ok(p.startPoint.lat >= -90 && p.startPoint.lat <= 90, `${p.id} startPoint.lat OOB`);
|
|
assert.ok(p.startPoint.lon >= -180 && p.startPoint.lon <= 180, `${p.id} startPoint.lon OOB`);
|
|
assert.ok(p.endPoint.lat >= -90 && p.endPoint.lat <= 90, `${p.id} endPoint.lat OOB`);
|
|
assert.ok(p.endPoint.lon >= -180 && p.endPoint.lon <= 180, `${p.id} endPoint.lon OOB`);
|
|
}
|
|
});
|
|
});
|
|
|
|
describe('pipeline registries — evidence', () => {
|
|
test('non-flowing badges carry at least one evidence source', () => {
|
|
const all = [...Object.values(gas.pipelines), ...Object.values(oil.pipelines)];
|
|
for (const p of all) {
|
|
if (p.evidence.physicalState === 'flowing') continue;
|
|
const hasEvidence =
|
|
p.evidence.operatorStatement != null ||
|
|
p.evidence.sanctionRefs.length > 0 ||
|
|
['ais-relay', 'satellite', 'press', 'gem'].includes(p.evidence.physicalStateSource);
|
|
assert.ok(hasEvidence, `${p.id} has no supporting evidence for state=${p.evidence.physicalState}`);
|
|
}
|
|
});
|
|
|
|
test('classifier confidence is within 0..1', () => {
|
|
const all = [...Object.values(gas.pipelines), ...Object.values(oil.pipelines)];
|
|
for (const p of all) {
|
|
const c = p.evidence.classifierConfidence;
|
|
assert.ok(c >= 0 && c <= 1, `${p.id} bad classifierConfidence: ${c}`);
|
|
}
|
|
});
|
|
|
|
test('sanctionRefs entries carry {authority, date, url}', () => {
|
|
const all = [...Object.values(gas.pipelines), ...Object.values(oil.pipelines)];
|
|
for (const p of all) {
|
|
for (const ref of p.evidence.sanctionRefs) {
|
|
assert.equal(typeof ref.authority, 'string', `${p.id} ref missing authority`);
|
|
assert.equal(typeof ref.date, 'string', `${p.id} ref missing date`);
|
|
assert.equal(typeof ref.url, 'string', `${p.id} ref missing url`);
|
|
assert.ok(ref.url.startsWith('http'), `${p.id} ref url not http(s)`);
|
|
}
|
|
}
|
|
});
|
|
});
|
|
|
|
describe('pipeline registries — commodity-capacity pairing', () => {
|
|
test('gas pipelines have capacityBcmYr (not capacityMbd)', () => {
|
|
for (const p of Object.values(gas.pipelines)) {
|
|
assert.equal(p.commodityType, 'gas', `${p.id} should be commodityType=gas`);
|
|
assert.equal(typeof p.capacityBcmYr, 'number', `${p.id} missing capacityBcmYr`);
|
|
assert.ok(p.capacityBcmYr > 0, `${p.id} capacityBcmYr must be > 0`);
|
|
}
|
|
});
|
|
|
|
test('oil pipelines have capacityMbd (not capacityBcmYr)', () => {
|
|
for (const p of Object.values(oil.pipelines)) {
|
|
assert.equal(p.commodityType, 'oil', `${p.id} should be commodityType=oil`);
|
|
assert.equal(typeof p.capacityMbd, 'number', `${p.id} missing capacityMbd`);
|
|
assert.ok(p.capacityMbd > 0, `${p.id} capacityMbd must be > 0`);
|
|
}
|
|
});
|
|
});
|
|
|
|
describe('pipeline registries — productClass', () => {
|
|
test('every oil pipeline declares a productClass from the enum', () => {
|
|
for (const p of Object.values(oil.pipelines)) {
|
|
assert.ok(
|
|
VALID_OIL_PRODUCT_CLASSES.has(p.productClass),
|
|
`${p.id} has invalid productClass: ${p.productClass}`,
|
|
);
|
|
}
|
|
});
|
|
|
|
test('gas pipelines do not carry a productClass field', () => {
|
|
for (const p of Object.values(gas.pipelines)) {
|
|
assert.equal(
|
|
p.productClass,
|
|
undefined,
|
|
`${p.id} should not have productClass (gas pipelines use commodity as their class)`,
|
|
);
|
|
}
|
|
});
|
|
|
|
test('validateRegistry rejects oil pipeline without productClass', () => {
|
|
const oilSample = oil.pipelines[Object.keys(oil.pipelines)[0]!];
|
|
const { productClass: _drop, ...stripped } = oilSample;
|
|
const bad = {
|
|
pipelines: Object.fromEntries(
|
|
Array.from({ length: 210 }, (_, i) => [`p${i}`, { ...stripped, id: `p${i}` }]),
|
|
),
|
|
};
|
|
assert.equal(validateRegistry(bad), false);
|
|
});
|
|
|
|
test('validateRegistry rejects oil pipeline with unknown productClass', () => {
|
|
const oilSample = oil.pipelines[Object.keys(oil.pipelines)[0]!];
|
|
const bad = {
|
|
pipelines: Object.fromEntries(
|
|
Array.from({ length: 210 }, (_, i) => [
|
|
`p${i}`,
|
|
{ ...oilSample, id: `p${i}`, productClass: 'diesel-only' },
|
|
]),
|
|
),
|
|
};
|
|
assert.equal(validateRegistry(bad), false);
|
|
});
|
|
|
|
test('validateRegistry rejects gas pipeline carrying productClass', () => {
|
|
const gasSample = gas.pipelines[Object.keys(gas.pipelines)[0]!];
|
|
const bad = {
|
|
pipelines: Object.fromEntries(
|
|
Array.from({ length: 210 }, (_, i) => [
|
|
`p${i}`,
|
|
{ ...gasSample, id: `p${i}`, productClass: 'crude' },
|
|
]),
|
|
),
|
|
};
|
|
assert.equal(validateRegistry(bad), false);
|
|
});
|
|
});
|
|
|
|
describe('pipeline registries — validateRegistry rejects bad input', () => {
|
|
test('rejects empty object', () => {
|
|
assert.equal(validateRegistry({}), false);
|
|
});
|
|
|
|
test('rejects null', () => {
|
|
assert.equal(validateRegistry(null), false);
|
|
});
|
|
|
|
test('rejects a pipeline with no evidence', () => {
|
|
const bad = {
|
|
pipelines: Object.fromEntries(
|
|
Array.from({ length: 210 }, (_, i) => [`p${i}`, {
|
|
id: `p${i}`, name: 'x', operator: 'y', commodityType: 'gas',
|
|
fromCountry: 'US', toCountry: 'CA', transitCountries: [],
|
|
capacityBcmYr: 1, startPoint: { lat: 0, lon: 0 }, endPoint: { lat: 1, lon: 1 },
|
|
}])
|
|
),
|
|
};
|
|
assert.equal(validateRegistry(bad), false);
|
|
});
|
|
|
|
test('rejects below MIN_PIPELINES_PER_REGISTRY', () => {
|
|
const bad = { pipelines: { onlyOne: gas.pipelines[Object.keys(gas.pipelines)[0]!] } };
|
|
assert.equal(validateRegistry(bad), false);
|
|
});
|
|
});
|
|
|
|
describe('pipeline registries — GEM source enum', () => {
|
|
test('VALID_SOURCES exported and includes the existing six members plus gem', () => {
|
|
// Same source-of-truth pattern as VALID_OIL_PRODUCT_CLASSES (PR #3383):
|
|
// export the Set so future tests can't drift from the validator.
|
|
assert.ok(VALID_SOURCES.has('operator'));
|
|
assert.ok(VALID_SOURCES.has('regulator'));
|
|
assert.ok(VALID_SOURCES.has('press'));
|
|
assert.ok(VALID_SOURCES.has('satellite'));
|
|
assert.ok(VALID_SOURCES.has('ais-relay'));
|
|
assert.ok(VALID_SOURCES.has('gem'));
|
|
});
|
|
|
|
test('validateRegistry accepts GEM-sourced minimum-viable evidence (state=unknown)', () => {
|
|
// GEM rows ship as state=unknown until classifier promotes them.
|
|
// physicalStateSource='gem' is sufficient evidence per the audit.
|
|
const gasSample = gas.pipelines[Object.keys(gas.pipelines)[0]!];
|
|
const good = {
|
|
pipelines: Object.fromEntries(
|
|
Array.from({ length: 210 }, (_, i) => [`p${i}`, {
|
|
...gasSample,
|
|
id: `p${i}`,
|
|
evidence: {
|
|
physicalState: 'unknown',
|
|
physicalStateSource: 'gem',
|
|
commercialState: 'unknown',
|
|
operatorStatement: null,
|
|
sanctionRefs: [],
|
|
classifierVersion: 'gem-import-v1',
|
|
classifierConfidence: 0.4,
|
|
lastEvidenceUpdate: '2026-04-25T00:00:00Z',
|
|
},
|
|
}])
|
|
),
|
|
};
|
|
assert.equal(validateRegistry(good), true);
|
|
});
|
|
|
|
test('validateRegistry accepts GEM-sourced offline row (state=offline + only source=gem)', () => {
|
|
// Per plan U1 audit: 'gem' is evidence-bearing for non-flowing badges,
|
|
// parity with press/satellite/ais-relay. An offline row with no operator
|
|
// statement and no sanctionRefs but physicalStateSource='gem' should pass
|
|
// validation (the public-badge derivation downstream will then map it
|
|
// to "disputed" via the external-signal rule).
|
|
const gasSample = gas.pipelines[Object.keys(gas.pipelines)[0]!];
|
|
const good = {
|
|
pipelines: Object.fromEntries(
|
|
Array.from({ length: 210 }, (_, i) => [`p${i}`, {
|
|
...gasSample,
|
|
id: `p${i}`,
|
|
evidence: {
|
|
physicalState: 'offline',
|
|
physicalStateSource: 'gem',
|
|
commercialState: 'unknown',
|
|
operatorStatement: null,
|
|
sanctionRefs: [],
|
|
classifierVersion: 'gem-import-v1',
|
|
classifierConfidence: 0.4,
|
|
lastEvidenceUpdate: '2026-04-25T00:00:00Z',
|
|
},
|
|
}])
|
|
),
|
|
};
|
|
assert.equal(validateRegistry(good), true);
|
|
});
|
|
|
|
test('validateRegistry still rejects unknown physicalStateSource values', () => {
|
|
// Adding 'gem' must not loosen the enum — unknown sources still fail.
|
|
const gasSample = gas.pipelines[Object.keys(gas.pipelines)[0]!];
|
|
const bad = {
|
|
pipelines: Object.fromEntries(
|
|
Array.from({ length: 210 }, (_, i) => [`p${i}`, {
|
|
...gasSample,
|
|
id: `p${i}`,
|
|
evidence: {
|
|
...gasSample.evidence,
|
|
physicalStateSource: 'rumor',
|
|
},
|
|
}])
|
|
),
|
|
};
|
|
assert.equal(validateRegistry(bad), false);
|
|
});
|
|
});
|