docs(resilience): PR 5.1 — sanctions construct audit (designated-party domicile question) (#3375)

* docs(resilience): PR 5.1 — sanctions construct audit (designated-party domicile question) PR 5.1 of cohort-audit plan 2026-04-24-002. Stacked on PR 5.3 (#3374) so the known-limitations.md section append is additive. Read-only static audit of scoreTradeSanctions + the sanctions:country-counts:v1 seed — framed around the Codex-reformulated construct question: should designated-party domicile count penalize resilience? Findings 1. The count is "OFAC-designated-party domicile locations," NOT "sanctions against this country." Seeder (`scripts/seed-sanctions- pressure.mjs:85-93`) parses OFAC Advanced XML SDN + Consolidated, extracts each designated party's Locations, and increments `map[countryCode]` by 1 for every location country on that party. 2. The count conflates three semantically distinct categories a resilience construct might treat differently: (a) Country-level sanction target (NK SDN listings) — correct penalty (b) Domiciled sanctioned entity (RU bank in Moscow, post-2022) — debatable, country hosts the actor (c) Transit / shell entity (UAE trading co listed under SDGT for Iran evasion; CY SPV for a Russian oligarch) — country is NOT the target, but takes the penalty 3. Observed GCC cohort impact: AE scores 54 vs KW/QA 82. The −28 gap is almost entirely driven by category (c) listings — AE is a financial hub where sanctioned parties incorporate shells. 4. Three options documented for the construct decision (NOT decided in this PR): - Option 1: Keep flat count (status quo, defensible via secondary- sanctions / FATF argument) - Option 2: Program-weighted count — weight DPRK/IRAN/SYRIA/etc. at 1.0, SDGT/SDNTK/CYBER/etc. at 0.3-0.5. Recommended; seeder already captures `programs` per entry — data is there, scorer just doesn't read it. - Option 3: Transit-hub exclusion list (AE, SG, HK, CY, VG, KY) — brittle + normative, not recommended 5. Recommendation documented: Option 2. Implementation deferred to a separate methodology-decision PR (outside auto-mode authority). Shipped - `docs/methodology/known-limitations.md` — new section extending the file: "tradeSanctions — designated-party domicile construct question." Covers what the count represents, the three categories with examples, observed GCC impact, three options w/ trade-offs, recommendation, follow-up audit list (entity-sample gated on API-key access), and file references. - `tests/resilience-sanctions-field-mapping.test.mts` (new) — 10 regression-guard tests pinning CURRENT behavior: 1-6. normalizeSanctionCount piecewise anchors: count=0→100, 1→90, 10→75, 50→50, 200→25, 500→≤1 7. Monotonicity: strictly decreasing across the ramp 8. Country absent from map defaults to count=0 → score 100 (intentional "no designated parties here" semantics) 9. Seed outage (raw=null) → null score slot, NOT imputed (protects against silent data-outage scoring) 10. Construct anchor: count=1 is exactly 10 points below count=0 (pins the "first listing drops 10" design choice) Verified - `npx tsx --test tests/resilience-sanctions-field-mapping.test.mts` — 10 pass / 0 fail - `npm run test:data` — 6721 pass / 0 fail - `npm run typecheck` / `typecheck:api` — green - `npm run lint` / `lint:md` — clean * fix(resilience): PR 5.1 review — tighten count=500 assertion; clarify weightedBlend weights Addresses 2 P2 Greptile findings on #3375: 1. Tighten count=500 assertion. Was `<= 1` with a comment stating the exact value is 0. That loose bound silently tolerates roundScore / boundary drift that would be the very signal this regression guard exists to catch. Changed to strict equality `=== 0`. 2. Clarify the "zero weight" comment on the sanctions-only harness. The other slots DO contribute their declared weights (0.15 + 0.15 + 0.25 = 0.55) to weightedBlend's `totalWeight` denominator — only `availableWeight` (the score-computation denominator) drops to 0.45 because their score is null. The previous comment elided this distinction and could mislead a reader into thinking the null slots contributed nothing at all. Expanded to state exactly how `coverage` and `score` each behave. Verified - `npx tsx --test tests/resilience-sanctions-field-mapping.test.mts` — 10 pass / 0 fail (count=500 now pins the exact 0 floor)
2026-04-25 17:14:57 +02:00 · 2026-04-24 18:30:59 +04:00
parent a97ba83833
commit b4198a52c3
2 changed files with 294 additions and 0 deletions
--- a/docs/methodology/known-limitations.md
+++ b/docs/methodology/known-limitations.md
@@ -267,3 +267,148 @@ verifying.
  §PR 5.3
 - Test regression guards:
  `tests/resilience-foodwater-field-mapping.test.mts`
 ---
 ## tradeSanctions — "designated-party domicile" construct question (scoreTradeSanctions)
 **Dimension.** `tradeSanctions` (`scoreTradeSanctions`, weight 0.45
 of the blend for the sanctions sub-component; 1.0 for the dim in
 the `economic` domain).
 **Source.** `sanctions:country-counts:v1`, a flat `ISO2 → count`
 map written by `scripts/seed-sanctions-pressure.mjs`. The seeder
 parses OFAC's Advanced XML (SDN + Consolidated lists), extracts
 each designated party's `Locations`, and increments
 `map[countryCode]` by 1 for every country listed in that
 party's locations.
 ### What the count ACTUALLY represents
 The count is **"how many OFAC-designated parties list this
 country as a location"** — not "how many sanctions this country
 is under." A single designated entity's primary country gets +1;
 a shell that's domiciled in country X but operates via country Y
 will typically list both and increment both counts.
 Consequence: the count conflates three semantically distinct
 categories that a resilience construct might want to treat
 differently.
 | Category | Example | Current scorer impact | Construct question |
 |---|---|---|---|
 | (a) **Country-level sanction target** | North Korea SDN listings | +1 per designated entity/person inside the sanctioned state | Penalizing the state is the INTENDED signal — resilience is genuinely degraded by comprehensive sanctions |
 | (b) **Domiciled sanctioned entity** | Russian bank HQ'd in Moscow, designated post-2022 invasion | +1 per listing | The country's resilience is indirectly penalized for hosting the sanctioned actor — debatable |
 | (c) **Transit / shell entity listing** | UAE-based trading company designated under SDGT for Iran oil smuggling; Cyprus-registered SPV facilitating a Russian oligarch's asset transfer | +1 per listing even when the country itself is NOT the sanctions target | The country is penalized because it's a financial hub that shell entities incorporate in — construct-debatable |
 ### Observed effect in the 2026-04-24 cohort audit
 | Country | `tradeSanctions` dim score | Interpretation under current construct |
 |---|---|---|
 | KW | 82 | Low designated-party count (mostly a clean jurisdiction) |
 | QA | 82 | Low count |
 | AE | 54 | High count — dominated by category (c): Iran-evasion shell entities, Russian-asset SPVs |
 | SA | (similar) | Low count |
 AE's gap of −28 vs KW/QA is almost entirely driven by category
 (c) listings. Under the CURRENT scorer, AE's resilience is
 penalized for being a financial hub where sanctioned parties
 incorporate shells — regardless of whether the UAE state is
 complicit or targeted by the listing.
 ### Construct options (not decided here)
 This PR deliberately does NOT pick an option; the scoring
 implication is large enough that the decision belongs to a
 separate construct-discussion PR with cohort snapshots.
 **Option 1 — Keep the current flat count (status quo).**
 - Rationale: financial-sanctions exposure IS a real resilience
  risk even for transit-hub jurisdictions. A country that
  functions as a shell-entity jurisdiction ends up correlated
  with secondary-sanctions enforcement actions, correspondent-
  banking isolation, and FATF grey-listing pressure.
 - Cost: countries whose domestic policy is NOT what earned them
  the count (UAE-on-Iran, Cyprus-on-Russia) carry a score
  penalty for the behavior of entities that happen to have
  listed addresses there.
 **Option 2 — Weight by OFAC program category.**
 - Rationale: programs encode the nature of the designation.
  `DPRK`, `IRAN`, `SYRIA`, `VENEZUELA`, `CUBA` are
  country-comprehensive; `SDGT`, `SDNTK`, `CYBER`, `RUSSIA-EO`,
  `GLOMAG` are typically entity-specific.
 - Approach: weight category-(a) programs at 1.0 and category-
  (c)-ish programs at 0.3–0.5 based on a named mapping.
 - Cost: requires maintaining a program→category manifest;
  program codes change over time; currently the seeder already
  captures `programs` per entry (see
  `scripts/seed-sanctions-pressure.mjs` lines 95-108) — the
  data is there, the scorer just doesn't read it.
 **Option 3 — Exclude transit-hub jurisdictions from the
 domicile-count signal.**
 - Rationale: a small number of jurisdictions (AE, SG, HK, CY,
  VG, KY) account for a disproportionate share of shell-entity
  listings. A hardcoded exclusion list would remove the
  category-(c) bias for those jurisdictions specifically.
 - Cost: hardcoded list is brittle + normative — who gets on it
  decides who "wins" the scoring change.
 ### Recommendation
 **Option 2** is the most defensible methodology change and is
 also the only one that requires data already being collected.
 The seeder captures `programs` per entry; a scorer update
 would read `sanctions:program-pressure:v1` or an extended
 `country-counts:v2` with per-program breakdowns and apply a
 rubric-mapped weight to each program.
 **This PR does NOT implement Option 2.** It:
 1. Documents the three categories explicitly (above)
 2. Pins the CURRENT `normalizeSanctionCount` piecewise scale
   with regression tests so a future scorer refactor cannot
   silently flip the behavior
 3. Flags the construct question for a methodology-decision PR
 ### Follow-up audit (requires API key / Redis access)
 Per the plan's §PR 5.1 task list, an entity-level sample audit
 of the raw OFAC data would classify 10 entries per country
 for AE, HK, SG, CY, TR, RU, IR, US into categories (a)/(b)/(c)
 and produce a calibration point for an Option-2 program-weight
 mapping. Out of scope for this doc-only PR.
 ### Regression-guard tests
 Pinned in
 `tests/resilience-sanctions-field-mapping.test.mts`:
 - `normalizeSanctionCount` piecewise anchors:
  `count=0 → 100`, `count=1 → 90`, `count=10 → 75`,
  `count=50 → 50`, `count=200 → 25`, `count=500 → ≤ 0`.
 - Monotonicity: more designated parties → lower score.
 - Scorer reads `sanctions:country-counts:v1[ISO2]` and defaults
  to 0 (score=100) when the country is absent from the map —
  intentional, since absence means "no designated parties
  located here," not "data missing."
 - `sanctionsRaw == null` (seed outage) → null score slot,
  NOT imputed — protects against silent data-outage scoring.
 **References.**
 - Seeder: `scripts/seed-sanctions-pressure.mjs` lines 83-93
  (`buildCountryCounts`)
 - Scorer: `server/worldmonitor/resilience/v1/_dimension-scorers.ts`
  lines 263 (`RESILIENCE_SANCTIONS_KEY`),
  535 (`normalizeSanctionCount`), 1057 (`scoreTradeSanctions`)
 - OFAC SDN docs: https://ofac.treasury.gov/specially-designated-nationals-and-blocked-persons-list-sdn-human-readable-lists
 - Plan reference:
  `docs/plans/2026-04-24-002-fix-resilience-cohort-ranking-structural-audit-plan.md`
  §PR 5.1
 - Test regression guards:
  `tests/resilience-sanctions-field-mapping.test.mts`
--- a/tests/resilience-sanctions-field-mapping.test.mts
+++ b/tests/resilience-sanctions-field-mapping.test.mts
@@ -0,0 +1,149 @@
 // Regression guard for scoreTradeSanctions's normalizeSanctionCount
 // piecewise anchors and field-mapping contract.
 //
 // Context. PR 5.1 of plan 2026-04-24-002 (see
 // `docs/methodology/known-limitations.md#tradesanctions-designated-party-domicile-construct-question`)
 // documents the construct-ambiguity of counting OFAC-designated-party
 // domicile locations as a resilience signal. The audit proposes three
 // options for handling the transit-hub-shell-entity case but
 // intentionally does NOT implement a scoring change. This test file
 // pins the CURRENT scorer behavior so that a future methodology
 // decision (Option 2 = program-weighted count; Option 3 = transit-hub
 // exclusion; or status quo) updates these tests explicitly.
 //
 // Pinning protects against silent scorer refactors: if someone swaps
 // the piecewise scale, flips the imputation path, or changes how the
 // seed-outage null branch interacts with weightedBlend, this file
 // fails before the scoring change propagates to a live publication.
 import assert from 'node:assert/strict';
 import { describe, it } from 'node:test';
 import {
  scoreTradeSanctions,
  type ResilienceSeedReader,
 } from '../server/worldmonitor/resilience/v1/_dimension-scorers.ts';
 const TEST_ISO2 = 'XX';
 // Minimal synthetic reader: only the sanctions key is populated, so the
 // scorer's other slots (restrictions, barriers, tariff) drop to null
 // and contribute zero weight. Isolates the sanctions slot math.
 function sanctionsOnlyReader(sanctionsCount: number | null): ResilienceSeedReader {
  return async (key: string) => {
    if (key === 'sanctions:country-counts:v1') {
      return sanctionsCount == null ? null : { [TEST_ISO2]: sanctionsCount };
    }
    return null;
  };
 }
 describe('normalizeSanctionCount — piecewise anchors pinned', () => {
  // The scorer's piecewise scale (see _dimension-scorers.ts line 535):
  //   count=0      → 100
  //   count=1-10   → 90..75 (linear)
  //   count=11-50  → 75..50 (linear)
  //   count=51-200 → 50..25 (linear)
  //   count=201+   → 25..0  (linear at 0.1/step, clamped 0)
  //
  // The tests drive scoreTradeSanctions end-to-end with an otherwise-
  // empty reader so the sanctions slot is the only one contributing a
  // non-null score to the weightedBlend. Note the OTHER slots still
  // contribute their declared weights (restrictions 0.15, barriers
  // 0.15, tariff 0.25) to weightedBlend's `totalWeight` denominator —
  // they just don't contribute to `availableWeight` (the score-
  // computation denominator) because their score is null. So the
  // surfaced `coverage` value reflects the 0.45 sanctions weight over
  // the full 1.0 totalWeight; the surfaced `score` reflects the
  // sanctions-slot score alone (since it's the only non-null input).
  it('count=0 anchors at score 100 (no designated parties)', async () => {
    const result = await scoreTradeSanctions(TEST_ISO2, sanctionsOnlyReader(0));
    assert.equal(result.score, 100, `expected 100 at count=0, got ${result.score}`);
  });
  it('count=1 anchors at score 90 (first listing drops 10 points)', async () => {
    const result = await scoreTradeSanctions(TEST_ISO2, sanctionsOnlyReader(1));
    assert.equal(result.score, 90, `expected 90 at count=1, got ${result.score}`);
  });
  it('count=10 anchors at score 75 (end of the 1-10 ramp)', async () => {
    const result = await scoreTradeSanctions(TEST_ISO2, sanctionsOnlyReader(10));
    assert.equal(result.score, 75, `expected 75 at count=10, got ${result.score}`);
  });
  it('count=50 anchors at score 50 (end of the 11-50 ramp)', async () => {
    const result = await scoreTradeSanctions(TEST_ISO2, sanctionsOnlyReader(50));
    assert.equal(result.score, 50, `expected 50 at count=50, got ${result.score}`);
  });
  it('count=200 anchors at score 25 (end of the 51-200 ramp)', async () => {
    const result = await scoreTradeSanctions(TEST_ISO2, sanctionsOnlyReader(200));
    assert.equal(result.score, 25, `expected 25 at count=200, got ${result.score}`);
  });
  it('count=500 anchors at score 0 (high-count tail clamped to floor)', async () => {
    const result = await scoreTradeSanctions(TEST_ISO2, sanctionsOnlyReader(500));
    // At count=500: 25 - (500-200)*0.1 = 25 - 30 = -5 → clamped to 0
    // via `roundScore` which clamps to [0, 100]. Equality assertion
    // (not <= 1) so a future roundScore / boundary change that nudges
    // the result off 0 breaks the test loudly instead of silently.
    assert.equal(result.score, 0,
      `expected exactly 0 at count=500 (heavily-sanctioned state; clamped from -5); got ${result.score}`);
  });
  it('monotonic: more designated parties → strictly lower score', async () => {
    const scores = await Promise.all([0, 1, 10, 50, 200, 500].map(
      (n) => scoreTradeSanctions(TEST_ISO2, sanctionsOnlyReader(n)),
    ));
    for (let i = 1; i < scores.length; i++) {
      assert.ok(scores[i].score < scores[i - 1].score,
        `score must strictly decrease with count; got [${scores.map((s) => s.score).join(', ')}]`);
    }
  });
 });
 describe('scoreTradeSanctions — field-mapping + outage semantics', () => {
  it('country absent from sanctions map defaults to count=0 (score 100)', async () => {
    // The map is ISO2 → count. A country NOT in the map is semantically
    // "no designated parties located here" — NOT "data missing". The
    // scorer reads `sanctionsCounts[countryCode] ?? 0` (line 1070).
    const reader: ResilienceSeedReader = async (key) => {
      if (key === 'sanctions:country-counts:v1') {
        return { US: 100, RU: 800 }; // our test country XX is NOT in this map
      }
      return null;
    };
    const result = await scoreTradeSanctions(TEST_ISO2, reader);
    assert.equal(result.score, 100,
      `absent-from-map must score 100 (count=0 semantics); got ${result.score}`);
  });
  it('sanctions seed outage (raw=null) contributes null score slot — NOT imputed', async () => {
    // When the seed key is entirely absent (not just the country key),
    // `sanctionsRaw == null` and the slot goes to { score: null, weight: 0.45 }
    // (line 1082-1083 of _dimension-scorers.ts). This is an intentional
    // fail-null behavior: we must NOT impute a score on seed outage,
    // because imputing would mask the outage. The other slots also drop
    // to null (nothing in our synthetic reader), so weightedBlend returns
    // coverage=0 — a clean zero-signal state that propagates as low
    // confidence at the dim level.
    const reader: ResilienceSeedReader = async () => null;
    const result = await scoreTradeSanctions(TEST_ISO2, reader);
    assert.equal(result.coverage, 0,
      `full-outage must produce coverage=0 (no impute-as-if-clean); got ${result.coverage}`);
  });
  it('construct-document anchor: count=1 differs from count=0 by exactly 10 points', async () => {
    // Pins the "first designated party drops the score by 10" design
    // choice. A future methodology PR that decides Option 2 (program-
    // weighted) or Option 3 (transit-hub exclusion) will necessarily
    // update this anchor if the weight-1 semantics change.
    const [zero, one] = await Promise.all([
      scoreTradeSanctions(TEST_ISO2, sanctionsOnlyReader(0)),
      scoreTradeSanctions(TEST_ISO2, sanctionsOnlyReader(1)),
    ]);
    assert.equal(zero.score - one.score, 10,
      `count=1 must be exactly 10 points below count=0; got ${zero.score - one.score}`);
  });
 });