release: add public snapshot tooling docs and privacy gates

This commit is contained in:
bruno cesar
2026-02-28 21:03:57 -03:00
parent 2bf65096c5
commit 989e81cf20
14 changed files with 611 additions and 26 deletions

View File

@@ -20,6 +20,10 @@ APP_ENV=dev
JWT_SECRET_KEY=change-me-generate-with-openssl-rand-hex-32
INVITE_CODE=
CORS_ORIGINS=http://localhost:3000
PUBLIC_MODE=false
PUBLIC_ALLOW_PERSON=false
PUBLIC_ALLOW_ENTITY_LOOKUP=false
PUBLIC_ALLOW_INVESTIGATIONS=false
# Frontend (dev only — production uses Caddy reverse proxy with relative paths)
VITE_API_URL=http://localhost:8000
@@ -30,13 +34,3 @@ VITE_API_URL=http://localhost:8000
# Optional ETL source tokens
# WORLD_BANK_API_KEY=
# EU_SANCTIONS_TOKEN=
# MCP Servers (Claude Code integration)
# 1. Add to ~/.zshrc:
# export NEO4J_PROD_PASSWORD="<from /opt/icarus/infra/.env on server>"
# export NEO4J_DEV_PASSWORD="icarus-dev-2026" (or your local Neo4j password)
# 2. Create SSH tunnel for prod Neo4j (bolt over SSH):
# ssh -f -N icarus-hetzner (uses ~/.ssh/config Host alias)
# This forwards localhost:7688 → server:7687 (encrypted)
# 3. Restart Claude Code — it reads .mcp.json automatically
# See .mcp.json for server definitions (neo4j-prod, neo4j-dev, ssh-hetzner)

View File

@@ -71,3 +71,16 @@ jobs:
- name: Audit ETL dependencies
run: uvx pip-audit -r /tmp/etl-requirements.txt --strict
public-privacy-gate:
name: Public Privacy Gate
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Validate public privacy contract
run: python scripts/check_public_privacy.py --repo-root .

1
.gitignore vendored
View File

@@ -50,6 +50,7 @@ htmlcov/
# Data files (too large for git)
*.csv
!docs/source_registry_br_v1.csv
!docs/release/public_boundary_matrix.csv
*.tsv
*.parquet
*.dbc

View File

@@ -1,24 +1,30 @@
# ICARUS
# World Transparency Graph (WTG) — Icarus Core
Ferramenta de análise de grafos de dados públicos brasileiros.
Plataforma global de análise de grafos de dados públicos.
Brazilian public data graph analysis tool.
Global public-data graph analysis platform.
[![CI](https://github.com/YOUR_ORG/icarus/actions/workflows/ci.yml/badge.svg)](https://github.com/YOUR_ORG/icarus/actions/workflows/ci.yml)
[![CI](https://github.com/brunoclz/world-transparency-graph/actions/workflows/ci.yml/badge.svg)](https://github.com/brunoclz/world-transparency-graph/actions/workflows/ci.yml)
[![License: AGPL v3](https://img.shields.io/badge/License-AGPL_v3-blue.svg)](https://www.gnu.org/licenses/agpl-3.0)
---
## O que é / What it is
ICARUS ingere dados de registros públicos brasileiros (CNPJ, TSE, Portal da Transparência, CEIS/CNEP) em um grafo Neo4j e permite a exploração visual de conexões entre pessoas, empresas, contratos, eleições e sanções.
WTG (powered by Icarus Core) ingere dados de registros públicos e permite a exploração visual de conexões entre empresas, contratos, eleições e sanções.
ICARUS ingests Brazilian public records (CNPJ, TSE, Portal da Transparência, CEIS/CNEP) into a Neo4j graph and enables visual exploration of connections between people, companies, contracts, elections, and sanctions.
WTG (powered by Icarus Core) ingests public records and enables visual exploration of connections between companies, contracts, elections, and sanctions.
**Dados de registros públicos. Não constitui acusação.**
**Data patterns from public records. Not accusations.**
## Modelo de marca / Brand model
- Produto público: **World Transparency Graph (WTG)**
- Movimento cívico: **BRCC**
- Engine institucional: **Icarus Core**
## Arquitetura / Architecture
```
@@ -105,21 +111,33 @@ make test-frontend # 20 testes TypeScript
| p10 | Ciclo doação-contrato | Donation-contract loop |
| p12 | Concentração de contratos | Contract concentration |
## Public mode contract
WTG Open deve rodar com defaults públicos:
- `PUBLIC_MODE=true`
- `PUBLIC_ALLOW_PERSON=false`
- `PUBLIC_ALLOW_ENTITY_LOOKUP=false`
- `PUBLIC_ALLOW_INVESTIGATIONS=false`
Com isso, o modo público não retorna nós de PF (`Person`/`Partner`) nem propriedades pessoais.
## Endpoints da API / API endpoints
| Método | Rota | Descrição |
|---|---|---|
| GET | `/health` | Health check |
| GET | `/api/v1/entity/{cpf_or_cnpj}` | Buscar entidade |
| GET | `/api/v1/entity/{id}/connections` | Conexões da entidade |
| GET | `/api/v1/search?q=` | Busca fulltext |
| GET | `/api/v1/graph/{entity_id}` | Dados do grafo |
| GET | `/api/v1/patterns/` | Listar padrões |
| GET | `/api/v1/patterns/{entity_id}` | Padrões da entidade |
| GET | `/api/v1/baseline/{entity_id}` | Comparação com pares |
| POST | `/api/v1/investigations/` | Criar investigação |
| GET | `/api/v1/investigations/` | Listar investigações |
| GET | `/api/v1/meta/sources` | Fontes de dados |
| GET | `/api/v1/public/meta` | Métricas agregadas e saúde de fontes |
| GET | `/api/v1/public/patterns/company/{cnpj_or_id}` | Sinais públicos por empresa |
| GET | `/api/v1/public/graph/company/{cnpj_or_id}` | Subgrafo público de empresa |
### Advanced-only surface (internal deployment)
- `/api/v1/entity/*`
- `/api/v1/search`
- `/api/v1/graph/*`
- `/api/v1/patterns/*`
- `/api/v1/investigations/*`
## Estrutura / Project structure

14
data/demo/README.md Normal file
View File

@@ -0,0 +1,14 @@
# Demo Data (Synthetic)
This directory is reserved for synthetic, public-safe demo data only.
Rules:
- No real CPF or personal identifiers.
- No `Person` / `Partner` labels.
- No operational metadata.
Use generator:
```bash
python3 scripts/generate_demo_dataset.py --output data/demo/synthetic_graph.json
```

View File

@@ -0,0 +1,97 @@
{
"meta": {
"generated_at_utc": "2026-02-28T23:37:57.377723+00:00",
"generator_version": "1.0.0",
"source": "synthetic"
},
"nodes": [
{
"id": "company:alpha",
"label": "Alpha Infra Ltda",
"type": "Company",
"properties": {
"cnpj": "11.111.111/0001-11",
"sector": "infrastructure",
"city": "Sao Paulo"
}
},
{
"id": "company:beta",
"label": "Beta Servicos SA",
"type": "Company",
"properties": {
"cnpj": "22.222.222/0001-22",
"sector": "services",
"city": "Recife"
}
},
{
"id": "contract:001",
"label": "Contrato Municipal 001",
"type": "Contract",
"properties": {
"contract_id": "CT-001",
"value": 1250000,
"date": "2025-11-10"
}
},
{
"id": "sanction:001",
"label": "Sancao Administrativa 001",
"type": "Sanction",
"properties": {
"sanction_id": "SN-001",
"date": "2025-08-12"
}
},
{
"id": "finance:001",
"label": "Debito Fiscal 001",
"type": "Finance",
"properties": {
"finance_id": "FN-001",
"value": 320000,
"date": "2025-09-03"
}
}
],
"edges": [
{
"id": "rel:1",
"source": "company:alpha",
"target": "contract:001",
"type": "VENCEU",
"properties": {
"source": "synthetic"
}
},
{
"id": "rel:2",
"source": "company:alpha",
"target": "sanction:001",
"type": "SANCIONADA",
"properties": {
"source": "synthetic"
}
},
{
"id": "rel:3",
"source": "company:alpha",
"target": "finance:001",
"type": "DEVE",
"properties": {
"source": "synthetic"
}
},
{
"id": "rel:4",
"source": "company:alpha",
"target": "company:beta",
"type": "SOCIO_DE",
"properties": {
"source": "synthetic",
"note": "company-level relation"
}
}
]
}

View File

@@ -0,0 +1,29 @@
# Demo Dataset Contract (WTG Open)
## Objective
Provide a reproducible, public-safe demo graph with synthetic records only.
## Safety rules
- Synthetic data only. No real CPF, no real personal names, no real personal addresses.
- Company identifiers may use synthetic CNPJ-like values reserved for demonstration.
- Demo graph cannot include `Person` or `Partner` labels.
- Demo exports must never include private or operational metadata.
## Required files
- `data/demo/synthetic_graph.json`
- `data/demo/README.md`
- `scripts/generate_demo_dataset.py`
## JSON schema (minimum)
- `nodes[]`: `{id, label, type, properties}`
- `edges[]`: `{id, source, target, type, properties}`
- `meta`: `{generated_at_utc, generator_version, source: "synthetic"}`
## Acceptance checks
- No field name contains `cpf`, `doc_partial`, or `doc_raw`.
- No node label equals `Person` or `Partner`.
- CI privacy gate passes.
## Runtime target
- Dedicated demo Neo4j instance (non-production).
- Public API served with `PUBLIC_MODE=true`.

View File

@@ -0,0 +1,35 @@
# Public Compliance Pack (WTG Open)
## 1. Data minimization baseline
- Public mode defaults:
- `PUBLIC_MODE=true`
- `PUBLIC_ALLOW_PERSON=false`
- `PUBLIC_ALLOW_ENTITY_LOOKUP=false`
- `PUBLIC_ALLOW_INVESTIGATIONS=false`
- Public endpoints are company-only and aggregated.
## 2. LGPD-compatible operating principles
- Purpose limitation: investigative transparency and civic oversight.
- Data minimization: no person-level lookup in public surface.
- Security by design: role-separated advanced environment.
- Transparency: source attribution and coverage caveats on every report.
## 3. Public terms of use requirements
- Tool presents connections in public records, not legal conclusions.
- Users cannot use the platform for harassment or doxxing.
- Abuse patterns trigger throttling and access restrictions.
## 4. Correction and takedown policy
- Accept correction requests with source evidence.
- Record decision logs with timestamp and rationale.
- Propagate approved corrections to next published snapshot.
## 5. Abuse response playbook
- Enforce strict rate limiting in public mode.
- Retain request logs for abuse analysis in legal window.
- Block abusive clients and rotate keys/tokens when needed.
## 6. Mandatory legal review gates before launch
- RIPD/DPIA draft reviewed by legal counsel.
- Terms of Use published.
- Public communication includes limitation statement.

View File

@@ -0,0 +1,19 @@
path,classification,reason,action_for_public_repo
api/**,PUBLIC,Core API code is required for public edition,include
etl/**,PUBLIC,Core ingestion framework is required for public edition,include
frontend/**,PUBLIC,Public demo UI is required,include
infra/**,PUBLIC with review,Keep only generic local/dev deployment docs,include reviewed subset
docs/brand/**,PUBLIC,Brand governance for release,include
docs/demo/**,PUBLIC,Demo dataset contract and constraints,include
docs/legal/**,PUBLIC,Compliance and responsible-use baseline,include
docs/source_registry_br_v1.csv,PUBLIC,Program governance and transparency,include
docs/data-sources.md,PUBLIC,Source catalog visibility,include
CLAUDE.md,REMOVE_FROM_PUBLIC,Contains operational host and infrastructure paths,exclude
.mcp.json,REMOVE_FROM_PUBLIC,Contains local runtime MCP wiring,exclude
scripts/auto_finalize_pncp_backfill.sh,REMOVE_FROM_PUBLIC,Production operational finalizer tied to server paths,exclude
scripts/storage_capacity_report.sh,INTERNAL,Operational script with server assumptions,exclude by default
docs/shadow_rollout_runbook.md,REMOVE_FROM_PUBLIC,Operational production rollout details,exclude
docs/ingestion_priority_runbook.md,REMOVE_FROM_PUBLIC,Contains production data paths and credential procedures,exclude
docs/ops/storage_operations.md,REMOVE_FROM_PUBLIC,Contains production operational process and server paths,exclude
audit-results/**,REMOVE_FROM_PUBLIC,Operational evidence and internal logs,exclude
data/**,INTERNAL by default,Only data/demo synthetic subset allowed,include only data/demo
1 path classification reason action_for_public_repo
2 api/** PUBLIC Core API code is required for public edition include
3 etl/** PUBLIC Core ingestion framework is required for public edition include
4 frontend/** PUBLIC Public demo UI is required include
5 infra/** PUBLIC with review Keep only generic local/dev deployment docs include reviewed subset
6 docs/brand/** PUBLIC Brand governance for release include
7 docs/demo/** PUBLIC Demo dataset contract and constraints include
8 docs/legal/** PUBLIC Compliance and responsible-use baseline include
9 docs/source_registry_br_v1.csv PUBLIC Program governance and transparency include
10 docs/data-sources.md PUBLIC Source catalog visibility include
11 CLAUDE.md REMOVE_FROM_PUBLIC Contains operational host and infrastructure paths exclude
12 .mcp.json REMOVE_FROM_PUBLIC Contains local runtime MCP wiring exclude
13 scripts/auto_finalize_pncp_backfill.sh REMOVE_FROM_PUBLIC Production operational finalizer tied to server paths exclude
14 scripts/storage_capacity_report.sh INTERNAL Operational script with server assumptions exclude by default
15 docs/shadow_rollout_runbook.md REMOVE_FROM_PUBLIC Operational production rollout details exclude
16 docs/ingestion_priority_runbook.md REMOVE_FROM_PUBLIC Contains production data paths and credential procedures exclude
17 docs/ops/storage_operations.md REMOVE_FROM_PUBLIC Contains production operational process and server paths exclude
18 audit-results/** REMOVE_FROM_PUBLIC Operational evidence and internal logs exclude
19 data/** INTERNAL by default Only data/demo synthetic subset allowed include only data/demo

View File

@@ -0,0 +1,27 @@
# Public vs Advanced Endpoint Matrix
## Public mode defaults
- `PUBLIC_MODE=true`
- `PUBLIC_ALLOW_PERSON=false`
- `PUBLIC_ALLOW_ENTITY_LOOKUP=false`
- `PUBLIC_ALLOW_INVESTIGATIONS=false`
## Endpoint behavior
| Endpoint | PUBLIC_MODE=false (advanced) | PUBLIC_MODE=true (default) |
|---|---|---|
| `GET /api/v1/entity/{cpf_or_cnpj}` | Allowed | `403` (`Entity lookup endpoint disabled in public mode`) |
| `GET /api/v1/entity/by-element-id/{id}` | Allowed | `403` (`Entity lookup endpoint disabled in public mode`) |
| `GET /api/v1/entity/{id}/connections` | Allowed | Person/Partner targets filtered out |
| `GET /api/v1/search` | Allowed | Person/Partner results filtered out |
| `GET /api/v1/graph/{entity_id}` | Allowed | Person/Partner center blocked, person nodes filtered |
| `GET /api/v1/patterns/{entity_id}` | Allowed | `403` when `PUBLIC_ALLOW_ENTITY_LOOKUP=false` |
| `GET /api/v1/investigations/*` | Allowed | `403` (`Investigation endpoints disabled in public mode`) |
| `GET /api/v1/public/meta` | Allowed | Allowed |
| `GET /api/v1/public/patterns/company/{cnpj_or_id}` | Allowed | Allowed |
| `GET /api/v1/public/graph/company/{cnpj_or_id}` | Allowed | Allowed |
## Exposure tier contract
- `public_safe`: company/contract/sanction/aggregate entities allowed on public surface.
- `restricted`: person-adjacent entities (not returned by default in public mode).
- `internal_only`: workspace/admin artifacts (`User`, `Investigation`, `Annotation`, `Tag`).

View File

@@ -0,0 +1,55 @@
# Public Repo Release Checklist — World Transparency Graph
## 1) Prepare sanitized snapshot
```bash
bash scripts/prepare_public_snapshot.sh /Users/brunoclz/CORRUPTOS /tmp/world-transparency-graph-public
```
## 2) Initialize clean-history repo from snapshot
```bash
cd /tmp/world-transparency-graph-public
git init
git add .
git commit -m "Initial public public edition release (WTG)"
```
## 3) Create GitHub repository (manual)
- Owner: `brunoclz`
- Name: `world-transparency-graph`
- Visibility: Public
- Do not auto-add README/License (already present)
## 4) Push initial release
```bash
git branch -M main
git remote add origin https://github.com/brunoclz/world-transparency-graph.git
git push -u origin main
```
## 5) Configure branch protection (GitHub UI)
Require all checks:
- `API (Python)`
- `ETL (Python)`
- `Frontend (TypeScript)`
- `Neutrality Audit`
- `Gitleaks`
- `Bandit (Python)`
- `Pip Audit (Python deps)`
- `Public Privacy Gate`
## 6) Configure environment defaults
- Set public deployment environment vars:
- `PUBLIC_MODE=true`
- `PUBLIC_ALLOW_PERSON=false`
- `PUBLIC_ALLOW_ENTITY_LOOKUP=false`
- `PUBLIC_ALLOW_INVESTIGATIONS=false`
## 7) Final checks before launch
- `python scripts/check_public_privacy.py --repo-root .` => `PASS`
- Confirm no internal runbooks in public repo
- Confirm demo data is synthetic (`data/demo/synthetic_graph.json`)
## 8) Launch communication split
- Publish product announcement as **WTG**
- Publish movement announcement as **BRCC**
- Mention methodology limits and non-accusatory policy

78
scripts/check_public_privacy.py Executable file
View File

@@ -0,0 +1,78 @@
#!/usr/bin/env python3
"""Public-surface privacy gate checks for WTG open release."""
from __future__ import annotations
import argparse
import json
import re
from pathlib import Path
CPF_RAW_RE = re.compile(r"(?<!\d)\d{11}(?!\d)")
CPF_FMT_RE = re.compile(r"\d{3}\.\d{3}\.\d{3}-\d{2}")
FORBIDDEN_IN_PUBLIC_QUERIES = (
":Person",
":Partner",
".cpf",
"doc_partial",
"doc_raw",
)
def check_public_queries(repo_root: Path) -> list[str]:
errors: list[str] = []
query_dir = repo_root / "api" / "src" / "icarus" / "queries"
for path in sorted(query_dir.glob("public_*.cypher")):
content = path.read_text(encoding="utf-8")
for token in FORBIDDEN_IN_PUBLIC_QUERIES:
if token in content:
errors.append(f"{path}: forbidden token in public query: {token}")
return errors
def check_demo_data(repo_root: Path) -> list[str]:
errors: list[str] = []
demo_dir = repo_root / "data" / "demo"
for path in sorted(demo_dir.glob("*.json")):
raw = path.read_text(encoding="utf-8")
if CPF_RAW_RE.search(raw) or CPF_FMT_RE.search(raw):
errors.append(f"{path}: possible CPF-like value found")
payload = json.loads(raw)
for node in payload.get("nodes", []):
label = str(node.get("type", ""))
if label in {"Person", "Partner"}:
errors.append(f"{path}: forbidden demo label {label}")
props = node.get("properties", {})
if isinstance(props, dict):
lowered = {str(k).lower() for k in props.keys()}
if "cpf" in lowered or "doc_partial" in lowered or "doc_raw" in lowered:
errors.append(f"{path}: forbidden personal key in demo node")
return errors
def main() -> int:
parser = argparse.ArgumentParser(description="Run public privacy checks")
parser.add_argument(
"--repo-root",
default=".",
help="Repository root path",
)
args = parser.parse_args()
repo_root = Path(args.repo_root).resolve()
errors = [
*check_public_queries(repo_root),
*check_demo_data(repo_root),
]
if errors:
print("FAIL")
for error in errors:
print(f"- {error}")
return 1
print("PASS")
return 0
if __name__ == "__main__":
raise SystemExit(main())

125
scripts/generate_demo_dataset.py Executable file
View File

@@ -0,0 +1,125 @@
#!/usr/bin/env python3
"""Generate a public-safe synthetic graph dataset for WTG demo."""
from __future__ import annotations
import argparse
import json
from datetime import UTC, datetime
from pathlib import Path
def build_payload() -> dict[str, object]:
nodes = [
{
"id": "company:alpha",
"label": "Alpha Infra Ltda",
"type": "Company",
"properties": {
"cnpj": "11.111.111/0001-11",
"sector": "infrastructure",
"city": "Sao Paulo",
},
},
{
"id": "company:beta",
"label": "Beta Servicos SA",
"type": "Company",
"properties": {
"cnpj": "22.222.222/0001-22",
"sector": "services",
"city": "Recife",
},
},
{
"id": "contract:001",
"label": "Contrato Municipal 001",
"type": "Contract",
"properties": {
"contract_id": "CT-001",
"value": 1250000,
"date": "2025-11-10",
},
},
{
"id": "sanction:001",
"label": "Sancao Administrativa 001",
"type": "Sanction",
"properties": {
"sanction_id": "SN-001",
"date": "2025-08-12",
},
},
{
"id": "finance:001",
"label": "Debito Fiscal 001",
"type": "Finance",
"properties": {
"finance_id": "FN-001",
"value": 320000,
"date": "2025-09-03",
},
},
]
edges = [
{
"id": "rel:1",
"source": "company:alpha",
"target": "contract:001",
"type": "VENCEU",
"properties": {"source": "synthetic"},
},
{
"id": "rel:2",
"source": "company:alpha",
"target": "sanction:001",
"type": "SANCIONADA",
"properties": {"source": "synthetic"},
},
{
"id": "rel:3",
"source": "company:alpha",
"target": "finance:001",
"type": "DEVE",
"properties": {"source": "synthetic"},
},
{
"id": "rel:4",
"source": "company:alpha",
"target": "company:beta",
"type": "SOCIO_DE",
"properties": {"source": "synthetic", "note": "company-level relation"},
},
]
return {
"meta": {
"generated_at_utc": datetime.now(UTC).isoformat(),
"generator_version": "1.0.0",
"source": "synthetic",
},
"nodes": nodes,
"edges": edges,
}
def main() -> int:
parser = argparse.ArgumentParser(description="Generate synthetic graph demo dataset")
parser.add_argument(
"--output",
default="data/demo/synthetic_graph.json",
help="Output path for synthetic dataset",
)
args = parser.parse_args()
output_path = Path(args.output)
output_path.parent.mkdir(parents=True, exist_ok=True)
payload = build_payload()
output_path.write_text(json.dumps(payload, ensure_ascii=True, indent=2) + "\n", encoding="utf-8")
print(f"Wrote synthetic dataset to {output_path}")
return 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -0,0 +1,80 @@
#!/usr/bin/env bash
set -euo pipefail
SRC_ROOT="${1:-$(pwd)}"
OUT_DIR="${2:-/tmp/world-transparency-graph-public-$(date +%Y%m%d_%H%M%S)}"
mkdir -p "$OUT_DIR"
# Include only intended public edition directories/files
rsync -a \
--exclude='**/.venv/***' \
--exclude='**/__pycache__/***' \
--exclude='**/.pytest_cache/***' \
--exclude='**/.mypy_cache/***' \
--exclude='**/.ruff_cache/***' \
--exclude='frontend/node_modules/***' \
--exclude='etl/data/***' \
--exclude='.env' \
--exclude='api/.env' \
--exclude='etl/.env' \
--exclude='frontend/.env' \
--exclude='**/dist/***' \
--exclude='**/build/***' \
--exclude='**/*.pyc' \
--include='api/' \
--include='api/***' \
--include='etl/' \
--include='etl/***' \
--include='frontend/' \
--include='frontend/***' \
--include='infra/' \
--include='infra/***' \
--include='docs/' \
--include='docs/brand/' \
--include='docs/brand/***' \
--include='docs/demo/' \
--include='docs/demo/***' \
--include='docs/legal/' \
--include='docs/legal/***' \
--include='docs/release/' \
--include='docs/release/public_boundary_matrix.csv' \
--include='docs/release/public_endpoint_matrix.md' \
--include='docs/release/public_repo_release_checklist.md' \
--include='docs/data-sources.md' \
--include='docs/source_registry_br_v1.csv' \
--include='docs/source_onboarding_contract.md' \
--include='.github/' \
--include='.github/***' \
--include='scripts/' \
--include='scripts/check_public_privacy.py' \
--include='scripts/generate_demo_dataset.py' \
--include='scripts/link_persons.cypher' \
--include='scripts/prepare_public_snapshot.sh' \
--include='data/' \
--include='data/demo/' \
--include='data/demo/***' \
--include='README.md' \
--include='LICENSE' \
--include='.env.example' \
--include='.gitignore' \
--include='.gitleaksignore' \
--exclude='*' \
"$SRC_ROOT/" "$OUT_DIR/"
# Explicit removals for internal-only artifacts
rm -f "$OUT_DIR/CLAUDE.md"
rm -f "$OUT_DIR/.mcp.json"
rm -f "$OUT_DIR/docs/shadow_rollout_runbook.md"
rm -f "$OUT_DIR/docs/ingestion_priority_runbook.md"
rm -f "$OUT_DIR/docs/ops/storage_operations.md"
rm -f "$OUT_DIR/scripts/auto_finalize_pncp_backfill.sh"
rm -rf "$OUT_DIR/audit-results"
# Ensure demo data exists
python3 "$OUT_DIR/scripts/generate_demo_dataset.py" --output "$OUT_DIR/data/demo/synthetic_graph.json" >/dev/null
# Run public privacy gate on generated snapshot
python3 "$OUT_DIR/scripts/check_public_privacy.py" --repo-root "$OUT_DIR"
printf 'Public snapshot prepared at: %s\n' "$OUT_DIR"