mirror of
https://github.com/kharonsec/br-acc
synced 2026-04-25 17:15:02 +02:00
release: add public snapshot tooling docs and privacy gates
This commit is contained in:
14
.env.example
14
.env.example
@@ -20,6 +20,10 @@ APP_ENV=dev
|
||||
JWT_SECRET_KEY=change-me-generate-with-openssl-rand-hex-32
|
||||
INVITE_CODE=
|
||||
CORS_ORIGINS=http://localhost:3000
|
||||
PUBLIC_MODE=false
|
||||
PUBLIC_ALLOW_PERSON=false
|
||||
PUBLIC_ALLOW_ENTITY_LOOKUP=false
|
||||
PUBLIC_ALLOW_INVESTIGATIONS=false
|
||||
|
||||
# Frontend (dev only — production uses Caddy reverse proxy with relative paths)
|
||||
VITE_API_URL=http://localhost:8000
|
||||
@@ -30,13 +34,3 @@ VITE_API_URL=http://localhost:8000
|
||||
# Optional ETL source tokens
|
||||
# WORLD_BANK_API_KEY=
|
||||
# EU_SANCTIONS_TOKEN=
|
||||
|
||||
# MCP Servers (Claude Code integration)
|
||||
# 1. Add to ~/.zshrc:
|
||||
# export NEO4J_PROD_PASSWORD="<from /opt/icarus/infra/.env on server>"
|
||||
# export NEO4J_DEV_PASSWORD="icarus-dev-2026" (or your local Neo4j password)
|
||||
# 2. Create SSH tunnel for prod Neo4j (bolt over SSH):
|
||||
# ssh -f -N icarus-hetzner (uses ~/.ssh/config Host alias)
|
||||
# This forwards localhost:7688 → server:7687 (encrypted)
|
||||
# 3. Restart Claude Code — it reads .mcp.json automatically
|
||||
# See .mcp.json for server definitions (neo4j-prod, neo4j-dev, ssh-hetzner)
|
||||
|
||||
13
.github/workflows/security.yml
vendored
13
.github/workflows/security.yml
vendored
@@ -71,3 +71,16 @@ jobs:
|
||||
|
||||
- name: Audit ETL dependencies
|
||||
run: uvx pip-audit -r /tmp/etl-requirements.txt --strict
|
||||
|
||||
public-privacy-gate:
|
||||
name: Public Privacy Gate
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: "3.12"
|
||||
|
||||
- name: Validate public privacy contract
|
||||
run: python scripts/check_public_privacy.py --repo-root .
|
||||
|
||||
1
.gitignore
vendored
1
.gitignore
vendored
@@ -50,6 +50,7 @@ htmlcov/
|
||||
# Data files (too large for git)
|
||||
*.csv
|
||||
!docs/source_registry_br_v1.csv
|
||||
!docs/release/public_boundary_matrix.csv
|
||||
*.tsv
|
||||
*.parquet
|
||||
*.dbc
|
||||
|
||||
50
README.md
50
README.md
@@ -1,24 +1,30 @@
|
||||
# ICARUS
|
||||
# World Transparency Graph (WTG) — Icarus Core
|
||||
|
||||
Ferramenta de análise de grafos de dados públicos brasileiros.
|
||||
Plataforma global de análise de grafos de dados públicos.
|
||||
|
||||
Brazilian public data graph analysis tool.
|
||||
Global public-data graph analysis platform.
|
||||
|
||||
[](https://github.com/YOUR_ORG/icarus/actions/workflows/ci.yml)
|
||||
[](https://github.com/brunoclz/world-transparency-graph/actions/workflows/ci.yml)
|
||||
[](https://www.gnu.org/licenses/agpl-3.0)
|
||||
|
||||
---
|
||||
|
||||
## O que é / What it is
|
||||
|
||||
ICARUS ingere dados de registros públicos brasileiros (CNPJ, TSE, Portal da Transparência, CEIS/CNEP) em um grafo Neo4j e permite a exploração visual de conexões entre pessoas, empresas, contratos, eleições e sanções.
|
||||
WTG (powered by Icarus Core) ingere dados de registros públicos e permite a exploração visual de conexões entre empresas, contratos, eleições e sanções.
|
||||
|
||||
ICARUS ingests Brazilian public records (CNPJ, TSE, Portal da Transparência, CEIS/CNEP) into a Neo4j graph and enables visual exploration of connections between people, companies, contracts, elections, and sanctions.
|
||||
WTG (powered by Icarus Core) ingests public records and enables visual exploration of connections between companies, contracts, elections, and sanctions.
|
||||
|
||||
**Dados de registros públicos. Não constitui acusação.**
|
||||
|
||||
**Data patterns from public records. Not accusations.**
|
||||
|
||||
## Modelo de marca / Brand model
|
||||
|
||||
- Produto público: **World Transparency Graph (WTG)**
|
||||
- Movimento cívico: **BRCC**
|
||||
- Engine institucional: **Icarus Core**
|
||||
|
||||
## Arquitetura / Architecture
|
||||
|
||||
```
|
||||
@@ -105,21 +111,33 @@ make test-frontend # 20 testes TypeScript
|
||||
| p10 | Ciclo doação-contrato | Donation-contract loop |
|
||||
| p12 | Concentração de contratos | Contract concentration |
|
||||
|
||||
## Public mode contract
|
||||
|
||||
WTG Open deve rodar com defaults públicos:
|
||||
|
||||
- `PUBLIC_MODE=true`
|
||||
- `PUBLIC_ALLOW_PERSON=false`
|
||||
- `PUBLIC_ALLOW_ENTITY_LOOKUP=false`
|
||||
- `PUBLIC_ALLOW_INVESTIGATIONS=false`
|
||||
|
||||
Com isso, o modo público não retorna nós de PF (`Person`/`Partner`) nem propriedades pessoais.
|
||||
|
||||
## Endpoints da API / API endpoints
|
||||
|
||||
| Método | Rota | Descrição |
|
||||
|---|---|---|
|
||||
| GET | `/health` | Health check |
|
||||
| GET | `/api/v1/entity/{cpf_or_cnpj}` | Buscar entidade |
|
||||
| GET | `/api/v1/entity/{id}/connections` | Conexões da entidade |
|
||||
| GET | `/api/v1/search?q=` | Busca fulltext |
|
||||
| GET | `/api/v1/graph/{entity_id}` | Dados do grafo |
|
||||
| GET | `/api/v1/patterns/` | Listar padrões |
|
||||
| GET | `/api/v1/patterns/{entity_id}` | Padrões da entidade |
|
||||
| GET | `/api/v1/baseline/{entity_id}` | Comparação com pares |
|
||||
| POST | `/api/v1/investigations/` | Criar investigação |
|
||||
| GET | `/api/v1/investigations/` | Listar investigações |
|
||||
| GET | `/api/v1/meta/sources` | Fontes de dados |
|
||||
| GET | `/api/v1/public/meta` | Métricas agregadas e saúde de fontes |
|
||||
| GET | `/api/v1/public/patterns/company/{cnpj_or_id}` | Sinais públicos por empresa |
|
||||
| GET | `/api/v1/public/graph/company/{cnpj_or_id}` | Subgrafo público de empresa |
|
||||
|
||||
### Advanced-only surface (internal deployment)
|
||||
|
||||
- `/api/v1/entity/*`
|
||||
- `/api/v1/search`
|
||||
- `/api/v1/graph/*`
|
||||
- `/api/v1/patterns/*`
|
||||
- `/api/v1/investigations/*`
|
||||
|
||||
## Estrutura / Project structure
|
||||
|
||||
|
||||
14
data/demo/README.md
Normal file
14
data/demo/README.md
Normal file
@@ -0,0 +1,14 @@
|
||||
# Demo Data (Synthetic)
|
||||
|
||||
This directory is reserved for synthetic, public-safe demo data only.
|
||||
|
||||
Rules:
|
||||
- No real CPF or personal identifiers.
|
||||
- No `Person` / `Partner` labels.
|
||||
- No operational metadata.
|
||||
|
||||
Use generator:
|
||||
|
||||
```bash
|
||||
python3 scripts/generate_demo_dataset.py --output data/demo/synthetic_graph.json
|
||||
```
|
||||
97
data/demo/synthetic_graph.json
Normal file
97
data/demo/synthetic_graph.json
Normal file
@@ -0,0 +1,97 @@
|
||||
{
|
||||
"meta": {
|
||||
"generated_at_utc": "2026-02-28T23:37:57.377723+00:00",
|
||||
"generator_version": "1.0.0",
|
||||
"source": "synthetic"
|
||||
},
|
||||
"nodes": [
|
||||
{
|
||||
"id": "company:alpha",
|
||||
"label": "Alpha Infra Ltda",
|
||||
"type": "Company",
|
||||
"properties": {
|
||||
"cnpj": "11.111.111/0001-11",
|
||||
"sector": "infrastructure",
|
||||
"city": "Sao Paulo"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "company:beta",
|
||||
"label": "Beta Servicos SA",
|
||||
"type": "Company",
|
||||
"properties": {
|
||||
"cnpj": "22.222.222/0001-22",
|
||||
"sector": "services",
|
||||
"city": "Recife"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "contract:001",
|
||||
"label": "Contrato Municipal 001",
|
||||
"type": "Contract",
|
||||
"properties": {
|
||||
"contract_id": "CT-001",
|
||||
"value": 1250000,
|
||||
"date": "2025-11-10"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "sanction:001",
|
||||
"label": "Sancao Administrativa 001",
|
||||
"type": "Sanction",
|
||||
"properties": {
|
||||
"sanction_id": "SN-001",
|
||||
"date": "2025-08-12"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "finance:001",
|
||||
"label": "Debito Fiscal 001",
|
||||
"type": "Finance",
|
||||
"properties": {
|
||||
"finance_id": "FN-001",
|
||||
"value": 320000,
|
||||
"date": "2025-09-03"
|
||||
}
|
||||
}
|
||||
],
|
||||
"edges": [
|
||||
{
|
||||
"id": "rel:1",
|
||||
"source": "company:alpha",
|
||||
"target": "contract:001",
|
||||
"type": "VENCEU",
|
||||
"properties": {
|
||||
"source": "synthetic"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "rel:2",
|
||||
"source": "company:alpha",
|
||||
"target": "sanction:001",
|
||||
"type": "SANCIONADA",
|
||||
"properties": {
|
||||
"source": "synthetic"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "rel:3",
|
||||
"source": "company:alpha",
|
||||
"target": "finance:001",
|
||||
"type": "DEVE",
|
||||
"properties": {
|
||||
"source": "synthetic"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "rel:4",
|
||||
"source": "company:alpha",
|
||||
"target": "company:beta",
|
||||
"type": "SOCIO_DE",
|
||||
"properties": {
|
||||
"source": "synthetic",
|
||||
"note": "company-level relation"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
29
docs/demo/dataset-contract.md
Normal file
29
docs/demo/dataset-contract.md
Normal file
@@ -0,0 +1,29 @@
|
||||
# Demo Dataset Contract (WTG Open)
|
||||
|
||||
## Objective
|
||||
Provide a reproducible, public-safe demo graph with synthetic records only.
|
||||
|
||||
## Safety rules
|
||||
- Synthetic data only. No real CPF, no real personal names, no real personal addresses.
|
||||
- Company identifiers may use synthetic CNPJ-like values reserved for demonstration.
|
||||
- Demo graph cannot include `Person` or `Partner` labels.
|
||||
- Demo exports must never include private or operational metadata.
|
||||
|
||||
## Required files
|
||||
- `data/demo/synthetic_graph.json`
|
||||
- `data/demo/README.md`
|
||||
- `scripts/generate_demo_dataset.py`
|
||||
|
||||
## JSON schema (minimum)
|
||||
- `nodes[]`: `{id, label, type, properties}`
|
||||
- `edges[]`: `{id, source, target, type, properties}`
|
||||
- `meta`: `{generated_at_utc, generator_version, source: "synthetic"}`
|
||||
|
||||
## Acceptance checks
|
||||
- No field name contains `cpf`, `doc_partial`, or `doc_raw`.
|
||||
- No node label equals `Person` or `Partner`.
|
||||
- CI privacy gate passes.
|
||||
|
||||
## Runtime target
|
||||
- Dedicated demo Neo4j instance (non-production).
|
||||
- Public API served with `PUBLIC_MODE=true`.
|
||||
35
docs/legal/public-compliance-pack.md
Normal file
35
docs/legal/public-compliance-pack.md
Normal file
@@ -0,0 +1,35 @@
|
||||
# Public Compliance Pack (WTG Open)
|
||||
|
||||
## 1. Data minimization baseline
|
||||
- Public mode defaults:
|
||||
- `PUBLIC_MODE=true`
|
||||
- `PUBLIC_ALLOW_PERSON=false`
|
||||
- `PUBLIC_ALLOW_ENTITY_LOOKUP=false`
|
||||
- `PUBLIC_ALLOW_INVESTIGATIONS=false`
|
||||
- Public endpoints are company-only and aggregated.
|
||||
|
||||
## 2. LGPD-compatible operating principles
|
||||
- Purpose limitation: investigative transparency and civic oversight.
|
||||
- Data minimization: no person-level lookup in public surface.
|
||||
- Security by design: role-separated advanced environment.
|
||||
- Transparency: source attribution and coverage caveats on every report.
|
||||
|
||||
## 3. Public terms of use requirements
|
||||
- Tool presents connections in public records, not legal conclusions.
|
||||
- Users cannot use the platform for harassment or doxxing.
|
||||
- Abuse patterns trigger throttling and access restrictions.
|
||||
|
||||
## 4. Correction and takedown policy
|
||||
- Accept correction requests with source evidence.
|
||||
- Record decision logs with timestamp and rationale.
|
||||
- Propagate approved corrections to next published snapshot.
|
||||
|
||||
## 5. Abuse response playbook
|
||||
- Enforce strict rate limiting in public mode.
|
||||
- Retain request logs for abuse analysis in legal window.
|
||||
- Block abusive clients and rotate keys/tokens when needed.
|
||||
|
||||
## 6. Mandatory legal review gates before launch
|
||||
- RIPD/DPIA draft reviewed by legal counsel.
|
||||
- Terms of Use published.
|
||||
- Public communication includes limitation statement.
|
||||
19
docs/release/public_boundary_matrix.csv
Normal file
19
docs/release/public_boundary_matrix.csv
Normal file
@@ -0,0 +1,19 @@
|
||||
path,classification,reason,action_for_public_repo
|
||||
api/**,PUBLIC,Core API code is required for public edition,include
|
||||
etl/**,PUBLIC,Core ingestion framework is required for public edition,include
|
||||
frontend/**,PUBLIC,Public demo UI is required,include
|
||||
infra/**,PUBLIC with review,Keep only generic local/dev deployment docs,include reviewed subset
|
||||
docs/brand/**,PUBLIC,Brand governance for release,include
|
||||
docs/demo/**,PUBLIC,Demo dataset contract and constraints,include
|
||||
docs/legal/**,PUBLIC,Compliance and responsible-use baseline,include
|
||||
docs/source_registry_br_v1.csv,PUBLIC,Program governance and transparency,include
|
||||
docs/data-sources.md,PUBLIC,Source catalog visibility,include
|
||||
CLAUDE.md,REMOVE_FROM_PUBLIC,Contains operational host and infrastructure paths,exclude
|
||||
.mcp.json,REMOVE_FROM_PUBLIC,Contains local runtime MCP wiring,exclude
|
||||
scripts/auto_finalize_pncp_backfill.sh,REMOVE_FROM_PUBLIC,Production operational finalizer tied to server paths,exclude
|
||||
scripts/storage_capacity_report.sh,INTERNAL,Operational script with server assumptions,exclude by default
|
||||
docs/shadow_rollout_runbook.md,REMOVE_FROM_PUBLIC,Operational production rollout details,exclude
|
||||
docs/ingestion_priority_runbook.md,REMOVE_FROM_PUBLIC,Contains production data paths and credential procedures,exclude
|
||||
docs/ops/storage_operations.md,REMOVE_FROM_PUBLIC,Contains production operational process and server paths,exclude
|
||||
audit-results/**,REMOVE_FROM_PUBLIC,Operational evidence and internal logs,exclude
|
||||
data/**,INTERNAL by default,Only data/demo synthetic subset allowed,include only data/demo
|
||||
|
27
docs/release/public_endpoint_matrix.md
Normal file
27
docs/release/public_endpoint_matrix.md
Normal file
@@ -0,0 +1,27 @@
|
||||
# Public vs Advanced Endpoint Matrix
|
||||
|
||||
## Public mode defaults
|
||||
- `PUBLIC_MODE=true`
|
||||
- `PUBLIC_ALLOW_PERSON=false`
|
||||
- `PUBLIC_ALLOW_ENTITY_LOOKUP=false`
|
||||
- `PUBLIC_ALLOW_INVESTIGATIONS=false`
|
||||
|
||||
## Endpoint behavior
|
||||
|
||||
| Endpoint | PUBLIC_MODE=false (advanced) | PUBLIC_MODE=true (default) |
|
||||
|---|---|---|
|
||||
| `GET /api/v1/entity/{cpf_or_cnpj}` | Allowed | `403` (`Entity lookup endpoint disabled in public mode`) |
|
||||
| `GET /api/v1/entity/by-element-id/{id}` | Allowed | `403` (`Entity lookup endpoint disabled in public mode`) |
|
||||
| `GET /api/v1/entity/{id}/connections` | Allowed | Person/Partner targets filtered out |
|
||||
| `GET /api/v1/search` | Allowed | Person/Partner results filtered out |
|
||||
| `GET /api/v1/graph/{entity_id}` | Allowed | Person/Partner center blocked, person nodes filtered |
|
||||
| `GET /api/v1/patterns/{entity_id}` | Allowed | `403` when `PUBLIC_ALLOW_ENTITY_LOOKUP=false` |
|
||||
| `GET /api/v1/investigations/*` | Allowed | `403` (`Investigation endpoints disabled in public mode`) |
|
||||
| `GET /api/v1/public/meta` | Allowed | Allowed |
|
||||
| `GET /api/v1/public/patterns/company/{cnpj_or_id}` | Allowed | Allowed |
|
||||
| `GET /api/v1/public/graph/company/{cnpj_or_id}` | Allowed | Allowed |
|
||||
|
||||
## Exposure tier contract
|
||||
- `public_safe`: company/contract/sanction/aggregate entities allowed on public surface.
|
||||
- `restricted`: person-adjacent entities (not returned by default in public mode).
|
||||
- `internal_only`: workspace/admin artifacts (`User`, `Investigation`, `Annotation`, `Tag`).
|
||||
55
docs/release/public_repo_release_checklist.md
Normal file
55
docs/release/public_repo_release_checklist.md
Normal file
@@ -0,0 +1,55 @@
|
||||
# Public Repo Release Checklist — World Transparency Graph
|
||||
|
||||
## 1) Prepare sanitized snapshot
|
||||
```bash
|
||||
bash scripts/prepare_public_snapshot.sh /Users/brunoclz/CORRUPTOS /tmp/world-transparency-graph-public
|
||||
```
|
||||
|
||||
## 2) Initialize clean-history repo from snapshot
|
||||
```bash
|
||||
cd /tmp/world-transparency-graph-public
|
||||
git init
|
||||
git add .
|
||||
git commit -m "Initial public public edition release (WTG)"
|
||||
```
|
||||
|
||||
## 3) Create GitHub repository (manual)
|
||||
- Owner: `brunoclz`
|
||||
- Name: `world-transparency-graph`
|
||||
- Visibility: Public
|
||||
- Do not auto-add README/License (already present)
|
||||
|
||||
## 4) Push initial release
|
||||
```bash
|
||||
git branch -M main
|
||||
git remote add origin https://github.com/brunoclz/world-transparency-graph.git
|
||||
git push -u origin main
|
||||
```
|
||||
|
||||
## 5) Configure branch protection (GitHub UI)
|
||||
Require all checks:
|
||||
- `API (Python)`
|
||||
- `ETL (Python)`
|
||||
- `Frontend (TypeScript)`
|
||||
- `Neutrality Audit`
|
||||
- `Gitleaks`
|
||||
- `Bandit (Python)`
|
||||
- `Pip Audit (Python deps)`
|
||||
- `Public Privacy Gate`
|
||||
|
||||
## 6) Configure environment defaults
|
||||
- Set public deployment environment vars:
|
||||
- `PUBLIC_MODE=true`
|
||||
- `PUBLIC_ALLOW_PERSON=false`
|
||||
- `PUBLIC_ALLOW_ENTITY_LOOKUP=false`
|
||||
- `PUBLIC_ALLOW_INVESTIGATIONS=false`
|
||||
|
||||
## 7) Final checks before launch
|
||||
- `python scripts/check_public_privacy.py --repo-root .` => `PASS`
|
||||
- Confirm no internal runbooks in public repo
|
||||
- Confirm demo data is synthetic (`data/demo/synthetic_graph.json`)
|
||||
|
||||
## 8) Launch communication split
|
||||
- Publish product announcement as **WTG**
|
||||
- Publish movement announcement as **BRCC**
|
||||
- Mention methodology limits and non-accusatory policy
|
||||
78
scripts/check_public_privacy.py
Executable file
78
scripts/check_public_privacy.py
Executable file
@@ -0,0 +1,78 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Public-surface privacy gate checks for WTG open release."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import re
|
||||
from pathlib import Path
|
||||
|
||||
CPF_RAW_RE = re.compile(r"(?<!\d)\d{11}(?!\d)")
|
||||
CPF_FMT_RE = re.compile(r"\d{3}\.\d{3}\.\d{3}-\d{2}")
|
||||
FORBIDDEN_IN_PUBLIC_QUERIES = (
|
||||
":Person",
|
||||
":Partner",
|
||||
".cpf",
|
||||
"doc_partial",
|
||||
"doc_raw",
|
||||
)
|
||||
|
||||
|
||||
def check_public_queries(repo_root: Path) -> list[str]:
|
||||
errors: list[str] = []
|
||||
query_dir = repo_root / "api" / "src" / "icarus" / "queries"
|
||||
for path in sorted(query_dir.glob("public_*.cypher")):
|
||||
content = path.read_text(encoding="utf-8")
|
||||
for token in FORBIDDEN_IN_PUBLIC_QUERIES:
|
||||
if token in content:
|
||||
errors.append(f"{path}: forbidden token in public query: {token}")
|
||||
return errors
|
||||
|
||||
|
||||
def check_demo_data(repo_root: Path) -> list[str]:
|
||||
errors: list[str] = []
|
||||
demo_dir = repo_root / "data" / "demo"
|
||||
for path in sorted(demo_dir.glob("*.json")):
|
||||
raw = path.read_text(encoding="utf-8")
|
||||
if CPF_RAW_RE.search(raw) or CPF_FMT_RE.search(raw):
|
||||
errors.append(f"{path}: possible CPF-like value found")
|
||||
payload = json.loads(raw)
|
||||
for node in payload.get("nodes", []):
|
||||
label = str(node.get("type", ""))
|
||||
if label in {"Person", "Partner"}:
|
||||
errors.append(f"{path}: forbidden demo label {label}")
|
||||
props = node.get("properties", {})
|
||||
if isinstance(props, dict):
|
||||
lowered = {str(k).lower() for k in props.keys()}
|
||||
if "cpf" in lowered or "doc_partial" in lowered or "doc_raw" in lowered:
|
||||
errors.append(f"{path}: forbidden personal key in demo node")
|
||||
return errors
|
||||
|
||||
|
||||
def main() -> int:
|
||||
parser = argparse.ArgumentParser(description="Run public privacy checks")
|
||||
parser.add_argument(
|
||||
"--repo-root",
|
||||
default=".",
|
||||
help="Repository root path",
|
||||
)
|
||||
args = parser.parse_args()
|
||||
|
||||
repo_root = Path(args.repo_root).resolve()
|
||||
errors = [
|
||||
*check_public_queries(repo_root),
|
||||
*check_demo_data(repo_root),
|
||||
]
|
||||
if errors:
|
||||
print("FAIL")
|
||||
for error in errors:
|
||||
print(f"- {error}")
|
||||
return 1
|
||||
|
||||
print("PASS")
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
125
scripts/generate_demo_dataset.py
Executable file
125
scripts/generate_demo_dataset.py
Executable file
@@ -0,0 +1,125 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Generate a public-safe synthetic graph dataset for WTG demo."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import json
|
||||
from datetime import UTC, datetime
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
def build_payload() -> dict[str, object]:
|
||||
nodes = [
|
||||
{
|
||||
"id": "company:alpha",
|
||||
"label": "Alpha Infra Ltda",
|
||||
"type": "Company",
|
||||
"properties": {
|
||||
"cnpj": "11.111.111/0001-11",
|
||||
"sector": "infrastructure",
|
||||
"city": "Sao Paulo",
|
||||
},
|
||||
},
|
||||
{
|
||||
"id": "company:beta",
|
||||
"label": "Beta Servicos SA",
|
||||
"type": "Company",
|
||||
"properties": {
|
||||
"cnpj": "22.222.222/0001-22",
|
||||
"sector": "services",
|
||||
"city": "Recife",
|
||||
},
|
||||
},
|
||||
{
|
||||
"id": "contract:001",
|
||||
"label": "Contrato Municipal 001",
|
||||
"type": "Contract",
|
||||
"properties": {
|
||||
"contract_id": "CT-001",
|
||||
"value": 1250000,
|
||||
"date": "2025-11-10",
|
||||
},
|
||||
},
|
||||
{
|
||||
"id": "sanction:001",
|
||||
"label": "Sancao Administrativa 001",
|
||||
"type": "Sanction",
|
||||
"properties": {
|
||||
"sanction_id": "SN-001",
|
||||
"date": "2025-08-12",
|
||||
},
|
||||
},
|
||||
{
|
||||
"id": "finance:001",
|
||||
"label": "Debito Fiscal 001",
|
||||
"type": "Finance",
|
||||
"properties": {
|
||||
"finance_id": "FN-001",
|
||||
"value": 320000,
|
||||
"date": "2025-09-03",
|
||||
},
|
||||
},
|
||||
]
|
||||
|
||||
edges = [
|
||||
{
|
||||
"id": "rel:1",
|
||||
"source": "company:alpha",
|
||||
"target": "contract:001",
|
||||
"type": "VENCEU",
|
||||
"properties": {"source": "synthetic"},
|
||||
},
|
||||
{
|
||||
"id": "rel:2",
|
||||
"source": "company:alpha",
|
||||
"target": "sanction:001",
|
||||
"type": "SANCIONADA",
|
||||
"properties": {"source": "synthetic"},
|
||||
},
|
||||
{
|
||||
"id": "rel:3",
|
||||
"source": "company:alpha",
|
||||
"target": "finance:001",
|
||||
"type": "DEVE",
|
||||
"properties": {"source": "synthetic"},
|
||||
},
|
||||
{
|
||||
"id": "rel:4",
|
||||
"source": "company:alpha",
|
||||
"target": "company:beta",
|
||||
"type": "SOCIO_DE",
|
||||
"properties": {"source": "synthetic", "note": "company-level relation"},
|
||||
},
|
||||
]
|
||||
|
||||
return {
|
||||
"meta": {
|
||||
"generated_at_utc": datetime.now(UTC).isoformat(),
|
||||
"generator_version": "1.0.0",
|
||||
"source": "synthetic",
|
||||
},
|
||||
"nodes": nodes,
|
||||
"edges": edges,
|
||||
}
|
||||
|
||||
|
||||
def main() -> int:
|
||||
parser = argparse.ArgumentParser(description="Generate synthetic graph demo dataset")
|
||||
parser.add_argument(
|
||||
"--output",
|
||||
default="data/demo/synthetic_graph.json",
|
||||
help="Output path for synthetic dataset",
|
||||
)
|
||||
args = parser.parse_args()
|
||||
|
||||
output_path = Path(args.output)
|
||||
output_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
payload = build_payload()
|
||||
output_path.write_text(json.dumps(payload, ensure_ascii=True, indent=2) + "\n", encoding="utf-8")
|
||||
print(f"Wrote synthetic dataset to {output_path}")
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
80
scripts/prepare_public_snapshot.sh
Executable file
80
scripts/prepare_public_snapshot.sh
Executable file
@@ -0,0 +1,80 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
SRC_ROOT="${1:-$(pwd)}"
|
||||
OUT_DIR="${2:-/tmp/world-transparency-graph-public-$(date +%Y%m%d_%H%M%S)}"
|
||||
|
||||
mkdir -p "$OUT_DIR"
|
||||
|
||||
# Include only intended public edition directories/files
|
||||
rsync -a \
|
||||
--exclude='**/.venv/***' \
|
||||
--exclude='**/__pycache__/***' \
|
||||
--exclude='**/.pytest_cache/***' \
|
||||
--exclude='**/.mypy_cache/***' \
|
||||
--exclude='**/.ruff_cache/***' \
|
||||
--exclude='frontend/node_modules/***' \
|
||||
--exclude='etl/data/***' \
|
||||
--exclude='.env' \
|
||||
--exclude='api/.env' \
|
||||
--exclude='etl/.env' \
|
||||
--exclude='frontend/.env' \
|
||||
--exclude='**/dist/***' \
|
||||
--exclude='**/build/***' \
|
||||
--exclude='**/*.pyc' \
|
||||
--include='api/' \
|
||||
--include='api/***' \
|
||||
--include='etl/' \
|
||||
--include='etl/***' \
|
||||
--include='frontend/' \
|
||||
--include='frontend/***' \
|
||||
--include='infra/' \
|
||||
--include='infra/***' \
|
||||
--include='docs/' \
|
||||
--include='docs/brand/' \
|
||||
--include='docs/brand/***' \
|
||||
--include='docs/demo/' \
|
||||
--include='docs/demo/***' \
|
||||
--include='docs/legal/' \
|
||||
--include='docs/legal/***' \
|
||||
--include='docs/release/' \
|
||||
--include='docs/release/public_boundary_matrix.csv' \
|
||||
--include='docs/release/public_endpoint_matrix.md' \
|
||||
--include='docs/release/public_repo_release_checklist.md' \
|
||||
--include='docs/data-sources.md' \
|
||||
--include='docs/source_registry_br_v1.csv' \
|
||||
--include='docs/source_onboarding_contract.md' \
|
||||
--include='.github/' \
|
||||
--include='.github/***' \
|
||||
--include='scripts/' \
|
||||
--include='scripts/check_public_privacy.py' \
|
||||
--include='scripts/generate_demo_dataset.py' \
|
||||
--include='scripts/link_persons.cypher' \
|
||||
--include='scripts/prepare_public_snapshot.sh' \
|
||||
--include='data/' \
|
||||
--include='data/demo/' \
|
||||
--include='data/demo/***' \
|
||||
--include='README.md' \
|
||||
--include='LICENSE' \
|
||||
--include='.env.example' \
|
||||
--include='.gitignore' \
|
||||
--include='.gitleaksignore' \
|
||||
--exclude='*' \
|
||||
"$SRC_ROOT/" "$OUT_DIR/"
|
||||
|
||||
# Explicit removals for internal-only artifacts
|
||||
rm -f "$OUT_DIR/CLAUDE.md"
|
||||
rm -f "$OUT_DIR/.mcp.json"
|
||||
rm -f "$OUT_DIR/docs/shadow_rollout_runbook.md"
|
||||
rm -f "$OUT_DIR/docs/ingestion_priority_runbook.md"
|
||||
rm -f "$OUT_DIR/docs/ops/storage_operations.md"
|
||||
rm -f "$OUT_DIR/scripts/auto_finalize_pncp_backfill.sh"
|
||||
rm -rf "$OUT_DIR/audit-results"
|
||||
|
||||
# Ensure demo data exists
|
||||
python3 "$OUT_DIR/scripts/generate_demo_dataset.py" --output "$OUT_DIR/data/demo/synthetic_graph.json" >/dev/null
|
||||
|
||||
# Run public privacy gate on generated snapshot
|
||||
python3 "$OUT_DIR/scripts/check_public_privacy.py" --repo-root "$OUT_DIR"
|
||||
|
||||
printf 'Public snapshot prepared at: %s\n' "$OUT_DIR"
|
||||
Reference in New Issue
Block a user