150 Commits

Author SHA1 Message Date
bruno cesar
316703b2ac Add 4 new pipelines (TSE Bens/Filiados, CEPIM, BCB) + shared updates
- TSE Bens: candidate declared assets from BigQuery (DeclaredAsset nodes, DECLAROU_BEM rels)
- TSE Filiados: party membership from BigQuery (PartyMembership nodes, FILIADO_A rels)
- CEPIM: barred NGOs from Portal da Transparência (BarredNGO nodes, IMPEDIDA rels)
- BCB: central bank penalties (BCBPenalty nodes, BCB_PENALIZADA rels)
- Updated init.cypher, meta_stats, meta.py, tokens, graphConstants, graphExplorer, i18n
- 36 pipelines registered in runner.py, 1133 tests green

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 04:10:00 -03:00
bruno cesar
63ecc28105 Add 8 new pipelines (CPGF/Viagens/SIOP/PNCP/DOU/CVM Funds/Renuncias/SICONFI) + shared updates
Teams C/D/E: CPGF govt credit cards, Viagens govt travel, SIOP detailed amendments,
PNCP all-level procurement (REST API), DOU rewrite (IN XML), CVM Funds ownership,
Renuncias Fiscais tax waivers, SICONFI municipal finances. Each with download script,
tests, fixtures. Updated init.cypher (4 constraints, 11 indexes), meta API (31 sources),
frontend (tokens, icons, i18n). Runner now has 32 pipelines.

1034 tests green (190 API + 690 ETL + 154 frontend).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 03:48:02 -03:00
bruno cesar
c2353efcbe Add 5 new pipelines (PEP/CEAF/Leniency/OFAC/Holdings) + centralize deadlock retry + data catalog
Team A pipelines: CGU PEP registry, CEAF expelled servants, Leniência agreements,
OFAC SDN sanctions, Brasil.IO company holdings. Each with download script, tests,
fixtures. Promoted deadlock retry from 4 pipelines into Neo4jBatchLoader.
Created docs/data-sources.md master catalog (85+ sources). Updated init.cypher
(12 constraints, 15 indexes), meta API (24 sources), frontend (tokens, icons, i18n).

798 tests green (190 API + 454 ETL + 154 frontend).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 03:35:38 -03:00
bruno cesar
29a1502e8e Fix orphaned data: Câmara deputy_id fallback, Senado CPF enrichment, GlobalPEP name matching
Câmara: deputies without CPF now linked via deputy_id (~10K recovered).
Senado: senator lookup from Dados Abertos API enables CPF-first GASTOU +
name fallback without CANDIDATO_EM filter (~200K recoverable).
GlobalPEP: post-load Cypher script for 2-phase exact name matching.
Also: OpenRouter MCP config, triple-AI consensus skill.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 01:27:32 -03:00
bruno cesar
7e4383dc7e Fix servidores pipeline (hash IDs for masked CPFs) + load 774K servidores + bug fixes
Servidores pipeline redesign: LGPD-masked CPFs (6-digit partials) caused
136K merge collisions when used as Person/PublicOffice keys. Now uses
SHA-256 hash IDs (servidor_id, office_id) as merge keys, keeping
cpf_partial as a property for entity resolution matching.

Production results: 635K PublicOffice, 632K Person, 36K SAME_AS links.

Also fixes: deputy_supplier_loop.cypher path (DOOU→Person→Election),
senado.py GASTOU relationship creation, MCP server docs in CLAUDE.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 01:01:11 -03:00
bruno cesar
dc4b36f00a Fix TSE pipeline: MERGE Person by CPF instead of sq_candidato
Same person can run in multiple years with different sq_candidato IDs
but same CPF. Merging by sq_candidato violates the Person.cpf uniqueness
constraint. Now merges by CPF when available, sq_candidato as fallback.
Uses coalesce() in CANDIDATO_EM/DOOU queries to find persons by either key.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 22:32:32 -03:00
bruno cesar
3b415e883c Fix CVM/Senado/Camara pipeline data format issues
CVM: Portal restructured URLs (PAS→PROCESSO/SANCIONADOR). New format
uses NUP as ID, semicolon-delimited, latin-1 encoding, ZIP archive.
Pipeline now uses name-based matching (no CPF/CNPJ in new data).

Senado: CSVs have metadata row on line 1 — add skiprows=1.
Camara: CSVs use UTF-8 BOM — switch from latin-1 to utf-8-sig.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 18:28:09 -03:00
bruno cesar
00776d6399 Add 5 new pipelines (ICIJ, OpenSanctions, CVM, Câmara, Senado) + 5 pattern queries
New ETL pipelines:
- ICIJ OffshoreLeaks: OffshoreEntity/OffshoreOfficer nodes, OFFICER_OF/INTERMEDIARY_OF rels
- OpenSanctions: GlobalPEP nodes, GLOBAL_PEP_MATCH rels (FtM JSONL parser)
- CVM: CVMProceeding nodes, CVM_SANCIONADA rels, penalty value parsing
- Câmara: Expense nodes (CEAP), GASTOU/FORNECEU rels, deputy/supplier links
- Senado: Expense nodes (CEAPS), FORNECEU rels

New pattern queries:
- offshore_connection, deputy_supplier_loop, cvm_sanctioned_receiving
- global_pep_contracts, legislator_supplier_loop

Schema: 5 constraints, 10 indexes, fulltext expanded to 14 labels
Runner: 19 pipelines registered. Meta: 19 data sources.
Frontend: 5 entity types, 7 relationship types (i18n + tokens + graph)
142 new ETL tests (673 total, all green)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 18:15:52 -03:00
bruno cesar
271671ff3a Fix 9 query-schema alignment bugs + 3 AI-review findings
Entity ID alignment: add 6 missing ID fields (cnes_code, finance_id,
embargo_id, school_id, convenio_id, stats_id) to entity_by_id and all
6 investigation Cypher queries (WHERE + coalesce chains).

entity_by_element_id: add missing PublicOffice label.

pattern_self_dealing: fix Amendment field reads with dual-source
coalesce fallbacks (TransfereGov + Transparencia).

init.cypher + schema_init.cypher: replace dead indexes
(amendment_object→function, amendment_date→value_committed,
convenio_date→date_published), expand fulltext index to 9 node types
with 11 search fields including n.function.

seed-dev.cypher: fix all property names (id→contract_id/sanction_id,
value→valor, PublicOffice id→cpf), add Amendment node, fix
AUTOR_EMENDA target to Amendment.

search.py: add name extraction for Contract/Amendment/Convenio/Embargo
/PublicOffice types in search results.

21 new tests, 570 total green. Triple-AI validated (Claude + Codex).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 10:31:30 -03:00
bruno cesar
a4967ae1da Harden remaining elementId() queries + search + pattern error handling
- graph_expand, entity_connections, entity_timeline, entity_score, node_degree:
  add entity label allowlist to all elementId() lookups (same IDOR class as SEC-01)
- search.cypher: exclude User/Investigation/Annotation/Tag from full-text results
- pattern_service.py: separate TimeoutError from unexpected exceptions for visibility

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 03:51:18 -03:00
bruno cesar
8e73684a9a Fix validation findings: deploy.sh dry-run compat, ORDER BY alias, loader key sanitization
- deploy.sh: defer DOMAIN check until after arg parsing (dry-run works without DOMAIN)
- pattern_donation_amendment_loop: ORDER BY amendment_value (was a.value, null for new fields)
- loader.py: reject non-identifier keys to prevent Cypher injection from malformed data

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 03:49:03 -03:00
bruno cesar
dd6e1a041f Add 64 tests: coverage for datasus, dou, IDOR prevention, pages, hooks
API (+10): Cypher label-filter integrity tests, investigation coalesce chain,
  pattern parameter binding, patrimony div-by-zero guard
ETL (+27): datasus pipeline (11), dou pipeline (16) with fixtures
Frontend (+27): 7 page smoke tests (Landing, Register, Investigations,
  Baseline, GraphExplorer, SharedInvestigation, EntityAnalysis) + useGraphData hook

Total: 559 tests (179 API + 226 ETL + 154 Frontend)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 03:36:31 -03:00
bruno cesar
ce1da72c41 Fix ETL + infra bugs: loader key union, datasus normalization, sanctions NULL dates, deploy.sh DOMAIN
- loader.py: build SET from union of all row keys (not just first row)
- datasus.py: normalize atende_sus to '1'/'0' (was storing raw 'SIM'/'NAO')
- sanctions.py: store None for missing date_end (was empty string, broke IS NULL check)
- deploy.sh: fail fast if DOMAIN env var is not set

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 03:30:58 -03:00
bruno cesar
e38004b682 Fix IDOR: add entity label allowlist to all elementId() Cypher branches
- entity_by_id.cypher: block User/Investigation node exposure via elementId
- investigation_add_entity.cypher: prevent linking internal nodes to investigations
- investigation_remove_entity.cypher: same label filter on remove path
- investigation_update.cypher: fix coalesce chain (was missing contract_id, sanction_id, amendment_id)

Confirmed by 3 independent AI audits (Claude, Codex, Gemini).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 03:30:42 -03:00
bruno cesar
d475b982d4 Add Phase 14 ETL pipelines (10), tests, fixtures, and utility scripts
10 new pipelines: bndes, comprasnet, datasus, dou, ibama, inep, pgfn, rais, tcu, transferegov
Date formatting transform, test fixtures for all sources
Download scripts (comprasnet, datasus, dou), audit tool, graph viz doc

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 03:15:07 -03:00
bruno cesar
9d86d3bc64 Update .gitignore: exclude PNGs, .playwright-mcp, .claude/plans, data dirs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 03:14:47 -03:00
bruno cesar
73be3bc29b Add 5 new pattern queries: donation-amendment loop, amendment-beneficiary contracts, debtor-health operator, sanctioned-health operator, shell-company contracts
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 03:14:35 -03:00
bruno cesar
e3ebbe6d76 Servidor entity matching: bypass LGPD-masked CPFs via partial CPF + name
Servidores have LGPD-masked CPFs (only 6 middle digits visible). This adds
two-layer SAME_AS matching to link 739K servidores to TSE/CNPJ persons:

- Phase 0: pre-compute cpf_middle6 on existing full-CPF Person nodes
- Phase 4: partial CPF + exact name match (confidence 0.95)
- Phase 5: unique name-only match for classified servidores (confidence 0.85)

Integration tests against real Neo4j caught and fixed a Cypher bug: MERGE
cannot use list index (targets[0]) directly — needs WITH alias first.

Also: make link-persons target, cpf_middle6/cpf_partial indexes,
testcontainers conftest fix, neutrality fix in value_sanitization.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 02:46:51 -03:00
bruno cesar
b01ebd74e3 Historical data gaps: TSE 2002-2024, servidores, search crash fix
Download scripts rewritten for all TSE election years (2002-2024):
- 3 donation URL patterns (pre-2012, 2012-2014, 2018+)
- 3 column format eras (early/legacy/new) with auto-detection
- ReceitaCandidato.csv file discovery for 2002-2006 nested ZIPs
Data: 5.87M candidates, 28.7M donations, 774K servidores downloaded.

Search crash fix (React error #31): sanitize_props() converts Neo4j
complex types (lists, dicts, temporals) to JSON-safe scalars in
entity, search, and graph routers. Defense-in-depth String() wrapping
in EntityDetail.tsx.

Transparencia download_transparencia.py: fixed column name casing
for Id_SERVIDOR_PORTAL (was uppercase).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 02:07:20 -03:00
bruno cesar
e031dc3f18 Phase 14b: Expand queries, landing, and frontend for 15 node/rel types
Cypher queries updated for all 15 node labels and 15 relationship types
(graph_expand, entity_connections, entity_timeline, entity_score,
schema_init). 3 new patterns: debtor_contracts, embargoed_receiving,
loan_debtor. Landing redesign with HeroGraph, FeatureIcons, typewriter
hook, 13 data sources. Frontend: 5 new data colors, 13 rel types in
graph store, i18n for new entity/relationship types. Schema: 5 new
uniqueness constraints + 3 new indexes. 473 tests green.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 02:07:02 -03:00
bruno cesar
2b88f9345d Fix graph CPU burn: memoize canvas callbacks, pause animation on settle + unmount
RC3: nodeCanvasObject + nodeCanvasObjectMode wrapped in useCallback — prevents
ForceGraph2D from re-initializing render pipeline on every React render.
RC4: pauseAnimation() on engine stop (halts RAF loop after layout settles) and
on unmount (prevents CPU burn after navigating away). Simulation tuned:
cooldownTime=4000, d3AlphaDecay=0.03, d3VelocityDecay=0.5 for faster settling.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 17:11:56 -03:00
bruno cesar
fb7395d8ba Expand queries, schema, patterns, and frontend for 15 node/rel types post Phase 14 data load
Cypher: graph_expand, entity_connections, entity_timeline, entity_score, meta_stats, search
all handle Finance, Embargo, Education, Convenio, LaborStats, Health, Amendment nodes.
Schema: 5 uniqueness constraints + 3 indexes for new node types.
3 new patterns: debtor_contracts, embargoed_receiving, loan_debtor (8 total).
Frontend: 5 dataColors, 6 relColors, 5 SVG icons, 12 entity + 13 rel types in store/i18n.
Landing: 13 data sources, updated stats. Meta: 14 counts + 14 sources.
ETL: CNPJ pipeline rewrite (MERGE-safe, CNAE/address fields), runner registers 14 pipelines.
Docs: spec + context updated. 468 tests green, neutrality pass.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 17:10:01 -03:00
bruno cesar
f182fb7300 Phase 2 UI overhaul: Entity Analysis dossier, graph polish, exposure optimization
- Entity Analysis page with 4-tab layout (Graph, Connections, Timeline, Export)
- Exposure index with heuristic percentile functions (0.2s vs 21s peer sampling)
- Graph canvas: ResizeObserver sizing, d3 force tuning, relationship-colored edges,
  directional arrows, node glow effects, grid background, auto-fit on load
- Dark HUD tooltip with graph2ScreenCoords positioning fix
- New components: ScoreRing, InsightsPanel, ConnectionsList, TimelineView,
  EntityHeader, AnalysisNav, GraphToolbar, ZoomControls, GraphLegend, GraphMinimap,
  ContextMenu, NodeTooltip, CommandPalette, Button, Skeleton, StatusBar, Toast
- Landing page, Register page, Dashboard page, PublicShell
- Zustand stores: graphExplorer, toast
- Custom node/edge canvas rendering with LOD, icons, connection badges
- IBM Plex Mono/Sans self-hosted fonts
- Design tokens: node colors, relationship colors, z-index scale
- 40+ i18n keys (PT-BR + EN), keyboard shortcuts hook, command palette
- All 145 API tests + 127 frontend tests passing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 05:22:42 -03:00
bruno cesar
71c2cb90f7 Add Entity Analysis dossier page with exposure index, insights, timeline, connections views
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 04:26:13 -03:00
bruno cesar
fa7fd23deb Frontend fixes: investigation 401, navigation redesign, route compat, light mode, i18n
- Fix listInvestigations trailing slash (307 redirect -> 401)
- Add ExposureFactor, ExposureResponse, TimelineEvent, TimelineResponse, HealthResponse interfaces
- Add getEntityExposure, getEntityTimeline, getHealthStatus API functions
- Simplify nav to 3 items (Dashboard, Search, Investigations)
- Add /app/analysis/:entityId route with lazy-loaded EntityAnalysis placeholder
- Add /app/graph/:entityId -> /app/analysis/:entityId redirect
- Update all /app/graph/ references to /app/analysis/ (SearchResults, Dashboard, Patterns)
- Add light mode CSS variables with [data-theme="light"] selector
- Add theme toggle (Sun/Moon) in sidebar, persisted to localStorage
- StatusBar polls /api/v1/meta/health every 30s for connectivity status
- Fix keyboard shortcuts: allow Cmd+K in input/textarea fields
- Add title attrs to collapsed ControlsSidebar icons
- Fix ControlsSidebar label truncation (overflow: visible)
- Add favicon.svg
- Add error toast on Dashboard search failure
- Add aria-label to logout button
- Add 60+ i18n keys (analysis.*, nav.theme*) in PT-BR and EN

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 04:22:57 -03:00
bruno cesar
c8f4b6762e Add exposure index + timeline endpoints, fix investigation router prefix
- Investigation router: prefix-based routing (relative paths), shared route on separate mini-router
- FastAPI: redirect_slashes=False safety net
- Schema: added Sanction.date_start and Amendment.date indexes
- New models: ExposureFactor, ExposureResponse, TimelineEvent, TimelineResponse
- New Cypher: entity_score, entity_score_peers, entity_timeline (cursor-paginated)
- New service: score_service.py with weighted exposure index (connections/sources/financial/patterns/baseline)
- New endpoints: GET /entity/{id}/exposure, GET /entity/{id}/timeline
- Fix: meta.py dict type annotations (pre-existing mypy errors)
- Tests: 17 new tests (score_service + entity_timeline), all 143 pass

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 04:22:21 -03:00
bruno cesar
f2a31c6c40 Fix 5 audit concerns: entity_ids, orphan cleanup, depth param, frontend types, docs sync
- investigation_update.cypher: collect(e.id) → coalesce(cpf, cnpj, elementId)
- investigation_delete.cypher: cascade delete Tags + Annotations
- entity_connections.cypher + entity.py: wire depth param with variable-length path
- client.ts: types→entity_types, add id/confidence to GraphEdge interface
- GraphCanvas.tsx: read confidence from root instead of properties
- spec.md: PUT→PATCH, CSV future, last-update NYI, dispute future, 5 patterns
- Rules: test counts (124 API, 103 ETL, 65 frontend), 8 pages

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 01:33:07 -03:00
bruno cesar
c550d017fa Fix 8 audit blockers: IDOR, graph leaks, CPF masking, format normalization, frontend types, pattern query
Security:
- entity_by_element_id: label allowlist prevents IDOR on private nodes
- graph_expand/entity_connections: restrict rel types + exclude User/Investigation/Annotation/Tag
- main.py: log critical warning on weak/default JWT secret at startup
- neo4j_service: schema bootstrap no longer drops comment-prefixed statements

Data integrity:
- entity_lookup.cypher: dual-format CPF/CNPJ matching (digits-only + punctuated)
- entity.py: format helpers normalize input before lookup
- cpf_masking.py: public mask functions for reuse outside middleware
- investigation.py: explicit CPF masking in PDF export

Frontend:
- client.ts: EntityDetail interface aligned with backend (removed root name/document, added is_pep)
- EntityDetail.tsx: derive display name/document from properties dict

Pattern logic:
- pattern_contract_concentration: compute municipality total before entity filter

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 01:23:19 -03:00
bruno cesar
03356fe5ce Fix 6 Codex xhigh blockers: pattern Cypher, schema bootstrap, rate limits, auth, entity IDs, frontend types
WS1: Fix donation-contract pattern — reverse DOOU path direction
     (Company→Person not Company→Election), d.value→d.valor
WS2: Schema bootstrap — copy init.cypher to queries/, add ensure_schema()
     called in lifespan. All IF NOT EXISTS, idempotent.
WS3: Rate limiting — add default_limits to Limiter, per-endpoint decorators
     on auth (10/min), search (30/min), patterns (30/min)
WS4: Investigation ownership — GET investigation/annotations/tags now require
     CurrentUser, Cypher matches User→OWNS→Investigation. Exports pass user_id.
WS5: Entity ID standardization — search/graph return document_id via coalesce,
     investigation queries match on cpf/cnpj/contract_id/sanction_id/amendment_id
     OR elementId. New /by-element-id endpoint. Frontend EntityDetail auto-routes.
WS6: Frontend SourceAttribution — sources typed as {database} objects not strings,
     components use source.database, test mocks updated.

292 tests green (124 API + 103 ETL + 65 frontend), neutrality clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 23:50:28 -03:00
bruno cesar
d91eb2cd6d Fix CNPJ format mismatch — TSE raw digits, Transparencia short values
TSE pipeline stored company donor CNPJs as raw 14-digit strings while all
other pipelines used formatted XX.XXX.XXX/XXXX-XX. This caused ~17K duplicate
Company nodes and broke cross-source MERGE operations. Transparencia accepted
short/invalid CNPJs (e.g. "11"), creating 4,427 garbage Company nodes.

- tse.py: format_cnpj(donor_doc) instead of storing raw digits
- transparencia.py: reject CNPJ with len != 14 digits (covers empty + short)
- Tests: assert formatted CNPJ for company donors, add short-CNPJ skip test
- Neo4j migration: merged 937 duplicates, reformatted 16,360 in place, deleted
  4,427 garbage nodes (53,654,691 → 53,649,327 Company nodes)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 21:25:46 -03:00
bruno cesar
15f17cd821 Fix ETL field mapping: razao_social in TSE/sanctions, contract_id validation
- TSE pipeline: add razao_social to company donor nodes (was only name)
- Sanctions pipeline: add razao_social to Company nodes (not Person)
- Transparencia pipeline: skip contracts with empty CNPJ digits
- API search/graph: fallback to name when razao_social missing
- Root cause: 24.6K Company nodes from TSE/sanctions had null razao_social
  because those pipelines only set name, not razao_social
- Add 4 tests for razao_social mapping and contract_id validation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 20:32:12 -03:00
bruno cesar
dfd1a7da20 Fix pattern queries, add indexes, harden ETL loader
- Fix property mismatches in pattern_sanctioned_receiving (c.date, s.date_start/end)
- Redesign pattern_donation_contract (remove impossible AUTOR_EMENDA→Contract path)
- Fix pattern_self_dealing path through Amendment instead of direct Contract
- Add 3 indexes: Company.cnae_principal, Contract.contracting_org, Contract.date
- Guard loader.load_nodes/load_relationships against null/empty keys
- Apply format_cpf to CNPJ socios transforms (RF + simple formats)
- Add 3 tests for null guards and CPF formatting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 20:20:17 -03:00
bruno cesar
04d84d76c1 Add --start-phase flag to streaming ETL for phase resumption
Allows skipping completed phases when restarting a streaming load,
avoiding redundant MERGE operations on already-loaded data.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 20:04:24 -03:00
bruno cesar
13fc81b8cf Harden production deployment — memory tuning, backups, monitoring
- .env.example: document Neo4j memory settings for 40M+ node production
- docker-compose.prod.yml: remove misleading VITE_API_URL runtime env
  (Vite bakes env at build time; Caddy proxies relative paths correctly)
- deploy.sh: health check through Caddy (HTTPS) instead of direct API port
- deploy.yml: pin appleboy/ssh-action to commit hash (supply-chain safety)
- backup-cron.sh: installer for daily Neo4j dump backup at 03:00 UTC
- snapshot-volume.sh: Hetzner Cloud volume snapshot via hcloud CLI
- healthcheck-cron.sh: uptime monitor every 5 min with webhook alerts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 14:08:17 -03:00
bruno cesar
5846e789a9 Add Phase 10: data loading, investigation CRUD, baseline/edge/shared pages
Data loading infrastructure:
- Download scripts for TSE, Transparencia, Sanctions (shared _download_utils)
- TSE pipeline: sq_candidato support, company donor handling
- Transparencia pipeline: emenda/contrato extraction fixes
- Neo4jBatchLoader enhancements, runner pipeline ordering

Investigation CRUD completeness:
- Delete annotation/tag/entity Cypher queries (existence-check pattern)
- Investigation service: delete_annotation, delete_tag, remove_entity
- Router: full CRUD endpoints + auth guard on export + filename sanitization
- Frontend store: delete actions for annotations, tags, entities

New frontend features:
- Baseline comparison page (useBaseline hook, route, i18n)
- EdgeDetail panel for graph edge inspection (money-proportional width)
- SharedInvestigation page (public share token access)
- +27 tests (Search, Patterns, SearchResults, GraphCanvas, EntityDetail)

Bug fixes:
- Cypher delete queries: return literal 1 instead of count-after-delete
- createInvestigation: trailing slash to avoid 307 redirect
- apiFetch: handle 204 No Content without parsing body
- export_investigation: add auth guard, sanitize Content-Disposition filename

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 14:06:37 -03:00
bruno cesar
3ebf4e4d2d Fix BQ download: use Storage Read API, correct output path
list_rows() + selected_fields avoids temp tables and storage quota.
Parallel downloads (one per table). Fix default output to ../data/.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 11:42:01 -03:00
bruno cesar
1bb538bb5f Add BigQuery streaming for full CNPJ dataset
RF server unreachable — use Base dos Dados BQ mirror instead.
New download script streams BQ → CSV page-by-page (no OOM).
run_streaming() auto-detects RF vs BQ format, reuses same transforms.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 06:54:51 -03:00
bruno cesar
acdc8c297a Add PDF export for investigations — weasyprint + Jinja2 template
New endpoint GET /api/v1/investigations/{id}/export/pdf with lang param.
Jinja2 A4 template, weasyprint rendering, lazy import for test compat.
Frontend: export button + API client method + i18n keys. 6 new tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 05:42:31 -03:00
bruno cesar
18cbee1ab7 Add frontend auth tests — auth store (10) + Login page (8)
Covers login/register/logout/restore flows in auth store and
Login page rendering, form submission, error display, navigation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 05:42:24 -03:00
bruno cesar
5abe290b04 Phase 7: Frontend auth integration — login page, auth store, error boundary
Auth store (Zustand): login/register/logout/restore, localStorage persistence,
auto-injects Bearer token via api/client.ts on all requests.

Login page: email/password form with register toggle and invite code field.
Protected routes: /investigations requires auth, redirects to /login.

AppShell: shows user email + logout when authenticated, login link when not.
ErrorBoundary: React error boundary wrapping entire app in main.tsx.

i18n: auth keys added for PT-BR + EN.

Also fixes: ETL pandas-stubs for mypy, sanctions.py no-any-return,
ETL integration conftest graceful testcontainers import.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 05:04:21 -03:00
bruno cesar
393e7dc3f0 Phase 6: Auth, integration tests, deployment, ETL rewrite, frontend polish
Auth: JWT auth with python-jose + passlib, invite-code registration,
user model + 3 Cypher queries, auth router, owner-scoped investigations.
Rate limiting: slowapi on auth endpoints.

Integration tests: testcontainers-based tests for entity, graph, search.

Deployment: docker-compose.prod.yml, Caddyfile, backup + deploy scripts,
GitHub Actions deploy workflow, deploy docs.

ETL rewrite: CNPJ pipeline handles real Receita Federal CSV layout (37 cols),
chunked file reading, proper field mapping. Download + explore scripts.
Test fixtures with real CSV samples.

Frontend polish: Spinner component, responsive CSS improvements across
all pages, better navigation, visual refinements.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 04:59:39 -03:00
bruno cesar
f5f825c8bd Phase 5: Polish — security fixes, code review fixes, CI, README
Security: constrain tag entity match, mask password in seed script,
enforce graph depth + LIMIT 500, shared PEP_ROLES constant.
Code quality: fix SearchResponse field mismatch, PATCH vs PUT,
addEntity URL, replace assert with RuntimeError, extract inline
Cypher, add model field length limits, fix i18n in Zustand store,
neutrality fix in API description.
Infra: GitHub Actions CI (api, etl, frontend, neutrality audit).
Docs: bilingual README (PT-BR + EN).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 03:52:59 -03:00
bruno cesar
36c8b0d2f8 Phase 4: Investigation workspace — CRUD, annotations, tags, sharing, export
Backend: 13 Cypher queries, investigation service with 12 async ops,
router with 13 endpoints (CRUD, annotations, tags, share tokens, export).
Frontend: InvestigationPanel, InvestigationDetail (inline editing),
AnnotationEditor, TagManager (color picker), Timeline. Zustand store
with full state management. Bilingual i18n. 79 API + 20 frontend tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 03:43:30 -03:00
bruno cesar
87f46341a4 Phase 3: ETL pipelines — transforms, loader, 4 data pipelines, entity resolution, seed
Transforms: name normalization, CPF/CNPJ formatting+validation, deduplication.
Neo4j batch loader with UNWIND batching (10K default).
Pipelines: CNPJ (Receita Federal), TSE (elections+donations),
Transparência (contracts+salaries+amendments), Sanctions (CEIS/CNEP).
Entity resolution: splink 4 config for Person matching (optional dep).
Dev seed: fixture graph exercising all 5 analysis patterns.
63 ETL tests passing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 03:38:12 -03:00
bruno cesar
1c6ca39050 Phase 2 complete: baseline comparison + frontend pattern components
Baseline service compares entities against CNAE sector and regional
peers. Frontend adds PatternCard, PatternResultCard, Patterns page
with bilingual i18n and route integration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 03:31:27 -03:00
bruno cesar
c6390bed05 Phase 2: Pattern analysis — 5 MVP corruption detection patterns
- Self-dealing amendments (p01): politician → family company → contract
- Patrimony incompatibility (p05): declared assets vs family company capital
- Sanctioned still receiving (p06): CEIS/CNEP companies winning contracts
- Donation-contract loop (p10): donate to campaign → win contracts
- Municipal contract concentration (p12): disproportionate contract share
- Pattern service: run single/all patterns with entity scoping
- Patterns router: GET /patterns/{entity_id}, GET /patterns/{entity_id}/{name}
- Bilingual metadata (PT-BR + EN), neutrality-checked
- 5 .cypher files, all parameterized
- 64 tests passing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 03:25:30 -03:00
bruno cesar
5cd0647eec Phase 1: Graph explorer — force-directed viz, controls, entity detail
- GraphCanvas: react-force-graph-2d with entity-colored nodes, confidence-based edges
- GraphControls: depth slider (1-4), entity type toggles with colored borders
- EntityDetail: side panel with properties, source badges, type coloring
- useGraphData hook: fetches/caches graph data from API
- GraphExplorer page: wires canvas + controls + detail panel
- i18n: added graph, entity type translations (PT-BR + EN)
- All feedback loops pass: eslint, tsc, vitest (12 tests)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 03:23:32 -03:00
bruno cesar
b468577f2b Phase 1: Frontend — design system, common components, search page
- Design tokens: entity type colors, spacing, typography as TS constants
- MoneyLabel: R$ formatting (Brazilian Real with locale)
- SourceBadge: data source colored badges
- ConfidenceBadge: solid/dashed visual for match confidence
- Disclaimer: neutral data disclaimer via i18n
- SearchBar: input with type filter dropdown
- SearchResults: result list with entity colors and source badges
- API client: typed fetch wrapper with search/entity/graph endpoints
- i18n: expanded PT-BR and EN translations
- 12 vitest tests passing (lint + type-check + vitest clean)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 03:21:30 -03:00
bruno cesar
0dd953898c Phase 1: API core — all endpoints, query service, CPF masking
- Neo4j query service: CypherLoader + parameterized executor
- Entity endpoints: /entity/{cpf_or_cnpj} lookup + /entity/{id}/connections
- Search endpoint: /search with fulltext index, pagination, type filtering
- Graph endpoint: /graph/{entity_id} with depth/type filtering, nodes + edges
- CPF masking middleware: scans responses, masks non-PEP CPFs, preserves CNPJ
- Pydantic models: EntityResponse, SearchResponse, GraphResponse with source attribution
- 5 .cypher query files (never inline Cypher)
- 58 unit tests passing (ruff + mypy + pytest clean)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 03:21:15 -03:00
bruno cesar
127a3e6754 Phase 0: Foundation — project skeleton with all three packages
- git init, AGPL-3.0 license, .gitignore, .env.example, Makefile
- CLAUDE.md with neutrality rules, build commands, code style
- Custom agents: security-reviewer, code-reviewer, test-writer
- API: FastAPI skeleton with health check, meta endpoints, Neo4j driver
- ETL: Pipeline base class, CLI runner skeleton, splink + basedosdados deps
- Frontend: Vite + React 19 + React Router 7 + i18n (PT-BR/EN) + Zustand
- Infra: docker-compose.yml (Neo4j + API + Frontend), schema init.cypher
- All feedback loops pass: ruff, mypy, pytest, eslint, tsc, vitest

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 03:16:21 -03:00