br-acc

mirror of https://github.com/kharonsec/br-acc synced 2026-04-25 17:15:02 +02:00

Author	SHA1	Message	Date
bruno cesar	316703b2ac	Add 4 new pipelines (TSE Bens/Filiados, CEPIM, BCB) + shared updates - TSE Bens: candidate declared assets from BigQuery (DeclaredAsset nodes, DECLAROU_BEM rels) - TSE Filiados: party membership from BigQuery (PartyMembership nodes, FILIADO_A rels) - CEPIM: barred NGOs from Portal da Transparência (BarredNGO nodes, IMPEDIDA rels) - BCB: central bank penalties (BCBPenalty nodes, BCB_PENALIZADA rels) - Updated init.cypher, meta_stats, meta.py, tokens, graphConstants, graphExplorer, i18n - 36 pipelines registered in runner.py, 1133 tests green Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 04:10:00 -03:00
bruno cesar	63ecc28105	Add 8 new pipelines (CPGF/Viagens/SIOP/PNCP/DOU/CVM Funds/Renuncias/SICONFI) + shared updates Teams C/D/E: CPGF govt credit cards, Viagens govt travel, SIOP detailed amendments, PNCP all-level procurement (REST API), DOU rewrite (IN XML), CVM Funds ownership, Renuncias Fiscais tax waivers, SICONFI municipal finances. Each with download script, tests, fixtures. Updated init.cypher (4 constraints, 11 indexes), meta API (31 sources), frontend (tokens, icons, i18n). Runner now has 32 pipelines. 1034 tests green (190 API + 690 ETL + 154 frontend). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 03:48:02 -03:00
bruno cesar	c2353efcbe	Add 5 new pipelines (PEP/CEAF/Leniency/OFAC/Holdings) + centralize deadlock retry + data catalog Team A pipelines: CGU PEP registry, CEAF expelled servants, Leniência agreements, OFAC SDN sanctions, Brasil.IO company holdings. Each with download script, tests, fixtures. Promoted deadlock retry from 4 pipelines into Neo4jBatchLoader. Created docs/data-sources.md master catalog (85+ sources). Updated init.cypher (12 constraints, 15 indexes), meta API (24 sources), frontend (tokens, icons, i18n). 798 tests green (190 API + 454 ETL + 154 frontend). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 03:35:38 -03:00
bruno cesar	29a1502e8e	Fix orphaned data: Câmara deputy_id fallback, Senado CPF enrichment, GlobalPEP name matching Câmara: deputies without CPF now linked via deputy_id (~10K recovered). Senado: senator lookup from Dados Abertos API enables CPF-first GASTOU + name fallback without CANDIDATO_EM filter (~200K recoverable). GlobalPEP: post-load Cypher script for 2-phase exact name matching. Also: OpenRouter MCP config, triple-AI consensus skill. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 01:27:32 -03:00
bruno cesar	7e4383dc7e	Fix servidores pipeline (hash IDs for masked CPFs) + load 774K servidores + bug fixes Servidores pipeline redesign: LGPD-masked CPFs (6-digit partials) caused 136K merge collisions when used as Person/PublicOffice keys. Now uses SHA-256 hash IDs (servidor_id, office_id) as merge keys, keeping cpf_partial as a property for entity resolution matching. Production results: 635K PublicOffice, 632K Person, 36K SAME_AS links. Also fixes: deputy_supplier_loop.cypher path (DOOU→Person→Election), senado.py GASTOU relationship creation, MCP server docs in CLAUDE.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 01:01:11 -03:00
bruno cesar	dc4b36f00a	Fix TSE pipeline: MERGE Person by CPF instead of sq_candidato Same person can run in multiple years with different sq_candidato IDs but same CPF. Merging by sq_candidato violates the Person.cpf uniqueness constraint. Now merges by CPF when available, sq_candidato as fallback. Uses coalesce() in CANDIDATO_EM/DOOU queries to find persons by either key. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 22:32:32 -03:00
bruno cesar	3b415e883c	Fix CVM/Senado/Camara pipeline data format issues CVM: Portal restructured URLs (PAS→PROCESSO/SANCIONADOR). New format uses NUP as ID, semicolon-delimited, latin-1 encoding, ZIP archive. Pipeline now uses name-based matching (no CPF/CNPJ in new data). Senado: CSVs have metadata row on line 1 — add skiprows=1. Camara: CSVs use UTF-8 BOM — switch from latin-1 to utf-8-sig. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 18:28:09 -03:00
bruno cesar	00776d6399	Add 5 new pipelines (ICIJ, OpenSanctions, CVM, Câmara, Senado) + 5 pattern queries New ETL pipelines: - ICIJ OffshoreLeaks: OffshoreEntity/OffshoreOfficer nodes, OFFICER_OF/INTERMEDIARY_OF rels - OpenSanctions: GlobalPEP nodes, GLOBAL_PEP_MATCH rels (FtM JSONL parser) - CVM: CVMProceeding nodes, CVM_SANCIONADA rels, penalty value parsing - Câmara: Expense nodes (CEAP), GASTOU/FORNECEU rels, deputy/supplier links - Senado: Expense nodes (CEAPS), FORNECEU rels New pattern queries: - offshore_connection, deputy_supplier_loop, cvm_sanctioned_receiving - global_pep_contracts, legislator_supplier_loop Schema: 5 constraints, 10 indexes, fulltext expanded to 14 labels Runner: 19 pipelines registered. Meta: 19 data sources. Frontend: 5 entity types, 7 relationship types (i18n + tokens + graph) 142 new ETL tests (673 total, all green) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 18:15:52 -03:00
bruno cesar	271671ff3a	Fix 9 query-schema alignment bugs + 3 AI-review findings Entity ID alignment: add 6 missing ID fields (cnes_code, finance_id, embargo_id, school_id, convenio_id, stats_id) to entity_by_id and all 6 investigation Cypher queries (WHERE + coalesce chains). entity_by_element_id: add missing PublicOffice label. pattern_self_dealing: fix Amendment field reads with dual-source coalesce fallbacks (TransfereGov + Transparencia). init.cypher + schema_init.cypher: replace dead indexes (amendment_object→function, amendment_date→value_committed, convenio_date→date_published), expand fulltext index to 9 node types with 11 search fields including n.function. seed-dev.cypher: fix all property names (id→contract_id/sanction_id, value→valor, PublicOffice id→cpf), add Amendment node, fix AUTOR_EMENDA target to Amendment. search.py: add name extraction for Contract/Amendment/Convenio/Embargo /PublicOffice types in search results. 21 new tests, 570 total green. Triple-AI validated (Claude + Codex). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 10:31:30 -03:00
bruno cesar	a4967ae1da	Harden remaining elementId() queries + search + pattern error handling - graph_expand, entity_connections, entity_timeline, entity_score, node_degree: add entity label allowlist to all elementId() lookups (same IDOR class as SEC-01) - search.cypher: exclude User/Investigation/Annotation/Tag from full-text results - pattern_service.py: separate TimeoutError from unexpected exceptions for visibility Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 03:51:18 -03:00
bruno cesar	8e73684a9a	Fix validation findings: deploy.sh dry-run compat, ORDER BY alias, loader key sanitization - deploy.sh: defer DOMAIN check until after arg parsing (dry-run works without DOMAIN) - pattern_donation_amendment_loop: ORDER BY amendment_value (was a.value, null for new fields) - loader.py: reject non-identifier keys to prevent Cypher injection from malformed data Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 03:49:03 -03:00
bruno cesar	dd6e1a041f	Add 64 tests: coverage for datasus, dou, IDOR prevention, pages, hooks API (+10): Cypher label-filter integrity tests, investigation coalesce chain, pattern parameter binding, patrimony div-by-zero guard ETL (+27): datasus pipeline (11), dou pipeline (16) with fixtures Frontend (+27): 7 page smoke tests (Landing, Register, Investigations, Baseline, GraphExplorer, SharedInvestigation, EntityAnalysis) + useGraphData hook Total: 559 tests (179 API + 226 ETL + 154 Frontend) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 03:36:31 -03:00
bruno cesar	ce1da72c41	Fix ETL + infra bugs: loader key union, datasus normalization, sanctions NULL dates, deploy.sh DOMAIN - loader.py: build SET from union of all row keys (not just first row) - datasus.py: normalize atende_sus to '1'/'0' (was storing raw 'SIM'/'NAO') - sanctions.py: store None for missing date_end (was empty string, broke IS NULL check) - deploy.sh: fail fast if DOMAIN env var is not set Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 03:30:58 -03:00
bruno cesar	e38004b682	Fix IDOR: add entity label allowlist to all elementId() Cypher branches - entity_by_id.cypher: block User/Investigation node exposure via elementId - investigation_add_entity.cypher: prevent linking internal nodes to investigations - investigation_remove_entity.cypher: same label filter on remove path - investigation_update.cypher: fix coalesce chain (was missing contract_id, sanction_id, amendment_id) Confirmed by 3 independent AI audits (Claude, Codex, Gemini). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 03:30:42 -03:00
bruno cesar	d475b982d4	Add Phase 14 ETL pipelines (10), tests, fixtures, and utility scripts 10 new pipelines: bndes, comprasnet, datasus, dou, ibama, inep, pgfn, rais, tcu, transferegov Date formatting transform, test fixtures for all sources Download scripts (comprasnet, datasus, dou), audit tool, graph viz doc Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 03:15:07 -03:00
bruno cesar	9d86d3bc64	Update .gitignore: exclude PNGs, .playwright-mcp, .claude/plans, data dirs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 03:14:47 -03:00
bruno cesar	73be3bc29b	Add 5 new pattern queries: donation-amendment loop, amendment-beneficiary contracts, debtor-health operator, sanctioned-health operator, shell-company contracts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 03:14:35 -03:00
bruno cesar	e3ebbe6d76	Servidor entity matching: bypass LGPD-masked CPFs via partial CPF + name Servidores have LGPD-masked CPFs (only 6 middle digits visible). This adds two-layer SAME_AS matching to link 739K servidores to TSE/CNPJ persons: - Phase 0: pre-compute cpf_middle6 on existing full-CPF Person nodes - Phase 4: partial CPF + exact name match (confidence 0.95) - Phase 5: unique name-only match for classified servidores (confidence 0.85) Integration tests against real Neo4j caught and fixed a Cypher bug: MERGE cannot use list index (targets[0]) directly — needs WITH alias first. Also: make link-persons target, cpf_middle6/cpf_partial indexes, testcontainers conftest fix, neutrality fix in value_sanitization.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 02:46:51 -03:00
bruno cesar	b01ebd74e3	Historical data gaps: TSE 2002-2024, servidores, search crash fix Download scripts rewritten for all TSE election years (2002-2024): - 3 donation URL patterns (pre-2012, 2012-2014, 2018+) - 3 column format eras (early/legacy/new) with auto-detection - ReceitaCandidato.csv file discovery for 2002-2006 nested ZIPs Data: 5.87M candidates, 28.7M donations, 774K servidores downloaded. Search crash fix (React error #31): sanitize_props() converts Neo4j complex types (lists, dicts, temporals) to JSON-safe scalars in entity, search, and graph routers. Defense-in-depth String() wrapping in EntityDetail.tsx. Transparencia download_transparencia.py: fixed column name casing for Id_SERVIDOR_PORTAL (was uppercase). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 02:07:20 -03:00
bruno cesar	e031dc3f18	Phase 14b: Expand queries, landing, and frontend for 15 node/rel types Cypher queries updated for all 15 node labels and 15 relationship types (graph_expand, entity_connections, entity_timeline, entity_score, schema_init). 3 new patterns: debtor_contracts, embargoed_receiving, loan_debtor. Landing redesign with HeroGraph, FeatureIcons, typewriter hook, 13 data sources. Frontend: 5 new data colors, 13 rel types in graph store, i18n for new entity/relationship types. Schema: 5 new uniqueness constraints + 3 new indexes. 473 tests green. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 02:07:02 -03:00
bruno cesar	2b88f9345d	Fix graph CPU burn: memoize canvas callbacks, pause animation on settle + unmount RC3: nodeCanvasObject + nodeCanvasObjectMode wrapped in useCallback — prevents ForceGraph2D from re-initializing render pipeline on every React render. RC4: pauseAnimation() on engine stop (halts RAF loop after layout settles) and on unmount (prevents CPU burn after navigating away). Simulation tuned: cooldownTime=4000, d3AlphaDecay=0.03, d3VelocityDecay=0.5 for faster settling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 17:11:56 -03:00
bruno cesar	fb7395d8ba	Expand queries, schema, patterns, and frontend for 15 node/rel types post Phase 14 data load Cypher: graph_expand, entity_connections, entity_timeline, entity_score, meta_stats, search all handle Finance, Embargo, Education, Convenio, LaborStats, Health, Amendment nodes. Schema: 5 uniqueness constraints + 3 indexes for new node types. 3 new patterns: debtor_contracts, embargoed_receiving, loan_debtor (8 total). Frontend: 5 dataColors, 6 relColors, 5 SVG icons, 12 entity + 13 rel types in store/i18n. Landing: 13 data sources, updated stats. Meta: 14 counts + 14 sources. ETL: CNPJ pipeline rewrite (MERGE-safe, CNAE/address fields), runner registers 14 pipelines. Docs: spec + context updated. 468 tests green, neutrality pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 17:10:01 -03:00
bruno cesar	f182fb7300	Phase 2 UI overhaul: Entity Analysis dossier, graph polish, exposure optimization - Entity Analysis page with 4-tab layout (Graph, Connections, Timeline, Export) - Exposure index with heuristic percentile functions (0.2s vs 21s peer sampling) - Graph canvas: ResizeObserver sizing, d3 force tuning, relationship-colored edges, directional arrows, node glow effects, grid background, auto-fit on load - Dark HUD tooltip with graph2ScreenCoords positioning fix - New components: ScoreRing, InsightsPanel, ConnectionsList, TimelineView, EntityHeader, AnalysisNav, GraphToolbar, ZoomControls, GraphLegend, GraphMinimap, ContextMenu, NodeTooltip, CommandPalette, Button, Skeleton, StatusBar, Toast - Landing page, Register page, Dashboard page, PublicShell - Zustand stores: graphExplorer, toast - Custom node/edge canvas rendering with LOD, icons, connection badges - IBM Plex Mono/Sans self-hosted fonts - Design tokens: node colors, relationship colors, z-index scale - 40+ i18n keys (PT-BR + EN), keyboard shortcuts hook, command palette - All 145 API tests + 127 frontend tests passing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 05:22:42 -03:00
bruno cesar	71c2cb90f7	Add Entity Analysis dossier page with exposure index, insights, timeline, connections views Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 04:26:13 -03:00
bruno cesar	fa7fd23deb	Frontend fixes: investigation 401, navigation redesign, route compat, light mode, i18n - Fix listInvestigations trailing slash (307 redirect -> 401) - Add ExposureFactor, ExposureResponse, TimelineEvent, TimelineResponse, HealthResponse interfaces - Add getEntityExposure, getEntityTimeline, getHealthStatus API functions - Simplify nav to 3 items (Dashboard, Search, Investigations) - Add /app/analysis/:entityId route with lazy-loaded EntityAnalysis placeholder - Add /app/graph/:entityId -> /app/analysis/:entityId redirect - Update all /app/graph/ references to /app/analysis/ (SearchResults, Dashboard, Patterns) - Add light mode CSS variables with [data-theme="light"] selector - Add theme toggle (Sun/Moon) in sidebar, persisted to localStorage - StatusBar polls /api/v1/meta/health every 30s for connectivity status - Fix keyboard shortcuts: allow Cmd+K in input/textarea fields - Add title attrs to collapsed ControlsSidebar icons - Fix ControlsSidebar label truncation (overflow: visible) - Add favicon.svg - Add error toast on Dashboard search failure - Add aria-label to logout button - Add 60+ i18n keys (analysis., nav.theme) in PT-BR and EN Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 04:22:57 -03:00
bruno cesar	c8f4b6762e	Add exposure index + timeline endpoints, fix investigation router prefix - Investigation router: prefix-based routing (relative paths), shared route on separate mini-router - FastAPI: redirect_slashes=False safety net - Schema: added Sanction.date_start and Amendment.date indexes - New models: ExposureFactor, ExposureResponse, TimelineEvent, TimelineResponse - New Cypher: entity_score, entity_score_peers, entity_timeline (cursor-paginated) - New service: score_service.py with weighted exposure index (connections/sources/financial/patterns/baseline) - New endpoints: GET /entity/{id}/exposure, GET /entity/{id}/timeline - Fix: meta.py dict type annotations (pre-existing mypy errors) - Tests: 17 new tests (score_service + entity_timeline), all 143 pass Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 04:22:21 -03:00
bruno cesar	f2a31c6c40	Fix 5 audit concerns: entity_ids, orphan cleanup, depth param, frontend types, docs sync - investigation_update.cypher: collect(e.id) → coalesce(cpf, cnpj, elementId) - investigation_delete.cypher: cascade delete Tags + Annotations - entity_connections.cypher + entity.py: wire depth param with variable-length path - client.ts: types→entity_types, add id/confidence to GraphEdge interface - GraphCanvas.tsx: read confidence from root instead of properties - spec.md: PUT→PATCH, CSV future, last-update NYI, dispute future, 5 patterns - Rules: test counts (124 API, 103 ETL, 65 frontend), 8 pages Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 01:33:07 -03:00
bruno cesar	c550d017fa	Fix 8 audit blockers: IDOR, graph leaks, CPF masking, format normalization, frontend types, pattern query Security: - entity_by_element_id: label allowlist prevents IDOR on private nodes - graph_expand/entity_connections: restrict rel types + exclude User/Investigation/Annotation/Tag - main.py: log critical warning on weak/default JWT secret at startup - neo4j_service: schema bootstrap no longer drops comment-prefixed statements Data integrity: - entity_lookup.cypher: dual-format CPF/CNPJ matching (digits-only + punctuated) - entity.py: format helpers normalize input before lookup - cpf_masking.py: public mask functions for reuse outside middleware - investigation.py: explicit CPF masking in PDF export Frontend: - client.ts: EntityDetail interface aligned with backend (removed root name/document, added is_pep) - EntityDetail.tsx: derive display name/document from properties dict Pattern logic: - pattern_contract_concentration: compute municipality total before entity filter Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 01:23:19 -03:00
bruno cesar	03356fe5ce	Fix 6 Codex xhigh blockers: pattern Cypher, schema bootstrap, rate limits, auth, entity IDs, frontend types WS1: Fix donation-contract pattern — reverse DOOU path direction (Company→Person not Company→Election), d.value→d.valor WS2: Schema bootstrap — copy init.cypher to queries/, add ensure_schema() called in lifespan. All IF NOT EXISTS, idempotent. WS3: Rate limiting — add default_limits to Limiter, per-endpoint decorators on auth (10/min), search (30/min), patterns (30/min) WS4: Investigation ownership — GET investigation/annotations/tags now require CurrentUser, Cypher matches User→OWNS→Investigation. Exports pass user_id. WS5: Entity ID standardization — search/graph return document_id via coalesce, investigation queries match on cpf/cnpj/contract_id/sanction_id/amendment_id OR elementId. New /by-element-id endpoint. Frontend EntityDetail auto-routes. WS6: Frontend SourceAttribution — sources typed as {database} objects not strings, components use source.database, test mocks updated. 292 tests green (124 API + 103 ETL + 65 frontend), neutrality clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 23:50:28 -03:00
bruno cesar	d91eb2cd6d	Fix CNPJ format mismatch — TSE raw digits, Transparencia short values TSE pipeline stored company donor CNPJs as raw 14-digit strings while all other pipelines used formatted XX.XXX.XXX/XXXX-XX. This caused ~17K duplicate Company nodes and broke cross-source MERGE operations. Transparencia accepted short/invalid CNPJs (e.g. "11"), creating 4,427 garbage Company nodes. - tse.py: format_cnpj(donor_doc) instead of storing raw digits - transparencia.py: reject CNPJ with len != 14 digits (covers empty + short) - Tests: assert formatted CNPJ for company donors, add short-CNPJ skip test - Neo4j migration: merged 937 duplicates, reformatted 16,360 in place, deleted 4,427 garbage nodes (53,654,691 → 53,649,327 Company nodes) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 21:25:46 -03:00
bruno cesar	15f17cd821	Fix ETL field mapping: razao_social in TSE/sanctions, contract_id validation - TSE pipeline: add razao_social to company donor nodes (was only name) - Sanctions pipeline: add razao_social to Company nodes (not Person) - Transparencia pipeline: skip contracts with empty CNPJ digits - API search/graph: fallback to name when razao_social missing - Root cause: 24.6K Company nodes from TSE/sanctions had null razao_social because those pipelines only set name, not razao_social - Add 4 tests for razao_social mapping and contract_id validation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 20:32:12 -03:00
bruno cesar	dfd1a7da20	Fix pattern queries, add indexes, harden ETL loader - Fix property mismatches in pattern_sanctioned_receiving (c.date, s.date_start/end) - Redesign pattern_donation_contract (remove impossible AUTOR_EMENDA→Contract path) - Fix pattern_self_dealing path through Amendment instead of direct Contract - Add 3 indexes: Company.cnae_principal, Contract.contracting_org, Contract.date - Guard loader.load_nodes/load_relationships against null/empty keys - Apply format_cpf to CNPJ socios transforms (RF + simple formats) - Add 3 tests for null guards and CPF formatting Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 20:20:17 -03:00
bruno cesar	04d84d76c1	Add --start-phase flag to streaming ETL for phase resumption Allows skipping completed phases when restarting a streaming load, avoiding redundant MERGE operations on already-loaded data. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 20:04:24 -03:00
bruno cesar	13fc81b8cf	Harden production deployment — memory tuning, backups, monitoring - .env.example: document Neo4j memory settings for 40M+ node production - docker-compose.prod.yml: remove misleading VITE_API_URL runtime env (Vite bakes env at build time; Caddy proxies relative paths correctly) - deploy.sh: health check through Caddy (HTTPS) instead of direct API port - deploy.yml: pin appleboy/ssh-action to commit hash (supply-chain safety) - backup-cron.sh: installer for daily Neo4j dump backup at 03:00 UTC - snapshot-volume.sh: Hetzner Cloud volume snapshot via hcloud CLI - healthcheck-cron.sh: uptime monitor every 5 min with webhook alerts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 14:08:17 -03:00
bruno cesar	5846e789a9	Add Phase 10: data loading, investigation CRUD, baseline/edge/shared pages Data loading infrastructure: - Download scripts for TSE, Transparencia, Sanctions (shared _download_utils) - TSE pipeline: sq_candidato support, company donor handling - Transparencia pipeline: emenda/contrato extraction fixes - Neo4jBatchLoader enhancements, runner pipeline ordering Investigation CRUD completeness: - Delete annotation/tag/entity Cypher queries (existence-check pattern) - Investigation service: delete_annotation, delete_tag, remove_entity - Router: full CRUD endpoints + auth guard on export + filename sanitization - Frontend store: delete actions for annotations, tags, entities New frontend features: - Baseline comparison page (useBaseline hook, route, i18n) - EdgeDetail panel for graph edge inspection (money-proportional width) - SharedInvestigation page (public share token access) - +27 tests (Search, Patterns, SearchResults, GraphCanvas, EntityDetail) Bug fixes: - Cypher delete queries: return literal 1 instead of count-after-delete - createInvestigation: trailing slash to avoid 307 redirect - apiFetch: handle 204 No Content without parsing body - export_investigation: add auth guard, sanitize Content-Disposition filename Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 14:06:37 -03:00
bruno cesar	3ebf4e4d2d	Fix BQ download: use Storage Read API, correct output path list_rows() + selected_fields avoids temp tables and storage quota. Parallel downloads (one per table). Fix default output to ../data/. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 11:42:01 -03:00
bruno cesar	1bb538bb5f	Add BigQuery streaming for full CNPJ dataset RF server unreachable — use Base dos Dados BQ mirror instead. New download script streams BQ → CSV page-by-page (no OOM). run_streaming() auto-detects RF vs BQ format, reuses same transforms. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 06:54:51 -03:00
bruno cesar	acdc8c297a	Add PDF export for investigations — weasyprint + Jinja2 template New endpoint GET /api/v1/investigations/{id}/export/pdf with lang param. Jinja2 A4 template, weasyprint rendering, lazy import for test compat. Frontend: export button + API client method + i18n keys. 6 new tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 05:42:31 -03:00
bruno cesar	18cbee1ab7	Add frontend auth tests — auth store (10) + Login page (8) Covers login/register/logout/restore flows in auth store and Login page rendering, form submission, error display, navigation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 05:42:24 -03:00
bruno cesar	5abe290b04	Phase 7: Frontend auth integration — login page, auth store, error boundary Auth store (Zustand): login/register/logout/restore, localStorage persistence, auto-injects Bearer token via api/client.ts on all requests. Login page: email/password form with register toggle and invite code field. Protected routes: /investigations requires auth, redirects to /login. AppShell: shows user email + logout when authenticated, login link when not. ErrorBoundary: React error boundary wrapping entire app in main.tsx. i18n: auth keys added for PT-BR + EN. Also fixes: ETL pandas-stubs for mypy, sanctions.py no-any-return, ETL integration conftest graceful testcontainers import. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 05:04:21 -03:00
bruno cesar	393e7dc3f0	Phase 6: Auth, integration tests, deployment, ETL rewrite, frontend polish Auth: JWT auth with python-jose + passlib, invite-code registration, user model + 3 Cypher queries, auth router, owner-scoped investigations. Rate limiting: slowapi on auth endpoints. Integration tests: testcontainers-based tests for entity, graph, search. Deployment: docker-compose.prod.yml, Caddyfile, backup + deploy scripts, GitHub Actions deploy workflow, deploy docs. ETL rewrite: CNPJ pipeline handles real Receita Federal CSV layout (37 cols), chunked file reading, proper field mapping. Download + explore scripts. Test fixtures with real CSV samples. Frontend polish: Spinner component, responsive CSS improvements across all pages, better navigation, visual refinements. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 04:59:39 -03:00
bruno cesar	f5f825c8bd	Phase 5: Polish — security fixes, code review fixes, CI, README Security: constrain tag entity match, mask password in seed script, enforce graph depth + LIMIT 500, shared PEP_ROLES constant. Code quality: fix SearchResponse field mismatch, PATCH vs PUT, addEntity URL, replace assert with RuntimeError, extract inline Cypher, add model field length limits, fix i18n in Zustand store, neutrality fix in API description. Infra: GitHub Actions CI (api, etl, frontend, neutrality audit). Docs: bilingual README (PT-BR + EN). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 03:52:59 -03:00
bruno cesar	36c8b0d2f8	Phase 4: Investigation workspace — CRUD, annotations, tags, sharing, export Backend: 13 Cypher queries, investigation service with 12 async ops, router with 13 endpoints (CRUD, annotations, tags, share tokens, export). Frontend: InvestigationPanel, InvestigationDetail (inline editing), AnnotationEditor, TagManager (color picker), Timeline. Zustand store with full state management. Bilingual i18n. 79 API + 20 frontend tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 03:43:30 -03:00
bruno cesar	87f46341a4	Phase 3: ETL pipelines — transforms, loader, 4 data pipelines, entity resolution, seed Transforms: name normalization, CPF/CNPJ formatting+validation, deduplication. Neo4j batch loader with UNWIND batching (10K default). Pipelines: CNPJ (Receita Federal), TSE (elections+donations), Transparência (contracts+salaries+amendments), Sanctions (CEIS/CNEP). Entity resolution: splink 4 config for Person matching (optional dep). Dev seed: fixture graph exercising all 5 analysis patterns. 63 ETL tests passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 03:38:12 -03:00
bruno cesar	1c6ca39050	Phase 2 complete: baseline comparison + frontend pattern components Baseline service compares entities against CNAE sector and regional peers. Frontend adds PatternCard, PatternResultCard, Patterns page with bilingual i18n and route integration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 03:31:27 -03:00
bruno cesar	c6390bed05	Phase 2: Pattern analysis — 5 MVP corruption detection patterns - Self-dealing amendments (p01): politician → family company → contract - Patrimony incompatibility (p05): declared assets vs family company capital - Sanctioned still receiving (p06): CEIS/CNEP companies winning contracts - Donation-contract loop (p10): donate to campaign → win contracts - Municipal contract concentration (p12): disproportionate contract share - Pattern service: run single/all patterns with entity scoping - Patterns router: GET /patterns/{entity_id}, GET /patterns/{entity_id}/{name} - Bilingual metadata (PT-BR + EN), neutrality-checked - 5 .cypher files, all parameterized - 64 tests passing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 03:25:30 -03:00
bruno cesar	5cd0647eec	Phase 1: Graph explorer — force-directed viz, controls, entity detail - GraphCanvas: react-force-graph-2d with entity-colored nodes, confidence-based edges - GraphControls: depth slider (1-4), entity type toggles with colored borders - EntityDetail: side panel with properties, source badges, type coloring - useGraphData hook: fetches/caches graph data from API - GraphExplorer page: wires canvas + controls + detail panel - i18n: added graph, entity type translations (PT-BR + EN) - All feedback loops pass: eslint, tsc, vitest (12 tests) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 03:23:32 -03:00
bruno cesar	b468577f2b	Phase 1: Frontend — design system, common components, search page - Design tokens: entity type colors, spacing, typography as TS constants - MoneyLabel: R$ formatting (Brazilian Real with locale) - SourceBadge: data source colored badges - ConfidenceBadge: solid/dashed visual for match confidence - Disclaimer: neutral data disclaimer via i18n - SearchBar: input with type filter dropdown - SearchResults: result list with entity colors and source badges - API client: typed fetch wrapper with search/entity/graph endpoints - i18n: expanded PT-BR and EN translations - 12 vitest tests passing (lint + type-check + vitest clean) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 03:21:30 -03:00
bruno cesar	0dd953898c	Phase 1: API core — all endpoints, query service, CPF masking - Neo4j query service: CypherLoader + parameterized executor - Entity endpoints: /entity/{cpf_or_cnpj} lookup + /entity/{id}/connections - Search endpoint: /search with fulltext index, pagination, type filtering - Graph endpoint: /graph/{entity_id} with depth/type filtering, nodes + edges - CPF masking middleware: scans responses, masks non-PEP CPFs, preserves CNPJ - Pydantic models: EntityResponse, SearchResponse, GraphResponse with source attribution - 5 .cypher query files (never inline Cypher) - 58 unit tests passing (ruff + mypy + pytest clean) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 03:21:15 -03:00
bruno cesar	127a3e6754	Phase 0: Foundation — project skeleton with all three packages - git init, AGPL-3.0 license, .gitignore, .env.example, Makefile - CLAUDE.md with neutrality rules, build commands, code style - Custom agents: security-reviewer, code-reviewer, test-writer - API: FastAPI skeleton with health check, meta endpoints, Neo4j driver - ETL: Pipeline base class, CLI runner skeleton, splink + basedosdados deps - Frontend: Vite + React 19 + React Router 7 + i18n (PT-BR/EN) + Zustand - Infra: docker-compose.yml (Neo4j + API + Frontend), schema init.cypher - All feedback loops pass: ruff, mypy, pytest, eslint, tsc, vitest Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 03:16:21 -03:00

1 2 3

150 Commits