Files
worldmonitor/docs/harness-engineering-roadmap.md
Elie Habib fe67111dc9 feat: harness engineering P0 - linting, testing, architecture docs (#1587)
* feat: harness engineering P0 - linting, testing, architecture docs

Add foundational infrastructure for agent-first development:

- AGENTS.md: agent entry point with progressive disclosure to deeper docs
- ARCHITECTURE.md: 12-section system reference with source-file refs and ownership rule
- Biome 2.4.7 linter with project-tuned rules, CI workflow (lint-code.yml)
- Architectural boundary lint enforcing forward-only dependency direction (lint-boundaries.mjs)
- Unit test CI workflow (test.yml), all 1083 tests passing
- Fixed 9 pre-existing test failures (bootstrap sync, deploy-config headers, globe parity, redis mocks, geometry URL, import.meta.env null safety)
- Fixed 12 architectural boundary violations (types moved to proper layers)
- Added 3 missing cache tier entries in gateway.ts
- Synced cache-keys.ts with bootstrap.js
- Renamed docs/architecture.mdx to "Design Philosophy" with cross-references
- Deprecated legacy docs/Docs_To_Review/ARCHITECTURE.md
- Harness engineering roadmap tracking doc

* fix: address PR review feedback on harness-engineering-p0

- countries-geojson.test.mjs: skip gracefully when CDN unreachable
  instead of failing CI on network issues
- country-geometry-overrides.test.mts: relax timing assertion
  (250ms -> 2000ms) for constrained CI environments
- lint-boundaries.mjs: implement the documented api/ boundary check
  (was documented but missing, causing false green)

* fix(lint): scan api/ .ts files in boundary check

The api/ boundary check only scanned .js/.mjs files, missing the 25
sebuf RPC .ts edge functions. Now scans .ts files with correct rules:
- Legacy .js: fully self-contained (no server/ or src/ imports)
- RPC .ts: may import server/ and src/generated/ (bundled at deploy),
  but blocks imports from src/ application code

* fix(lint): detect import() type expressions in boundary lint

- Move AppContext back to app/app-context.ts (aggregate type that
  references components/services/utils belongs at the top, not types/)
- Move HappyContentCategory and TechHQ to types/ (simple enums/interfaces)
- Boundary lint now catches import('@/layer') expressions, not just
  from '@/layer' imports
- correlation-engine imports of AppContext marked boundary-ignore
  (type-only imports of top-level aggregate)
2026-03-14 21:29:21 +04:00

8.0 KiB

Harness Engineering Readiness Roadmap

Based on Harness Engineering: Leveraging Codex in an Agent-First World (OpenAI, Feb 2026)

Last updated: 2026-03-14

Current readiness: ~25%


Pillar Assessment

# Pillar Status Score
1 Repo knowledge as system of record Good 7/10
2 Enforced architecture Good 6/10
3 Application legibility (agent observability) Weak 2/10
4 Agent-to-agent review loops None 0/10
5 Self-healing / garbage collection None 0/10
6 Full feature loops None 0/10
7 Doc linters / gardening Partial 4/10

Pillar 1: Repo Knowledge as System of Record

Principle: AGENTS.md is a table of contents, not the encyclopedia. Progressive disclosure. Anything outside the repo does not exist.

Done

  • AGENTS.md at repo root (table of contents, progressive disclosure)
  • ARCHITECTURE.md at repo root (system reference with source-file refs, ownership rule)
  • docs/architecture.mdx renamed to "Design Philosophy" (why decisions were made)
  • Legacy docs/Docs_To_Review/ARCHITECTURE.md deprecated with banner
  • Cross-references between all architecture docs
  • Proto contract system with CI freshness checks
  • Comprehensive Mintlify docs site with API reference

Remaining

  • Create docs/design-docs/ directory with index.md
  • Create docs/exec-plans/active/ and docs/exec-plans/completed/
  • Create docs/product-specs/ with index.md
  • Migrate relevant .claude/memory/ entries into repo-visible docs (conventions that apply to all contributors, not just Claude)
  • Add docs/generated/ for auto-generated reference docs (e.g., db-schema, cache-key inventory)

Pillar 2: Enforced Architecture

Principle: Documentation alone cannot maintain coherence. Custom linters enforce dependency direction, naming, file size, structured logging. Lint errors include remediation instructions for agents.

Done

  • TypeScript strict mode (noUncheckedIndexedAccess, noUnusedLocals, noUnusedParameters)
  • tsc --noEmit in CI and pre-push hook
  • Edge function self-containment check (esbuild bundle + import guardrail test)
  • Proto breaking-change detection (buf breaking)
  • Markdown linting in CI

Remaining

  • P0: Add JS/TS linter (Biome 2.4.7) — biome.json, npm run lint, CI workflow lint-code.yml, ~120 files auto-fixed
  • P0: Architectural boundary lint — scripts/lint-boundaries.mjs, npm run lint:boundaries, CI enforced. Fixed 12 violations (moved types to proper layers). 3 pragmatic exceptions with boundary-ignore comments
  • Encode .claude/memory/ conventions as lint rules:
    • Ban fetch.bind(globalThis) (use deferred lambda)
    • Require cachedFetchJson() in new RPC handlers
    • Require seed-meta:<key> write in seed scripts
    • Require User-Agent header in server-side fetch
    • Require cache key includes request-varying params
  • File size limits with warnings
  • Structured logging enforcement in API handlers

Pillar 3: Application Legibility (Agent Observability)

Principle: Agents must be able to launch the app, navigate UI, capture screenshots, inspect DOM, and query logs/metrics/traces.

Done

  • Sentry error tracking in browser
  • api/health.js with per-key freshness monitoring
  • api/seed-health.js for seed loop monitoring
  • Playwright E2E test infrastructure (config, specs, visual regression)
  • Circuit breaker instrumentation

Remaining

  • P1: Expand Playwright E2E harness for agent-driven validation (launch app, navigate, screenshot, assert)
  • P1: Add structured JSON logging to API handlers (request ID, latency, error context)
  • Expose logs in a queryable format (even grep on Railway logs is a start)
  • Add performance budgets (startup time, critical path latency) as testable assertions
  • Wire Chrome DevTools Protocol for agent DOM inspection (desktop)

Pillar 4: Agent-to-Agent Review Loops

Principle: Agent reviews its own work locally. Additional agents review. Feedback loops run until reviews pass. Humans sometimes review PRs.

Done

  • Pre-push hook runs automated checks (typecheck, edge bundle, markdown lint)
  • CI runs typecheck on all PRs

Remaining

  • P2: Configure agent PR review in CI (check for architectural violations, convention adherence, test coverage)
  • Start with advisory comments, not blocking
  • Add self-review step: agent runs tests + lint before opening PR
  • Multi-agent review: different agents check different aspects (security, performance, conventions)

Pillar 5: Self-Healing / Garbage Collection

Principle: Background agents scan for violations and open refactoring PRs. Technical debt becomes incremental maintenance instead of large refactors.

Done

  • "Golden principles" partially encoded in AGENTS.md (key patterns, critical conventions)

Remaining

  • P3: Create convention violation scanner (dead code, banned patterns, missing seed-meta, cache key issues)
  • Background agent opens small refactoring PRs
  • Track tech debt in docs/exec-plans/tech-debt-tracker.md
  • Define "golden principles" document with shared utilities, data shape validation rules, anti-patterns

Pillar 6: Full Feature Loops

Principle: Given a prompt, agent can validate repo state, reproduce bug, record video, implement fix, validate fix, open PR, address feedback, merge.

Done

  • Agents can open PRs via gh
  • Agents can run tests via npm run test:data
  • Git worktree support for isolated work

Remaining

  • P4: Agent bug reproduction harness (receive bug report, reproduce, record screenshot/video)
  • Agent self-merge pipeline for low-risk PRs (requires P0-P2 as safety net)
  • Agent escalation protocol (when to ask human vs. proceed)
  • Build failure auto-repair (agent detects CI failure, fixes, re-pushes)

Pillar 7: Doc Linters / Gardening

Principle: Dedicated linters validate documentation freshness, cross-links, structure. Background agent runs doc gardening tasks.

Done

  • markdownlint-cli2 in CI and pre-push
  • MDX lint for Mintlify compatibility
  • Ownership rule in ARCHITECTURE.md ("update in same PR")

Remaining

  • P3: Doc freshness linter (detect stale dates, broken internal links, orphaned docs)
  • Cross-link validator (ensure all doc references resolve)
  • Doc gardening agent (background task to fix stale docs, update counts, verify source-file refs)

Implementation Order

Phase 1 (P0) — Foundation          ← START HERE
├── Add Biome/ESLint linter
├── Add tests to CI
└── Architectural boundary rules

Phase 2 (P1) — Agent Observability
├── Expand Playwright harness
├── Structured logging
└── Encode memory conventions as lint rules

Phase 3 (P2) — Review Loops
├── Automated PR review agent
└── Golden patterns doc

Phase 4 (P3) — Self-Healing
├── Convention violation scanner
├── Doc freshness linter
└── Tech debt tracker

Phase 5 (P4) — Full Autonomy
├── Bug reproduction harness
├── Self-merge pipeline
└── Progressive disclosure doc tree

Progress Log

Date Change Pillar
2026-03-14 Created AGENTS.md (table of contents) 1
2026-03-14 Created ARCHITECTURE.md (system reference, Codex-approved) 1
2026-03-14 Renamed docs/architecture.mdx to "Design Philosophy", added cross-references 1
2026-03-14 Deprecated legacy docs/Docs_To_Review/ARCHITECTURE.md 1
2026-03-14 Added Biome 2.4.7 linter: biome.json, npm run lint, CI workflow, ~120 files auto-fixed 2
2026-03-14 Fixed all 9 failing test files (1005 tests, 0 failures), added CI workflow test.yml 2
2026-03-14 Architectural boundary lint: lint-boundaries.mjs, fixed 12 violations, 3 pragmatic exceptions, CI enforced 2