Files
paperclip/doc/plans/2026-04-08-agent-os-follow-up-plan.md
2026-04-09 06:14:12 -05:00

8.0 KiB

PAP-1229 Agent OS Follow-up Plan

Date: 2026-04-08 Related issue: PAP-1229 Companion analysis: doc/plans/2026-04-08-agent-os-technical-report.md

Goal

Turn the agent-os research into a low-risk Paperclip execution plan that preserves Paperclip's control-plane model while testing the few runtime ideas that appear worth adopting.

Decision summary

Paperclip should not absorb agent-os as a product model or orchestration layer.

Paperclip should evaluate agent-os in three narrow areas:

  1. optional agent runtime for selected local adapters
  2. capability-based runtime permission vocabulary
  3. snapshot-backed disposable execution roots

Everything else should stay out of scope unless those three experiments produce strong evidence.

Success condition

This work is successful when Paperclip has:

  • a clear yes/no answer on whether agent-os is worth supporting as an execution substrate
  • a concrete adapter/runtime experiment with measurable results
  • a proposed runtime capability model that fits current Paperclip adapters
  • a clear decision on whether snapshot-backed execution roots are worth integrating

Non-goals

Do not:

  • replace Paperclip heartbeats, issues, comments, approvals, or budgets with agent-os primitives
  • introduce Rust/sidecar requirements for all local execution paths
  • migrate all adapters at once
  • add runtime workflow/queue abstractions to Paperclip core

Existing Paperclip integration points

The plan should stay anchored to these existing surfaces:

  • packages/adapter-utils/src/types.ts
    • adapter contract, runtime service reporting, session metadata, and capability normalization targets
  • server/src/services/heartbeat.ts
    • execution entry point, log capture, issue comment summaries, and cost reporting
  • server/src/services/execution-workspaces.ts
    • current workspace lifecycle and git-oriented cleanup/readiness model
  • server/src/services/plugin-loader.ts
    • typed host capability boundary and extension loading patterns
  • local adapter implementations in packages/adapters/*/src/server/
    • current execution behavior to compare against an agent-os-backed path

Phase plan

Phase 0: constraints and experiment design

Objective:

  • make the evaluation falsifiable before writing integration code

Deliverables:

  • short experiment brief added to this document or a child issue
  • chosen first runtime target: pi_local or opencode_local
  • baseline metrics definition

Questions to lock down:

  • what exact developer experience should improve
  • what security/isolation property we expect to gain
  • what failure modes are unacceptable
  • whether the prototype is adapter-only or a deeper internal runtime abstraction spike

Exit criteria:

  • a single first target chosen
  • measurable comparison criteria agreed on

Recommended metrics:

  • cold start latency
  • session resume reliability across heartbeats
  • transcript/log quality
  • implementation complexity
  • operational complexity on local dev machines

Phase 1: agentos_local spike

Objective:

  • prove that Paperclip can drive one local agent through an agent-os runtime without breaking heartbeat semantics

Suggested scope:

  • implement a new experimental adapter, agentos_local, or a feature-flagged runtime path under one existing adapter
  • start with pi_local or opencode_local
  • keep Paperclip's existing heartbeat, issue, workspace, and comment flow authoritative

Minimum implementation shape:

  • adapter accepts model/runtime config
  • server/src/services/heartbeat.ts still owns run lifecycle
  • execution result still maps into existing AdapterExecutionResult
  • session state still fits current sessionParams / sessionDisplayId flow

What to verify:

  • checkout and heartbeat flow still work end to end
  • resume across multiple heartbeats works
  • logs/transcripts remain readable in the UI
  • failure paths surface cleanly in issue comments and run logs

Exit criteria:

  • one agent type can run reliably through the new path
  • documented comparison against the existing local adapter path
  • explicit recommendation: continue, pause, or abandon

Phase 2: capability-based runtime permissions

Objective:

  • introduce a Paperclip-native capability vocabulary without coupling the product to agent-os

Suggested scope:

  • extend adapter config schema vocabulary for runtime permissions
  • prototype normalized capabilities such as:
    • fs.read
    • fs.write
    • network.fetch
    • network.listen
    • process.spawn
    • env.read

Integration targets:

  • packages/adapter-utils/src/types.ts
  • adapter config-schema support
  • server-side runtime config validation
  • future board-facing UI for permissions, if needed

What to avoid:

  • building a full human policy UI before the vocabulary is proven useful
  • forcing every adapter to implement capability enforcement immediately

Exit criteria:

  • documented capability schema
  • one adapter path using it meaningfully
  • clear compatibility story for non-agent-os adapters

Phase 3: snapshot-backed execution root experiment

Objective:

  • determine whether a layered/snapshotted root model improves some Paperclip workloads

Suggested scope:

  • evaluate it only for disposable or non-repo-heavy tasks first
  • keep git worktree-based repo editing as the default for codebase tasks

Promising use cases:

  • routine-style runs
  • ephemeral preview/test environments
  • isolated document/artifact generation
  • tasks that do not need full git history or branch semantics

Integration targets:

  • server/src/services/execution-workspaces.ts
  • workspace realization paths called from server/src/services/heartbeat.ts

Exit criteria:

  • clear statement on which workload classes benefit
  • clear statement on which workloads should stay on worktrees
  • go/no-go decision for broader implementation

Phase 4: typed host tool evaluation

Objective:

  • identify where Paperclip should prefer explicit typed tools over ambient shell access

Suggested scope:

  • compare agent-os host-toolkit ideas with existing plugin and runtime-service surfaces
  • choose 1-2 sensitive operations that should become typed tools

Good candidates:

  • git metadata/status inspection
  • runtime service inspection
  • deployment/preview status retrieval
  • generated artifact publishing

Exit criteria:

  • one concrete proposal for typed-tool adoption in Paperclip
  • clear statement on whether this belongs in plugins, adapters, or core services

Recommended order:

  1. Phase 0
  2. Phase 1
  3. Phase 2
  4. Phase 3
  5. Phase 4

Reasoning:

  • Phase 1 is the fastest way to invalidate or validate the entire agent-os direction
  • Phase 2 is valuable even if Phase 1 is abandoned
  • Phase 3 should wait until there is confidence that the runtime approach is operationally worthwhile
  • Phase 4 is useful independently but should be informed by what Phase 1 and Phase 2 expose

Risks

Technical risk

  • agent-os introduces Rust sidecar and packaging complexity that may outweigh runtime benefits

Product risk

  • runtime experimentation could blur the boundary between Paperclip as control plane and Paperclip as execution platform

Integration risk

  • session semantics, log formatting, and failure behavior may degrade relative to current local adapters

Scope risk

  • a small runtime spike could expand into an adapter-system rewrite if not kept tightly bounded

Guardrails

To keep this effort controlled:

  • keep all experiments behind a clearly experimental adapter or feature flag
  • do not change issue/comment/approval/budget semantics to suit the runtime
  • measure against current local adapters instead of judging in isolation
  • stop after Phase 1 if the operational burden is already clearly too high

Proposed next action

The next concrete action should be a small implementation spike issue:

  • title: Prototype experimental agentos_local runtime for one local adapter
  • target adapter: opencode_local unless pi_local is materially easier
  • expected output: code spike, short verification notes, and a continue/stop recommendation

If leadership wants planning only and no spike yet, this document is the handoff artifact for that decision.