8.0 KiB
PAP-1229 Agent OS Follow-up Plan
Date: 2026-04-08
Related issue: PAP-1229
Companion analysis: doc/plans/2026-04-08-agent-os-technical-report.md
Goal
Turn the agent-os research into a low-risk Paperclip execution plan that preserves Paperclip's control-plane model while testing the few runtime ideas that appear worth adopting.
Decision summary
Paperclip should not absorb agent-os as a product model or orchestration layer.
Paperclip should evaluate agent-os in three narrow areas:
- optional agent runtime for selected local adapters
- capability-based runtime permission vocabulary
- snapshot-backed disposable execution roots
Everything else should stay out of scope unless those three experiments produce strong evidence.
Success condition
This work is successful when Paperclip has:
- a clear yes/no answer on whether
agent-osis worth supporting as an execution substrate - a concrete adapter/runtime experiment with measurable results
- a proposed runtime capability model that fits current Paperclip adapters
- a clear decision on whether snapshot-backed execution roots are worth integrating
Non-goals
Do not:
- replace Paperclip heartbeats, issues, comments, approvals, or budgets with
agent-osprimitives - introduce Rust/sidecar requirements for all local execution paths
- migrate all adapters at once
- add runtime workflow/queue abstractions to Paperclip core
Existing Paperclip integration points
The plan should stay anchored to these existing surfaces:
packages/adapter-utils/src/types.ts- adapter contract, runtime service reporting, session metadata, and capability normalization targets
server/src/services/heartbeat.ts- execution entry point, log capture, issue comment summaries, and cost reporting
server/src/services/execution-workspaces.ts- current workspace lifecycle and git-oriented cleanup/readiness model
server/src/services/plugin-loader.ts- typed host capability boundary and extension loading patterns
- local adapter implementations in
packages/adapters/*/src/server/- current execution behavior to compare against an
agent-os-backed path
- current execution behavior to compare against an
Phase plan
Phase 0: constraints and experiment design
Objective:
- make the evaluation falsifiable before writing integration code
Deliverables:
- short experiment brief added to this document or a child issue
- chosen first runtime target:
pi_localoropencode_local - baseline metrics definition
Questions to lock down:
- what exact developer experience should improve
- what security/isolation property we expect to gain
- what failure modes are unacceptable
- whether the prototype is adapter-only or a deeper internal runtime abstraction spike
Exit criteria:
- a single first target chosen
- measurable comparison criteria agreed on
Recommended metrics:
- cold start latency
- session resume reliability across heartbeats
- transcript/log quality
- implementation complexity
- operational complexity on local dev machines
Phase 1: agentos_local spike
Objective:
- prove that Paperclip can drive one local agent through an
agent-osruntime without breaking heartbeat semantics
Suggested scope:
- implement a new experimental adapter,
agentos_local, or a feature-flagged runtime path under one existing adapter - start with
pi_localoropencode_local - keep Paperclip's existing heartbeat, issue, workspace, and comment flow authoritative
Minimum implementation shape:
- adapter accepts model/runtime config
server/src/services/heartbeat.tsstill owns run lifecycle- execution result still maps into existing
AdapterExecutionResult - session state still fits current
sessionParams/sessionDisplayIdflow
What to verify:
- checkout and heartbeat flow still work end to end
- resume across multiple heartbeats works
- logs/transcripts remain readable in the UI
- failure paths surface cleanly in issue comments and run logs
Exit criteria:
- one agent type can run reliably through the new path
- documented comparison against the existing local adapter path
- explicit recommendation: continue, pause, or abandon
Phase 2: capability-based runtime permissions
Objective:
- introduce a Paperclip-native capability vocabulary without coupling the product to
agent-os
Suggested scope:
- extend adapter config schema vocabulary for runtime permissions
- prototype normalized capabilities such as:
fs.readfs.writenetwork.fetchnetwork.listenprocess.spawnenv.read
Integration targets:
packages/adapter-utils/src/types.ts- adapter config-schema support
- server-side runtime config validation
- future board-facing UI for permissions, if needed
What to avoid:
- building a full human policy UI before the vocabulary is proven useful
- forcing every adapter to implement capability enforcement immediately
Exit criteria:
- documented capability schema
- one adapter path using it meaningfully
- clear compatibility story for non-
agent-osadapters
Phase 3: snapshot-backed execution root experiment
Objective:
- determine whether a layered/snapshotted root model improves some Paperclip workloads
Suggested scope:
- evaluate it only for disposable or non-repo-heavy tasks first
- keep git worktree-based repo editing as the default for codebase tasks
Promising use cases:
- routine-style runs
- ephemeral preview/test environments
- isolated document/artifact generation
- tasks that do not need full git history or branch semantics
Integration targets:
server/src/services/execution-workspaces.ts- workspace realization paths called from
server/src/services/heartbeat.ts
Exit criteria:
- clear statement on which workload classes benefit
- clear statement on which workloads should stay on worktrees
- go/no-go decision for broader implementation
Phase 4: typed host tool evaluation
Objective:
- identify where Paperclip should prefer explicit typed tools over ambient shell access
Suggested scope:
- compare
agent-oshost-toolkit ideas with existing plugin and runtime-service surfaces - choose 1-2 sensitive operations that should become typed tools
Good candidates:
- git metadata/status inspection
- runtime service inspection
- deployment/preview status retrieval
- generated artifact publishing
Exit criteria:
- one concrete proposal for typed-tool adoption in Paperclip
- clear statement on whether this belongs in plugins, adapters, or core services
Recommended sequencing
Recommended order:
- Phase 0
- Phase 1
- Phase 2
- Phase 3
- Phase 4
Reasoning:
- Phase 1 is the fastest way to invalidate or validate the entire
agent-osdirection - Phase 2 is valuable even if Phase 1 is abandoned
- Phase 3 should wait until there is confidence that the runtime approach is operationally worthwhile
- Phase 4 is useful independently but should be informed by what Phase 1 and Phase 2 expose
Risks
Technical risk
agent-osintroduces Rust sidecar and packaging complexity that may outweigh runtime benefits
Product risk
- runtime experimentation could blur the boundary between Paperclip as control plane and Paperclip as execution platform
Integration risk
- session semantics, log formatting, and failure behavior may degrade relative to current local adapters
Scope risk
- a small runtime spike could expand into an adapter-system rewrite if not kept tightly bounded
Guardrails
To keep this effort controlled:
- keep all experiments behind a clearly experimental adapter or feature flag
- do not change issue/comment/approval/budget semantics to suit the runtime
- measure against current local adapters instead of judging in isolation
- stop after Phase 1 if the operational burden is already clearly too high
Proposed next action
The next concrete action should be a small implementation spike issue:
- title:
Prototype experimental agentos_local runtime for one local adapter - target adapter:
opencode_localunlesspi_localis materially easier - expected output: code spike, short verification notes, and a continue/stop recommendation
If leadership wants planning only and no spike yet, this document is the handoff artifact for that decision.