eliott/paperclip

Fork 0

mirror of https://github.com/paperclipai/paperclip synced 2026-04-25 17:25:15 +02:00

Files

dotta 5758aba91e docs: add agent-os follow-up plan

2026-04-09 06:14:12 -05:00

8.0 KiB

Raw Permalink Blame History

PAP-1229 Agent OS Follow-up Plan

Date: 2026-04-08 Related issue: PAP-1229 Companion analysis: doc/plans/2026-04-08-agent-os-technical-report.md

Goal

Turn the agent-os research into a low-risk Paperclip execution plan that preserves Paperclip's control-plane model while testing the few runtime ideas that appear worth adopting.

Decision summary

Paperclip should not absorb agent-os as a product model or orchestration layer.

Paperclip should evaluate agent-os in three narrow areas:

optional agent runtime for selected local adapters
capability-based runtime permission vocabulary
snapshot-backed disposable execution roots

Everything else should stay out of scope unless those three experiments produce strong evidence.

Success condition

This work is successful when Paperclip has:

a clear yes/no answer on whether agent-os is worth supporting as an execution substrate
a concrete adapter/runtime experiment with measurable results
a proposed runtime capability model that fits current Paperclip adapters
a clear decision on whether snapshot-backed execution roots are worth integrating

Non-goals

Do not:

replace Paperclip heartbeats, issues, comments, approvals, or budgets with agent-os primitives
introduce Rust/sidecar requirements for all local execution paths
migrate all adapters at once
add runtime workflow/queue abstractions to Paperclip core

Existing Paperclip integration points

The plan should stay anchored to these existing surfaces:

packages/adapter-utils/src/types.ts
- adapter contract, runtime service reporting, session metadata, and capability normalization targets
server/src/services/heartbeat.ts
- execution entry point, log capture, issue comment summaries, and cost reporting
server/src/services/execution-workspaces.ts
- current workspace lifecycle and git-oriented cleanup/readiness model
server/src/services/plugin-loader.ts
- typed host capability boundary and extension loading patterns
local adapter implementations in packages/adapters/*/src/server/
- current execution behavior to compare against an agent-os-backed path

Phase plan

Phase 0: constraints and experiment design

Objective:

make the evaluation falsifiable before writing integration code

Deliverables:

short experiment brief added to this document or a child issue
chosen first runtime target: pi_local or opencode_local
baseline metrics definition

Questions to lock down:

what exact developer experience should improve
what security/isolation property we expect to gain
what failure modes are unacceptable
whether the prototype is adapter-only or a deeper internal runtime abstraction spike

Exit criteria:

a single first target chosen
measurable comparison criteria agreed on

Recommended metrics:

cold start latency
session resume reliability across heartbeats
transcript/log quality
implementation complexity
operational complexity on local dev machines

Phase 1: `agentos_local` spike

Objective:

prove that Paperclip can drive one local agent through an agent-os runtime without breaking heartbeat semantics

Suggested scope:

implement a new experimental adapter, agentos_local, or a feature-flagged runtime path under one existing adapter
start with pi_local or opencode_local
keep Paperclip's existing heartbeat, issue, workspace, and comment flow authoritative

Minimum implementation shape:

adapter accepts model/runtime config
server/src/services/heartbeat.ts still owns run lifecycle
execution result still maps into existing AdapterExecutionResult
session state still fits current sessionParams / sessionDisplayId flow

What to verify:

checkout and heartbeat flow still work end to end
resume across multiple heartbeats works
logs/transcripts remain readable in the UI
failure paths surface cleanly in issue comments and run logs

Exit criteria:

one agent type can run reliably through the new path
documented comparison against the existing local adapter path
explicit recommendation: continue, pause, or abandon

Phase 2: capability-based runtime permissions

Objective:

introduce a Paperclip-native capability vocabulary without coupling the product to agent-os

Suggested scope:

extend adapter config schema vocabulary for runtime permissions
prototype normalized capabilities such as:
- fs.read
- fs.write
- network.fetch
- network.listen
- process.spawn
- env.read

Integration targets:

packages/adapter-utils/src/types.ts
adapter config-schema support
server-side runtime config validation
future board-facing UI for permissions, if needed

What to avoid:

building a full human policy UI before the vocabulary is proven useful
forcing every adapter to implement capability enforcement immediately

Exit criteria:

documented capability schema
one adapter path using it meaningfully
clear compatibility story for non-agent-os adapters

Phase 3: snapshot-backed execution root experiment

Objective:

determine whether a layered/snapshotted root model improves some Paperclip workloads

Suggested scope:

evaluate it only for disposable or non-repo-heavy tasks first
keep git worktree-based repo editing as the default for codebase tasks

Promising use cases:

routine-style runs
ephemeral preview/test environments
isolated document/artifact generation
tasks that do not need full git history or branch semantics

Integration targets:

server/src/services/execution-workspaces.ts
workspace realization paths called from server/src/services/heartbeat.ts

Exit criteria:

clear statement on which workload classes benefit
clear statement on which workloads should stay on worktrees
go/no-go decision for broader implementation

Phase 4: typed host tool evaluation

Objective:

identify where Paperclip should prefer explicit typed tools over ambient shell access

Suggested scope:

compare agent-os host-toolkit ideas with existing plugin and runtime-service surfaces
choose 1-2 sensitive operations that should become typed tools

Good candidates:

git metadata/status inspection
runtime service inspection
deployment/preview status retrieval
generated artifact publishing

Exit criteria:

one concrete proposal for typed-tool adoption in Paperclip
clear statement on whether this belongs in plugins, adapters, or core services

Recommended sequencing

Recommended order:

Phase 0
Phase 1
Phase 2
Phase 3
Phase 4

Reasoning:

Phase 1 is the fastest way to invalidate or validate the entire agent-os direction
Phase 2 is valuable even if Phase 1 is abandoned
Phase 3 should wait until there is confidence that the runtime approach is operationally worthwhile
Phase 4 is useful independently but should be informed by what Phase 1 and Phase 2 expose

Risks

Technical risk

agent-os introduces Rust sidecar and packaging complexity that may outweigh runtime benefits

Product risk

runtime experimentation could blur the boundary between Paperclip as control plane and Paperclip as execution platform

Integration risk

session semantics, log formatting, and failure behavior may degrade relative to current local adapters

Scope risk

a small runtime spike could expand into an adapter-system rewrite if not kept tightly bounded

Guardrails

To keep this effort controlled:

keep all experiments behind a clearly experimental adapter or feature flag
do not change issue/comment/approval/budget semantics to suit the runtime
measure against current local adapters instead of judging in isolation
stop after Phase 1 if the operational burden is already clearly too high

Proposed next action

The next concrete action should be a small implementation spike issue:

title: Prototype experimental agentos_local runtime for one local adapter
target adapter: opencode_local unless pi_local is materially easier
expected output: code spike, short verification notes, and a continue/stop recommendation

If leadership wants planning only and no spike yet, this document is the handoff artifact for that decision.

8.0 KiB Raw Permalink Blame History

PAP-1229 Agent OS Follow-up Plan

Goal

Decision summary

Success condition

Non-goals

Existing Paperclip integration points

Phase plan

Phase 0: constraints and experiment design

Phase 1: agentos_local spike

Phase 2: capability-based runtime permissions

Phase 3: snapshot-backed execution root experiment

Phase 4: typed host tool evaluation

Recommended sequencing

Risks

Technical risk

Product risk

Integration risk

Scope risk

Guardrails

Proposed next action

8.0 KiB

Raw Permalink Blame History

Phase 1: `agentos_local` spike