feat(costs): add billing, quota, and budget control plane

refactor(quota): move provider quota logic into adapter layer, add unit tests
- Extract all Anthropic credential/API logic into claude-local/src/server/quota.ts - Extract all OpenAI/WHAM credential/API logic into codex-local/src/server/quota.ts - Add optional getQuotaWindows() to ServerAdapterModule in adapter-utils - Rewrite quota-windows.ts as a 29-line thin aggregator with zero provider knowledge - Wire getQuotaWindows into adapter registry for claude-local and codex-local - Add 47 unit tests covering toPercent, secondsToWindowLabel, WHAM normalization, readClaudeToken, readCodexToken, fetchClaudeQuota, fetchCodexQuota, fetchWithTimeout - Add 8 unit tests covering parseDateRange validation and byProvider pro-rata math Adding a third provider now requires only touching that provider's adapter.
2026-05-07 07:32:02 +02:00 · 2026-03-14 22:06:02 -05:00 · 2026-03-14 22:00:43 -05:00 · 2026-03-14 22:00:43 -05:00 · 2026-03-14 22:00:43 -05:00 · 2026-03-14 22:00:43 -05:00
672 changed files with 158679 additions and 6273 deletions
--- a/.agents/skills/create-agent-adapter/SKILL.md
+++ b/.agents/skills/create-agent-adapter/SKILL.md
--- a/.agents/skills/pr-report/SKILL.md
+++ b/.agents/skills/pr-report/SKILL.md
@@ -0,0 +1,202 @@
+---
+name: pr-report
+description: >
+  Review a pull request or contribution deeply, explain it tutorial-style for a
+  maintainer, and produce a polished report artifact such as HTML or Markdown.
+  Use when asked to analyze a PR, explain a contributor's design decisions,
+  compare it with similar systems, or prepare a merge recommendation.
+---
+
+# PR Report Skill
+
+Produce a maintainer-grade review of a PR, branch, or large contribution.
+
+Default posture:
+
+- understand the change before judging it
+- explain the system as built, not just the diff
+- separate architectural problems from product-scope objections
+- make a concrete recommendation, not a vague impression
+
+## When to Use
+
+Use this skill when the user asks for things like:
+
+- "review this PR deeply"
+- "explain this contribution to me"
+- "make me a report or webpage for this PR"
+- "compare this design to similar systems"
+- "should I merge this?"
+
+## Outputs
+
+Common outputs:
+
+- standalone HTML report in `tmp/reports/...`
+- Markdown report in `report/` or another requested folder
+- short maintainer summary in chat
+
+If the user asks for a webpage, build a polished standalone HTML artifact with
+clear sections and readable visual hierarchy.
+
+Resources bundled with this skill:
+
+- `references/style-guide.md` for visual direction and report presentation rules
+- `assets/html-report-starter.html` for a reusable standalone HTML/CSS starter
+
+## Workflow
+
+### 1. Acquire and frame the target
+
+Work from local code when possible, not just the GitHub PR page.
+
+Gather:
+
+- target branch or worktree
+- diff size and changed subsystems
+- relevant repo docs, specs, and invariants
+- contributor intent if it is documented in PR text or design docs
+
+Start by answering: what is this change *trying* to become?
+
+### 2. Build a mental model of the system
+
+Do not stop at file-by-file notes. Reconstruct the design:
+
+- what new runtime or contract exists
+- which layers changed: db, shared types, server, UI, CLI, docs
+- lifecycle: install, startup, execution, UI, failure, disablement
+- trust boundary: what code runs where, under what authority
+
+For large contributions, include a tutorial-style section that teaches the
+system from first principles.
+
+### 3. Review like a maintainer
+
+Findings come first. Order by severity.
+
+Prioritize:
+
+- behavioral regressions
+- trust or security gaps
+- misleading abstractions
+- lifecycle and operational risks
+- coupling that will be hard to unwind
+- missing tests or unverifiable claims
+
+Always cite concrete file references when possible.
+
+### 4. Distinguish the objection type
+
+Be explicit about whether a concern is:
+
+- product direction
+- architecture
+- implementation quality
+- rollout strategy
+- documentation honesty
+
+Do not hide an architectural objection inside a scope objection.
+
+### 5. Compare to external precedents when needed
+
+If the contribution introduces a framework or platform concept, compare it to
+similar open-source systems.
+
+When comparing:
+
+- prefer official docs or source
+- focus on extension boundaries, context passing, trust model, and UI ownership
+- extract lessons, not just similarities
+
+Good comparison questions:
+
+- Who owns lifecycle?
+- Who owns UI composition?
+- Is context explicit or ambient?
+- Are plugins trusted code or sandboxed code?
+- Are extension points named and typed?
+
+### 6. Make the recommendation actionable
+
+Do not stop at "merge" or "do not merge."
+
+Choose one:
+
+- merge as-is
+- merge after specific redesign
+- salvage specific pieces
+- keep as design research
+
+If rejecting or narrowing, say what should be kept.
+
+Useful recommendation buckets:
+
+- keep the protocol/type model
+- redesign the UI boundary
+- narrow the initial surface area
+- defer third-party execution
+- ship a host-owned extension-point model first
+
+### 7. Build the artifact
+
+Suggested report structure:
+
+1. Executive summary
+2. What the PR actually adds
+3. Tutorial: how the system works
+4. Strengths
+5. Main findings
+6. Comparisons
+7. Recommendation
+
+For HTML reports:
+
+- use intentional typography and color
+- make navigation easy for long reports
+- favor strong section headings and small reference labels
+- avoid generic dashboard styling
+
+Before building from scratch, read `references/style-guide.md`.
+If a fast polished starter is helpful, begin from `assets/html-report-starter.html`
+and replace the placeholder content with the actual report.
+
+### 8. Verify before handoff
+
+Check:
+
+- artifact path exists
+- findings still match the actual code
+- any requested forbidden strings are absent from generated output
+- if tests were not run, say so explicitly
+
+## Review Heuristics
+
+### Plugin and platform work
+
+Watch closely for:
+
+- docs claiming sandboxing while runtime executes trusted host processes
+- module-global state used to smuggle React context
+- hidden dependence on render order
+- plugins reaching into host internals instead of using explicit APIs
+- "capabilities" that are really policy labels on top of fully trusted code
+
+### Good signs
+
+- typed contracts shared across layers
+- explicit extension points
+- host-owned lifecycle
+- honest trust model
+- narrow first rollout with room to grow
+
+## Final Response
+
+In chat, summarize:
+
+- where the report is
+- your overall call
+- the top one or two reasons
+- whether verification or tests were skipped
+
+Keep the chat summary shorter than the report itself.
--- a/.agents/skills/pr-report/assets/html-report-starter.html
+++ b/.agents/skills/pr-report/assets/html-report-starter.html
@@ -0,0 +1,426 @@
+<!doctype html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1" />
+    <title>PR Report Starter</title>
+    <link rel="preconnect" href="https://fonts.googleapis.com" />
+    <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin />
+    <link
+      href="https://fonts.googleapis.com/css2?family=IBM+Plex+Sans:wght@400;500;600;700&family=Newsreader:opsz,wght@6..72,500;6..72,700&display=swap"
+      rel="stylesheet"
+    />
+    <style>
+      :root {
+        --bg: #f4efe5;
+        --paper: rgba(255, 251, 244, 0.88);
+        --paper-strong: #fffaf1;
+        --ink: #1f1b17;
+        --muted: #6a6257;
+        --line: rgba(31, 27, 23, 0.12);
+        --accent: #9c4729;
+        --accent-soft: rgba(156, 71, 41, 0.1);
+        --good: #2f6a42;
+        --warn: #946200;
+        --bad: #8c2f25;
+        --shadow: 0 22px 60px rgba(52, 37, 19, 0.1);
+        --radius: 20px;
+      }
+
+      * {
+        box-sizing: border-box;
+      }
+
+      html {
+        scroll-behavior: smooth;
+      }
+
+      body {
+        margin: 0;
+        color: var(--ink);
+        font-family: "IBM Plex Sans", sans-serif;
+        background:
+          radial-gradient(circle at top left, rgba(156, 71, 41, 0.12), transparent 34rem),
+          radial-gradient(circle at top right, rgba(47, 106, 66, 0.08), transparent 28rem),
+          linear-gradient(180deg, #efe6d6 0%, var(--bg) 48%, #ece5d8 100%);
+      }
+
+      .shell {
+        width: min(1360px, calc(100vw - 32px));
+        margin: 24px auto;
+        display: grid;
+        grid-template-columns: 280px minmax(0, 1fr);
+        gap: 24px;
+      }
+
+      .panel {
+        background: var(--paper);
+        backdrop-filter: blur(12px);
+        border: 1px solid var(--line);
+        border-radius: var(--radius);
+        box-shadow: var(--shadow);
+      }
+
+      .nav {
+        position: sticky;
+        top: 20px;
+        align-self: start;
+        padding: 22px;
+      }
+
+      .eyebrow {
+        letter-spacing: 0.12em;
+        text-transform: uppercase;
+        font-size: 11px;
+        font-weight: 700;
+        color: var(--accent);
+      }
+
+      .nav h1,
+      .hero h1,
+      h2,
+      h3 {
+        font-family: "Newsreader", serif;
+        line-height: 0.96;
+        margin: 0;
+      }
+
+      .nav h1 {
+        font-size: 2rem;
+        margin-top: 10px;
+      }
+
+      .nav p {
+        color: var(--muted);
+        font-size: 0.95rem;
+        line-height: 1.5;
+      }
+
+      .nav ul {
+        list-style: none;
+        padding: 0;
+        margin: 18px 0 0;
+        display: grid;
+        gap: 10px;
+      }
+
+      .nav a {
+        display: block;
+        color: var(--ink);
+        text-decoration: none;
+        padding: 10px 12px;
+        border-radius: 12px;
+        border: 1px solid transparent;
+        background: rgba(255, 255, 255, 0.35);
+      }
+
+      .nav a:hover {
+        border-color: var(--line);
+        background: rgba(255, 255, 255, 0.75);
+      }
+
+      .meta-block {
+        margin-top: 20px;
+        padding-top: 18px;
+        border-top: 1px solid var(--line);
+        color: var(--muted);
+        font-size: 0.86rem;
+        line-height: 1.5;
+      }
+
+      main {
+        display: grid;
+        gap: 24px;
+      }
+
+      section {
+        padding: 26px 28px 28px;
+      }
+
+      .hero {
+        padding: 28px;
+        overflow: hidden;
+        position: relative;
+      }
+
+      .hero::after {
+        content: "";
+        position: absolute;
+        inset: auto -3rem -6rem auto;
+        width: 18rem;
+        height: 18rem;
+        border-radius: 50%;
+        background: radial-gradient(circle, rgba(156, 71, 41, 0.14), transparent 68%);
+        pointer-events: none;
+      }
+
+      .hero h1 {
+        font-size: clamp(2.6rem, 5vw, 4.6rem);
+        max-width: 12ch;
+        margin-top: 12px;
+      }
+
+      .lede {
+        margin-top: 16px;
+        max-width: 70ch;
+        font-size: 1.05rem;
+        line-height: 1.65;
+        color: #2b2723;
+      }
+
+      .hero-grid,
+      .card-grid,
+      .two-col {
+        display: grid;
+        gap: 14px;
+      }
+
+      .hero-grid {
+        margin-top: 24px;
+        grid-template-columns: repeat(4, minmax(0, 1fr));
+      }
+
+      .card-grid {
+        grid-template-columns: repeat(2, minmax(0, 1fr));
+      }
+
+      .two-col {
+        grid-template-columns: repeat(2, minmax(0, 1fr));
+      }
+
+      .metric,
+      .card,
+      .finding {
+        padding: 18px;
+        background: rgba(255, 255, 255, 0.68);
+        border: 1px solid var(--line);
+        border-radius: 18px;
+      }
+
+      .metric .label {
+        color: var(--muted);
+        font-size: 0.82rem;
+        text-transform: uppercase;
+        letter-spacing: 0.08em;
+      }
+
+      .metric .value {
+        margin-top: 8px;
+        font-size: 1.45rem;
+        font-weight: 700;
+      }
+
+      h2 {
+        font-size: 2rem;
+        margin-bottom: 16px;
+      }
+
+      h3 {
+        font-size: 1.3rem;
+        margin-bottom: 10px;
+      }
+
+      p {
+        margin: 0 0 14px;
+        line-height: 1.65;
+      }
+
+      ul,
+      ol {
+        margin: 0;
+        padding-left: 20px;
+        line-height: 1.65;
+      }
+
+      li + li {
+        margin-top: 8px;
+      }
+
+      .badge-row {
+        display: flex;
+        flex-wrap: wrap;
+        gap: 8px;
+        margin: 18px 0 8px;
+      }
+
+      .badge {
+        display: inline-flex;
+        align-items: center;
+        gap: 8px;
+        padding: 8px 10px;
+        border-radius: 999px;
+        font-size: 0.82rem;
+        font-weight: 700;
+        border: 1px solid var(--line);
+        background: rgba(255, 255, 255, 0.68);
+      }
+
+      .badge.good {
+        color: var(--good);
+      }
+
+      .badge.warn {
+        color: var(--warn);
+      }
+
+      .badge.bad {
+        color: var(--bad);
+      }
+
+      .quote {
+        margin-top: 18px;
+        padding: 18px;
+        border-left: 4px solid var(--accent);
+        border-radius: 14px;
+        background: var(--accent-soft);
+      }
+
+      .severity {
+        display: inline-flex;
+        margin-bottom: 12px;
+        padding: 6px 10px;
+        border-radius: 999px;
+        font-size: 0.78rem;
+        font-weight: 700;
+        text-transform: uppercase;
+        letter-spacing: 0.08em;
+      }
+
+      .severity.high {
+        background: rgba(140, 47, 37, 0.12);
+        color: var(--bad);
+      }
+
+      .severity.medium {
+        background: rgba(148, 98, 0, 0.12);
+        color: var(--warn);
+      }
+
+      .severity.low {
+        background: rgba(47, 106, 66, 0.12);
+        color: var(--good);
+      }
+
+      .ref {
+        color: var(--muted);
+        font-size: 0.82rem;
+        line-height: 1.5;
+      }
+
+      @media (max-width: 980px) {
+        .shell {
+          grid-template-columns: 1fr;
+        }
+
+        .nav {
+          position: static;
+        }
+
+        .hero-grid,
+        .card-grid,
+        .two-col {
+          grid-template-columns: 1fr;
+        }
+
+        .hero h1 {
+          max-width: 100%;
+        }
+      }
+    </style>
+  </head>
+  <body>
+    <div class="shell">
+      <aside class="panel nav">
+        <div class="eyebrow">Maintainer Report</div>
+        <h1>Report Title</h1>
+        <p>Replace this with a concise description of what the report covers.</p>
+        <ul>
+          <li><a href="#summary">Summary</a></li>
+          <li><a href="#tutorial">Tutorial</a></li>
+          <li><a href="#findings">Findings</a></li>
+          <li><a href="#recommendation">Recommendation</a></li>
+        </ul>
+        <div class="meta-block">
+          Replace with project metadata, review date, or scope notes.
+        </div>
+      </aside>
+
+      <main>
+        <section class="panel hero" id="summary">
+          <div class="eyebrow">Executive Summary</div>
+          <h1>Use the hero for the clearest one-line judgment.</h1>
+          <p class="lede">
+            Replace this with the short explanation of what the contribution does, why it matters,
+            and what the core maintainer question is.
+          </p>
+          <div class="badge-row">
+            <span class="badge good">Strength</span>
+            <span class="badge warn">Tradeoff</span>
+            <span class="badge bad">Risk</span>
+          </div>
+          <div class="hero-grid">
+            <div class="metric">
+              <div class="label">Overall Call</div>
+              <div class="value">Placeholder</div>
+            </div>
+            <div class="metric">
+              <div class="label">Main Concern</div>
+              <div class="value">Placeholder</div>
+            </div>
+            <div class="metric">
+              <div class="label">Best Part</div>
+              <div class="value">Placeholder</div>
+            </div>
+            <div class="metric">
+              <div class="label">Weakest Part</div>
+              <div class="value">Placeholder</div>
+            </div>
+          </div>
+          <div class="quote">
+            Use this block for the thesis, a sharp takeaway, or a key cited point.
+          </div>
+        </section>
+
+        <section class="panel" id="tutorial">
+          <h2>Tutorial Section</h2>
+          <div class="two-col">
+            <div class="card">
+              <h3>Concept Card</h3>
+              <p>Use cards for mental models, subsystems, or comparison slices.</p>
+              <div class="ref">path/to/file.ts:10</div>
+            </div>
+            <div class="card">
+              <h3>Second Card</h3>
+              <p>Keep cards fairly dense. This template is about style, not fixed structure.</p>
+              <div class="ref">path/to/file.ts:20</div>
+            </div>
+          </div>
+        </section>
+
+        <section class="panel" id="findings">
+          <h2>Findings</h2>
+          <article class="finding">
+            <div class="severity high">High</div>
+            <h3>Finding Title</h3>
+            <p>Use findings for the sharpest judgment calls and risks.</p>
+            <div class="ref">path/to/file.ts:30</div>
+          </article>
+        </section>
+
+        <section class="panel" id="recommendation">
+          <h2>Recommendation</h2>
+          <div class="card-grid">
+            <div class="card">
+              <h3>Path Forward</h3>
+              <p>Use this area for merge guidance, salvage plan, or rollout advice.</p>
+            </div>
+            <div class="card">
+              <h3>What To Keep</h3>
+              <p>Call out the parts worth preserving even if the whole proposal should not land.</p>
+            </div>
+          </div>
+        </section>
+      </main>
+    </div>
+  </body>
+</html>
--- a/.agents/skills/pr-report/references/style-guide.md
+++ b/.agents/skills/pr-report/references/style-guide.md
@@ -0,0 +1,149 @@
+# PR Report Style Guide
+
+Use this guide when the user wants a report artifact, especially a webpage.
+
+## Goal
+
+Make the report feel like an editorial review, not an internal admin dashboard.
+The page should make a long technical argument easy to scan without looking
+generic or overdesigned.
+
+## Visual Direction
+
+Preferred tone:
+
+- editorial
+- warm
+- serious
+- high-contrast
+- handcrafted, not corporate SaaS
+
+Avoid:
+
+- default app-shell layouts
+- purple gradients on white
+- generic card dashboards
+- cramped pages with weak hierarchy
+- novelty fonts that hurt readability
+
+## Typography
+
+Recommended pattern:
+
+- one expressive serif or display face for major headings
+- one sturdy sans-serif for body copy and UI labels
+
+Good combinations:
+
+- Newsreader + IBM Plex Sans
+- Source Serif 4 + Instrument Sans
+- Fraunces + Public Sans
+- Libre Baskerville + Work Sans
+
+Rules:
+
+- headings should feel deliberate and large
+- body copy should stay comfortable for long reading
+- reference labels and badges should use smaller dense sans text
+
+## Layout
+
+Recommended structure:
+
+- a sticky side or top navigation for long reports
+- one strong hero summary at the top
+- panel or paper-like sections for each major topic
+- multi-column card grids for comparisons and strengths
+- single-column body text for findings and recommendations
+
+Use generous spacing. Long-form technical reports need breathing room.
+
+## Color
+
+Prefer muted paper-like backgrounds with one warm accent and one cool counterweight.
+
+Suggested token categories:
+
+- `--bg`
+- `--paper`
+- `--ink`
+- `--muted`
+- `--line`
+- `--accent`
+- `--good`
+- `--warn`
+- `--bad`
+
+The accent should highlight navigation, badges, and important labels. Do not
+let accent colors dominate body text.
+
+## Useful UI Elements
+
+Include small reusable styles for:
+
+- summary metrics
+- badges
+- quotes or callouts
+- finding cards
+- severity labels
+- reference labels
+- comparison cards
+- responsive two-column sections
+
+## Motion
+
+Keep motion restrained.
+
+Good:
+
+- soft fade/slide-in on first load
+- hover response on nav items or cards
+
+Bad:
+
+- constant animation
+- floating blobs
+- decorative motion with no reading benefit
+
+## Content Presentation
+
+Even when the user wants design polish, clarity stays primary.
+
+Good structure for long reports:
+
+1. executive summary
+2. what changed
+3. tutorial explanation
+4. strengths
+5. findings
+6. comparisons
+7. recommendation
+
+The exact headings can change. The important thing is to separate explanation
+from judgment.
+
+## References
+
+Reference labels should be visually quiet but easy to spot.
+
+Good pattern:
+
+- small muted text
+- monospace or compact sans
+- keep them close to the paragraph they support
+
+## Starter Usage
+
+If you need a fast polished base, start from:
+
+- `assets/html-report-starter.html`
+
+Customize:
+
+- fonts
+- color tokens
+- hero copy
+- section ordering
+- card density
+
+Do not preserve the placeholder sections if they do not fit the actual report.
--- a/.agents/skills/release-changelog/SKILL.md
+++ b/.agents/skills/release-changelog/SKILL.md
@@ -0,0 +1,178 @@
+---
+name: release-changelog
+description: >
+  Generate the stable Paperclip release changelog at releases/v{version}.md by
+  reading commits, changesets, and merged PR context since the last stable tag.
+---
+
+# Release Changelog Skill
+
+Generate the user-facing changelog for the **stable** Paperclip release.
+
+Output:
+
+- `releases/v{version}.md`
+
+Important rule:
+
+- even if there are canary releases such as `1.2.3-canary.0`, the changelog file stays `releases/v1.2.3.md`
+
+## Step 0 — Idempotency Check
+
+Before generating anything, check whether the file already exists:
+
+```bash
+ls releases/v{version}.md 2>/dev/null
+```
+
+If it exists:
+
+1. read it first
+2. present it to the reviewer
+3. ask whether to keep it, regenerate it, or update specific sections
+4. never overwrite it silently
+
+## Step 1 — Determine the Stable Range
+
+Find the last stable tag:
+
+```bash
+git tag --list 'v*' --sort=-version:refname | head -1
+git log v{last}..HEAD --oneline --no-merges
+```
+
+The planned stable version comes from one of:
+
+- an explicit maintainer request
+- the chosen bump type applied to the last stable tag
+- the release plan already agreed in `doc/RELEASING.md`
+
+Do not derive the changelog version from a canary tag or prerelease suffix.
+
+## Step 2 — Gather the Raw Inputs
+
+Collect release data from:
+
+1. git commits since the last stable tag
+2. `.changeset/*.md` files
+3. merged PRs via `gh` when available
+
+Useful commands:
+
+```bash
+git log v{last}..HEAD --oneline --no-merges
+git log v{last}..HEAD --format="%H %s" --no-merges
+ls .changeset/*.md | grep -v README.md
+gh pr list --state merged --search "merged:>={last-tag-date}" --json number,title,body,labels
+```
+
+## Step 3 — Detect Breaking Changes
+
+Look for:
+
+- destructive migrations
+- removed or changed API fields/endpoints
+- renamed or removed config keys
+- `major` changesets
+- `BREAKING:` or `BREAKING CHANGE:` commit signals
+
+Key commands:
+
+```bash
+git diff --name-only v{last}..HEAD -- packages/db/src/migrations/
+git diff v{last}..HEAD -- packages/db/src/schema/
+git diff v{last}..HEAD -- server/src/routes/ server/src/api/
+git log v{last}..HEAD --format="%s" | rg -n 'BREAKING CHANGE|BREAKING:|^[a-z]+!:' || true
+```
+
+If the requested bump is lower than the minimum required bump, flag that before the release proceeds.
+
+## Step 4 — Categorize for Users
+
+Use these stable changelog sections:
+
+- `Breaking Changes`
+- `Highlights`
+- `Improvements`
+- `Fixes`
+- `Upgrade Guide` when needed
+
+Exclude purely internal refactors, CI changes, and docs-only work unless they materially affect users.
+
+Guidelines:
+
+- group related commits into one user-facing entry
+- write from the user perspective
+- keep highlights short and concrete
+- spell out upgrade actions for breaking changes
+
+### Inline PR and contributor attribution
+
+When a bullet item clearly maps to a merged pull request, add inline attribution at the
+end of the entry in this format:
+
+```
+- **Feature name** — Description. ([#123](https://github.com/paperclipai/paperclip/pull/123), @contributor1, @contributor2)
+```
+
+Rules:
+
+- Only add a PR link when you can confidently trace the bullet to a specific merged PR.
+  Use merge commit messages (`Merge pull request #N from user/branch`) to map PRs.
+- List the contributor(s) who authored the PR. Use GitHub usernames, not real names or emails.
+- If multiple PRs contributed to a single bullet, list them all: `([#10](url), [#12](url), @user1, @user2)`.
+- If you cannot determine the PR number or contributor with confidence, omit the attribution
+  parenthetical — do not guess.
+- Core maintainer commits that don't have an external PR can omit the parenthetical.
+
+## Step 5 — Write the File
+
+Template:
+
+```markdown
+# v{version}
+
+> Released: {YYYY-MM-DD}
+
+## Breaking Changes
+
+## Highlights
+
+## Improvements
+
+## Fixes
+
+## Upgrade Guide
+
+## Contributors
+
+Thank you to everyone who contributed to this release!
+
+@username1, @username2, @username3
+```
+
+Omit empty sections except `Highlights`, `Improvements`, and `Fixes`, which should usually exist.
+
+The `Contributors` section should always be included. List every person who authored
+commits in the release range, @-mentioning them by their **GitHub username** (not their
+real name or email). To find GitHub usernames:
+
+1. Extract usernames from merge commit messages: `git log v{last}..HEAD --oneline --merges` — the branch prefix (e.g. `from username/branch`) gives the GitHub username.
+2. For noreply emails like `user@users.noreply.github.com`, the username is the part before `@`.
+3. For contributors whose username is ambiguous, check `gh api users/{guess}` or the PR page.
+
+**Never expose contributor email addresses.** Use `@username` only.
+
+Exclude bot accounts (e.g. `lockfile-bot`, `dependabot`) from the list. List contributors
+in alphabetical order by GitHub username (case-insensitive).
+
+## Step 6 — Review Before Release
+
+Before handing it off:
+
+1. confirm the heading is the stable version only
+2. confirm there is no `-canary` language in the title or filename
+3. confirm any breaking changes have an upgrade path
+4. present the draft for human sign-off
+
+This skill never publishes anything. It only prepares the stable changelog artifact.
--- a/.agents/skills/release/SKILL.md
+++ b/.agents/skills/release/SKILL.md
@@ -0,0 +1,261 @@
+---
+name: release
+description: >
+  Coordinate a full Paperclip release across engineering verification, npm,
+  GitHub, website publishing, and announcement follow-up. Use when leadership
+  asks to ship a release, not merely to discuss version bumps.
+---
+
+# Release Coordination Skill
+
+Run the full Paperclip release as a maintainer workflow, not just an npm publish.
+
+This skill coordinates:
+
+- stable changelog drafting via `release-changelog`
+- release-train setup via `scripts/release-start.sh`
+- prerelease canary publishing via `scripts/release.sh --canary`
+- Docker smoke testing via `scripts/docker-onboard-smoke.sh`
+- stable publishing via `scripts/release.sh`
+- pushing the stable branch commit and tag
+- GitHub Release creation via `scripts/create-github-release.sh`
+- website / announcement follow-up tasks
+
+## Trigger
+
+Use this skill when leadership asks for:
+
+- "do a release"
+- "ship the next patch/minor/major"
+- "release vX.Y.Z"
+
+## Preconditions
+
+Before proceeding, verify all of the following:
+
+1. `.agents/skills/release-changelog/SKILL.md` exists and is usable.
+2. The repo working tree is clean, including untracked files.
+3. There are commits since the last stable tag.
+4. The release SHA has passed the verification gate or is about to.
+5. If package manifests changed, the CI-owned `pnpm-lock.yaml` refresh is already merged on `master` before the release branch is cut.
+6. npm publish rights are available locally, or the GitHub release workflow is being used with trusted publishing.
+7. If running through Paperclip, you have issue context for status updates and follow-up task creation.
+
+If any precondition fails, stop and report the blocker.
+
+## Inputs
+
+Collect these inputs up front:
+
+- requested bump: `patch`, `minor`, or `major`
+- whether this run is a dry run or live release
+- whether the release is being run locally or from GitHub Actions
+- release issue / company context for website and announcement follow-up
+
+## Step 0 — Release Model
+
+Paperclip now uses this release model:
+
+1. Start or resume `release/X.Y.Z`
+2. Draft the **stable** changelog as `releases/vX.Y.Z.md`
+3. Publish one or more **prerelease canaries** such as `X.Y.Z-canary.0`
+4. Smoke test the canary via Docker
+5. Publish the stable version `X.Y.Z`
+6. Push the stable branch commit and tag
+7. Create the GitHub Release
+8. Merge `release/X.Y.Z` back to `master` without squash or rebase
+9. Complete website and announcement surfaces
+
+Critical consequence:
+
+- Canaries do **not** use promote-by-dist-tag anymore.
+- The changelog remains stable-only. Do not create `releases/vX.Y.Z-canary.N.md`.
+
+## Step 1 — Decide the Stable Version
+
+Start the release train first:
+
+```bash
+./scripts/release-start.sh {patch|minor|major}
+```
+
+Then run release preflight:
+
+```bash
+./scripts/release-preflight.sh canary {patch|minor|major}
+# or
+./scripts/release-preflight.sh stable {patch|minor|major}
+```
+
+Then use the last stable tag as the base:
+
+```bash
+LAST_TAG=$(git tag --list 'v*' --sort=-version:refname | head -1)
+git log "${LAST_TAG}..HEAD" --oneline --no-merges
+git diff --name-only "${LAST_TAG}..HEAD" -- packages/db/src/migrations/
+git diff "${LAST_TAG}..HEAD" -- packages/db/src/schema/
+git log "${LAST_TAG}..HEAD" --format="%s" | rg -n 'BREAKING CHANGE|BREAKING:|^[a-z]+!:' || true
+```
+
+Bump policy:
+
+- destructive migrations, removed APIs, breaking config changes -> `major`
+- additive migrations or clearly user-visible features -> at least `minor`
+- fixes only -> `patch`
+
+If the requested bump is too low, escalate it and explain why.
+
+## Step 2 — Draft the Stable Changelog
+
+Invoke `release-changelog` and generate:
+
+- `releases/vX.Y.Z.md`
+
+Rules:
+
+- review the draft with a human before publish
+- preserve manual edits if the file already exists
+- keep the heading and filename stable-only, for example `v1.2.3`
+- do not create a separate canary changelog file
+
+## Step 3 — Verify the Release SHA
+
+Run the standard gate:
+
+```bash
+pnpm -r typecheck
+pnpm test:run
+pnpm build
+```
+
+If the release will be run through GitHub Actions, the workflow can rerun this gate. Still report whether the local tree currently passes.
+
+The GitHub Actions release workflow installs with `pnpm install --frozen-lockfile`. Treat that as a release invariant, not a nuisance: if manifests changed and the lockfile refresh PR has not landed yet, stop and wait for `master` to contain the committed lockfile before shipping.
+
+## Step 4 — Publish a Canary
+
+Run from the `release/X.Y.Z` branch:
+
+```bash
+./scripts/release.sh {patch|minor|major} --canary --dry-run
+./scripts/release.sh {patch|minor|major} --canary
+```
+
+What this means:
+
+- npm receives `X.Y.Z-canary.N` under dist-tag `canary`
+- `latest` remains unchanged
+- no git tag is created
+- the script cleans the working tree afterward
+
+Guard:
+
+- if the current stable is `0.2.7`, the next patch canary is `0.2.8-canary.0`
+- the tooling must never publish `0.2.7-canary.N` after `0.2.7` is already stable
+
+After publish, verify:
+
+```bash
+npm view paperclipai@canary version
+```
+
+The user install path is:
+
+```bash
+npx paperclipai@canary onboard
+```
+
+## Step 5 — Smoke Test the Canary
+
+Run:
+
+```bash
+PAPERCLIPAI_VERSION=canary ./scripts/docker-onboard-smoke.sh
+```
+
+Confirm:
+
+1. install succeeds
+2. onboarding completes
+3. server boots
+4. UI loads
+5. basic company/dashboard flow works
+
+If smoke testing fails:
+
+- stop the stable release
+- fix the issue
+- publish another canary
+- repeat the smoke test
+
+Each retry should create a higher canary ordinal, while the stable target version can stay the same.
+
+## Step 6 — Publish Stable
+
+Once the SHA is vetted, run:
+
+```bash
+./scripts/release.sh {patch|minor|major} --dry-run
+./scripts/release.sh {patch|minor|major}
+```
+
+Stable publish does this:
+
+- publishes `X.Y.Z` to npm under `latest`
+- creates the local release commit
+- creates the local git tag `vX.Y.Z`
+
+Stable publish does **not** push the release for you.
+
+## Step 7 — Push and Create GitHub Release
+
+After stable publish succeeds:
+
+```bash
+git push public-gh HEAD --follow-tags
+./scripts/create-github-release.sh X.Y.Z
+```
+
+Use the stable changelog file as the GitHub Release notes source.
+
+Then open the PR from `release/X.Y.Z` back to `master` and merge without squash or rebase.
+
+## Step 8 — Finish the Other Surfaces
+
+Create or verify follow-up work for:
+
+- website changelog publishing
+- launch post / social announcement
+- any release summary in Paperclip issue context
+
+These should reference the stable release, not the canary.
+
+## Failure Handling
+
+If the canary is bad:
+
+- publish another canary, do not ship stable
+
+If stable npm publish succeeds but push or GitHub release creation fails:
+
+- fix the git/GitHub issue immediately from the same checkout
+- do not republish the same version
+
+If `latest` is bad after stable publish:
+
+```bash
+./scripts/rollback-latest.sh <last-good-version>
+```
+
+Then fix forward with a new patch release.
+
+## Output
+
+When the skill completes, provide:
+
+- stable version and, if relevant, the final canary version tested
+- verification status
+- npm status
+- git tag / GitHub Release status
+- website / announcement follow-up status
+- rollback recommendation if anything is still partially complete
--- a/.github/workflows/e2e.yml
+++ b/.github/workflows/e2e.yml
@@ -0,0 +1,44 @@
+name: E2E Tests
+
+on:
+  workflow_dispatch:
+    inputs:
+      skip_llm:
+        description: "Skip LLM-dependent assertions (default: true)"
+        type: boolean
+        default: true
+
+jobs:
+  e2e:
+    runs-on: ubuntu-latest
+    timeout-minutes: 30
+    env:
+      PAPERCLIP_E2E_SKIP_LLM: ${{ inputs.skip_llm && 'true' || 'false' }}
+      ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
+    steps:
+      - uses: actions/checkout@v4
+
+      - uses: pnpm/action-setup@v4
+        with:
+          version: 9
+
+      - uses: actions/setup-node@v4
+        with:
+          node-version: 20
+          cache: pnpm
+
+      - run: pnpm install --frozen-lockfile
+      - run: pnpm build
+      - run: npx playwright install --with-deps chromium
+
+      - name: Run e2e tests
+        run: pnpm run test:e2e
+
+      - uses: actions/upload-artifact@v4
+        if: always()
+        with:
+          name: playwright-report
+          path: |
+            tests/e2e/playwright-report/
+            tests/e2e/test-results/
+          retention-days: 14
--- a/.github/workflows/pr-policy.yml
+++ b/.github/workflows/pr-policy.yml
@@ -0,0 +1,49 @@
+name: PR Policy
+
+on:
+  pull_request:
+    branches:
+      - master
+
+concurrency:
+  group: pr-policy-${{ github.event.pull_request.number }}
+  cancel-in-progress: true
+
+jobs:
+  policy:
+    runs-on: ubuntu-latest
+    timeout-minutes: 10
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+
+      - name: Setup pnpm
+        uses: pnpm/action-setup@v4
+        with:
+          version: 9.15.4
+          run_install: false
+
+      - name: Setup Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version: 20
+
+      - name: Block manual lockfile edits
+        if: github.head_ref != 'chore/refresh-lockfile'
+        run: |
+          changed="$(git diff --name-only "${{ github.event.pull_request.base.sha }}" "${{ github.event.pull_request.head.sha }}")"
+          if printf '%s\n' "$changed" | grep -qx 'pnpm-lock.yaml'; then
+            echo "Do not commit pnpm-lock.yaml in pull requests. CI owns lockfile updates."
+            exit 1
+          fi
+
+      - name: Validate dependency resolution when manifests change
+        run: |
+          changed="$(git diff --name-only "${{ github.event.pull_request.base.sha }}" "${{ github.event.pull_request.head.sha }}")"
+          manifest_pattern='(^|/)package\.json$|^pnpm-workspace\.yaml$|^\.npmrc$|^pnpmfile\.(cjs|js|mjs)$'
+          if printf '%s\n' "$changed" | grep -Eq "$manifest_pattern"; then
+            pnpm install --lockfile-only --ignore-scripts --no-frozen-lockfile
+          fi
--- a/.github/workflows/pr-verify.yml
+++ b/.github/workflows/pr-verify.yml
@@ -0,0 +1,42 @@
+name: PR Verify
+
+on:
+  pull_request:
+    branches:
+      - master
+
+concurrency:
+  group: pr-verify-${{ github.event.pull_request.number }}
+  cancel-in-progress: true
+
+jobs:
+  verify:
+    runs-on: ubuntu-latest
+    timeout-minutes: 20
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4
+
+      - name: Setup pnpm
+        uses: pnpm/action-setup@v4
+        with:
+          version: 9.15.4
+
+      - name: Setup Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version: 20
+          cache: pnpm
+
+      - name: Install dependencies
+        run: pnpm install --no-frozen-lockfile
+
+      - name: Typecheck
+        run: pnpm -r typecheck
+
+      - name: Run tests
+        run: pnpm test:run
+
+      - name: Build
+        run: pnpm build
--- a/.github/workflows/refresh-lockfile.yml
+++ b/.github/workflows/refresh-lockfile.yml
@@ -0,0 +1,81 @@
+name: Refresh Lockfile
+
+on:
+  push:
+    branches:
+      - master
+  workflow_dispatch:
+
+concurrency:
+  group: refresh-lockfile-master
+  cancel-in-progress: false
+
+jobs:
+  refresh:
+    runs-on: ubuntu-latest
+    timeout-minutes: 10
+    permissions:
+      contents: write
+      pull-requests: write
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4
+
+      - name: Setup pnpm
+        uses: pnpm/action-setup@v4
+        with:
+          version: 9.15.4
+          run_install: false
+
+      - name: Setup Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version: 20
+          cache: pnpm
+
+      - name: Refresh pnpm lockfile
+        run: pnpm install --lockfile-only --ignore-scripts --no-frozen-lockfile
+
+      - name: Fail on unexpected file changes
+        run: |
+          changed="$(git status --porcelain)"
+          if [ -z "$changed" ]; then
+            echo "Lockfile is already up to date."
+            exit 0
+          fi
+          if printf '%s\n' "$changed" | grep -Fvq ' pnpm-lock.yaml'; then
+            echo "Unexpected files changed during lockfile refresh:"
+            echo "$changed"
+            exit 1
+          fi
+
+      - name: Create or update pull request
+        env:
+          GH_TOKEN: ${{ github.token }}
+        run: |
+          if git diff --quiet -- pnpm-lock.yaml; then
+            echo "Lockfile unchanged, nothing to do."
+            exit 0
+          fi
+
+          BRANCH="chore/refresh-lockfile"
+          git config user.name "lockfile-bot"
+          git config user.email "lockfile-bot@users.noreply.github.com"
+
+          git checkout -B "$BRANCH"
+          git add pnpm-lock.yaml
+          git commit -m "chore(lockfile): refresh pnpm-lock.yaml"
+          git push --force origin "$BRANCH"
+
+          # Create PR if one doesn't already exist
+          existing=$(gh pr list --head "$BRANCH" --json number --jq '.[0].number')
+          if [ -z "$existing" ]; then
+            gh pr create \
+              --head "$BRANCH" \
+              --title "chore(lockfile): refresh pnpm-lock.yaml" \
+              --body "Auto-generated lockfile refresh after dependencies changed on master. This PR only updates pnpm-lock.yaml."
+            echo "Created new PR."
+          else
+            echo "PR #$existing already exists, branch updated via force push."
+          fi
--- a/.github/workflows/release.yml
+++ b/.github/workflows/release.yml
@@ -0,0 +1,132 @@
+name: Release
+
+on:
+  workflow_dispatch:
+    inputs:
+      channel:
+        description: Release channel
+        required: true
+        type: choice
+        default: canary
+        options:
+          - canary
+          - stable
+      bump:
+        description: Semantic version bump
+        required: true
+        type: choice
+        default: patch
+        options:
+          - patch
+          - minor
+          - major
+      dry_run:
+        description: Preview the release without publishing
+        required: true
+        type: boolean
+        default: true
+
+concurrency:
+  group: release-${{ github.ref }}
+  cancel-in-progress: false
+
+jobs:
+  verify:
+    if: startsWith(github.ref, 'refs/heads/release/')
+    runs-on: ubuntu-latest
+    timeout-minutes: 30
+    permissions:
+      contents: read
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+
+      - name: Setup pnpm
+        uses: pnpm/action-setup@v4
+        with:
+          version: 9.15.4
+
+      - name: Setup Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version: 24
+          cache: pnpm
+
+      - name: Install dependencies
+        run: pnpm install --frozen-lockfile
+
+      - name: Typecheck
+        run: pnpm -r typecheck
+
+      - name: Run tests
+        run: pnpm test:run
+
+      - name: Build
+        run: pnpm build
+
+  publish:
+    if: startsWith(github.ref, 'refs/heads/release/')
+    needs: verify
+    runs-on: ubuntu-latest
+    timeout-minutes: 45
+    environment: npm-release
+    permissions:
+      contents: write
+      id-token: write
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+
+      - name: Setup pnpm
+        uses: pnpm/action-setup@v4
+        with:
+          version: 9.15.4
+
+      - name: Setup Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version: 24
+          cache: pnpm
+
+      - name: Install dependencies
+        run: pnpm install --frozen-lockfile
+
+      - name: Configure git author
+        run: |
+          git config user.name "github-actions[bot]"
+          git config user.email "41898282+github-actions[bot]@users.noreply.github.com"
+
+      - name: Run release script
+        env:
+          GITHUB_ACTIONS: "true"
+        run: |
+          args=("${{ inputs.bump }}")
+          if [ "${{ inputs.channel }}" = "canary" ]; then
+            args+=("--canary")
+          fi
+          if [ "${{ inputs.dry_run }}" = "true" ]; then
+            args+=("--dry-run")
+          fi
+          ./scripts/release.sh "${args[@]}"
+
+      - name: Push stable release branch commit and tag
+        if: inputs.channel == 'stable' && !inputs.dry_run
+        run: git push origin "HEAD:${GITHUB_REF_NAME}" --follow-tags
+
+      - name: Create GitHub Release
+        if: inputs.channel == 'stable' && !inputs.dry_run
+        env:
+          GH_TOKEN: ${{ github.token }}
+        run: |
+          version="$(git tag --points-at HEAD | grep '^v' | head -1 | sed 's/^v//')"
+          if [ -z "$version" ]; then
+            echo "Error: no v* tag points at HEAD after stable release." >&2
+            exit 1
+          fi
+          ./scripts/create-github-release.sh "$version"
--- a/.gitignore
+++ b/.gitignore
@@ -36,4 +36,8 @@ tmp/
 *.tmp
 .vscode/
 .claude/settings.local.json
-.paperclip-local/
+.paperclip-local/
+
+# Playwright
+tests/e2e/test-results/
+tests/e2e/playwright-report/
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -78,6 +78,9 @@ If you change schema/API behavior, update all impacted layers:
 4. Do not replace strategic docs wholesale unless asked.
 Prefer additive updates. Keep `doc/SPEC.md` and `doc/SPEC-implementation.md` aligned.

+5. Keep plan docs dated and centralized.
+New plan documents belong in `doc/plans/` and should use `YYYY-MM-DD-slug.md` filenames.
+
 ## 6. Database Change Workflow

 When changing data model:
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -0,0 +1,41 @@
+# Contributing Guide
+
+Thanks for wanting to contribute!
+
+We really appreciate both small fixes and thoughtful larger changes.
+
+## Two Paths to Get Your Pull Request Accepted
+
+### Path 1: Small, Focused Changes (Fastest way to get merged)
+- Pick **one** clear thing to fix/improve
+- Touch the **smallest possible number of files**
+- Make sure the change is very targeted and easy to review
+- All automated checks pass (including Greptile comments)
+- No new lint/test failures
+
+These almost always get merged quickly when they're clean.
+
+### Path 2: Bigger or Impactful Changes
+- **First** talk about it in Discord → #dev channel  
+  → Describe what you're trying to solve  
+  → Share rough ideas / approach
+- Once there's rough agreement, build it
+- In your PR include:
+  - Before / After screenshots (or short video if UI/behavior change)
+  - Clear description of what & why
+  - Proof it works (manual testing notes)
+  - All tests passing
+  - All Greptile + other PR comments addressed
+
+PRs that follow this path are **much** more likely to be accepted, even when they're large.
+
+## General Rules (both paths)
+- Write clear commit messages
+- Keep PR title + description meaningful
+- One PR = one logical change (unless it's a small related group)
+- Run tests locally first
+- Be kind in discussions 😄
+
+Questions? Just ask in #dev — we're happy to help.
+
+Happy hacking!
--- a/22
+++ b/22
@@ -1,4 +1,4 @@
-FROM node:20-bookworm-slim AS base
+FROM node:lts-trixie-slim AS base
 RUN apt-get update \
  && apt-get install -y --no-install-recommends ca-certificates curl git \
  && rm -rf /var/lib/apt/lists/*
@@ -15,19 +15,28 @@ COPY packages/db/package.json packages/db/
 COPY packages/adapter-utils/package.json packages/adapter-utils/
 COPY packages/adapters/claude-local/package.json packages/adapters/claude-local/
 COPY packages/adapters/codex-local/package.json packages/adapters/codex-local/
+COPY packages/adapters/cursor-local/package.json packages/adapters/cursor-local/
+COPY packages/adapters/gemini-local/package.json packages/adapters/gemini-local/
+COPY packages/adapters/openclaw-gateway/package.json packages/adapters/openclaw-gateway/
+COPY packages/adapters/opencode-local/package.json packages/adapters/opencode-local/
+COPY packages/adapters/pi-local/package.json packages/adapters/pi-local/
+
 RUN pnpm install --frozen-lockfile

 FROM base AS build
 WORKDIR /app
 COPY --from=deps /app /app
 COPY . .
-RUN pnpm --filter @paperclip/ui build
-RUN pnpm --filter @paperclip/server build
+RUN pnpm --filter @paperclipai/ui build
+RUN pnpm --filter @paperclipai/server build
+RUN test -f server/dist/index.js || (echo "ERROR: server build output missing" && exit 1)

 FROM base AS production
 WORKDIR /app
-COPY --from=build /app /app
-RUN npm install --global --omit=dev @anthropic-ai/claude-code@latest @openai/codex@latest
+COPY --chown=node:node --from=build /app /app
+RUN npm install --global --omit=dev @anthropic-ai/claude-code@latest @openai/codex@latest opencode-ai \
+  && mkdir -p /paperclip \
+  && chown node:node /paperclip

 ENV NODE_ENV=production \
  HOME=/paperclip \
@@ -37,10 +46,11 @@ ENV NODE_ENV=production \
  PAPERCLIP_HOME=/paperclip \
  PAPERCLIP_INSTANCE_ID=default \
  PAPERCLIP_CONFIG=/paperclip/instances/default/config.json \
-  PAPERCLIP_DEPLOYMENT_MODE=local_trusted \
+  PAPERCLIP_DEPLOYMENT_MODE=authenticated \
  PAPERCLIP_DEPLOYMENT_EXPOSURE=private

 VOLUME ["/paperclip"]
 EXPOSE 3100

+USER node
 CMD ["node", "--import", "./server/node_modules/tsx/dist/loader.mjs", "server/dist/index.js"]
--- a/Dockerfile.onboard-smoke
+++ b/Dockerfile.onboard-smoke
@@ -0,0 +1,40 @@
+FROM ubuntu:24.04
+
+ARG NODE_MAJOR=20
+ARG PAPERCLIPAI_VERSION=latest
+ARG HOST_UID=10001
+
+ENV DEBIAN_FRONTEND=noninteractive \
+  PAPERCLIP_HOME=/paperclip \
+  PAPERCLIP_OPEN_ON_LISTEN=false \
+  HOST=0.0.0.0 \
+  PORT=3100 \
+  HOME=/home/paperclip \
+  LANG=en_US.UTF-8 \
+  LC_ALL=en_US.UTF-8 \
+  NPM_CONFIG_UPDATE_NOTIFIER=false \
+  NODE_MAJOR=${NODE_MAJOR} \
+  PAPERCLIPAI_VERSION=${PAPERCLIPAI_VERSION}
+
+RUN apt-get update \
+  && apt-get install -y --no-install-recommends ca-certificates curl gnupg locales \
+  && mkdir -p /etc/apt/keyrings \
+  && curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key \
+    | gpg --dearmor -o /etc/apt/keyrings/nodesource.gpg \
+  && echo "deb [signed-by=/etc/apt/keyrings/nodesource.gpg] https://deb.nodesource.com/node_${NODE_MAJOR}.x nodistro main" \
+    > /etc/apt/sources.list.d/nodesource.list \
+  && apt-get update \
+  && apt-get install -y --no-install-recommends nodejs \
+  && locale-gen en_US.UTF-8 \
+  && groupadd --gid 10001 paperclip \
+  && useradd --create-home --shell /bin/bash --uid "${HOST_UID}" --gid 10001 paperclip \
+  && mkdir -p /paperclip /home/paperclip/workspace \
+  && chown -R paperclip:paperclip /paperclip /home/paperclip \
+  && rm -rf /var/lib/apt/lists/*
+
+VOLUME ["/paperclip"]
+WORKDIR /home/paperclip/workspace
+EXPOSE 3100
+USER paperclip
+
+CMD ["bash", "-lc", "set -euo pipefail; mkdir -p \"$PAPERCLIP_HOME\"; npx --yes \"paperclipai@${PAPERCLIPAI_VERSION}\" onboard --yes --data-dir \"$PAPERCLIP_HOME\""]
--- a/21
+++ b/21
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2025 Paperclip AI
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
--- a/README.md
+++ b/README.md
@@ -4,15 +4,15 @@

 <p align="center">
  <a href="#quickstart"><strong>Quickstart</strong></a> &middot;
-  <a href="https://paperclip.dev/docs"><strong>Docs</strong></a> &middot;
+  <a href="https://paperclip.ing/docs"><strong>Docs</strong></a> &middot;
  <a href="https://github.com/paperclipai/paperclip"><strong>GitHub</strong></a> &middot;
-  <a href="https://discord.gg/paperclip"><strong>Discord</strong></a>
+  <a href="https://discord.gg/m4HZY7xNG3"><strong>Discord</strong></a>
 </p>

 <p align="center">
  <a href="https://github.com/paperclipai/paperclip/blob/master/LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue" alt="MIT License" /></a>
  <a href="https://github.com/paperclipai/paperclip/stargazers"><img src="https://img.shields.io/github/stars/paperclipai/paperclip?style=flat" alt="Stars" /></a>
-  <a href="https://discord.gg/paperclip"><img src="https://img.shields.io/discord/000000000?label=discord" alt="Discord" /></a>
+  <a href="https://discord.gg/m4HZY7xNG3"><img src="https://img.shields.io/discord/000000000?label=discord" alt="Discord" /></a>
 </p>

 <br/>
@@ -174,7 +174,7 @@ Paperclip handles the hard orchestration details correctly.
 Open source. Self-hosted. No Paperclip account required.

 ```bash
-npx paperclipai onboard
+npx paperclipai onboard --yes
 ```

 Or manually:
@@ -218,7 +218,8 @@ By default, agents run on scheduled heartbeats and event-based triggers (task as
 ## Development

 ```bash
-pnpm dev              # Full dev (API + UI)
+pnpm dev              # Full dev (API + UI, watch mode)
+pnpm dev:once         # Full dev without file watching
 pnpm dev:server       # Server only
 pnpm build            # Build all
 pnpm typecheck        # Type checking
@@ -233,9 +234,13 @@ See [doc/DEVELOPING.md](doc/DEVELOPING.md) for the full development guide.

 ## Roadmap

- 🛒 **Clipmart** — Download and share entire company architectures
- 🧩 **Plugin System** — Embed custom plugins (e.g. Reporting, Knowledge Base) into Paperclip
- ☁️ **Cloud Agent Adapters** — Add more adapters for cloud-hosted agents
+- ⚪ Get OpenClaw onboarding easier
+- ⚪ Get cloud agents working e.g. Cursor / e2b agents
+- ⚪ ClipMart - buy and sell entire agent companies
+- ⚪ Easy agent configurations / easier to understand
+- ⚪ Better support for harness engineering
+- ⚪ Plugin system (e.g. if you want to add a knowledgebase, custom tracing, queues, etc)
+- ⚪ Better docs

 <br/>

@@ -243,13 +248,11 @@ See [doc/DEVELOPING.md](doc/DEVELOPING.md) for the full development guide.

 We welcome contributions. See the [contributing guide](CONTRIBUTING.md) for details.

-<!-- TODO: add CONTRIBUTING.md -->
-
 <br/>

 ## Community

- [Discord](#) — Coming soon
+- [Discord](https://discord.gg/m4HZY7xNG3) — Join the community
 - [GitHub Issues](https://github.com/paperclipai/paperclip/issues) — bugs and feature requests
 - [GitHub Discussions](https://github.com/paperclipai/paperclip/discussions) — ideas and RFC

@@ -259,6 +262,10 @@ We welcome contributions. See the [contributing guide](CONTRIBUTING.md) for deta

 MIT &copy; 2026 Paperclip

+## Star History
+
+[![Star History Chart](https://api.star-history.com/image?repos=paperclipai/paperclip&type=date&legend=top-left)](https://www.star-history.com/?repos=paperclipai%2Fpaperclip&type=date&legend=top-left)
+
 <br/>

 ---
--- a/cli/CHANGELOG.md
+++ b/cli/CHANGELOG.md
@@ -1,5 +1,128 @@
 # paperclipai

+## 0.3.1
+
+### Patch Changes
+
+- Stable release preparation for 0.3.1
+- Updated dependencies
+  - @paperclipai/adapter-utils@0.3.1
+  - @paperclipai/adapter-claude-local@0.3.1
+  - @paperclipai/adapter-codex-local@0.3.1
+  - @paperclipai/adapter-cursor-local@0.3.1
+  - @paperclipai/adapter-gemini-local@0.3.1
+  - @paperclipai/adapter-openclaw-gateway@0.3.1
+  - @paperclipai/adapter-opencode-local@0.3.1
+  - @paperclipai/adapter-pi-local@0.3.1
+  - @paperclipai/db@0.3.1
+  - @paperclipai/shared@0.3.1
+  - @paperclipai/server@0.3.1
+
+## 0.3.0
+
+### Minor Changes
+
+- Stable release preparation for 0.3.0
+
+### Patch Changes
+
+- Updated dependencies [6077ae6]
+- Updated dependencies
+  - @paperclipai/shared@0.3.0
+  - @paperclipai/adapter-utils@0.3.0
+  - @paperclipai/adapter-claude-local@0.3.0
+  - @paperclipai/adapter-codex-local@0.3.0
+  - @paperclipai/adapter-cursor-local@0.3.0
+  - @paperclipai/adapter-openclaw-gateway@0.3.0
+  - @paperclipai/adapter-opencode-local@0.3.0
+  - @paperclipai/adapter-pi-local@0.3.0
+  - @paperclipai/db@0.3.0
+  - @paperclipai/server@0.3.0
+
+## 0.2.7
+
+### Patch Changes
+
+- Version bump (patch)
+- Updated dependencies
+  - @paperclipai/shared@0.2.7
+  - @paperclipai/adapter-utils@0.2.7
+  - @paperclipai/db@0.2.7
+  - @paperclipai/adapter-claude-local@0.2.7
+  - @paperclipai/adapter-codex-local@0.2.7
+  - @paperclipai/adapter-openclaw@0.2.7
+  - @paperclipai/server@0.2.7
+
+## 0.2.6
+
+### Patch Changes
+
+- Version bump (patch)
+- Updated dependencies
+  - @paperclipai/shared@0.2.6
+  - @paperclipai/adapter-utils@0.2.6
+  - @paperclipai/db@0.2.6
+  - @paperclipai/adapter-claude-local@0.2.6
+  - @paperclipai/adapter-codex-local@0.2.6
+  - @paperclipai/adapter-openclaw@0.2.6
+  - @paperclipai/server@0.2.6
+
+## 0.2.5
+
+### Patch Changes
+
+- Version bump (patch)
+- Updated dependencies
+  - @paperclipai/shared@0.2.5
+  - @paperclipai/adapter-utils@0.2.5
+  - @paperclipai/db@0.2.5
+  - @paperclipai/adapter-claude-local@0.2.5
+  - @paperclipai/adapter-codex-local@0.2.5
+  - @paperclipai/adapter-openclaw@0.2.5
+  - @paperclipai/server@0.2.5
+
+## 0.2.4
+
+### Patch Changes
+
+- Version bump (patch)
+- Updated dependencies
+  - @paperclipai/shared@0.2.4
+  - @paperclipai/adapter-utils@0.2.4
+  - @paperclipai/db@0.2.4
+  - @paperclipai/adapter-claude-local@0.2.4
+  - @paperclipai/adapter-codex-local@0.2.4
+  - @paperclipai/adapter-openclaw@0.2.4
+  - @paperclipai/server@0.2.4
+
+## 0.2.3
+
+### Patch Changes
+
+- Version bump (patch)
+- Updated dependencies
+  - @paperclipai/shared@0.2.3
+  - @paperclipai/adapter-utils@0.2.3
+  - @paperclipai/db@0.2.3
+  - @paperclipai/adapter-claude-local@0.2.3
+  - @paperclipai/adapter-codex-local@0.2.3
+  - @paperclipai/adapter-openclaw@0.2.3
+  - @paperclipai/server@0.2.3
+
+## 0.2.2
+
+### Patch Changes
+
+- Version bump (patch)
+- Updated dependencies
+  - @paperclipai/shared@0.2.2
+  - @paperclipai/adapter-utils@0.2.2
+  - @paperclipai/db@0.2.2
+  - @paperclipai/adapter-claude-local@0.2.2
+  - @paperclipai/adapter-codex-local@0.2.2
+  - @paperclipai/adapter-openclaw@0.2.2
+  - @paperclipai/server@0.2.2
+
 ## 0.2.1

 ### Patch Changes
--- a/cli/esbuild.config.mjs
+++ b/cli/esbuild.config.mjs
@@ -21,7 +21,7 @@ const workspacePaths = [
  "packages/adapter-utils",
  "packages/adapters/claude-local",
  "packages/adapters/codex-local",
-  "packages/adapters/openclaw",
+  "packages/adapters/openclaw-gateway",
 ];

 // Workspace packages that should NOT be bundled — they'll be published
--- a/cli/package.json
+++ b/cli/package.json
@@ -1,6 +1,6 @@
 {
  "name": "paperclipai",
-  "version": "0.2.1",
+  "version": "0.3.1",
  "description": "Paperclip CLI — orchestrate AI agent teams to run a business",
  "type": "module",
  "bin": {
@@ -36,7 +36,11 @@
    "@clack/prompts": "^0.10.0",
    "@paperclipai/adapter-claude-local": "workspace:*",
    "@paperclipai/adapter-codex-local": "workspace:*",
-    "@paperclipai/adapter-openclaw": "workspace:*",
+    "@paperclipai/adapter-cursor-local": "workspace:*",
+    "@paperclipai/adapter-gemini-local": "workspace:*",
+    "@paperclipai/adapter-opencode-local": "workspace:*",
+    "@paperclipai/adapter-pi-local": "workspace:*",
+    "@paperclipai/adapter-openclaw-gateway": "workspace:*",
    "@paperclipai/adapter-utils": "workspace:*",
    "@paperclipai/db": "workspace:*",
    "@paperclipai/server": "workspace:*",
@@ -44,6 +48,7 @@
    "drizzle-orm": "0.38.4",
    "dotenv": "^17.0.1",
    "commander": "^13.1.0",
+    "embedded-postgres": "^18.1.0-beta.16",
    "picocolors": "^1.1.1"
  },
  "devDependencies": {
--- a/cli/src/tests/agent-jwt-env.test.ts
+++ b/cli/src/tests/agent-jwt-env.test.ts
@@ -4,7 +4,9 @@ import path from "node:path";
 import { afterEach, beforeEach, describe, expect, it } from "vitest";
 import {
  ensureAgentJwtSecret,
+  mergePaperclipEnvEntries,
  readAgentJwtSecretFromEnv,
+  readPaperclipEnvEntries,
  resolveAgentJwtEnvFile,
 } from "../config/env.js";
 import { agentJwtSecretCheck } from "../checks/agent-jwt-secret-check.js";
@@ -58,4 +60,20 @@ describe("agent jwt env helpers", () => {
    const result = agentJwtSecretCheck(configPath);
    expect(result.status).toBe("pass");
  });
+
+  it("quotes hash-prefixed env values so dotenv round-trips them", () => {
+    const configPath = tempConfigPath();
+    const envPath = resolveAgentJwtEnvFile(configPath);
+
+    mergePaperclipEnvEntries(
+      {
+        PAPERCLIP_WORKTREE_COLOR: "#439edb",
+      },
+      envPath,
+    );
+
+    const contents = fs.readFileSync(envPath, "utf-8");
+    expect(contents).toContain('PAPERCLIP_WORKTREE_COLOR="#439edb"');
+    expect(readPaperclipEnvEntries(envPath).PAPERCLIP_WORKTREE_COLOR).toBe("#439edb");
+  });
 });
--- a/cli/src/tests/allowed-hostname.test.ts
+++ b/cli/src/tests/allowed-hostname.test.ts
@@ -21,6 +21,12 @@ function writeBaseConfig(configPath: string) {
      mode: "embedded-postgres",
      embeddedPostgresDataDir: "/tmp/paperclip-db",
      embeddedPostgresPort: 54329,
+      backup: {
+        enabled: true,
+        intervalMinutes: 60,
+        retentionDays: 30,
+        dir: "/tmp/paperclip-backups",
+      },
    },
    logging: {
      mode: "file",
@@ -36,6 +42,7 @@ function writeBaseConfig(configPath: string) {
    },
    auth: {
      baseUrlMode: "auto",
+      disableSignUp: false,
    },
    storage: {
      provider: "local_disk",
@@ -68,4 +75,3 @@ describe("allowed-hostname command", () => {
    expect(raw.server.allowedHostnames).toEqual(["dotta-macbook-pro"]);
  });
 });
-
--- a/cli/src/tests/doctor.test.ts
+++ b/cli/src/tests/doctor.test.ts
@@ -0,0 +1,99 @@
+import fs from "node:fs";
+import os from "node:os";
+import path from "node:path";
+import { afterEach, beforeEach, describe, expect, it } from "vitest";
+import { doctor } from "../commands/doctor.js";
+import { writeConfig } from "../config/store.js";
+import type { PaperclipConfig } from "../config/schema.js";
+
+const ORIGINAL_ENV = { ...process.env };
+
+function createTempConfig(): string {
+  const root = fs.mkdtempSync(path.join(os.tmpdir(), "paperclip-doctor-"));
+  const configPath = path.join(root, ".paperclip", "config.json");
+  const runtimeRoot = path.join(root, "runtime");
+
+  const config: PaperclipConfig = {
+    $meta: {
+      version: 1,
+      updatedAt: "2026-03-10T00:00:00.000Z",
+      source: "configure",
+    },
+    database: {
+      mode: "embedded-postgres",
+      embeddedPostgresDataDir: path.join(runtimeRoot, "db"),
+      embeddedPostgresPort: 55432,
+      backup: {
+        enabled: true,
+        intervalMinutes: 60,
+        retentionDays: 30,
+        dir: path.join(runtimeRoot, "backups"),
+      },
+    },
+    logging: {
+      mode: "file",
+      logDir: path.join(runtimeRoot, "logs"),
+    },
+    server: {
+      deploymentMode: "local_trusted",
+      exposure: "private",
+      host: "127.0.0.1",
+      port: 3199,
+      allowedHostnames: [],
+      serveUi: true,
+    },
+    auth: {
+      baseUrlMode: "auto",
+      disableSignUp: false,
+    },
+    storage: {
+      provider: "local_disk",
+      localDisk: {
+        baseDir: path.join(runtimeRoot, "storage"),
+      },
+      s3: {
+        bucket: "paperclip",
+        region: "us-east-1",
+        prefix: "",
+        forcePathStyle: false,
+      },
+    },
+    secrets: {
+      provider: "local_encrypted",
+      strictMode: false,
+      localEncrypted: {
+        keyFilePath: path.join(runtimeRoot, "secrets", "master.key"),
+      },
+    },
+  };
+
+  writeConfig(config, configPath);
+  return configPath;
+}
+
+describe("doctor", () => {
+  beforeEach(() => {
+    process.env = { ...ORIGINAL_ENV };
+    delete process.env.PAPERCLIP_AGENT_JWT_SECRET;
+    delete process.env.PAPERCLIP_SECRETS_MASTER_KEY;
+    delete process.env.PAPERCLIP_SECRETS_MASTER_KEY_FILE;
+  });
+
+  afterEach(() => {
+    process.env = { ...ORIGINAL_ENV };
+  });
+
+  it("re-runs repairable checks so repaired failures do not remain blocking", async () => {
+    const configPath = createTempConfig();
+
+    const summary = await doctor({
+      config: configPath,
+      repair: true,
+      yes: true,
+    });
+
+    expect(summary.failed).toBe(0);
+    expect(summary.warned).toBe(0);
+    expect(process.env.PAPERCLIP_AGENT_JWT_SECRET).toBeTruthy();
+  });
+});
--- a/cli/src/tests/worktree.test.ts
+++ b/cli/src/tests/worktree.test.ts
@@ -0,0 +1,472 @@
+import fs from "node:fs";
+import os from "node:os";
+import path from "node:path";
+import { execFileSync } from "node:child_process";
+import { afterEach, describe, expect, it, vi } from "vitest";
+import {
+  copyGitHooksToWorktreeGitDir,
+  copySeededSecretsKey,
+  rebindWorkspaceCwd,
+  resolveSourceConfigPath,
+  resolveGitWorktreeAddArgs,
+  resolveWorktreeMakeTargetPath,
+  worktreeInitCommand,
+  worktreeMakeCommand,
+} from "../commands/worktree.js";
+import {
+  buildWorktreeConfig,
+  buildWorktreeEnvEntries,
+  formatShellExports,
+  generateWorktreeColor,
+  resolveWorktreeSeedPlan,
+  resolveWorktreeLocalPaths,
+  rewriteLocalUrlPort,
+  sanitizeWorktreeInstanceId,
+} from "../commands/worktree-lib.js";
+import type { PaperclipConfig } from "../config/schema.js";
+
+const ORIGINAL_CWD = process.cwd();
+const ORIGINAL_ENV = { ...process.env };
+
+afterEach(() => {
+  process.chdir(ORIGINAL_CWD);
+  for (const key of Object.keys(process.env)) {
+    if (!(key in ORIGINAL_ENV)) delete process.env[key];
+  }
+  for (const [key, value] of Object.entries(ORIGINAL_ENV)) {
+    if (value === undefined) delete process.env[key];
+    else process.env[key] = value;
+  }
+});
+
+function buildSourceConfig(): PaperclipConfig {
+  return {
+    $meta: {
+      version: 1,
+      updatedAt: "2026-03-09T00:00:00.000Z",
+      source: "configure",
+    },
+    database: {
+      mode: "embedded-postgres",
+      embeddedPostgresDataDir: "/tmp/main/db",
+      embeddedPostgresPort: 54329,
+      backup: {
+        enabled: true,
+        intervalMinutes: 60,
+        retentionDays: 30,
+        dir: "/tmp/main/backups",
+      },
+    },
+    logging: {
+      mode: "file",
+      logDir: "/tmp/main/logs",
+    },
+    server: {
+      deploymentMode: "authenticated",
+      exposure: "private",
+      host: "127.0.0.1",
+      port: 3100,
+      allowedHostnames: ["localhost"],
+      serveUi: true,
+    },
+    auth: {
+      baseUrlMode: "explicit",
+      publicBaseUrl: "http://127.0.0.1:3100",
+      disableSignUp: false,
+    },
+    storage: {
+      provider: "local_disk",
+      localDisk: {
+        baseDir: "/tmp/main/storage",
+      },
+      s3: {
+        bucket: "paperclip",
+        region: "us-east-1",
+        prefix: "",
+        forcePathStyle: false,
+      },
+    },
+    secrets: {
+      provider: "local_encrypted",
+      strictMode: false,
+      localEncrypted: {
+        keyFilePath: "/tmp/main/secrets/master.key",
+      },
+    },
+  };
+}
+
+describe("worktree helpers", () => {
+  it("sanitizes instance ids", () => {
+    expect(sanitizeWorktreeInstanceId("feature/worktree-support")).toBe("feature-worktree-support");
+    expect(sanitizeWorktreeInstanceId("  ")).toBe("worktree");
+  });
+
+  it("resolves worktree:make target paths under the user home directory", () => {
+    expect(resolveWorktreeMakeTargetPath("paperclip-pr-432")).toBe(
+      path.resolve(os.homedir(), "paperclip-pr-432"),
+    );
+  });
+
+  it("rejects worktree:make names that are not safe directory/branch names", () => {
+    expect(() => resolveWorktreeMakeTargetPath("paperclip/pr-432")).toThrow(
+      "Worktree name must contain only letters, numbers, dots, underscores, or dashes.",
+    );
+  });
+
+  it("builds git worktree add args for new and existing branches", () => {
+    expect(
+      resolveGitWorktreeAddArgs({
+        branchName: "feature-branch",
+        targetPath: "/tmp/feature-branch",
+        branchExists: false,
+      }),
+    ).toEqual(["worktree", "add", "-b", "feature-branch", "/tmp/feature-branch", "HEAD"]);
+
+    expect(
+      resolveGitWorktreeAddArgs({
+        branchName: "feature-branch",
+        targetPath: "/tmp/feature-branch",
+        branchExists: true,
+      }),
+    ).toEqual(["worktree", "add", "/tmp/feature-branch", "feature-branch"]);
+  });
+
+  it("builds git worktree add args with a start point", () => {
+    expect(
+      resolveGitWorktreeAddArgs({
+        branchName: "my-worktree",
+        targetPath: "/tmp/my-worktree",
+        branchExists: false,
+        startPoint: "public-gh/master",
+      }),
+    ).toEqual(["worktree", "add", "-b", "my-worktree", "/tmp/my-worktree", "public-gh/master"]);
+  });
+
+  it("uses start point even when a local branch with the same name exists", () => {
+    expect(
+      resolveGitWorktreeAddArgs({
+        branchName: "my-worktree",
+        targetPath: "/tmp/my-worktree",
+        branchExists: true,
+        startPoint: "origin/main",
+      }),
+    ).toEqual(["worktree", "add", "-b", "my-worktree", "/tmp/my-worktree", "origin/main"]);
+  });
+
+  it("rewrites loopback auth URLs to the new port only", () => {
+    expect(rewriteLocalUrlPort("http://127.0.0.1:3100", 3110)).toBe("http://127.0.0.1:3110/");
+    expect(rewriteLocalUrlPort("https://paperclip.example", 3110)).toBe("https://paperclip.example");
+  });
+
+  it("builds isolated config and env paths for a worktree", () => {
+    const paths = resolveWorktreeLocalPaths({
+      cwd: "/tmp/paperclip-feature",
+      homeDir: "/tmp/paperclip-worktrees",
+      instanceId: "feature-worktree-support",
+    });
+    const config = buildWorktreeConfig({
+      sourceConfig: buildSourceConfig(),
+      paths,
+      serverPort: 3110,
+      databasePort: 54339,
+      now: new Date("2026-03-09T12:00:00.000Z"),
+    });
+
+    expect(config.database.embeddedPostgresDataDir).toBe(
+      path.resolve("/tmp/paperclip-worktrees", "instances", "feature-worktree-support", "db"),
+    );
+    expect(config.database.embeddedPostgresPort).toBe(54339);
+    expect(config.server.port).toBe(3110);
+    expect(config.auth.publicBaseUrl).toBe("http://127.0.0.1:3110/");
+    expect(config.storage.localDisk.baseDir).toBe(
+      path.resolve("/tmp/paperclip-worktrees", "instances", "feature-worktree-support", "data", "storage"),
+    );
+
+    const env = buildWorktreeEnvEntries(paths, {
+      name: "feature-worktree-support",
+      color: "#3abf7a",
+    });
+    expect(env.PAPERCLIP_HOME).toBe(path.resolve("/tmp/paperclip-worktrees"));
+    expect(env.PAPERCLIP_INSTANCE_ID).toBe("feature-worktree-support");
+    expect(env.PAPERCLIP_IN_WORKTREE).toBe("true");
+    expect(env.PAPERCLIP_WORKTREE_NAME).toBe("feature-worktree-support");
+    expect(env.PAPERCLIP_WORKTREE_COLOR).toBe("#3abf7a");
+    expect(formatShellExports(env)).toContain("export PAPERCLIP_INSTANCE_ID='feature-worktree-support'");
+  });
+
+  it("generates vivid worktree colors as hex", () => {
+    expect(generateWorktreeColor()).toMatch(/^#[0-9a-f]{6}$/);
+  });
+
+  it("uses minimal seed mode to keep app state but drop heavy runtime history", () => {
+    const minimal = resolveWorktreeSeedPlan("minimal");
+    const full = resolveWorktreeSeedPlan("full");
+
+    expect(minimal.excludedTables).toContain("heartbeat_runs");
+    expect(minimal.excludedTables).toContain("heartbeat_run_events");
+    expect(minimal.excludedTables).toContain("workspace_runtime_services");
+    expect(minimal.excludedTables).toContain("agent_task_sessions");
+    expect(minimal.nullifyColumns.issues).toEqual(["checkout_run_id", "execution_run_id"]);
+
+    expect(full.excludedTables).toEqual([]);
+    expect(full.nullifyColumns).toEqual({});
+  });
+
+  it("copies the source local_encrypted secrets key into the seeded worktree instance", () => {
+    const tempRoot = fs.mkdtempSync(path.join(os.tmpdir(), "paperclip-worktree-secrets-"));
+    const originalInlineMasterKey = process.env.PAPERCLIP_SECRETS_MASTER_KEY;
+    const originalKeyFile = process.env.PAPERCLIP_SECRETS_MASTER_KEY_FILE;
+    try {
+      delete process.env.PAPERCLIP_SECRETS_MASTER_KEY;
+      delete process.env.PAPERCLIP_SECRETS_MASTER_KEY_FILE;
+      const sourceConfigPath = path.join(tempRoot, "source", "config.json");
+      const sourceKeyPath = path.join(tempRoot, "source", "secrets", "master.key");
+      const targetKeyPath = path.join(tempRoot, "target", "secrets", "master.key");
+      fs.mkdirSync(path.dirname(sourceKeyPath), { recursive: true });
+      fs.writeFileSync(sourceKeyPath, "source-master-key", "utf8");
+
+      const sourceConfig = buildSourceConfig();
+      sourceConfig.secrets.localEncrypted.keyFilePath = sourceKeyPath;
+
+      copySeededSecretsKey({
+        sourceConfigPath,
+        sourceConfig,
+        sourceEnvEntries: {},
+        targetKeyFilePath: targetKeyPath,
+      });
+
+      expect(fs.readFileSync(targetKeyPath, "utf8")).toBe("source-master-key");
+    } finally {
+      if (originalInlineMasterKey === undefined) {
+        delete process.env.PAPERCLIP_SECRETS_MASTER_KEY;
+      } else {
+        process.env.PAPERCLIP_SECRETS_MASTER_KEY = originalInlineMasterKey;
+      }
+      if (originalKeyFile === undefined) {
+        delete process.env.PAPERCLIP_SECRETS_MASTER_KEY_FILE;
+      } else {
+        process.env.PAPERCLIP_SECRETS_MASTER_KEY_FILE = originalKeyFile;
+      }
+      fs.rmSync(tempRoot, { recursive: true, force: true });
+    }
+  });
+
+  it("writes the source inline secrets master key into the seeded worktree instance", () => {
+    const tempRoot = fs.mkdtempSync(path.join(os.tmpdir(), "paperclip-worktree-secrets-"));
+    try {
+      const sourceConfigPath = path.join(tempRoot, "source", "config.json");
+      const targetKeyPath = path.join(tempRoot, "target", "secrets", "master.key");
+
+      copySeededSecretsKey({
+        sourceConfigPath,
+        sourceConfig: buildSourceConfig(),
+        sourceEnvEntries: {
+          PAPERCLIP_SECRETS_MASTER_KEY: "inline-source-master-key",
+        },
+        targetKeyFilePath: targetKeyPath,
+      });
+
+      expect(fs.readFileSync(targetKeyPath, "utf8")).toBe("inline-source-master-key");
+    } finally {
+      fs.rmSync(tempRoot, { recursive: true, force: true });
+    }
+  });
+
+  it("persists the current agent jwt secret into the worktree env file", async () => {
+    const tempRoot = fs.mkdtempSync(path.join(os.tmpdir(), "paperclip-worktree-jwt-"));
+    const repoRoot = path.join(tempRoot, "repo");
+    const originalCwd = process.cwd();
+    const originalJwtSecret = process.env.PAPERCLIP_AGENT_JWT_SECRET;
+
+    try {
+      fs.mkdirSync(repoRoot, { recursive: true });
+      process.env.PAPERCLIP_AGENT_JWT_SECRET = "worktree-shared-secret";
+      process.chdir(repoRoot);
+
+      await worktreeInitCommand({
+        seed: false,
+        fromConfig: path.join(tempRoot, "missing", "config.json"),
+        home: path.join(tempRoot, ".paperclip-worktrees"),
+      });
+
+      const envPath = path.join(repoRoot, ".paperclip", ".env");
+      const envContents = fs.readFileSync(envPath, "utf8");
+      expect(envContents).toContain("PAPERCLIP_AGENT_JWT_SECRET=worktree-shared-secret");
+      expect(envContents).toContain("PAPERCLIP_WORKTREE_NAME=repo");
+      expect(envContents).toMatch(/PAPERCLIP_WORKTREE_COLOR=\"#[0-9a-f]{6}\"/);
+    } finally {
+      process.chdir(originalCwd);
+      if (originalJwtSecret === undefined) {
+        delete process.env.PAPERCLIP_AGENT_JWT_SECRET;
+      } else {
+        process.env.PAPERCLIP_AGENT_JWT_SECRET = originalJwtSecret;
+      }
+      fs.rmSync(tempRoot, { recursive: true, force: true });
+    }
+  });
+
+  it("defaults the seed source config to the current repo-local Paperclip config", () => {
+    const tempRoot = fs.mkdtempSync(path.join(os.tmpdir(), "paperclip-worktree-source-config-"));
+    const repoRoot = path.join(tempRoot, "repo");
+    const localConfigPath = path.join(repoRoot, ".paperclip", "config.json");
+    const originalCwd = process.cwd();
+    const originalPaperclipConfig = process.env.PAPERCLIP_CONFIG;
+
+    try {
+      fs.mkdirSync(path.dirname(localConfigPath), { recursive: true });
+      fs.writeFileSync(localConfigPath, JSON.stringify(buildSourceConfig()), "utf8");
+      delete process.env.PAPERCLIP_CONFIG;
+      process.chdir(repoRoot);
+
+      expect(fs.realpathSync(resolveSourceConfigPath({}))).toBe(fs.realpathSync(localConfigPath));
+    } finally {
+      process.chdir(originalCwd);
+      if (originalPaperclipConfig === undefined) {
+        delete process.env.PAPERCLIP_CONFIG;
+      } else {
+        process.env.PAPERCLIP_CONFIG = originalPaperclipConfig;
+      }
+      fs.rmSync(tempRoot, { recursive: true, force: true });
+    }
+  });
+
+  it("preserves the source config path across worktree:make cwd changes", () => {
+    const tempRoot = fs.mkdtempSync(path.join(os.tmpdir(), "paperclip-worktree-source-override-"));
+    const sourceConfigPath = path.join(tempRoot, "source", "config.json");
+    const targetRoot = path.join(tempRoot, "target");
+    const originalCwd = process.cwd();
+    const originalPaperclipConfig = process.env.PAPERCLIP_CONFIG;
+
+    try {
+      fs.mkdirSync(path.dirname(sourceConfigPath), { recursive: true });
+      fs.mkdirSync(targetRoot, { recursive: true });
+      fs.writeFileSync(sourceConfigPath, JSON.stringify(buildSourceConfig()), "utf8");
+      delete process.env.PAPERCLIP_CONFIG;
+      process.chdir(targetRoot);
+
+      expect(resolveSourceConfigPath({ sourceConfigPathOverride: sourceConfigPath })).toBe(
+        path.resolve(sourceConfigPath),
+      );
+    } finally {
+      process.chdir(originalCwd);
+      if (originalPaperclipConfig === undefined) {
+        delete process.env.PAPERCLIP_CONFIG;
+      } else {
+        process.env.PAPERCLIP_CONFIG = originalPaperclipConfig;
+      }
+      fs.rmSync(tempRoot, { recursive: true, force: true });
+    }
+  });
+
+  it("rebinds same-repo workspace paths onto the current worktree root", () => {
+    expect(
+      rebindWorkspaceCwd({
+        sourceRepoRoot: "/Users/example/paperclip",
+        targetRepoRoot: "/Users/example/paperclip-pr-432",
+        workspaceCwd: "/Users/example/paperclip",
+      }),
+    ).toBe("/Users/example/paperclip-pr-432");
+
+    expect(
+      rebindWorkspaceCwd({
+        sourceRepoRoot: "/Users/example/paperclip",
+        targetRepoRoot: "/Users/example/paperclip-pr-432",
+        workspaceCwd: "/Users/example/paperclip/packages/db",
+      }),
+    ).toBe("/Users/example/paperclip-pr-432/packages/db");
+  });
+
+  it("does not rebind paths outside the source repo root", () => {
+    expect(
+      rebindWorkspaceCwd({
+        sourceRepoRoot: "/Users/example/paperclip",
+        targetRepoRoot: "/Users/example/paperclip-pr-432",
+        workspaceCwd: "/Users/example/other-project",
+      }),
+    ).toBeNull();
+  });
+
+  it("copies shared git hooks into a linked worktree git dir", () => {
+    const tempRoot = fs.mkdtempSync(path.join(os.tmpdir(), "paperclip-worktree-hooks-"));
+    const repoRoot = path.join(tempRoot, "repo");
+    const worktreePath = path.join(tempRoot, "repo-feature");
+
+    try {
+      fs.mkdirSync(repoRoot, { recursive: true });
+      execFileSync("git", ["init"], { cwd: repoRoot, stdio: "ignore" });
+      execFileSync("git", ["config", "user.email", "test@example.com"], { cwd: repoRoot, stdio: "ignore" });
+      execFileSync("git", ["config", "user.name", "Test User"], { cwd: repoRoot, stdio: "ignore" });
+      fs.writeFileSync(path.join(repoRoot, "README.md"), "# temp\n", "utf8");
+      execFileSync("git", ["add", "README.md"], { cwd: repoRoot, stdio: "ignore" });
+      execFileSync("git", ["commit", "-m", "Initial commit"], { cwd: repoRoot, stdio: "ignore" });
+
+      const sourceHooksDir = path.join(repoRoot, ".git", "hooks");
+      const sourceHookPath = path.join(sourceHooksDir, "pre-commit");
+      const sourceTokensPath = path.join(sourceHooksDir, "forbidden-tokens.txt");
+      fs.writeFileSync(sourceHookPath, "#!/usr/bin/env bash\nexit 0\n", { encoding: "utf8", mode: 0o755 });
+      fs.chmodSync(sourceHookPath, 0o755);
+      fs.writeFileSync(sourceTokensPath, "secret-token\n", "utf8");
+
+      execFileSync("git", ["worktree", "add", "--detach", worktreePath], { cwd: repoRoot, stdio: "ignore" });
+
+      const copied = copyGitHooksToWorktreeGitDir(worktreePath);
+      const worktreeGitDir = execFileSync("git", ["rev-parse", "--git-dir"], {
+        cwd: worktreePath,
+        encoding: "utf8",
+        stdio: ["ignore", "pipe", "ignore"],
+      }).trim();
+      const resolvedSourceHooksDir = fs.realpathSync(sourceHooksDir);
+      const resolvedTargetHooksDir = fs.realpathSync(path.resolve(worktreePath, worktreeGitDir, "hooks"));
+      const targetHookPath = path.join(resolvedTargetHooksDir, "pre-commit");
+      const targetTokensPath = path.join(resolvedTargetHooksDir, "forbidden-tokens.txt");
+
+      expect(copied).toMatchObject({
+        sourceHooksPath: resolvedSourceHooksDir,
+        targetHooksPath: resolvedTargetHooksDir,
+        copied: true,
+      });
+      expect(fs.readFileSync(targetHookPath, "utf8")).toBe("#!/usr/bin/env bash\nexit 0\n");
+      expect(fs.statSync(targetHookPath).mode & 0o111).not.toBe(0);
+      expect(fs.readFileSync(targetTokensPath, "utf8")).toBe("secret-token\n");
+    } finally {
+      execFileSync("git", ["worktree", "remove", "--force", worktreePath], { cwd: repoRoot, stdio: "ignore" });
+      fs.rmSync(tempRoot, { recursive: true, force: true });
+    }
+  });
+
+  it("creates and initializes a worktree from the top-level worktree:make command", async () => {
+    const tempRoot = fs.mkdtempSync(path.join(os.tmpdir(), "paperclip-worktree-make-"));
+    const repoRoot = path.join(tempRoot, "repo");
+    const fakeHome = path.join(tempRoot, "home");
+    const worktreePath = path.join(fakeHome, "paperclip-make-test");
+    const originalCwd = process.cwd();
+    const homedirSpy = vi.spyOn(os, "homedir").mockReturnValue(fakeHome);
+
+    try {
+      fs.mkdirSync(repoRoot, { recursive: true });
+      fs.mkdirSync(fakeHome, { recursive: true });
+      execFileSync("git", ["init"], { cwd: repoRoot, stdio: "ignore" });
+      execFileSync("git", ["config", "user.email", "test@example.com"], { cwd: repoRoot, stdio: "ignore" });
+      execFileSync("git", ["config", "user.name", "Test User"], { cwd: repoRoot, stdio: "ignore" });
+      fs.writeFileSync(path.join(repoRoot, "README.md"), "# temp\n", "utf8");
+      execFileSync("git", ["add", "README.md"], { cwd: repoRoot, stdio: "ignore" });
+      execFileSync("git", ["commit", "-m", "Initial commit"], { cwd: repoRoot, stdio: "ignore" });
+
+      process.chdir(repoRoot);
+
+      await worktreeMakeCommand("paperclip-make-test", {
+        seed: false,
+        home: path.join(tempRoot, ".paperclip-worktrees"),
+      });
+
+      expect(fs.existsSync(path.join(worktreePath, ".git"))).toBe(true);
+      expect(fs.existsSync(path.join(worktreePath, ".paperclip", "config.json"))).toBe(true);
+      expect(fs.existsSync(path.join(worktreePath, ".paperclip", ".env"))).toBe(true);
+    } finally {
+      process.chdir(originalCwd);
+      homedirSpy.mockRestore();
+      fs.rmSync(tempRoot, { recursive: true, force: true });
+    }
+  }, 20_000);
+});
--- a/cli/src/adapters/registry.ts
+++ b/cli/src/adapters/registry.ts
@@ -1,7 +1,11 @@
 import type { CLIAdapterModule } from "@paperclipai/adapter-utils";
 import { printClaudeStreamEvent } from "@paperclipai/adapter-claude-local/cli";
 import { printCodexStreamEvent } from "@paperclipai/adapter-codex-local/cli";
-import { printOpenClawStreamEvent } from "@paperclipai/adapter-openclaw/cli";
+import { printCursorStreamEvent } from "@paperclipai/adapter-cursor-local/cli";
+import { printGeminiStreamEvent } from "@paperclipai/adapter-gemini-local/cli";
+import { printOpenCodeStreamEvent } from "@paperclipai/adapter-opencode-local/cli";
+import { printPiStreamEvent } from "@paperclipai/adapter-pi-local/cli";
+import { printOpenClawGatewayStreamEvent } from "@paperclipai/adapter-openclaw-gateway/cli";
 import { processCLIAdapter } from "./process/index.js";
 import { httpCLIAdapter } from "./http/index.js";

@@ -15,13 +19,43 @@ const codexLocalCLIAdapter: CLIAdapterModule = {
  formatStdoutEvent: printCodexStreamEvent,
 };

-const openclawCLIAdapter: CLIAdapterModule = {
-  type: "openclaw",
-  formatStdoutEvent: printOpenClawStreamEvent,
+const openCodeLocalCLIAdapter: CLIAdapterModule = {
+  type: "opencode_local",
+  formatStdoutEvent: printOpenCodeStreamEvent,
+};
+
+const piLocalCLIAdapter: CLIAdapterModule = {
+  type: "pi_local",
+  formatStdoutEvent: printPiStreamEvent,
+};
+
+const cursorLocalCLIAdapter: CLIAdapterModule = {
+  type: "cursor",
+  formatStdoutEvent: printCursorStreamEvent,
+};
+
+const geminiLocalCLIAdapter: CLIAdapterModule = {
+  type: "gemini_local",
+  formatStdoutEvent: printGeminiStreamEvent,
+};
+
+const openclawGatewayCLIAdapter: CLIAdapterModule = {
+  type: "openclaw_gateway",
+  formatStdoutEvent: printOpenClawGatewayStreamEvent,
 };

 const adaptersByType = new Map<string, CLIAdapterModule>(
-  [claudeLocalCLIAdapter, codexLocalCLIAdapter, openclawCLIAdapter, processCLIAdapter, httpCLIAdapter].map((a) => [a.type, a]),
+  [
+    claudeLocalCLIAdapter,
+    codexLocalCLIAdapter,
+    openCodeLocalCLIAdapter,
+    piLocalCLIAdapter,
+    cursorLocalCLIAdapter,
+    geminiLocalCLIAdapter,
+    openclawGatewayCLIAdapter,
+    processCLIAdapter,
+    httpCLIAdapter,
+  ].map((a) => [a.type, a]),
 );

 export function getCLIAdapter(type: string): CLIAdapterModule {
--- a/cli/src/checks/database-check.ts
+++ b/cli/src/checks/database-check.ts
@@ -39,15 +39,7 @@ export async function databaseCheck(config: PaperclipConfig, configPath?: string
    const dataDir = resolveRuntimeLikePath(config.database.embeddedPostgresDataDir, configPath);
    const reportedPath = dataDir;
    if (!fs.existsSync(dataDir)) {
-      return {
-        name: "Database",
-        status: "warn",
-        message: `Embedded PostgreSQL data directory does not exist: ${reportedPath}`,
-        canRepair: true,
-        repair: () => {
-          fs.mkdirSync(reportedPath, { recursive: true });
-        },
-      };
+      fs.mkdirSync(reportedPath, { recursive: true });
    }

    return {
--- a/cli/src/checks/llm-check.ts
+++ b/cli/src/checks/llm-check.ts
@@ -5,20 +5,16 @@ export async function llmCheck(config: PaperclipConfig): Promise<CheckResult> {
  if (!config.llm) {
    return {
      name: "LLM provider",
-      status: "warn",
-      message: "No LLM provider configured",
-      canRepair: false,
-      repairHint: "Run `paperclipai configure --section llm` to set one up",
+      status: "pass",
+      message: "No LLM provider configured (optional)",
    };
  }

  if (!config.llm.apiKey) {
    return {
      name: "LLM provider",
-      status: "warn",
-      message: `${config.llm.provider} configured but no API key set`,
-      canRepair: false,
-      repairHint: "Run `paperclipai configure --section llm`",
+      status: "pass",
+      message: `${config.llm.provider} configured but no API key set (optional)`,
    };
  }

--- a/cli/src/checks/log-check.ts
+++ b/cli/src/checks/log-check.ts
@@ -8,15 +8,7 @@ export function logCheck(config: PaperclipConfig, configPath?: string): CheckRes
  const reportedDir = logDir;

  if (!fs.existsSync(logDir)) {
-    return {
-      name: "Log directory",
-      status: "warn",
-      message: `Log directory does not exist: ${reportedDir}`,
-      canRepair: true,
-      repair: () => {
-        fs.mkdirSync(reportedDir, { recursive: true });
-      },
-    };
+    fs.mkdirSync(reportedDir, { recursive: true });
  }

  try {
--- a/cli/src/checks/storage-check.ts
+++ b/cli/src/checks/storage-check.ts
@@ -7,16 +7,7 @@ export function storageCheck(config: PaperclipConfig, configPath?: string): Chec
  if (config.storage.provider === "local_disk") {
    const baseDir = resolveRuntimeLikePath(config.storage.localDisk.baseDir, configPath);
    if (!fs.existsSync(baseDir)) {
-      return {
-        name: "Storage",
-        status: "warn",
-        message: `Local storage directory does not exist: ${baseDir}`,
-        canRepair: true,
-        repair: () => {
-          fs.mkdirSync(baseDir, { recursive: true });
-        },
-        repairHint: "Run with --repair to create local storage directory",
-      };
+      fs.mkdirSync(baseDir, { recursive: true });
    }

    try {
--- a/cli/src/client/http.ts
+++ b/cli/src/client/http.ts
@@ -104,8 +104,10 @@ export class PaperclipApiClient {

 function buildUrl(apiBase: string, path: string): string {
  const normalizedPath = path.startsWith("/") ? path : `/${path}`;
+  const [pathname, query] = normalizedPath.split("?");
  const url = new URL(apiBase);
-  url.pathname = `${url.pathname.replace(/\/+$/, "")}${normalizedPath}`;
+  url.pathname = `${url.pathname.replace(/\/+$/, "")}${pathname}`;
+  if (query) url.search = query;
  return url.toString();
 }

--- a/cli/src/commands/allowed-hostname.ts
+++ b/cli/src/commands/allowed-hostname.ts
@@ -26,6 +26,9 @@ export async function addAllowedHostname(host: string, opts: { config?: string }
    p.log.info(`Hostname ${pc.cyan(normalized)} is already allowed.`);
  } else {
    p.log.success(`Added allowed hostname: ${pc.cyan(normalized)}`);
+    p.log.message(
+      pc.dim("Restart the Paperclip server for this change to take effect."),
+    );
  }

  if (!(config.server.deploymentMode === "authenticated" && config.server.exposure === "private")) {
--- a/cli/src/commands/auth-bootstrap-ceo.ts
+++ b/cli/src/commands/auth-bootstrap-ceo.ts
@@ -3,6 +3,7 @@ import * as p from "@clack/prompts";
 import pc from "picocolors";
 import { and, eq, gt, isNull } from "drizzle-orm";
 import { createDb, instanceUserRoles, invites } from "@paperclipai/db";
+import { loadPaperclipEnvFile } from "../config/env.js";
 import { readConfig, resolveConfigPath } from "../config/store.js";

 function hashToken(token: string) {
@@ -13,7 +14,8 @@ function createInviteToken() {
  return `pcp_bootstrap_${randomBytes(24).toString("hex")}`;
 }

-function resolveDbUrl(configPath?: string) {
+function resolveDbUrl(configPath?: string, explicitDbUrl?: string) {
+  if (explicitDbUrl) return explicitDbUrl;
  const config = readConfig(configPath);
  if (process.env.DATABASE_URL) return process.env.DATABASE_URL;
  if (config?.database.mode === "postgres" && config.database.connectionString) {
@@ -28,6 +30,12 @@ function resolveDbUrl(configPath?: string) {

 function resolveBaseUrl(configPath?: string, explicitBaseUrl?: string) {
  if (explicitBaseUrl) return explicitBaseUrl.replace(/\/+$/, "");
+  const fromEnv =
+    process.env.PAPERCLIP_PUBLIC_URL ??
+    process.env.PAPERCLIP_AUTH_PUBLIC_BASE_URL ??
+    process.env.BETTER_AUTH_URL ??
+    process.env.BETTER_AUTH_BASE_URL;
+  if (fromEnv?.trim()) return fromEnv.trim().replace(/\/+$/, "");
  const config = readConfig(configPath);
  if (config?.auth.baseUrlMode === "explicit" && config.auth.publicBaseUrl) {
    return config.auth.publicBaseUrl.replace(/\/+$/, "");
@@ -43,8 +51,10 @@ export async function bootstrapCeoInvite(opts: {
  force?: boolean;
  expiresHours?: number;
  baseUrl?: string;
+  dbUrl?: string;
 }) {
  const configPath = resolveConfigPath(opts.config);
+  loadPaperclipEnvFile(configPath);
  const config = readConfig(configPath);
  if (!config) {
    p.log.error(`No config found at ${configPath}. Run ${pc.cyan("paperclip onboard")} first.`);
@@ -56,7 +66,7 @@ export async function bootstrapCeoInvite(opts: {
    return;
  }

-  const dbUrl = resolveDbUrl(configPath);
+  const dbUrl = resolveDbUrl(configPath, opts.dbUrl);
  if (!dbUrl) {
    p.log.error(
      "Could not resolve database connection for bootstrap.",
@@ -65,6 +75,11 @@ export async function bootstrapCeoInvite(opts: {
  }

  const db = createDb(dbUrl);
+  const closableDb = db as typeof db & {
+    $client?: {
+      end?: (options?: { timeout?: number }) => Promise<void>;
+    };
+  };
  try {
    const existingAdminCount = await db
      .select()
@@ -112,5 +127,7 @@ export async function bootstrapCeoInvite(opts: {
  } catch (err) {
    p.log.error(`Could not create bootstrap invite: ${err instanceof Error ? err.message : String(err)}`);
    p.log.info("If using embedded-postgres, start the Paperclip server and run this command again.");
+  } finally {
+    await closableDb.$client?.end?.({ timeout: 5 }).catch(() => undefined);
  }
 }
--- a/cli/src/commands/client/agent.ts
+++ b/cli/src/commands/client/agent.ts
@@ -1,5 +1,13 @@
 import { Command } from "commander";
 import type { Agent } from "@paperclipai/shared";
+import {
+  removeMaintainerOnlySkillSymlinks,
+  resolvePaperclipSkillsDir,
+} from "@paperclipai/adapter-utils/server-utils";
+import fs from "node:fs/promises";
+import os from "node:os";
+import path from "node:path";
+import { fileURLToPath } from "node:url";
 import {
  addCommonClientOptions,
  formatInlineRecord,
@@ -13,6 +21,141 @@ interface AgentListOptions extends BaseClientOptions {
  companyId?: string;
 }

+interface AgentLocalCliOptions extends BaseClientOptions {
+  companyId?: string;
+  keyName?: string;
+  installSkills?: boolean;
+}
+
+interface CreatedAgentKey {
+  id: string;
+  name: string;
+  token: string;
+  createdAt: string;
+}
+
+interface SkillsInstallSummary {
+  tool: "codex" | "claude";
+  target: string;
+  linked: string[];
+  removed: string[];
+  skipped: string[];
+  failed: Array<{ name: string; error: string }>;
+}
+
+const __moduleDir = path.dirname(fileURLToPath(import.meta.url));
+
+function codexSkillsHome(): string {
+  const fromEnv = process.env.CODEX_HOME?.trim();
+  const base = fromEnv && fromEnv.length > 0 ? fromEnv : path.join(os.homedir(), ".codex");
+  return path.join(base, "skills");
+}
+
+function claudeSkillsHome(): string {
+  const fromEnv = process.env.CLAUDE_HOME?.trim();
+  const base = fromEnv && fromEnv.length > 0 ? fromEnv : path.join(os.homedir(), ".claude");
+  return path.join(base, "skills");
+}
+
+async function installSkillsForTarget(
+  sourceSkillsDir: string,
+  targetSkillsDir: string,
+  tool: "codex" | "claude",
+): Promise<SkillsInstallSummary> {
+  const summary: SkillsInstallSummary = {
+    tool,
+    target: targetSkillsDir,
+    linked: [],
+    removed: [],
+    skipped: [],
+    failed: [],
+  };
+
+  await fs.mkdir(targetSkillsDir, { recursive: true });
+  const entries = await fs.readdir(sourceSkillsDir, { withFileTypes: true });
+  summary.removed = await removeMaintainerOnlySkillSymlinks(
+    targetSkillsDir,
+    entries.filter((entry) => entry.isDirectory()).map((entry) => entry.name),
+  );
+  for (const entry of entries) {
+    if (!entry.isDirectory()) continue;
+    const source = path.join(sourceSkillsDir, entry.name);
+    const target = path.join(targetSkillsDir, entry.name);
+    const existing = await fs.lstat(target).catch(() => null);
+    if (existing) {
+      if (existing.isSymbolicLink()) {
+        let linkedPath: string | null = null;
+        try {
+          linkedPath = await fs.readlink(target);
+        } catch (err) {
+          await fs.unlink(target);
+          try {
+            await fs.symlink(source, target);
+            summary.linked.push(entry.name);
+            continue;
+          } catch (linkErr) {
+            summary.failed.push({
+              name: entry.name,
+              error:
+                err instanceof Error && linkErr instanceof Error
+                  ? `${err.message}; then ${linkErr.message}`
+                  : err instanceof Error
+                    ? err.message
+                    : `Failed to recover broken symlink: ${String(err)}`,
+            });
+            continue;
+          }
+        }
+
+        const resolvedLinkedPath = path.isAbsolute(linkedPath)
+          ? linkedPath
+          : path.resolve(path.dirname(target), linkedPath);
+        const linkedTargetExists = await fs
+          .stat(resolvedLinkedPath)
+          .then(() => true)
+          .catch(() => false);
+
+        if (!linkedTargetExists) {
+          await fs.unlink(target);
+        } else {
+          summary.skipped.push(entry.name);
+          continue;
+        }
+      } else {
+        summary.skipped.push(entry.name);
+        continue;
+      }
+    }
+
+    try {
+      await fs.symlink(source, target);
+      summary.linked.push(entry.name);
+    } catch (err) {
+      summary.failed.push({
+        name: entry.name,
+        error: err instanceof Error ? err.message : String(err),
+      });
+    }
+  }
+
+  return summary;
+}
+
+function buildAgentEnvExports(input: {
+  apiBase: string;
+  companyId: string;
+  agentId: string;
+  apiKey: string;
+}): string {
+  const escaped = (value: string) => value.replace(/'/g, "'\"'\"'");
+  return [
+    `export PAPERCLIP_API_URL='${escaped(input.apiBase)}'`,
+    `export PAPERCLIP_COMPANY_ID='${escaped(input.companyId)}'`,
+    `export PAPERCLIP_AGENT_ID='${escaped(input.agentId)}'`,
+    `export PAPERCLIP_API_KEY='${escaped(input.apiKey)}'`,
+  ].join("\n");
+}
+
 export function registerAgentCommands(program: Command): void {
  const agent = program.command("agent").description("Agent operations");

@@ -71,4 +214,102 @@ export function registerAgentCommands(program: Command): void {
        }
      }),
  );
+
+  addCommonClientOptions(
+    agent
+      .command("local-cli")
+      .description(
+        "Create an agent API key, install local Paperclip skills for Codex/Claude, and print shell exports",
+      )
+      .argument("<agentRef>", "Agent ID or shortname/url-key")
+      .requiredOption("-C, --company-id <id>", "Company ID")
+      .option("--key-name <name>", "API key label", "local-cli")
+      .option(
+        "--no-install-skills",
+        "Skip installing Paperclip skills into ~/.codex/skills and ~/.claude/skills",
+      )
+      .action(async (agentRef: string, opts: AgentLocalCliOptions) => {
+        try {
+          const ctx = resolveCommandContext(opts, { requireCompany: true });
+          const query = new URLSearchParams({ companyId: ctx.companyId ?? "" });
+          const agentRow = await ctx.api.get<Agent>(
+            `/api/agents/${encodeURIComponent(agentRef)}?${query.toString()}`,
+          );
+          if (!agentRow) {
+            throw new Error(`Agent not found: ${agentRef}`);
+          }
+
+          const now = new Date().toISOString().replaceAll(":", "-");
+          const keyName = opts.keyName?.trim() ? opts.keyName.trim() : `local-cli-${now}`;
+          const key = await ctx.api.post<CreatedAgentKey>(`/api/agents/${agentRow.id}/keys`, { name: keyName });
+          if (!key) {
+            throw new Error("Failed to create API key");
+          }
+
+          const installSummaries: SkillsInstallSummary[] = [];
+          if (opts.installSkills !== false) {
+            const skillsDir = await resolvePaperclipSkillsDir(__moduleDir, [path.resolve(process.cwd(), "skills")]);
+            if (!skillsDir) {
+              throw new Error(
+                "Could not locate local Paperclip skills directory. Expected ./skills in the repo checkout.",
+              );
+            }
+
+            installSummaries.push(
+              await installSkillsForTarget(skillsDir, codexSkillsHome(), "codex"),
+              await installSkillsForTarget(skillsDir, claudeSkillsHome(), "claude"),
+            );
+          }
+
+          const exportsText = buildAgentEnvExports({
+            apiBase: ctx.api.apiBase,
+            companyId: agentRow.companyId,
+            agentId: agentRow.id,
+            apiKey: key.token,
+          });
+
+          if (ctx.json) {
+            printOutput(
+              {
+                agent: {
+                  id: agentRow.id,
+                  name: agentRow.name,
+                  urlKey: agentRow.urlKey,
+                  companyId: agentRow.companyId,
+                },
+                key: {
+                  id: key.id,
+                  name: key.name,
+                  createdAt: key.createdAt,
+                  token: key.token,
+                },
+                skills: installSummaries,
+                exports: exportsText,
+              },
+              { json: true },
+            );
+            return;
+          }
+
+          console.log(`Agent: ${agentRow.name} (${agentRow.id})`);
+          console.log(`API key created: ${key.name} (${key.id})`);
+          if (installSummaries.length > 0) {
+            for (const summary of installSummaries) {
+              console.log(
+                `${summary.tool}: linked=${summary.linked.length} removed=${summary.removed.length} skipped=${summary.skipped.length} failed=${summary.failed.length} target=${summary.target}`,
+              );
+              for (const failed of summary.failed) {
+                console.log(`  failed ${failed.name}: ${failed.error}`);
+              }
+            }
+          }
+          console.log("");
+          console.log("# Run this in your shell before launching codex/claude:");
+          console.log(exportsText);
+        } catch (err) {
+          handleCommandError(err);
+        }
+      }),
+    { includeCompany: false },
+  );
 }
--- a/cli/src/commands/client/plugin.ts
+++ b/cli/src/commands/client/plugin.ts
@@ -0,0 +1,374 @@
+import path from "node:path";
+import { Command } from "commander";
+import pc from "picocolors";
+import {
+  addCommonClientOptions,
+  handleCommandError,
+  printOutput,
+  resolveCommandContext,
+  type BaseClientOptions,
+} from "./common.js";
+
+// ---------------------------------------------------------------------------
+// Types mirroring server-side shapes
+// ---------------------------------------------------------------------------
+
+interface PluginRecord {
+  id: string;
+  pluginKey: string;
+  packageName: string;
+  version: string;
+  status: string;
+  displayName?: string;
+  lastError?: string | null;
+  installedAt: string;
+  updatedAt: string;
+}
+
+
+// ---------------------------------------------------------------------------
+// Option types
+// ---------------------------------------------------------------------------
+
+interface PluginListOptions extends BaseClientOptions {
+  status?: string;
+}
+
+interface PluginInstallOptions extends BaseClientOptions {
+  local?: boolean;
+  version?: string;
+}
+
+interface PluginUninstallOptions extends BaseClientOptions {
+  force?: boolean;
+}
+
+// ---------------------------------------------------------------------------
+// Helpers
+// ---------------------------------------------------------------------------
+
+/**
+ * Resolve a local path argument to an absolute path so the server can find the
+ * plugin on disk regardless of where the user ran the CLI.
+ */
+function resolvePackageArg(packageArg: string, isLocal: boolean): string {
+  if (!isLocal) return packageArg;
+  // Already absolute
+  if (path.isAbsolute(packageArg)) return packageArg;
+  // Expand leading ~ to home directory
+  if (packageArg.startsWith("~")) {
+    const home = process.env.HOME ?? process.env.USERPROFILE ?? "";
+    return path.resolve(home, packageArg.slice(1).replace(/^[\\/]/, ""));
+  }
+  return path.resolve(process.cwd(), packageArg);
+}
+
+function formatPlugin(p: PluginRecord): string {
+  const statusColor =
+    p.status === "ready"
+      ? pc.green(p.status)
+      : p.status === "error"
+        ? pc.red(p.status)
+        : p.status === "disabled"
+          ? pc.dim(p.status)
+          : pc.yellow(p.status);
+
+  const parts = [
+    `key=${pc.bold(p.pluginKey)}`,
+    `status=${statusColor}`,
+    `version=${p.version}`,
+    `id=${pc.dim(p.id)}`,
+  ];
+
+  if (p.lastError) {
+    parts.push(`error=${pc.red(p.lastError.slice(0, 80))}`);
+  }
+
+  return parts.join("  ");
+}
+
+// ---------------------------------------------------------------------------
+// Command registration
+// ---------------------------------------------------------------------------
+
+export function registerPluginCommands(program: Command): void {
+  const plugin = program.command("plugin").description("Plugin lifecycle management");
+
+  // -------------------------------------------------------------------------
+  // plugin list
+  // -------------------------------------------------------------------------
+  addCommonClientOptions(
+    plugin
+      .command("list")
+      .description("List installed plugins")
+      .option("--status <status>", "Filter by status (ready, error, disabled, installed, upgrade_pending)")
+      .action(async (opts: PluginListOptions) => {
+        try {
+          const ctx = resolveCommandContext(opts);
+          const qs = opts.status ? `?status=${encodeURIComponent(opts.status)}` : "";
+          const plugins = await ctx.api.get<PluginRecord[]>(`/api/plugins${qs}`);
+
+          if (ctx.json) {
+            printOutput(plugins, { json: true });
+            return;
+          }
+
+          const rows = plugins ?? [];
+          if (rows.length === 0) {
+            console.log(pc.dim("No plugins installed."));
+            return;
+          }
+
+          for (const p of rows) {
+            console.log(formatPlugin(p));
+          }
+        } catch (err) {
+          handleCommandError(err);
+        }
+      }),
+  );
+
+  // -------------------------------------------------------------------------
+  // plugin install <package-or-path>
+  // -------------------------------------------------------------------------
+  addCommonClientOptions(
+    plugin
+      .command("install <package>")
+      .description(
+        "Install a plugin from a local path or npm package.\n" +
+          "  Examples:\n" +
+          "    paperclipai plugin install ./my-plugin              # local path\n" +
+          "    paperclipai plugin install @acme/plugin-linear      # npm package\n" +
+          "    paperclipai plugin install @acme/plugin-linear@1.2  # pinned version",
+      )
+      .option("-l, --local", "Treat <package> as a local filesystem path", false)
+      .option("--version <version>", "Specific npm version to install (npm packages only)")
+      .action(async (packageArg: string, opts: PluginInstallOptions) => {
+        try {
+          const ctx = resolveCommandContext(opts);
+
+          // Auto-detect local paths: starts with . or / or ~ or is an absolute path
+          const isLocal =
+            opts.local ||
+            packageArg.startsWith("./") ||
+            packageArg.startsWith("../") ||
+            packageArg.startsWith("/") ||
+            packageArg.startsWith("~");
+
+          const resolvedPackage = resolvePackageArg(packageArg, isLocal);
+
+          if (!ctx.json) {
+            console.log(
+              pc.dim(
+                isLocal
+                  ? `Installing plugin from local path: ${resolvedPackage}`
+                  : `Installing plugin: ${resolvedPackage}${opts.version ? `@${opts.version}` : ""}`,
+              ),
+            );
+          }
+
+          const installedPlugin = await ctx.api.post<PluginRecord>("/api/plugins/install", {
+            packageName: resolvedPackage,
+            version: opts.version,
+            isLocalPath: isLocal,
+          });
+
+          if (ctx.json) {
+            printOutput(installedPlugin, { json: true });
+            return;
+          }
+
+          if (!installedPlugin) {
+            console.log(pc.dim("Install returned no plugin record."));
+            return;
+          }
+
+          console.log(
+            pc.green(
+              `✓ Installed ${pc.bold(installedPlugin.pluginKey)} v${installedPlugin.version} (${installedPlugin.status})`,
+            ),
+          );
+
+          if (installedPlugin.lastError) {
+            console.log(pc.red(`  Warning: ${installedPlugin.lastError}`));
+          }
+        } catch (err) {
+          handleCommandError(err);
+        }
+      }),
+  );
+
+  // -------------------------------------------------------------------------
+  // plugin uninstall <plugin-key-or-id>
+  // -------------------------------------------------------------------------
+  addCommonClientOptions(
+    plugin
+      .command("uninstall <pluginKey>")
+      .description(
+        "Uninstall a plugin by its plugin key or database ID.\n" +
+          "  Use --force to hard-purge all state and config.",
+      )
+      .option("--force", "Purge all plugin state and config (hard delete)", false)
+      .action(async (pluginKey: string, opts: PluginUninstallOptions) => {
+        try {
+          const ctx = resolveCommandContext(opts);
+          const purge = opts.force === true;
+          const qs = purge ? "?purge=true" : "";
+
+          if (!ctx.json) {
+            console.log(
+              pc.dim(
+                purge
+                  ? `Uninstalling and purging plugin: ${pluginKey}`
+                  : `Uninstalling plugin: ${pluginKey}`,
+              ),
+            );
+          }
+
+          const result = await ctx.api.delete<PluginRecord | null>(
+            `/api/plugins/${encodeURIComponent(pluginKey)}${qs}`,
+          );
+
+          if (ctx.json) {
+            printOutput(result, { json: true });
+            return;
+          }
+
+          console.log(pc.green(`✓ Uninstalled ${pc.bold(pluginKey)}${purge ? " (purged)" : ""}`));
+        } catch (err) {
+          handleCommandError(err);
+        }
+      }),
+  );
+
+  // -------------------------------------------------------------------------
+  // plugin enable <plugin-key-or-id>
+  // -------------------------------------------------------------------------
+  addCommonClientOptions(
+    plugin
+      .command("enable <pluginKey>")
+      .description("Enable a disabled or errored plugin")
+      .action(async (pluginKey: string, opts: BaseClientOptions) => {
+        try {
+          const ctx = resolveCommandContext(opts);
+          const result = await ctx.api.post<PluginRecord>(
+            `/api/plugins/${encodeURIComponent(pluginKey)}/enable`,
+          );
+
+          if (ctx.json) {
+            printOutput(result, { json: true });
+            return;
+          }
+
+          console.log(pc.green(`✓ Enabled ${pc.bold(pluginKey)} — status: ${result?.status ?? "unknown"}`));
+        } catch (err) {
+          handleCommandError(err);
+        }
+      }),
+  );
+
+  // -------------------------------------------------------------------------
+  // plugin disable <plugin-key-or-id>
+  // -------------------------------------------------------------------------
+  addCommonClientOptions(
+    plugin
+      .command("disable <pluginKey>")
+      .description("Disable a running plugin without uninstalling it")
+      .action(async (pluginKey: string, opts: BaseClientOptions) => {
+        try {
+          const ctx = resolveCommandContext(opts);
+          const result = await ctx.api.post<PluginRecord>(
+            `/api/plugins/${encodeURIComponent(pluginKey)}/disable`,
+          );
+
+          if (ctx.json) {
+            printOutput(result, { json: true });
+            return;
+          }
+
+          console.log(pc.dim(`Disabled ${pc.bold(pluginKey)} — status: ${result?.status ?? "unknown"}`));
+        } catch (err) {
+          handleCommandError(err);
+        }
+      }),
+  );
+
+  // -------------------------------------------------------------------------
+  // plugin inspect <plugin-key-or-id>
+  // -------------------------------------------------------------------------
+  addCommonClientOptions(
+    plugin
+      .command("inspect <pluginKey>")
+      .description("Show full details for an installed plugin")
+      .action(async (pluginKey: string, opts: BaseClientOptions) => {
+        try {
+          const ctx = resolveCommandContext(opts);
+          const result = await ctx.api.get<PluginRecord>(
+            `/api/plugins/${encodeURIComponent(pluginKey)}`,
+          );
+
+          if (ctx.json) {
+            printOutput(result, { json: true });
+            return;
+          }
+
+          if (!result) {
+            console.log(pc.red(`Plugin not found: ${pluginKey}`));
+            process.exit(1);
+          }
+
+          console.log(formatPlugin(result));
+          if (result.lastError) {
+            console.log(`\n${pc.red("Last error:")}\n${result.lastError}`);
+          }
+        } catch (err) {
+          handleCommandError(err);
+        }
+      }),
+  );
+
+  // -------------------------------------------------------------------------
+  // plugin examples
+  // -------------------------------------------------------------------------
+  addCommonClientOptions(
+    plugin
+      .command("examples")
+      .description("List bundled example plugins available for local install")
+      .action(async (opts: BaseClientOptions) => {
+        try {
+          const ctx = resolveCommandContext(opts);
+          const examples = await ctx.api.get<
+            Array<{
+              packageName: string;
+              pluginKey: string;
+              displayName: string;
+              description: string;
+              localPath: string;
+              tag: string;
+            }>
+          >("/api/plugins/examples");
+
+          if (ctx.json) {
+            printOutput(examples, { json: true });
+            return;
+          }
+
+          const rows = examples ?? [];
+          if (rows.length === 0) {
+            console.log(pc.dim("No bundled examples available."));
+            return;
+          }
+
+          for (const ex of rows) {
+            console.log(
+              `${pc.bold(ex.displayName)}  ${pc.dim(ex.pluginKey)}\n` +
+                `  ${ex.description}\n` +
+                `  ${pc.cyan(`paperclipai plugin install ${ex.localPath}`)}`,
+            );
+          }
+        } catch (err) {
+          handleCommandError(err);
+        }
+      }),
+  );
+}
--- a/cli/src/commands/configure.ts
+++ b/cli/src/commands/configure.ts
@@ -10,6 +10,7 @@ import { defaultSecretsConfig, promptSecrets } from "../prompts/secrets.js";
 import { defaultStorageConfig, promptStorage } from "../prompts/storage.js";
 import { promptServer } from "../prompts/server.js";
 import {
+  resolveDefaultBackupDir,
  resolveDefaultEmbeddedPostgresDir,
  resolveDefaultLogsDir,
  resolvePaperclipInstanceId,
@@ -39,6 +40,12 @@ function defaultConfig(): PaperclipConfig {
      mode: "embedded-postgres",
      embeddedPostgresDataDir: resolveDefaultEmbeddedPostgresDir(instanceId),
      embeddedPostgresPort: 54329,
+      backup: {
+        enabled: true,
+        intervalMinutes: 60,
+        retentionDays: 30,
+        dir: resolveDefaultBackupDir(instanceId),
+      },
    },
    logging: {
      mode: "file",
@@ -54,6 +61,7 @@ function defaultConfig(): PaperclipConfig {
    },
    auth: {
      baseUrlMode: "auto",
+      disableSignUp: false,
    },
    storage: defaultStorageConfig(),
    secrets: defaultSecretsConfig(),
@@ -118,7 +126,7 @@ export async function configure(opts: {

    switch (section) {
      case "database":
-        config.database = await promptDatabase();
+        config.database = await promptDatabase(config.database);
        break;
      case "llm": {
        const llm = await promptLlm();
--- a/cli/src/commands/db-backup.ts
+++ b/cli/src/commands/db-backup.ts
@@ -0,0 +1,102 @@
+import path from "node:path";
+import * as p from "@clack/prompts";
+import pc from "picocolors";
+import { formatDatabaseBackupResult, runDatabaseBackup } from "@paperclipai/db";
+import {
+  expandHomePrefix,
+  resolveDefaultBackupDir,
+  resolvePaperclipInstanceId,
+} from "../config/home.js";
+import { readConfig, resolveConfigPath } from "../config/store.js";
+import { printPaperclipCliBanner } from "../utils/banner.js";
+
+type DbBackupOptions = {
+  config?: string;
+  dir?: string;
+  retentionDays?: number;
+  filenamePrefix?: string;
+  json?: boolean;
+};
+
+function resolveConnectionString(configPath?: string): { value: string; source: string } {
+  const envUrl = process.env.DATABASE_URL?.trim();
+  if (envUrl) return { value: envUrl, source: "DATABASE_URL" };
+
+  const config = readConfig(configPath);
+  if (config?.database.mode === "postgres" && config.database.connectionString?.trim()) {
+    return { value: config.database.connectionString.trim(), source: "config.database.connectionString" };
+  }
+
+  const port = config?.database.embeddedPostgresPort ?? 54329;
+  return {
+    value: `postgres://paperclip:paperclip@127.0.0.1:${port}/paperclip`,
+    source: `embedded-postgres@${port}`,
+  };
+}
+
+function normalizeRetentionDays(value: number | undefined, fallback: number): number {
+  const candidate = value ?? fallback;
+  if (!Number.isInteger(candidate) || candidate < 1) {
+    throw new Error(`Invalid retention days '${String(candidate)}'. Use a positive integer.`);
+  }
+  return candidate;
+}
+
+function resolveBackupDir(raw: string): string {
+  return path.resolve(expandHomePrefix(raw.trim()));
+}
+
+export async function dbBackupCommand(opts: DbBackupOptions): Promise<void> {
+  printPaperclipCliBanner();
+  p.intro(pc.bgCyan(pc.black(" paperclip db:backup ")));
+
+  const configPath = resolveConfigPath(opts.config);
+  const config = readConfig(opts.config);
+  const connection = resolveConnectionString(opts.config);
+  const defaultDir = resolveDefaultBackupDir(resolvePaperclipInstanceId());
+  const configuredDir = opts.dir?.trim() || config?.database.backup.dir || defaultDir;
+  const backupDir = resolveBackupDir(configuredDir);
+  const retentionDays = normalizeRetentionDays(
+    opts.retentionDays,
+    config?.database.backup.retentionDays ?? 30,
+  );
+  const filenamePrefix = opts.filenamePrefix?.trim() || "paperclip";
+
+  p.log.message(pc.dim(`Config: ${configPath}`));
+  p.log.message(pc.dim(`Connection source: ${connection.source}`));
+  p.log.message(pc.dim(`Backup dir: ${backupDir}`));
+  p.log.message(pc.dim(`Retention: ${retentionDays} day(s)`));
+
+  const spinner = p.spinner();
+  spinner.start("Creating database backup...");
+  try {
+    const result = await runDatabaseBackup({
+      connectionString: connection.value,
+      backupDir,
+      retentionDays,
+      filenamePrefix,
+    });
+    spinner.stop(`Backup saved: ${formatDatabaseBackupResult(result)}`);
+
+    if (opts.json) {
+      console.log(
+        JSON.stringify(
+          {
+            backupFile: result.backupFile,
+            sizeBytes: result.sizeBytes,
+            prunedCount: result.prunedCount,
+            backupDir,
+            retentionDays,
+            connectionSource: connection.source,
+          },
+          null,
+          2,
+        ),
+      );
+    }
+    p.outro(pc.green("Backup completed."));
+  } catch (err) {
+    spinner.stop(pc.red("Backup failed."));
+    throw err;
+  }
+}
--- a/cli/src/commands/doctor.ts
+++ b/cli/src/commands/doctor.ts
@@ -14,6 +14,7 @@ import {
  storageCheck,
  type CheckResult,
 } from "../checks/index.js";
+import { loadPaperclipEnvFile } from "../config/env.js";
 import { printPaperclipCliBanner } from "../utils/banner.js";

 const STATUS_ICON = {
@@ -31,6 +32,7 @@ export async function doctor(opts: {
  p.intro(pc.bgCyan(pc.black(" paperclip doctor ")));

  const configPath = resolveConfigPath(opts.config);
+  loadPaperclipEnvFile(configPath);
  const results: CheckResult[] = [];

  // 1. Config check (must pass before others)
@@ -64,28 +66,40 @@ export async function doctor(opts: {
  printResult(deploymentAuthResult);

  // 3. Agent JWT check
-  const jwtResult = agentJwtSecretCheck(opts.config);
-  results.push(jwtResult);
-  printResult(jwtResult);
-  await maybeRepair(jwtResult, opts);
+  results.push(
+    await runRepairableCheck({
+      run: () => agentJwtSecretCheck(opts.config),
+      configPath,
+      opts,
+    }),
+  );

  // 4. Secrets adapter check
-  const secretsResult = secretsCheck(config, configPath);
-  results.push(secretsResult);
-  printResult(secretsResult);
-  await maybeRepair(secretsResult, opts);
+  results.push(
+    await runRepairableCheck({
+      run: () => secretsCheck(config, configPath),
+      configPath,
+      opts,
+    }),
+  );

  // 5. Storage check
-  const storageResult = storageCheck(config, configPath);
-  results.push(storageResult);
-  printResult(storageResult);
-  await maybeRepair(storageResult, opts);
+  results.push(
+    await runRepairableCheck({
+      run: () => storageCheck(config, configPath),
+      configPath,
+      opts,
+    }),
+  );

  // 6. Database check
-  const dbResult = await databaseCheck(config, configPath);
-  results.push(dbResult);
-  printResult(dbResult);
-  await maybeRepair(dbResult, opts);
+  results.push(
+    await runRepairableCheck({
+      run: () => databaseCheck(config, configPath),
+      configPath,
+      opts,
+    }),
+  );

  // 7. LLM check
  const llmResult = await llmCheck(config);
@@ -93,10 +107,13 @@ export async function doctor(opts: {
  printResult(llmResult);

  // 8. Log directory check
-  const logResult = logCheck(config, configPath);
-  results.push(logResult);
-  printResult(logResult);
-  await maybeRepair(logResult, opts);
+  results.push(
+    await runRepairableCheck({
+      run: () => logCheck(config, configPath),
+      configPath,
+      opts,
+    }),
+  );

  // 9. Port check
  const portResult = await portCheck(config);
@@ -118,9 +135,9 @@ function printResult(result: CheckResult): void {
 async function maybeRepair(
  result: CheckResult,
  opts: { repair?: boolean; yes?: boolean },
-): Promise<void> {
-  if (result.status === "pass" || !result.canRepair || !result.repair) return;
-  if (!opts.repair) return;
+): Promise<boolean> {
+  if (result.status === "pass" || !result.canRepair || !result.repair) return false;
+  if (!opts.repair) return false;

  let shouldRepair = opts.yes;
  if (!shouldRepair) {
@@ -128,7 +145,7 @@ async function maybeRepair(
      message: `Repair "${result.name}"?`,
      initialValue: true,
    });
-    if (p.isCancel(answer)) return;
+    if (p.isCancel(answer)) return false;
    shouldRepair = answer;
  }

@@ -136,10 +153,30 @@ async function maybeRepair(
    try {
      await result.repair();
      p.log.success(`Repaired: ${result.name}`);
+      return true;
    } catch (err) {
      p.log.error(`Repair failed: ${err instanceof Error ? err.message : String(err)}`);
    }
  }
+  return false;
+}
+
+async function runRepairableCheck(input: {
+  run: () => CheckResult | Promise<CheckResult>;
+  configPath: string;
+  opts: { repair?: boolean; yes?: boolean };
+}): Promise<CheckResult> {
+  let result = await input.run();
+  printResult(result);
+
+  const repaired = await maybeRepair(result, input.opts);
+  if (!repaired) return result;
+
+  // Repairs may create/update the adjacent .env file or other local resources.
+  loadPaperclipEnvFile(input.configPath);
+  result = await input.run();
+  printResult(result);
+  return result;
 }

 function printSummary(results: CheckResult[]): { passed: number; warned: number; failed: number } {
--- a/cli/src/commands/env.ts
+++ b/cli/src/commands/env.ts
@@ -118,6 +118,29 @@ function collectDeploymentEnvRows(config: PaperclipConfig | null, configPath: st
  const dbUrl = process.env.DATABASE_URL ?? config?.database?.connectionString ?? "";
  const databaseMode = config?.database?.mode ?? "embedded-postgres";
  const dbUrlSource: EnvSource = process.env.DATABASE_URL ? "env" : config?.database?.connectionString ? "config" : "missing";
+  const publicUrl =
+    process.env.PAPERCLIP_PUBLIC_URL ??
+    process.env.PAPERCLIP_AUTH_PUBLIC_BASE_URL ??
+    process.env.BETTER_AUTH_URL ??
+    process.env.BETTER_AUTH_BASE_URL ??
+    config?.auth?.publicBaseUrl ??
+    "";
+  const publicUrlSource: EnvSource =
+    process.env.PAPERCLIP_PUBLIC_URL
+      ? "env"
+      : process.env.PAPERCLIP_AUTH_PUBLIC_BASE_URL || process.env.BETTER_AUTH_URL || process.env.BETTER_AUTH_BASE_URL
+        ? "env"
+        : config?.auth?.publicBaseUrl
+          ? "config"
+          : "missing";
+  let trustedOriginsDefault = "";
+  if (publicUrl) {
+    try {
+      trustedOriginsDefault = new URL(publicUrl).origin;
+    } catch {
+      trustedOriginsDefault = "";
+    }
+  }

  const heartbeatInterval = process.env.HEARTBEAT_SCHEDULER_INTERVAL_MS ?? DEFAULT_HEARTBEAT_SCHEDULER_INTERVAL_MS;
  const heartbeatEnabled = process.env.HEARTBEAT_SCHEDULER_ENABLED ?? "true";
@@ -192,6 +215,24 @@ function collectDeploymentEnvRows(config: PaperclipConfig | null, configPath: st
      required: false,
      note: "HTTP listen port",
    },
+    {
+      key: "PAPERCLIP_PUBLIC_URL",
+      value: publicUrl,
+      source: publicUrlSource,
+      required: false,
+      note: "Canonical public URL for auth/callback/invite origin wiring",
+    },
+    {
+      key: "BETTER_AUTH_TRUSTED_ORIGINS",
+      value: process.env.BETTER_AUTH_TRUSTED_ORIGINS ?? trustedOriginsDefault,
+      source: process.env.BETTER_AUTH_TRUSTED_ORIGINS
+        ? "env"
+        : trustedOriginsDefault
+          ? "default"
+          : "missing",
+      required: false,
+      note: "Comma-separated auth origin allowlist (auto-derived from PAPERCLIP_PUBLIC_URL when possible)",
+    },
    {
      key: "PAPERCLIP_AGENT_JWT_TTL_SECONDS",
      value: process.env.PAPERCLIP_AGENT_JWT_TTL_SECONDS ?? DEFAULT_AGENT_JWT_TTL_SECONDS,
--- a/cli/src/commands/onboard.ts
+++ b/cli/src/commands/onboard.ts
@@ -1,5 +1,18 @@
 import * as p from "@clack/prompts";
+import path from "node:path";
 import pc from "picocolors";
+import {
+  AUTH_BASE_URL_MODES,
+  DEPLOYMENT_EXPOSURES,
+  DEPLOYMENT_MODES,
+  SECRET_PROVIDERS,
+  STORAGE_PROVIDERS,
+  type AuthBaseUrlMode,
+  type DeploymentExposure,
+  type DeploymentMode,
+  type SecretProvider,
+  type StorageProvider,
+} from "@paperclipai/shared";
 import { configExists, readConfig, resolveConfigPath, writeConfig } from "../config/store.js";
 import type { PaperclipConfig } from "../config/schema.js";
 import { ensureAgentJwtSecret, resolveAgentJwtEnvFile } from "../config/env.js";
@@ -12,6 +25,8 @@ import { defaultStorageConfig, promptStorage } from "../prompts/storage.js";
 import { promptServer } from "../prompts/server.js";
 import {
  describeLocalInstancePaths,
+  expandHomePrefix,
+  resolveDefaultBackupDir,
  resolveDefaultEmbeddedPostgresDir,
  resolveDefaultLogsDir,
  resolvePaperclipInstanceId,
@@ -28,32 +43,194 @@ type OnboardOptions = {
  invokedByRun?: boolean;
 };

-function quickstartDefaults(): Pick<PaperclipConfig, "database" | "logging" | "server" | "auth" | "storage" | "secrets"> {
+type OnboardDefaults = Pick<PaperclipConfig, "database" | "logging" | "server" | "auth" | "storage" | "secrets">;
+
+const ONBOARD_ENV_KEYS = [
+  "PAPERCLIP_PUBLIC_URL",
+  "DATABASE_URL",
+  "PAPERCLIP_DB_BACKUP_ENABLED",
+  "PAPERCLIP_DB_BACKUP_INTERVAL_MINUTES",
+  "PAPERCLIP_DB_BACKUP_RETENTION_DAYS",
+  "PAPERCLIP_DB_BACKUP_DIR",
+  "PAPERCLIP_DEPLOYMENT_MODE",
+  "PAPERCLIP_DEPLOYMENT_EXPOSURE",
+  "HOST",
+  "PORT",
+  "SERVE_UI",
+  "PAPERCLIP_ALLOWED_HOSTNAMES",
+  "PAPERCLIP_AUTH_BASE_URL_MODE",
+  "PAPERCLIP_AUTH_PUBLIC_BASE_URL",
+  "BETTER_AUTH_URL",
+  "BETTER_AUTH_BASE_URL",
+  "PAPERCLIP_STORAGE_PROVIDER",
+  "PAPERCLIP_STORAGE_LOCAL_DIR",
+  "PAPERCLIP_STORAGE_S3_BUCKET",
+  "PAPERCLIP_STORAGE_S3_REGION",
+  "PAPERCLIP_STORAGE_S3_ENDPOINT",
+  "PAPERCLIP_STORAGE_S3_PREFIX",
+  "PAPERCLIP_STORAGE_S3_FORCE_PATH_STYLE",
+  "PAPERCLIP_SECRETS_PROVIDER",
+  "PAPERCLIP_SECRETS_STRICT_MODE",
+  "PAPERCLIP_SECRETS_MASTER_KEY_FILE",
+] as const;
+
+function parseBooleanFromEnv(rawValue: string | undefined): boolean | null {
+  if (rawValue === undefined) return null;
+  const lower = rawValue.trim().toLowerCase();
+  if (lower === "true" || lower === "1" || lower === "yes") return true;
+  if (lower === "false" || lower === "0" || lower === "no") return false;
+  return null;
+}
+
+function parseNumberFromEnv(rawValue: string | undefined): number | null {
+  if (!rawValue) return null;
+  const parsed = Number(rawValue);
+  if (!Number.isFinite(parsed)) return null;
+  return parsed;
+}
+
+function parseEnumFromEnv<T extends string>(rawValue: string | undefined, allowedValues: readonly T[]): T | null {
+  if (!rawValue) return null;
+  return allowedValues.includes(rawValue as T) ? (rawValue as T) : null;
+}
+
+function resolvePathFromEnv(rawValue: string | undefined): string | null {
+  if (!rawValue || rawValue.trim().length === 0) return null;
+  return path.resolve(expandHomePrefix(rawValue.trim()));
+}
+
+function quickstartDefaultsFromEnv(): {
+  defaults: OnboardDefaults;
+  usedEnvKeys: string[];
+  ignoredEnvKeys: Array<{ key: string; reason: string }>;
+} {
  const instanceId = resolvePaperclipInstanceId();
-  return {
+  const defaultStorage = defaultStorageConfig();
+  const defaultSecrets = defaultSecretsConfig();
+  const databaseUrl = process.env.DATABASE_URL?.trim() || undefined;
+  const publicUrl =
+    process.env.PAPERCLIP_PUBLIC_URL?.trim() ||
+    process.env.PAPERCLIP_AUTH_PUBLIC_BASE_URL?.trim() ||
+    process.env.BETTER_AUTH_URL?.trim() ||
+    process.env.BETTER_AUTH_BASE_URL?.trim() ||
+    undefined;
+  const deploymentMode =
+    parseEnumFromEnv<DeploymentMode>(process.env.PAPERCLIP_DEPLOYMENT_MODE, DEPLOYMENT_MODES) ?? "local_trusted";
+  const deploymentExposureFromEnv = parseEnumFromEnv<DeploymentExposure>(
+    process.env.PAPERCLIP_DEPLOYMENT_EXPOSURE,
+    DEPLOYMENT_EXPOSURES,
+  );
+  const deploymentExposure =
+    deploymentMode === "local_trusted" ? "private" : (deploymentExposureFromEnv ?? "private");
+  const authPublicBaseUrl = publicUrl;
+  const authBaseUrlModeFromEnv = parseEnumFromEnv<AuthBaseUrlMode>(
+    process.env.PAPERCLIP_AUTH_BASE_URL_MODE,
+    AUTH_BASE_URL_MODES,
+  );
+  const authBaseUrlMode = authBaseUrlModeFromEnv ?? (authPublicBaseUrl ? "explicit" : "auto");
+  const allowedHostnamesFromEnv = process.env.PAPERCLIP_ALLOWED_HOSTNAMES
+    ? process.env.PAPERCLIP_ALLOWED_HOSTNAMES
+      .split(",")
+      .map((value) => value.trim().toLowerCase())
+      .filter((value) => value.length > 0)
+    : [];
+  const hostnameFromPublicUrl = publicUrl
+    ? (() => {
+      try {
+        return new URL(publicUrl).hostname.trim().toLowerCase();
+      } catch {
+        return null;
+      }
+    })()
+    : null;
+  const storageProvider =
+    parseEnumFromEnv<StorageProvider>(process.env.PAPERCLIP_STORAGE_PROVIDER, STORAGE_PROVIDERS) ??
+    defaultStorage.provider;
+  const secretsProvider =
+    parseEnumFromEnv<SecretProvider>(process.env.PAPERCLIP_SECRETS_PROVIDER, SECRET_PROVIDERS) ??
+    defaultSecrets.provider;
+  const databaseBackupEnabled = parseBooleanFromEnv(process.env.PAPERCLIP_DB_BACKUP_ENABLED) ?? true;
+  const databaseBackupIntervalMinutes = Math.max(
+    1,
+    parseNumberFromEnv(process.env.PAPERCLIP_DB_BACKUP_INTERVAL_MINUTES) ?? 60,
+  );
+  const databaseBackupRetentionDays = Math.max(
+    1,
+    parseNumberFromEnv(process.env.PAPERCLIP_DB_BACKUP_RETENTION_DAYS) ?? 30,
+  );
+  const defaults: OnboardDefaults = {
    database: {
-      mode: "embedded-postgres",
+      mode: databaseUrl ? "postgres" : "embedded-postgres",
+      ...(databaseUrl ? { connectionString: databaseUrl } : {}),
      embeddedPostgresDataDir: resolveDefaultEmbeddedPostgresDir(instanceId),
      embeddedPostgresPort: 54329,
+      backup: {
+        enabled: databaseBackupEnabled,
+        intervalMinutes: databaseBackupIntervalMinutes,
+        retentionDays: databaseBackupRetentionDays,
+        dir: resolvePathFromEnv(process.env.PAPERCLIP_DB_BACKUP_DIR) ?? resolveDefaultBackupDir(instanceId),
+      },
    },
    logging: {
      mode: "file",
      logDir: resolveDefaultLogsDir(instanceId),
    },
    server: {
-      deploymentMode: "local_trusted",
-      exposure: "private",
-      host: "127.0.0.1",
-      port: 3100,
-      allowedHostnames: [],
-      serveUi: true,
+      deploymentMode,
+      exposure: deploymentExposure,
+      host: process.env.HOST ?? "127.0.0.1",
+      port: Number(process.env.PORT) || 3100,
+      allowedHostnames: Array.from(new Set([...allowedHostnamesFromEnv, ...(hostnameFromPublicUrl ? [hostnameFromPublicUrl] : [])])),
+      serveUi: parseBooleanFromEnv(process.env.SERVE_UI) ?? true,
    },
    auth: {
-      baseUrlMode: "auto",
+      baseUrlMode: authBaseUrlMode,
+      disableSignUp: false,
+      ...(authPublicBaseUrl ? { publicBaseUrl: authPublicBaseUrl } : {}),
+    },
+    storage: {
+      provider: storageProvider,
+      localDisk: {
+        baseDir:
+          resolvePathFromEnv(process.env.PAPERCLIP_STORAGE_LOCAL_DIR) ?? defaultStorage.localDisk.baseDir,
+      },
+      s3: {
+        bucket: process.env.PAPERCLIP_STORAGE_S3_BUCKET ?? defaultStorage.s3.bucket,
+        region: process.env.PAPERCLIP_STORAGE_S3_REGION ?? defaultStorage.s3.region,
+        endpoint: process.env.PAPERCLIP_STORAGE_S3_ENDPOINT ?? defaultStorage.s3.endpoint,
+        prefix: process.env.PAPERCLIP_STORAGE_S3_PREFIX ?? defaultStorage.s3.prefix,
+        forcePathStyle:
+          parseBooleanFromEnv(process.env.PAPERCLIP_STORAGE_S3_FORCE_PATH_STYLE) ??
+          defaultStorage.s3.forcePathStyle,
+      },
+    },
+    secrets: {
+      provider: secretsProvider,
+      strictMode: parseBooleanFromEnv(process.env.PAPERCLIP_SECRETS_STRICT_MODE) ?? defaultSecrets.strictMode,
+      localEncrypted: {
+        keyFilePath:
+          resolvePathFromEnv(process.env.PAPERCLIP_SECRETS_MASTER_KEY_FILE) ??
+          defaultSecrets.localEncrypted.keyFilePath,
+      },
    },
-    storage: defaultStorageConfig(),
-    secrets: defaultSecretsConfig(),
  };
+  const ignoredEnvKeys: Array<{ key: string; reason: string }> = [];
+  if (deploymentMode === "local_trusted" && process.env.PAPERCLIP_DEPLOYMENT_EXPOSURE !== undefined) {
+    ignoredEnvKeys.push({
+      key: "PAPERCLIP_DEPLOYMENT_EXPOSURE",
+      reason: "Ignored because deployment mode local_trusted always forces private exposure",
+    });
+  }
+
+  const ignoredKeySet = new Set(ignoredEnvKeys.map((entry) => entry.key));
+  const usedEnvKeys = ONBOARD_ENV_KEYS.filter(
+    (key) => process.env[key] !== undefined && !ignoredKeySet.has(key),
+  );
+  return { defaults, usedEnvKeys, ignoredEnvKeys };
+}
+
+function canCreateBootstrapInviteImmediately(config: Pick<PaperclipConfig, "database" | "server">): boolean {
+  return config.server.deploymentMode === "authenticated" && config.database.mode !== "embedded-postgres";
 }

 export async function onboard(opts: OnboardOptions): Promise<void> {
@@ -109,6 +286,7 @@ export async function onboard(opts: OnboardOptions): Promise<void> {
  }

  let llm: PaperclipConfig["llm"] | undefined;
+  const { defaults: derivedDefaults, usedEnvKeys, ignoredEnvKeys } = quickstartDefaultsFromEnv();
  let {
    database,
    logging,
@@ -116,11 +294,11 @@ export async function onboard(opts: OnboardOptions): Promise<void> {
    auth,
    storage,
    secrets,
-  } = quickstartDefaults();
+  } = derivedDefaults;

  if (setupMode === "advanced") {
    p.log.step(pc.bold("Database"));
-    database = await promptDatabase();
+    database = await promptDatabase(database);

    if (database.mode === "postgres" && database.connectionString) {
      const s = p.spinner();
@@ -184,13 +362,20 @@ export async function onboard(opts: OnboardOptions): Promise<void> {
    logging = await promptLogging();

    p.log.step(pc.bold("Server"));
-    ({ server, auth } = await promptServer());
+    ({ server, auth } = await promptServer({ currentServer: server, currentAuth: auth }));

    p.log.step(pc.bold("Storage"));
-    storage = await promptStorage(defaultStorageConfig());
+    storage = await promptStorage(storage);

    p.log.step(pc.bold("Secrets"));
-    secrets = defaultSecretsConfig();
+    const secretsDefaults = defaultSecretsConfig();
+    secrets = {
+      provider: secrets.provider ?? secretsDefaults.provider,
+      strictMode: secrets.strictMode ?? secretsDefaults.strictMode,
+      localEncrypted: {
+        keyFilePath: secrets.localEncrypted?.keyFilePath ?? secretsDefaults.localEncrypted.keyFilePath,
+      },
+    };
    p.log.message(
      pc.dim(
        `Using defaults: provider=${secrets.provider}, strictMode=${secrets.strictMode}, keyFile=${secrets.localEncrypted.keyFilePath}`,
@@ -198,9 +383,17 @@ export async function onboard(opts: OnboardOptions): Promise<void> {
    );
  } else {
    p.log.step(pc.bold("Quickstart"));
-    p.log.message(
-      pc.dim("Using local defaults: embedded database, no LLM provider, file storage, and local encrypted secrets."),
-    );
+    p.log.message(pc.dim("Using quickstart defaults."));
+    if (usedEnvKeys.length > 0) {
+      p.log.message(pc.dim(`Environment-aware defaults active (${usedEnvKeys.length} env var(s) detected).`));
+    } else {
+      p.log.message(
+        pc.dim("No environment overrides detected: embedded database, file storage, local encrypted secrets."),
+      );
+    }
+    for (const ignored of ignoredEnvKeys) {
+      p.log.message(pc.dim(`Ignored ${ignored.key}: ${ignored.reason}`));
+    }
  }

  const jwtSecret = ensureAgentJwtSecret(configPath);
@@ -261,7 +454,7 @@ export async function onboard(opts: OnboardOptions): Promise<void> {
    "Next commands",
  );

-  if (server.deploymentMode === "authenticated") {
+  if (canCreateBootstrapInviteImmediately({ database, server })) {
    p.log.step("Generating bootstrap CEO invite");
    await bootstrapCeoInvite({ config: configPath });
  }
@@ -284,5 +477,15 @@ export async function onboard(opts: OnboardOptions): Promise<void> {
    return;
  }

+  if (server.deploymentMode === "authenticated" && database.mode === "embedded-postgres") {
+    p.log.info(
+      [
+        "Bootstrap CEO invite will be created after the server starts.",
+        `Next: ${pc.cyan("paperclipai run")}`,
+        `Then: ${pc.cyan("paperclipai auth bootstrap-ceo")}`,
+      ].join("\n"),
+    );
+  }
+
  p.outro("You're all set!");
 }
--- a/cli/src/commands/run.ts
+++ b/cli/src/commands/run.ts
@@ -3,9 +3,13 @@ import path from "node:path";
 import { fileURLToPath, pathToFileURL } from "node:url";
 import * as p from "@clack/prompts";
 import pc from "picocolors";
+import { bootstrapCeoInvite } from "./auth-bootstrap-ceo.js";
 import { onboard } from "./onboard.js";
 import { doctor } from "./doctor.js";
+import { loadPaperclipEnvFile } from "../config/env.js";
 import { configExists, resolveConfigPath } from "../config/store.js";
+import type { PaperclipConfig } from "../config/schema.js";
+import { readConfig } from "../config/store.js";
 import {
  describeLocalInstancePaths,
  resolvePaperclipHomeDir,
@@ -19,6 +23,13 @@ interface RunOptions {
  yes?: boolean;
 }

+interface StartedServer {
+  apiUrl: string;
+  databaseUrl: string;
+  host: string;
+  listenPort: number;
+}
+
 export async function runCommand(opts: RunOptions): Promise<void> {
  const instanceId = resolvePaperclipInstanceId(opts.instance);
  process.env.PAPERCLIP_INSTANCE_ID = instanceId;
@@ -31,6 +42,7 @@ export async function runCommand(opts: RunOptions): Promise<void> {

  const configPath = resolveConfigPath(opts.config);
  process.env.PAPERCLIP_CONFIG = configPath;
+  loadPaperclipEnvFile(configPath);

  p.intro(pc.bgCyan(pc.black(" paperclipai run ")));
  p.log.message(pc.dim(`Home: ${paths.homeDir}`));
@@ -60,8 +72,41 @@ export async function runCommand(opts: RunOptions): Promise<void> {
    process.exit(1);
  }

+  const config = readConfig(configPath);
+  if (!config) {
+    p.log.error(`No config found at ${configPath}.`);
+    process.exit(1);
+  }
+
  p.log.step("Starting Paperclip server...");
-  await importServerEntry();
+  const startedServer = await importServerEntry();
+
+  if (shouldGenerateBootstrapInviteAfterStart(config)) {
+    p.log.step("Generating bootstrap CEO invite");
+    await bootstrapCeoInvite({
+      config: configPath,
+      dbUrl: startedServer.databaseUrl,
+      baseUrl: resolveBootstrapInviteBaseUrl(config, startedServer),
+    });
+  }
+}
+
+function resolveBootstrapInviteBaseUrl(
+  config: PaperclipConfig,
+  startedServer: StartedServer,
+): string {
+  const explicitBaseUrl =
+    process.env.PAPERCLIP_PUBLIC_URL ??
+    process.env.PAPERCLIP_AUTH_PUBLIC_BASE_URL ??
+    process.env.BETTER_AUTH_URL ??
+    process.env.BETTER_AUTH_BASE_URL ??
+    (config.auth.baseUrlMode === "explicit" ? config.auth.publicBaseUrl : undefined);
+
+  if (typeof explicitBaseUrl === "string" && explicitBaseUrl.trim().length > 0) {
+    return explicitBaseUrl.trim().replace(/\/+$/, "");
+  }
+
+  return startedServer.apiUrl.replace(/\/api$/, "");
 }

 function formatError(err: unknown): string {
@@ -84,6 +129,15 @@ function isModuleNotFoundError(err: unknown): boolean {
  return err.message.includes("Cannot find module");
 }

+function getMissingModuleSpecifier(err: unknown): string | null {
+  if (!(err instanceof Error)) return null;
+  const packageMatch = err.message.match(/Cannot find package '([^']+)' imported from/);
+  if (packageMatch?.[1]) return packageMatch[1];
+  const moduleMatch = err.message.match(/Cannot find module '([^']+)'/);
+  if (moduleMatch?.[1]) return moduleMatch[1];
+  return null;
+}
+
 function maybeEnableUiDevMiddleware(entrypoint: string): void {
  if (process.env.PAPERCLIP_UI_DEV_MIDDLEWARE !== undefined) return;
  const normalized = entrypoint.replaceAll("\\", "/");
@@ -92,24 +146,45 @@ function maybeEnableUiDevMiddleware(entrypoint: string): void {
  }
 }

-async function importServerEntry(): Promise<void> {
+async function importServerEntry(): Promise<StartedServer> {
  // Dev mode: try local workspace path (monorepo with tsx)
  const projectRoot = path.resolve(path.dirname(fileURLToPath(import.meta.url)), "../../..");
  const devEntry = path.resolve(projectRoot, "server/src/index.ts");
  if (fs.existsSync(devEntry)) {
    maybeEnableUiDevMiddleware(devEntry);
-    await import(pathToFileURL(devEntry).href);
-    return;
+    const mod = await import(pathToFileURL(devEntry).href);
+    return await startServerFromModule(mod, devEntry);
  }

  // Production mode: import the published @paperclipai/server package
  try {
-    await import("@paperclipai/server");
+    const mod = await import("@paperclipai/server");
+    return await startServerFromModule(mod, "@paperclipai/server");
  } catch (err) {
+    const missingSpecifier = getMissingModuleSpecifier(err);
+    const missingServerEntrypoint = !missingSpecifier || missingSpecifier === "@paperclipai/server";
+    if (isModuleNotFoundError(err) && missingServerEntrypoint) {
+      throw new Error(
+        `Could not locate a Paperclip server entrypoint.\n` +
+          `Tried: ${devEntry}, @paperclipai/server\n` +
+          `${formatError(err)}`,
+      );
+    }
    throw new Error(
-      `Could not locate a Paperclip server entrypoint.\n` +
-        `Tried: ${devEntry}, @paperclipai/server\n` +
+      `Paperclip server failed to start.\n` +
        `${formatError(err)}`,
    );
  }
 }
+
+function shouldGenerateBootstrapInviteAfterStart(config: PaperclipConfig): boolean {
+  return config.server.deploymentMode === "authenticated" && config.database.mode === "embedded-postgres";
+}
+
+async function startServerFromModule(mod: unknown, label: string): Promise<StartedServer> {
+  const startServer = (mod as { startServer?: () => Promise<StartedServer> }).startServer;
+  if (typeof startServer !== "function") {
+    throw new Error(`Paperclip server entrypoint did not export startServer(): ${label}`);
+  }
+  return await startServer();
+}
--- a/cli/src/commands/worktree-lib.ts
+++ b/cli/src/commands/worktree-lib.ts
@@ -0,0 +1,274 @@
+import { randomInt } from "node:crypto";
+import path from "node:path";
+import type { PaperclipConfig } from "../config/schema.js";
+import { expandHomePrefix } from "../config/home.js";
+
+export const DEFAULT_WORKTREE_HOME = "~/.paperclip-worktrees";
+export const WORKTREE_SEED_MODES = ["minimal", "full"] as const;
+
+export type WorktreeSeedMode = (typeof WORKTREE_SEED_MODES)[number];
+
+export type WorktreeSeedPlan = {
+  mode: WorktreeSeedMode;
+  excludedTables: string[];
+  nullifyColumns: Record<string, string[]>;
+};
+
+const MINIMAL_WORKTREE_EXCLUDED_TABLES = [
+  "activity_log",
+  "agent_runtime_state",
+  "agent_task_sessions",
+  "agent_wakeup_requests",
+  "cost_events",
+  "heartbeat_run_events",
+  "heartbeat_runs",
+  "workspace_runtime_services",
+];
+
+const MINIMAL_WORKTREE_NULLIFIED_COLUMNS: Record<string, string[]> = {
+  issues: ["checkout_run_id", "execution_run_id"],
+};
+
+export type WorktreeLocalPaths = {
+  cwd: string;
+  repoConfigDir: string;
+  configPath: string;
+  envPath: string;
+  homeDir: string;
+  instanceId: string;
+  instanceRoot: string;
+  contextPath: string;
+  embeddedPostgresDataDir: string;
+  backupDir: string;
+  logDir: string;
+  secretsKeyFilePath: string;
+  storageDir: string;
+};
+
+export type WorktreeUiBranding = {
+  name: string;
+  color: string;
+};
+
+export function isWorktreeSeedMode(value: string): value is WorktreeSeedMode {
+  return (WORKTREE_SEED_MODES as readonly string[]).includes(value);
+}
+
+export function resolveWorktreeSeedPlan(mode: WorktreeSeedMode): WorktreeSeedPlan {
+  if (mode === "full") {
+    return {
+      mode,
+      excludedTables: [],
+      nullifyColumns: {},
+    };
+  }
+  return {
+    mode,
+    excludedTables: [...MINIMAL_WORKTREE_EXCLUDED_TABLES],
+    nullifyColumns: {
+      ...MINIMAL_WORKTREE_NULLIFIED_COLUMNS,
+    },
+  };
+}
+
+function nonEmpty(value: string | null | undefined): string | null {
+  return typeof value === "string" && value.trim().length > 0 ? value.trim() : null;
+}
+
+function isLoopbackHost(hostname: string): boolean {
+  const value = hostname.trim().toLowerCase();
+  return value === "127.0.0.1" || value === "localhost" || value === "::1";
+}
+
+export function sanitizeWorktreeInstanceId(rawValue: string): string {
+  const trimmed = rawValue.trim().toLowerCase();
+  const normalized = trimmed
+    .replace(/[^a-z0-9_-]+/g, "-")
+    .replace(/-+/g, "-")
+    .replace(/^[-_]+|[-_]+$/g, "");
+  return normalized || "worktree";
+}
+
+export function resolveSuggestedWorktreeName(cwd: string, explicitName?: string): string {
+  return nonEmpty(explicitName) ?? path.basename(path.resolve(cwd));
+}
+
+function hslComponentToHex(n: number): string {
+  return Math.round(Math.max(0, Math.min(255, n)))
+    .toString(16)
+    .padStart(2, "0");
+}
+
+function hslToHex(hue: number, saturation: number, lightness: number): string {
+  const s = Math.max(0, Math.min(100, saturation)) / 100;
+  const l = Math.max(0, Math.min(100, lightness)) / 100;
+  const c = (1 - Math.abs((2 * l) - 1)) * s;
+  const h = ((hue % 360) + 360) % 360;
+  const x = c * (1 - Math.abs(((h / 60) % 2) - 1));
+  const m = l - (c / 2);
+
+  let r = 0;
+  let g = 0;
+  let b = 0;
+
+  if (h < 60) {
+    r = c;
+    g = x;
+  } else if (h < 120) {
+    r = x;
+    g = c;
+  } else if (h < 180) {
+    g = c;
+    b = x;
+  } else if (h < 240) {
+    g = x;
+    b = c;
+  } else if (h < 300) {
+    r = x;
+    b = c;
+  } else {
+    r = c;
+    b = x;
+  }
+
+  return `#${hslComponentToHex((r + m) * 255)}${hslComponentToHex((g + m) * 255)}${hslComponentToHex((b + m) * 255)}`;
+}
+
+export function generateWorktreeColor(): string {
+  return hslToHex(randomInt(0, 360), 68, 56);
+}
+
+export function resolveWorktreeLocalPaths(opts: {
+  cwd: string;
+  homeDir?: string;
+  instanceId: string;
+}): WorktreeLocalPaths {
+  const cwd = path.resolve(opts.cwd);
+  const homeDir = path.resolve(expandHomePrefix(opts.homeDir ?? DEFAULT_WORKTREE_HOME));
+  const instanceRoot = path.resolve(homeDir, "instances", opts.instanceId);
+  const repoConfigDir = path.resolve(cwd, ".paperclip");
+  return {
+    cwd,
+    repoConfigDir,
+    configPath: path.resolve(repoConfigDir, "config.json"),
+    envPath: path.resolve(repoConfigDir, ".env"),
+    homeDir,
+    instanceId: opts.instanceId,
+    instanceRoot,
+    contextPath: path.resolve(homeDir, "context.json"),
+    embeddedPostgresDataDir: path.resolve(instanceRoot, "db"),
+    backupDir: path.resolve(instanceRoot, "data", "backups"),
+    logDir: path.resolve(instanceRoot, "logs"),
+    secretsKeyFilePath: path.resolve(instanceRoot, "secrets", "master.key"),
+    storageDir: path.resolve(instanceRoot, "data", "storage"),
+  };
+}
+
+export function rewriteLocalUrlPort(rawUrl: string | undefined, port: number): string | undefined {
+  if (!rawUrl) return undefined;
+  try {
+    const parsed = new URL(rawUrl);
+    if (!isLoopbackHost(parsed.hostname)) return rawUrl;
+    parsed.port = String(port);
+    return parsed.toString();
+  } catch {
+    return rawUrl;
+  }
+}
+
+export function buildWorktreeConfig(input: {
+  sourceConfig: PaperclipConfig | null;
+  paths: WorktreeLocalPaths;
+  serverPort: number;
+  databasePort: number;
+  now?: Date;
+}): PaperclipConfig {
+  const { sourceConfig, paths, serverPort, databasePort } = input;
+  const nowIso = (input.now ?? new Date()).toISOString();
+
+  const source = sourceConfig;
+  const authPublicBaseUrl = rewriteLocalUrlPort(source?.auth.publicBaseUrl, serverPort);
+
+  return {
+    $meta: {
+      version: 1,
+      updatedAt: nowIso,
+      source: "configure",
+    },
+    ...(source?.llm ? { llm: source.llm } : {}),
+    database: {
+      mode: "embedded-postgres",
+      embeddedPostgresDataDir: paths.embeddedPostgresDataDir,
+      embeddedPostgresPort: databasePort,
+      backup: {
+        enabled: source?.database.backup.enabled ?? true,
+        intervalMinutes: source?.database.backup.intervalMinutes ?? 60,
+        retentionDays: source?.database.backup.retentionDays ?? 30,
+        dir: paths.backupDir,
+      },
+    },
+    logging: {
+      mode: source?.logging.mode ?? "file",
+      logDir: paths.logDir,
+    },
+    server: {
+      deploymentMode: source?.server.deploymentMode ?? "local_trusted",
+      exposure: source?.server.exposure ?? "private",
+      host: source?.server.host ?? "127.0.0.1",
+      port: serverPort,
+      allowedHostnames: source?.server.allowedHostnames ?? [],
+      serveUi: source?.server.serveUi ?? true,
+    },
+    auth: {
+      baseUrlMode: source?.auth.baseUrlMode ?? "auto",
+      ...(authPublicBaseUrl ? { publicBaseUrl: authPublicBaseUrl } : {}),
+      disableSignUp: source?.auth.disableSignUp ?? false,
+    },
+    storage: {
+      provider: source?.storage.provider ?? "local_disk",
+      localDisk: {
+        baseDir: paths.storageDir,
+      },
+      s3: {
+        bucket: source?.storage.s3.bucket ?? "paperclip",
+        region: source?.storage.s3.region ?? "us-east-1",
+        endpoint: source?.storage.s3.endpoint,
+        prefix: source?.storage.s3.prefix ?? "",
+        forcePathStyle: source?.storage.s3.forcePathStyle ?? false,
+      },
+    },
+    secrets: {
+      provider: source?.secrets.provider ?? "local_encrypted",
+      strictMode: source?.secrets.strictMode ?? false,
+      localEncrypted: {
+        keyFilePath: paths.secretsKeyFilePath,
+      },
+    },
+  };
+}
+
+export function buildWorktreeEnvEntries(
+  paths: WorktreeLocalPaths,
+  branding?: WorktreeUiBranding,
+): Record<string, string> {
+  return {
+    PAPERCLIP_HOME: paths.homeDir,
+    PAPERCLIP_INSTANCE_ID: paths.instanceId,
+    PAPERCLIP_CONFIG: paths.configPath,
+    PAPERCLIP_CONTEXT: paths.contextPath,
+    PAPERCLIP_IN_WORKTREE: "true",
+    ...(branding?.name ? { PAPERCLIP_WORKTREE_NAME: branding.name } : {}),
+    ...(branding?.color ? { PAPERCLIP_WORKTREE_COLOR: branding.color } : {}),
+  };
+}
+
+function shellEscape(value: string): string {
+  return `'${value.replaceAll("'", `'\"'\"'`)}'`;
+}
+
+export function formatShellExports(entries: Record<string, string>): string {
+  return Object.entries(entries)
+    .filter(([, value]) => typeof value === "string" && value.trim().length > 0)
+    .map(([key, value]) => `export ${key}=${shellEscape(value)}`)
+    .join("\n");
+}
--- a/cli/src/commands/worktree.ts
+++ b/cli/src/commands/worktree.ts
--- a/cli/src/config/env.ts
+++ b/cli/src/config/env.ts
@@ -22,20 +22,35 @@ function parseEnvFile(contents: string) {
  }
 }

+function formatEnvValue(value: string): string {
+  if (/^[A-Za-z0-9_./:@-]+$/.test(value)) {
+    return value;
+  }
+  return JSON.stringify(value);
+}
+
 function renderEnvFile(entries: Record<string, string>) {
  const lines = [
    "# Paperclip environment variables",
-    "# Generated by `paperclipai onboard`",
-    ...Object.entries(entries).map(([key, value]) => `${key}=${value}`),
+    "# Generated by Paperclip CLI commands",
+    ...Object.entries(entries).map(([key, value]) => `${key}=${formatEnvValue(value)}`),
    "",
  ];
  return lines.join("\n");
 }

+export function resolvePaperclipEnvFile(configPath?: string): string {
+  return resolveEnvFilePath(configPath);
+}
+
 export function resolveAgentJwtEnvFile(configPath?: string): string {
  return resolveEnvFilePath(configPath);
 }

+export function loadPaperclipEnvFile(configPath?: string): void {
+  loadAgentJwtEnvFile(resolveEnvFilePath(configPath));
+}
+
 export function loadAgentJwtEnvFile(filePath = resolveEnvFilePath()): void {
  if (loadedEnvFiles.has(filePath)) return;

@@ -78,13 +93,33 @@ export function ensureAgentJwtSecret(configPath?: string): { secret: string; cre
 }

 export function writeAgentJwtEnv(secret: string, filePath = resolveEnvFilePath()): void {
+  mergePaperclipEnvEntries({ [JWT_SECRET_ENV_KEY]: secret }, filePath);
+}
+
+export function readPaperclipEnvEntries(filePath = resolveEnvFilePath()): Record<string, string> {
+  if (!fs.existsSync(filePath)) return {};
+  return parseEnvFile(fs.readFileSync(filePath, "utf-8"));
+}
+
+export function writePaperclipEnvEntries(entries: Record<string, string>, filePath = resolveEnvFilePath()): void {
  const dir = path.dirname(filePath);
  fs.mkdirSync(dir, { recursive: true });
-
-  const current = fs.existsSync(filePath) ? parseEnvFile(fs.readFileSync(filePath, "utf-8")) : {};
-  current[JWT_SECRET_ENV_KEY] = secret;
-
-  fs.writeFileSync(filePath, renderEnvFile(current), {
+  fs.writeFileSync(filePath, renderEnvFile(entries), {
    mode: 0o600,
  });
 }
+
+export function mergePaperclipEnvEntries(
+  entries: Record<string, string>,
+  filePath = resolveEnvFilePath(),
+): Record<string, string> {
+  const current = readPaperclipEnvEntries(filePath);
+  const next = {
+    ...current,
+    ...Object.fromEntries(
+      Object.entries(entries).filter(([, value]) => typeof value === "string" && value.trim().length > 0),
+    ),
+  };
+  writePaperclipEnvEntries(next, filePath);
+  return next;
+}
--- a/cli/src/config/home.ts
+++ b/cli/src/config/home.ts
@@ -49,6 +49,10 @@ export function resolveDefaultStorageDir(instanceId?: string): string {
  return path.resolve(resolvePaperclipInstanceRoot(instanceId), "data", "storage");
 }

+export function resolveDefaultBackupDir(instanceId?: string): string {
+  return path.resolve(resolvePaperclipInstanceRoot(instanceId), "data", "backups");
+}
+
 export function expandHomePrefix(value: string): string {
  if (value === "~") return os.homedir();
  if (value.startsWith("~/")) return path.resolve(os.homedir(), value.slice(2));
@@ -64,6 +68,7 @@ export function describeLocalInstancePaths(instanceId?: string) {
    instanceRoot,
    configPath: resolveDefaultConfigPath(resolvedInstanceId),
    embeddedPostgresDataDir: resolveDefaultEmbeddedPostgresDir(resolvedInstanceId),
+    backupDir: resolveDefaultBackupDir(resolvedInstanceId),
    logDir: resolveDefaultLogsDir(resolvedInstanceId),
    secretsKeyFilePath: resolveDefaultSecretsKeyFilePath(resolvedInstanceId),
    storageDir: resolveDefaultStorageDir(resolvedInstanceId),
--- a/cli/src/config/schema.ts
+++ b/cli/src/config/schema.ts
@@ -2,6 +2,7 @@ export {
  paperclipConfigSchema,
  configMetaSchema,
  llmConfigSchema,
+  databaseBackupConfigSchema,
  databaseConfigSchema,
  loggingConfigSchema,
  serverConfigSchema,
@@ -13,6 +14,7 @@ export {
  secretsLocalEncryptedConfigSchema,
  type PaperclipConfig,
  type LlmConfig,
+  type DatabaseBackupConfig,
  type DatabaseConfig,
  type LoggingConfig,
  type ServerConfig,
--- a/cli/src/index.ts
+++ b/cli/src/index.ts
@@ -7,6 +7,7 @@ import { addAllowedHostname } from "./commands/allowed-hostname.js";
 import { heartbeatRun } from "./commands/heartbeat-run.js";
 import { runCommand } from "./commands/run.js";
 import { bootstrapCeoInvite } from "./commands/auth-bootstrap-ceo.js";
+import { dbBackupCommand } from "./commands/db-backup.js";
 import { registerContextCommands } from "./commands/client/context.js";
 import { registerCompanyCommands } from "./commands/client/company.js";
 import { registerIssueCommands } from "./commands/client/issue.js";
@@ -15,6 +16,9 @@ import { registerApprovalCommands } from "./commands/client/approval.js";
 import { registerActivityCommands } from "./commands/client/activity.js";
 import { registerDashboardCommands } from "./commands/client/dashboard.js";
 import { applyDataDirOverride, type DataDirOptionLike } from "./config/data-dir.js";
+import { loadPaperclipEnvFile } from "./config/env.js";
+import { registerWorktreeCommands } from "./commands/worktree.js";
+import { registerPluginCommands } from "./commands/client/plugin.js";

 const program = new Command();
 const DATA_DIR_OPTION_HELP =
@@ -23,7 +27,7 @@ const DATA_DIR_OPTION_HELP =
 program
  .name("paperclipai")
  .description("Paperclip CLI — setup, diagnose, and configure your instance")
-  .version("0.2.0");
+  .version("0.2.7");

 program.hook("preAction", (_thisCommand, actionCommand) => {
  const options = actionCommand.optsWithGlobals() as DataDirOptionLike;
@@ -32,6 +36,7 @@ program.hook("preAction", (_thisCommand, actionCommand) => {
    hasConfigOption: optionNames.has("config"),
    hasContextOption: optionNames.has("context"),
  });
+  loadPaperclipEnvFile(options.config);
 });

 program
@@ -70,6 +75,19 @@ program
  .option("-s, --section <section>", "Section to configure (llm, database, logging, server, storage, secrets)")
  .action(configure);

+program
+  .command("db:backup")
+  .description("Create a one-off database backup using current config")
+  .option("-c, --config <path>", "Path to config file")
+  .option("-d, --data-dir <path>", DATA_DIR_OPTION_HELP)
+  .option("--dir <path>", "Backup output directory (overrides config)")
+  .option("--retention-days <days>", "Retention window used for pruning", (value) => Number(value))
+  .option("--filename-prefix <prefix>", "Backup filename prefix", "paperclip")
+  .option("--json", "Print backup metadata as JSON")
+  .action(async (opts) => {
+    await dbBackupCommand(opts);
+  });
+
 program
  .command("allowed-hostname")
  .description("Allow a hostname for authenticated/private mode access")
@@ -118,6 +136,8 @@ registerAgentCommands(program);
 registerApprovalCommands(program);
 registerActivityCommands(program);
 registerDashboardCommands(program);
+registerWorktreeCommands(program);
+registerPluginCommands(program);

 const auth = program.command("auth").description("Authentication and bootstrap utilities");

--- a/cli/src/prompts/database.ts
+++ b/cli/src/prompts/database.ts
@@ -1,9 +1,26 @@
 import * as p from "@clack/prompts";
 import type { DatabaseConfig } from "../config/schema.js";
-import { resolveDefaultEmbeddedPostgresDir, resolvePaperclipInstanceId } from "../config/home.js";
+import {
+  resolveDefaultBackupDir,
+  resolveDefaultEmbeddedPostgresDir,
+  resolvePaperclipInstanceId,
+} from "../config/home.js";

-export async function promptDatabase(): Promise<DatabaseConfig> {
-  const defaultEmbeddedDir = resolveDefaultEmbeddedPostgresDir(resolvePaperclipInstanceId());
+export async function promptDatabase(current?: DatabaseConfig): Promise<DatabaseConfig> {
+  const instanceId = resolvePaperclipInstanceId();
+  const defaultEmbeddedDir = resolveDefaultEmbeddedPostgresDir(instanceId);
+  const defaultBackupDir = resolveDefaultBackupDir(instanceId);
+  const base: DatabaseConfig = current ?? {
+    mode: "embedded-postgres",
+    embeddedPostgresDataDir: defaultEmbeddedDir,
+    embeddedPostgresPort: 54329,
+    backup: {
+      enabled: true,
+      intervalMinutes: 60,
+      retentionDays: 30,
+      dir: defaultBackupDir,
+    },
+  };

  const mode = await p.select({
    message: "Database mode",
@@ -11,6 +28,7 @@ export async function promptDatabase(): Promise<DatabaseConfig> {
      { value: "embedded-postgres" as const, label: "Embedded PostgreSQL (managed locally)", hint: "recommended" },
      { value: "postgres" as const, label: "PostgreSQL (external server)" },
    ],
+    initialValue: base.mode,
  });

  if (p.isCancel(mode)) {
@@ -18,9 +36,14 @@ export async function promptDatabase(): Promise<DatabaseConfig> {
    process.exit(0);
  }

+  let connectionString: string | undefined = base.connectionString;
+  let embeddedPostgresDataDir = base.embeddedPostgresDataDir || defaultEmbeddedDir;
+  let embeddedPostgresPort = base.embeddedPostgresPort || 54329;
+
  if (mode === "postgres") {
-    const connectionString = await p.text({
+    const value = await p.text({
      message: "PostgreSQL connection string",
+      defaultValue: base.connectionString ?? "",
      placeholder: "postgres://user:pass@localhost:5432/paperclip",
      validate: (val) => {
        if (!val) return "Connection string is required for PostgreSQL mode";
@@ -28,48 +51,107 @@ export async function promptDatabase(): Promise<DatabaseConfig> {
      },
    });

-    if (p.isCancel(connectionString)) {
+    if (p.isCancel(value)) {
      p.cancel("Setup cancelled.");
      process.exit(0);
    }

-    return {
-      mode: "postgres",
-      connectionString,
-      embeddedPostgresDataDir: defaultEmbeddedDir,
-      embeddedPostgresPort: 54329,
-    };
+    connectionString = value;
+  } else {
+    const dataDir = await p.text({
+      message: "Embedded PostgreSQL data directory",
+      defaultValue: base.embeddedPostgresDataDir || defaultEmbeddedDir,
+      placeholder: defaultEmbeddedDir,
+    });
+
+    if (p.isCancel(dataDir)) {
+      p.cancel("Setup cancelled.");
+      process.exit(0);
+    }
+
+    embeddedPostgresDataDir = dataDir || defaultEmbeddedDir;
+
+    const portValue = await p.text({
+      message: "Embedded PostgreSQL port",
+      defaultValue: String(base.embeddedPostgresPort || 54329),
+      placeholder: "54329",
+      validate: (val) => {
+        const n = Number(val);
+        if (!Number.isInteger(n) || n < 1 || n > 65535) return "Port must be an integer between 1 and 65535";
+      },
+    });
+
+    if (p.isCancel(portValue)) {
+      p.cancel("Setup cancelled.");
+      process.exit(0);
+    }
+
+    embeddedPostgresPort = Number(portValue || "54329");
+    connectionString = undefined;
  }

-  const embeddedPostgresDataDir = await p.text({
-    message: "Embedded PostgreSQL data directory",
-    defaultValue: defaultEmbeddedDir,
-    placeholder: defaultEmbeddedDir,
+  const backupEnabled = await p.confirm({
+    message: "Enable automatic database backups?",
+    initialValue: base.backup.enabled,
  });
-
-  if (p.isCancel(embeddedPostgresDataDir)) {
+  if (p.isCancel(backupEnabled)) {
    p.cancel("Setup cancelled.");
    process.exit(0);
  }

-  const embeddedPostgresPort = await p.text({
-    message: "Embedded PostgreSQL port",
-    defaultValue: "54329",
-    placeholder: "54329",
+  const backupDirInput = await p.text({
+    message: "Backup directory",
+    defaultValue: base.backup.dir || defaultBackupDir,
+    placeholder: defaultBackupDir,
+    validate: (val) => (!val || val.trim().length === 0 ? "Backup directory is required" : undefined),
+  });
+  if (p.isCancel(backupDirInput)) {
+    p.cancel("Setup cancelled.");
+    process.exit(0);
+  }
+
+  const backupIntervalInput = await p.text({
+    message: "Backup interval (minutes)",
+    defaultValue: String(base.backup.intervalMinutes || 60),
+    placeholder: "60",
    validate: (val) => {
      const n = Number(val);
-      if (!Number.isInteger(n) || n < 1 || n > 65535) return "Port must be an integer between 1 and 65535";
+      if (!Number.isInteger(n) || n < 1) return "Interval must be a positive integer";
+      if (n > 10080) return "Interval must be 10080 minutes (7 days) or less";
+      return undefined;
    },
  });
+  if (p.isCancel(backupIntervalInput)) {
+    p.cancel("Setup cancelled.");
+    process.exit(0);
+  }

-  if (p.isCancel(embeddedPostgresPort)) {
+  const backupRetentionInput = await p.text({
+    message: "Backup retention (days)",
+    defaultValue: String(base.backup.retentionDays || 30),
+    placeholder: "30",
+    validate: (val) => {
+      const n = Number(val);
+      if (!Number.isInteger(n) || n < 1) return "Retention must be a positive integer";
+      if (n > 3650) return "Retention must be 3650 days or less";
+      return undefined;
+    },
+  });
+  if (p.isCancel(backupRetentionInput)) {
    p.cancel("Setup cancelled.");
    process.exit(0);
  }

  return {
-    mode: "embedded-postgres",
-    embeddedPostgresDataDir: embeddedPostgresDataDir || defaultEmbeddedDir,
-    embeddedPostgresPort: Number(embeddedPostgresPort || "54329"),
+    mode,
+    connectionString,
+    embeddedPostgresDataDir,
+    embeddedPostgresPort,
+    backup: {
+      enabled: backupEnabled,
+      intervalMinutes: Number(backupIntervalInput || "60"),
+      retentionDays: Number(backupRetentionInput || "30"),
+      dir: backupDirInput || defaultBackupDir,
+    },
  };
 }
--- a/cli/src/prompts/server.ts
+++ b/cli/src/prompts/server.ts
@@ -113,7 +113,7 @@ export async function promptServer(opts?: {
  }

  const port = Number(portStr) || 3100;
-  let auth: AuthConfig = { baseUrlMode: "auto" };
+  let auth: AuthConfig = { baseUrlMode: "auto", disableSignUp: false };
  if (deploymentMode === "authenticated" && exposure === "public") {
    const urlInput = await p.text({
      message: "Public base URL",
@@ -139,18 +139,26 @@ export async function promptServer(opts?: {
    }
    auth = {
      baseUrlMode: "explicit",
+      disableSignUp: false,
      publicBaseUrl: urlInput.trim().replace(/\/+$/, ""),
    };
  } else if (currentAuth?.baseUrlMode === "explicit" && currentAuth.publicBaseUrl) {
    auth = {
      baseUrlMode: "explicit",
+      disableSignUp: false,
      publicBaseUrl: currentAuth.publicBaseUrl,
    };
  }

  return {
-    server: { deploymentMode, exposure, host: hostStr.trim(), port, allowedHostnames, serveUi: true },
+    server: {
+      deploymentMode,
+      exposure,
+      host: hostStr.trim(),
+      port,
+      allowedHostnames,
+      serveUi: currentServer?.serveUi ?? true,
+    },
    auth,
  };
 }
-
--- a/cli/tsconfig.json
+++ b/cli/tsconfig.json
@@ -1,5 +1,5 @@
 {
-  "extends": "../tsconfig.json",
+  "extends": "../tsconfig.base.json",
  "compilerOptions": {
    "outDir": "dist",
    "rootDir": "src"
--- a/doc/CLI.md
+++ b/doc/CLI.md
@@ -116,6 +116,20 @@ pnpm paperclipai issue release <issue-id>
 ```sh
 pnpm paperclipai agent list --company-id <company-id>
 pnpm paperclipai agent get <agent-id>
+pnpm paperclipai agent local-cli <agent-id-or-shortname> --company-id <company-id>
+```
+
+`agent local-cli` is the quickest way to run local Claude/Codex manually as a Paperclip agent:
+
+- creates a new long-lived agent API key
+- installs missing Paperclip skills into `~/.codex/skills` and `~/.claude/skills`
+- prints `export ...` lines for `PAPERCLIP_API_URL`, `PAPERCLIP_COMPANY_ID`, `PAPERCLIP_AGENT_ID`, and `PAPERCLIP_API_KEY`
+
+Example for shortname-based local setup:
+
+```sh
+pnpm paperclipai agent local-cli codexcoder --company-id <company-id>
+pnpm paperclipai agent local-cli claudecoder --company-id <company-id>
 ```

 ## Approval Commands
--- a/doc/DATABASE.md
+++ b/doc/DATABASE.md
@@ -19,6 +19,14 @@ That's it. On first start the server:

 Data persists across restarts in `~/.paperclip/instances/default/db/`. To reset local dev data, delete that directory.

+If you need to apply pending migrations manually, run:
+
+```sh
+pnpm db:migrate
+```
+
+When `DATABASE_URL` is unset, this command targets the current embedded PostgreSQL instance for your active Paperclip config/instance.
+
 This mode is ideal for local development and one-command installs.

 Docker note: the Docker quickstart image also uses embedded PostgreSQL by default. Persist `/paperclip` to keep DB state across container restarts (see `doc/DOCKER.md`).
--- a/doc/DEVELOPING.md
+++ b/doc/DEVELOPING.md
@@ -15,6 +15,14 @@ Current implementation status:
 - Node.js 20+
 - pnpm 9+

+## Dependency Lockfile Policy
+
+GitHub Actions owns `pnpm-lock.yaml`.
+
+- Do not commit `pnpm-lock.yaml` in pull requests.
+- Pull request CI validates dependency resolution when manifests change.
+- Pushes to `master` regenerate `pnpm-lock.yaml` with `pnpm install --lockfile-only --no-frozen-lockfile`, commit it back if needed, and then run verification with `--frozen-lockfile`.
+
 ## Start Dev

 From repo root:
@@ -29,6 +37,8 @@ This starts:
 - API server: `http://localhost:3100`
 - UI: served by the API server in dev middleware mode (same origin as API)

+`pnpm dev` runs the server in watch mode and restarts on changes from workspace packages (including adapter packages). Use `pnpm dev:once` to run without file watching.
+
 Tailscale/private-auth dev mode:

 ```sh
@@ -114,6 +124,119 @@ When a local agent run has no resolved project/session workspace, Paperclip fall

 This path honors `PAPERCLIP_HOME` and `PAPERCLIP_INSTANCE_ID` in non-default setups.

+## Worktree-local Instances
+
+When developing from multiple git worktrees, do not point two Paperclip servers at the same embedded PostgreSQL data directory.
+
+Instead, create a repo-local Paperclip config plus an isolated instance for the worktree:
+
+```sh
+paperclipai worktree init
+# or create the git worktree and initialize it in one step:
+pnpm paperclipai worktree:make paperclip-pr-432
+```
+
+This command:
+
+- writes repo-local files at `.paperclip/config.json` and `.paperclip/.env`
+- creates an isolated instance under `~/.paperclip-worktrees/instances/<worktree-id>/`
+- when run inside a linked git worktree, mirrors the effective git hooks into that worktree's private git dir
+- picks a free app port and embedded PostgreSQL port
+- by default seeds the isolated DB in `minimal` mode from the current effective Paperclip instance/config (repo-local worktree config when present, otherwise the default instance) via a logical SQL snapshot
+
+Seed modes:
+
+- `minimal` keeps core app state like companies, projects, issues, comments, approvals, and auth state, preserves schema for all tables, but omits row data from heavy operational history such as heartbeat runs, wake requests, activity logs, runtime services, and agent session state
+- `full` makes a full logical clone of the source instance
+- `--no-seed` creates an empty isolated instance
+
+After `worktree init`, both the server and the CLI auto-load the repo-local `.paperclip/.env` when run inside that worktree, so normal commands like `pnpm dev`, `paperclipai doctor`, and `paperclipai db:backup` stay scoped to the worktree instance.
+
+That repo-local env also sets:
+
+- `PAPERCLIP_IN_WORKTREE=true`
+- `PAPERCLIP_WORKTREE_NAME=<worktree-name>`
+- `PAPERCLIP_WORKTREE_COLOR=<hex-color>`
+
+The server/UI use those values for worktree-specific branding such as the top banner and dynamically colored favicon.
+
+Print shell exports explicitly when needed:
+
+```sh
+paperclipai worktree env
+# or:
+eval "$(paperclipai worktree env)"
+```
+
+### Worktree CLI Reference
+
+**`pnpm paperclipai worktree init [options]`** — Create repo-local config/env and an isolated instance for the current worktree.
+
+| Option | Description |
+|---|---|
+| `--name <name>` | Display name used to derive the instance id |
+| `--instance <id>` | Explicit isolated instance id |
+| `--home <path>` | Home root for worktree instances (default: `~/.paperclip-worktrees`) |
+| `--from-config <path>` | Source config.json to seed from |
+| `--from-data-dir <path>` | Source PAPERCLIP_HOME used when deriving the source config |
+| `--from-instance <id>` | Source instance id (default: `default`) |
+| `--server-port <port>` | Preferred server port |
+| `--db-port <port>` | Preferred embedded Postgres port |
+| `--seed-mode <mode>` | Seed profile: `minimal` or `full` (default: `minimal`) |
+| `--no-seed` | Skip database seeding from the source instance |
+| `--force` | Replace existing repo-local config and isolated instance data |
+
+Examples:
+
+```sh
+paperclipai worktree init --no-seed
+paperclipai worktree init --seed-mode full
+paperclipai worktree init --from-instance default
+paperclipai worktree init --from-data-dir ~/.paperclip
+paperclipai worktree init --force
+```
+
+**`pnpm paperclipai worktree:make <name> [options]`** — Create `~/NAME` as a git worktree, then initialize an isolated Paperclip instance inside it. This combines `git worktree add` with `worktree init` in a single step.
+
+| Option | Description |
+|---|---|
+| `--start-point <ref>` | Remote ref to base the new branch on (e.g. `origin/main`) |
+| `--instance <id>` | Explicit isolated instance id |
+| `--home <path>` | Home root for worktree instances (default: `~/.paperclip-worktrees`) |
+| `--from-config <path>` | Source config.json to seed from |
+| `--from-data-dir <path>` | Source PAPERCLIP_HOME used when deriving the source config |
+| `--from-instance <id>` | Source instance id (default: `default`) |
+| `--server-port <port>` | Preferred server port |
+| `--db-port <port>` | Preferred embedded Postgres port |
+| `--seed-mode <mode>` | Seed profile: `minimal` or `full` (default: `minimal`) |
+| `--no-seed` | Skip database seeding from the source instance |
+| `--force` | Replace existing repo-local config and isolated instance data |
+
+Examples:
+
+```sh
+pnpm paperclipai worktree:make paperclip-pr-432
+pnpm paperclipai worktree:make my-feature --start-point origin/main
+pnpm paperclipai worktree:make experiment --no-seed
+```
+
+**`pnpm paperclipai worktree env [options]`** — Print shell exports for the current worktree-local Paperclip instance.
+
+| Option | Description |
+|---|---|
+| `-c, --config <path>` | Path to config file |
+| `--json` | Print JSON instead of shell exports |
+
+Examples:
+
+```sh
+pnpm paperclipai worktree env
+pnpm paperclipai worktree env --json
+eval "$(pnpm paperclipai worktree env)"
+```
+
+For project execution worktrees, Paperclip can also run a project-defined provision command after it creates or reuses an isolated git worktree. Configure this on the project's execution workspace policy (`workspaceStrategy.provisionCommand`). The command runs inside the derived worktree and receives `PAPERCLIP_WORKSPACE_*`, `PAPERCLIP_PROJECT_ID`, `PAPERCLIP_AGENT_ID`, and `PAPERCLIP_ISSUE_*` environment variables so each repo can bootstrap itself however it wants.
+
 ## Quick Health Checks

 In another terminal:
@@ -141,6 +264,36 @@ pnpm dev

 If you set `DATABASE_URL`, the server will use that instead of embedded PostgreSQL.

+## Automatic DB Backups
+
+Paperclip can run automatic DB backups on a timer. Defaults:
+
+- enabled
+- every 60 minutes
+- retain 30 days
+- backup dir: `~/.paperclip/instances/default/data/backups`
+
+Configure these in:
+
+```sh
+pnpm paperclipai configure --section database
+```
+
+Run a one-off backup manually:
+
+```sh
+pnpm paperclipai db:backup
+# or:
+pnpm db:backup
+```
+
+Environment overrides:
+
+- `PAPERCLIP_DB_BACKUP_ENABLED=true|false`
+- `PAPERCLIP_DB_BACKUP_INTERVAL_MINUTES=<minutes>`
+- `PAPERCLIP_DB_BACKUP_RETENTION_DAYS=<days>`
+- `PAPERCLIP_DB_BACKUP_DIR=/absolute/or/~/path`
+
 ## Secrets in Dev

 Agent env vars now support secret references. By default, secret values are stored with local encryption and only secret refs are persisted in agent config.
@@ -216,5 +369,61 @@ Agent-oriented invite onboarding now exposes machine-readable API docs:

 - `GET /api/invites/:token` returns invite summary plus onboarding and skills index links.
 - `GET /api/invites/:token/onboarding` returns onboarding manifest details (registration endpoint, claim endpoint template, skill install hints).
+- `GET /api/invites/:token/onboarding.txt` returns a plain-text onboarding doc intended for both human operators and agents (llm.txt-style handoff), including optional inviter message and suggested network host candidates.
 - `GET /api/skills/index` lists available skill documents.
 - `GET /api/skills/paperclip` returns the Paperclip heartbeat skill markdown.
+
+## OpenClaw Join Smoke Test
+
+Run the end-to-end OpenClaw join smoke harness:
+
+```sh
+pnpm smoke:openclaw-join
+```
+
+What it validates:
+
+- invite creation for agent-only join
+- agent join request using `adapterType=openclaw`
+- board approval + one-time API key claim semantics
+- callback delivery on wakeup to a dockerized OpenClaw-style webhook receiver
+
+Required permissions:
+
+- This script performs board-governed actions (create invite, approve join, wakeup another agent).
+- In authenticated mode, run with board auth via `PAPERCLIP_AUTH_HEADER` or `PAPERCLIP_COOKIE`.
+
+Optional auth flags (for authenticated mode):
+
+- `PAPERCLIP_AUTH_HEADER` (for example `Bearer ...`)
+- `PAPERCLIP_COOKIE` (session cookie header value)
+
+## OpenClaw Docker UI One-Command Script
+
+To boot OpenClaw in Docker and print a host-browser dashboard URL in one command:
+
+```sh
+pnpm smoke:openclaw-docker-ui
+```
+
+This script lives at `scripts/smoke/openclaw-docker-ui.sh` and automates clone/build/config/start for Compose-based local OpenClaw UI testing.
+
+Pairing behavior for this smoke script:
+
+- default `OPENCLAW_DISABLE_DEVICE_AUTH=1` (no Control UI pairing prompt for local smoke; no extra pairing env vars required)
+- set `OPENCLAW_DISABLE_DEVICE_AUTH=0` to require standard device pairing
+
+Model behavior for this smoke script:
+
+- defaults to OpenAI models (`openai/gpt-5.2` + OpenAI fallback) so it does not require Anthropic auth by default
+
+State behavior for this smoke script:
+
+- defaults to isolated config dir `~/.openclaw-paperclip-smoke`
+- resets smoke agent state each run by default (`OPENCLAW_RESET_STATE=1`) to avoid stale provider/auth drift
+
+Networking behavior for this smoke script:
+
+- auto-detects and prints a Paperclip host URL reachable from inside OpenClaw Docker
+- default container-side host alias is `host.docker.internal` (override with `PAPERCLIP_HOST_FROM_CONTAINER` / `PAPERCLIP_HOST_PORT`)
+- if Paperclip rejects container hostnames in authenticated/private mode, allow `host.docker.internal` via `pnpm paperclipai allowed-hostname host.docker.internal` and restart Paperclip
--- a/doc/DOCKER.md
+++ b/doc/DOCKER.md
@@ -42,6 +42,32 @@ Optional overrides:
 PAPERCLIP_PORT=3200 PAPERCLIP_DATA_DIR=./data/pc docker compose -f docker-compose.quickstart.yml up --build
 ```

+If you change host port or use a non-local domain, set `PAPERCLIP_PUBLIC_URL` to the external URL you will use in browser/auth flows.
+
+## Authenticated Compose (Single Public URL)
+
+For authenticated deployments, set one canonical public URL and let Paperclip derive auth/callback defaults:
+
+```yaml
+services:
+  paperclip:
+    environment:
+      PAPERCLIP_DEPLOYMENT_MODE: authenticated
+      PAPERCLIP_DEPLOYMENT_EXPOSURE: private
+      PAPERCLIP_PUBLIC_URL: https://desk.koker.net
+```
+
+`PAPERCLIP_PUBLIC_URL` is used as the primary source for:
+
+- auth public base URL
+- Better Auth base URL defaults
+- bootstrap invite URL defaults
+- hostname allowlist defaults (hostname extracted from URL)
+
+Granular overrides remain available if needed (`PAPERCLIP_AUTH_PUBLIC_BASE_URL`, `BETTER_AUTH_URL`, `BETTER_AUTH_TRUSTED_ORIGINS`, `PAPERCLIP_ALLOWED_HOSTNAMES`).
+
+Set `PAPERCLIP_ALLOWED_HOSTNAMES` explicitly only when you need additional hostnames beyond the public URL host (for example Tailscale/LAN aliases or multiple private hostnames).
+
 ## Claude + Codex Local Adapters in Docker

 The image pre-installs:
@@ -66,3 +92,37 @@ Notes:

 - Without API keys, the app still runs normally.
 - Adapter environment checks in Paperclip will surface missing auth/CLI prerequisites.
+
+## Onboard Smoke Test (Ubuntu + npm only)
+
+Use this when you want to mimic a fresh machine that only has Ubuntu + npm and verify:
+
+- `npx paperclipai onboard --yes` completes
+- the server binds to `0.0.0.0:3100` so host access works
+- onboard/run banners and startup logs are visible in your terminal
+
+Build + run:
+
+```sh
+./scripts/docker-onboard-smoke.sh
+```
+
+Open: `http://localhost:3131` (default smoke host port)
+
+Useful overrides:
+
+```sh
+HOST_PORT=3200 PAPERCLIPAI_VERSION=latest ./scripts/docker-onboard-smoke.sh
+PAPERCLIP_DEPLOYMENT_MODE=authenticated PAPERCLIP_DEPLOYMENT_EXPOSURE=private ./scripts/docker-onboard-smoke.sh
+```
+
+Notes:
+
+- Persistent data is mounted at `./data/docker-onboard-smoke` by default.
+- Container runtime user id defaults to your local `id -u` so the mounted data dir stays writable while avoiding root runtime.
+- Smoke script defaults to `authenticated/private` mode so `HOST=0.0.0.0` can be exposed to the host.
+- Smoke script defaults host port to `3131` to avoid conflicts with local Paperclip on `3100`.
+- Smoke script also defaults `PAPERCLIP_PUBLIC_URL` to `http://localhost:<HOST_PORT>` so bootstrap invite URLs and auth callbacks use the reachable host port instead of the container's internal `3100`.
+- In authenticated mode, the smoke script defaults `SMOKE_AUTO_BOOTSTRAP=true` and drives the real bootstrap path automatically: it signs up a real user, runs `paperclipai auth bootstrap-ceo` inside the container to mint a real bootstrap invite, accepts that invite over HTTP, and verifies board session access.
+- Run the script in the foreground to watch the onboarding flow; stop with `Ctrl+C` after validation.
+- The image definition is in `Dockerfile.onboard-smoke`.
--- a/doc/OPENCLAW_ONBOARDING.md
+++ b/doc/OPENCLAW_ONBOARDING.md
@@ -0,0 +1,94 @@
+Use this exact checklist.
+
+1. Start Paperclip in auth mode.
+```bash
+cd <paperclip-repo-root>
+pnpm dev --tailscale-auth
+```
+Then verify:
+```bash
+curl -sS http://127.0.0.1:3100/api/health | jq
+```
+
+2. Start a clean/stock OpenClaw Docker.
+```bash
+OPENCLAW_RESET_STATE=1 OPENCLAW_BUILD=1 ./scripts/smoke/openclaw-docker-ui.sh
+```
+Open the printed `Dashboard URL` (includes `#token=...`) in your browser.
+
+3. In Paperclip UI, go to `http://127.0.0.1:3100/CLA/company/settings`.
+
+4. Use the OpenClaw invite prompt flow.
+- In the Invites section, click `Generate OpenClaw Invite Prompt`.
+- Copy the generated prompt from `OpenClaw Invite Prompt`.
+- Paste it into OpenClaw main chat as one message.
+- If it stalls, send one follow-up: `How is onboarding going? Continue setup now.`
+
+Security/control note:
+- The OpenClaw invite prompt is created from a controlled endpoint:
+  - `POST /api/companies/{companyId}/openclaw/invite-prompt`
+  - board users with invite permission can call it
+  - agent callers are limited to the company CEO agent
+
+5. Approve the join request in Paperclip UI, then confirm the OpenClaw agent appears in CLA agents.
+
+6. Gateway preflight (required before task tests).
+- Confirm the created agent uses `openclaw_gateway` (not `openclaw`).
+- Confirm gateway URL is `ws://...` or `wss://...`.
+- Confirm gateway token is non-trivial (not empty / not 1-char placeholder).
+- The OpenClaw Gateway adapter UI should not expose `disableDeviceAuth` for normal onboarding.
+- Confirm pairing mode is explicit:
+  - required default: device auth enabled (`adapterConfig.disableDeviceAuth` false/absent) with persisted `adapterConfig.devicePrivateKeyPem`
+  - do not rely on `disableDeviceAuth` for normal onboarding
+- If you can run API checks with board auth:
+```bash
+AGENT_ID="<newly-created-agent-id>"
+curl -sS -H "Cookie: $PAPERCLIP_COOKIE" "http://127.0.0.1:3100/api/agents/$AGENT_ID" | jq '{adapterType,adapterConfig:{url:.adapterConfig.url,tokenLen:(.adapterConfig.headers["x-openclaw-token"] // .adapterConfig.headers["x-openclaw-auth"] // "" | length),disableDeviceAuth:(.adapterConfig.disableDeviceAuth // false),hasDeviceKey:(.adapterConfig.devicePrivateKeyPem // "" | length > 0)}}'
+```
+- Expected: `adapterType=openclaw_gateway`, `tokenLen >= 16`, `hasDeviceKey=true`, and `disableDeviceAuth=false`.
+
+Pairing handshake note:
+- Clean run expectation: first task should succeed without manual pairing commands.
+- The adapter attempts one automatic pairing approval + retry on first `pairing required` (when shared gateway auth token/password is valid).
+- If auto-pair cannot complete (for example token mismatch or no pending request), the first gateway run may still return `pairing required`.
+- This is a separate approval from Paperclip invite approval. You must approve the pending device in OpenClaw itself.
+- Approve it in OpenClaw, then retry the task.
+- For local docker smoke, you can approve from host:
+```bash
+docker exec openclaw-docker-openclaw-gateway-1 sh -lc 'openclaw devices approve --latest --json --url "ws://127.0.0.1:18789" --token "$(node -p \"require(process.env.HOME+\\\"/.openclaw/openclaw.json\\\").gateway.auth.token\")"'
+```
+- You can inspect pending vs paired devices:
+```bash
+docker exec openclaw-docker-openclaw-gateway-1 sh -lc 'TOK="$(node -e \"const fs=require(\\\"fs\\\");const c=JSON.parse(fs.readFileSync(\\\"/home/node/.openclaw/openclaw.json\\\",\\\"utf8\\\"));process.stdout.write(c.gateway?.auth?.token||\\\"\\\");\")\"; openclaw devices list --json --url \"ws://127.0.0.1:18789\" --token \"$TOK\"'
+```
+
+7. Case A (manual issue test).
+- Create an issue assigned to the OpenClaw agent.
+- Put instructions: “post comment `OPENCLAW_CASE_A_OK_<timestamp>` and mark done.”
+- Verify in UI: issue status becomes `done` and comment exists.
+
+8. Case B (message tool test).
+- Create another issue assigned to OpenClaw.
+- Instructions: “send `OPENCLAW_CASE_B_OK_<timestamp>` to main webchat via message tool, then comment same marker on issue, then mark done.”
+- Verify both:
+  - marker comment on issue
+  - marker text appears in OpenClaw main chat
+
+9. Case C (new session memory/skills test).
+- In OpenClaw, start `/new` session.
+- Ask it to create a new CLA issue in Paperclip with unique title `OPENCLAW_CASE_C_CREATED_<timestamp>`.
+- Verify in Paperclip UI that new issue exists.
+
+10. Watch logs during test (optional but helpful):
+```bash
+docker compose -f /tmp/openclaw-docker/docker-compose.yml -f /tmp/openclaw-docker/.paperclip-openclaw.override.yml logs -f openclaw-gateway
+```
+
+11. Expected pass criteria.
+- Preflight: `openclaw_gateway` + non-placeholder token (`tokenLen >= 16`).
+- Pairing mode: stable `devicePrivateKeyPem` configured with device auth enabled (default path).
+- Case A: `done` + marker comment.
+- Case B: `done` + marker comment + main-chat message visible.
+- Case C: original task done and new issue created from `/new` session.
+
+If you want, I can also give you a single “observer mode” command that runs the stock smoke harness while you watch the same steps live in UI.
--- a/doc/PRODUCT.md
+++ b/doc/PRODUCT.md
@@ -94,3 +94,53 @@ Canonical mode design and command expectations live in `doc/DEPLOYMENT-MODES.md`
 ## Further Detail

 See [SPEC.md](./SPEC.md) for the full technical specification and [TASKS.md](./TASKS.md) for the task management data model.
+
+---
+
+Paperclip’s core identity is a **control plane for autonomous AI companies**, centered on **companies, org charts, goals, issues/comments, heartbeats, budgets, approvals, and board governance**. The public docs are also explicit about the current boundaries: **tasks/comments are the built-in communication model**, Paperclip is **not a chatbot**, and it is **not a code review tool**. The roadmap already points toward **easier onboarding, cloud agents, easier agent configuration, plugins, better docs, and ClipMart/ClipHub-style reusable companies/templates**.
+
+## What Paperclip should do vs. not do
+
+**Do**
+
+- Stay **board-level and company-level**. Users should manage goals, orgs, budgets, approvals, and outputs.
+- Make the first five minutes feel magical: install, answer a few questions, see a CEO do something real.
+- Keep work anchored to **issues/comments/projects/goals**, even if the surface feels conversational.
+- Treat **agency / internal team / startup** as the same underlying abstraction with different templates and labels.
+- Make outputs first-class: files, docs, reports, previews, links, screenshots.
+- Provide **hooks into engineering workflows**: worktrees, preview servers, PR links, external review tools.
+- Use **plugins** for edge cases like rich chat, knowledge bases, doc editors, custom tracing.
+
+**Do not**
+
+- Do not make the core product a general chat app. The current product definition is explicitly task/comment-centric and “not a chatbot,” and that boundary is valuable.
+- Do not build a complete Jira/GitHub replacement. The repo/docs already position Paperclip as organization orchestration, not focused on pull-request review.
+- Do not build enterprise-grade RBAC first. The current V1 spec still treats multi-board governance and fine-grained human permissions as out of scope, so the first multi-user version should be coarse and company-scoped.
+- Do not lead with raw bash logs and transcripts. Default view should be human-readable intent/progress, with raw detail beneath.
+- Do not force users to understand provider/API-key plumbing unless absolutely necessary. There are active onboarding/auth issues already; friction here is clearly real.
+
+## Specific design goals
+
+1. **Time-to-first-success under 5 minutes**
+   A fresh user should go from install to “my CEO completed a first task” in one sitting.
+
+2. **Board-level abstraction always wins**
+   The default UI should answer: what is the company doing, who is doing it, why does it matter, what did it cost, and what needs my approval.
+
+3. **Conversation stays attached to work objects**
+   “Chat with CEO” should still resolve to strategy threads, decisions, tasks, or approvals.
+
+4. **Progressive disclosure**
+   Top layer: human-readable summary. Middle layer: checklist/steps/artifacts. Bottom layer: raw logs/tool calls/transcript.
+
+5. **Output-first**
+   Work is not done until the user can see the result: file, document, preview link, screenshot, plan, or PR.
+
+6. **Local-first, cloud-ready**
+   The mental model should not change between local solo use and shared/private or public/cloud deployment.
+
+7. **Safe autonomy**
+   Auto mode is allowed; hidden token burn is not.
+
+8. **Thin core, rich edges**
+   Put optional chat, knowledge, and special surfaces into plugins/extensions rather than bloating the control plane.
--- a/doc/PUBLISHING.md
+++ b/doc/PUBLISHING.md
@@ -1,196 +1,121 @@
 # Publishing to npm

-This document covers how to build and publish the `paperclipai` CLI package to npm.
+Low-level reference for how Paperclip packages are built for npm.

-## Prerequisites
+For the maintainer release workflow, use [doc/RELEASING.md](RELEASING.md). This document is only about packaging internals and the scripts that produce publishable artifacts.

- Node.js 20+
- pnpm 9.15+
- An npm account with publish access to the `paperclipai` package
- Logged in to npm: `npm login`
+## Current Release Entry Points

-## One-Command Publish
+Use these scripts instead of older one-off publish commands:

-The fastest way to publish — bumps version, builds, publishes, restores, commits, and tags in one shot:
+- [`scripts/release-start.sh`](../scripts/release-start.sh) to create or resume `release/X.Y.Z`
+- [`scripts/release-preflight.sh`](../scripts/release-preflight.sh) before any canary or stable release
+- [`scripts/release.sh`](../scripts/release.sh) for canary and stable npm publishes
+- [`scripts/rollback-latest.sh`](../scripts/rollback-latest.sh) to repoint `latest` during rollback
+- [`scripts/create-github-release.sh`](../scripts/create-github-release.sh) after pushing the stable branch tag

-```bash
-./scripts/bump-and-publish.sh patch          # 0.1.1 → 0.1.2
-./scripts/bump-and-publish.sh minor          # 0.1.1 → 0.2.0
-./scripts/bump-and-publish.sh major          # 0.1.1 → 1.0.0
-./scripts/bump-and-publish.sh 2.0.0          # set explicit version
-./scripts/bump-and-publish.sh patch --dry-run # everything except npm publish
-```
+## Why the CLI needs special packaging

-The script runs all 6 steps below in order. It requires a clean working tree and an active `npm login` session (unless `--dry-run`). After it finishes, push:
+The CLI package, `paperclipai`, imports code from workspace packages such as:

-```bash
-git push && git push origin v<version>
-```
+- `@paperclipai/server`
+- `@paperclipai/db`
+- `@paperclipai/shared`
+- adapter packages under `packages/adapters/`

-## Manual Step-by-Step
+Those workspace references use `workspace:*` during development. npm cannot install those references directly for end users, so the release build has to transform the CLI into a publishable standalone package.

-If you prefer to run each step individually:
+## `build-npm.sh`

-### Quick Reference
-
-```bash
-# Bump version
-./scripts/version-bump.sh patch      # 0.1.0 → 0.1.1
-
-# Build
-./scripts/build-npm.sh
-
-# Preview what will be published
-cd cli && npm pack --dry-run
-
-# Publish
-cd cli && npm publish --access public
-
-# Restore dev package.json
-mv cli/package.dev.json cli/package.json
-```
-
-## Step-by-Step
-
-### 1. Bump the version
-
-```bash
-./scripts/version-bump.sh <patch|minor|major|X.Y.Z>
-```
-
-This updates the version in two places:
-
- `cli/package.json` — the source of truth
- `cli/src/index.ts` — the Commander `.version()` call
-
-Examples:
-
-```bash
-./scripts/version-bump.sh patch    # 0.1.0 → 0.1.1
-./scripts/version-bump.sh minor    # 0.1.0 → 0.2.0
-./scripts/version-bump.sh major    # 0.1.0 → 1.0.0
-./scripts/version-bump.sh 1.2.3   # set explicit version
-```
-
-### 2. Build
+Run:

 ```bash
 ./scripts/build-npm.sh
 ```

-The build script runs five steps:
+This script does six things:

-1. **Forbidden token check** — scans tracked files for tokens listed in `.git/hooks/forbidden-tokens.txt`. If the file is missing (e.g. on a contributor's machine), the check passes silently. The script never prints which tokens it's searching for.
-2. **TypeScript type-check** — runs `pnpm -r typecheck` across all workspace packages.
-3. **esbuild bundle** — bundles the CLI entry point (`cli/src/index.ts`) and all workspace package code (`@paperclipai/*`) into a single file at `cli/dist/index.js`. External npm dependencies (express, postgres, etc.) are kept as regular imports.
-4. **Generate publishable package.json** — replaces `cli/package.json` with a version that has real npm dependency ranges instead of `workspace:*` references (see [package.dev.json](#packagedevjson) below).
-5. **Summary** — prints the bundle size and next steps.
+1. Runs the forbidden token check unless `--skip-checks` is supplied
+2. Runs `pnpm -r typecheck`
+3. Bundles the CLI entrypoint with esbuild into `cli/dist/index.js`
+4. Verifies the bundled entrypoint with `node --check`
+5. Rewrites `cli/package.json` into a publishable npm manifest and stores the dev copy as `cli/package.dev.json`
+6. Copies the repo `README.md` into `cli/README.md` for npm package metadata

-To skip the forbidden token check (e.g. in CI without the token list):
+`build-npm.sh` is used by the release script so that npm users install a real package rather than unresolved workspace dependencies.
+
+## Publishable CLI layout
+
+During development, [`cli/package.json`](../cli/package.json) contains workspace references.
+
+During release preparation:
+
+- `cli/package.json` becomes a publishable manifest with external npm dependency ranges
+- `cli/package.dev.json` stores the development manifest temporarily
+- `cli/dist/index.js` contains the bundled CLI entrypoint
+- `cli/README.md` is copied in for npm metadata
+
+After release finalization, the release script restores the development manifest and removes the temporary README copy.
+
+## Package discovery
+
+The release tooling scans the workspace for public packages under:
+
+- `packages/`
+- `server/`
+- `cli/`
+
+`ui/` remains ignored for npm publishing because it is private.
+
+This matters because all public packages are versioned and published together as one release unit.
+
+## Canary packaging model
+
+Canaries are published as semver prereleases such as:
+
+- `1.2.3-canary.0`
+- `1.2.3-canary.1`
+
+They are published under the npm dist-tag `canary`.
+
+This means:
+
+- `npx paperclipai@canary onboard` can install them explicitly
+- `npx paperclipai onboard` continues to resolve `latest`
+- the stable changelog can stay at `releases/v1.2.3.md`
+
+## Stable packaging model
+
+Stable releases publish normal semver versions such as `1.2.3` under the npm dist-tag `latest`.
+
+The stable publish flow also creates the local release commit and git tag on `release/X.Y.Z`. Pushing that branch commit/tag, creating the GitHub Release, and merging the release branch back to `master` happen afterward as separate maintainer steps.
+
+## Rollback model
+
+Rollback does not unpublish packages.
+
+Instead, the maintainer should move the `latest` dist-tag back to the previous good stable version with:

 ```bash
-./scripts/build-npm.sh --skip-checks
+./scripts/rollback-latest.sh <stable-version>
 ```

-### 3. Preview (optional)
+That keeps history intact while restoring the default install path quickly.

-See what npm will publish:
+## Notes for CI

-```bash
-cd cli && npm pack --dry-run
-```
+The repo includes a manual GitHub Actions release workflow at [`.github/workflows/release.yml`](../.github/workflows/release.yml).

-### 4. Publish
+Recommended CI release setup:

-```bash
-cd cli && npm publish --access public
-```
+- use npm trusted publishing via GitHub OIDC
+- require approval through the `npm-release` environment
+- run releases from `release/X.Y.Z`
+- use canary first, then stable

-### 5. Restore dev package.json
+## Related Files

-After publishing, restore the workspace-aware `package.json`:
-
-```bash
-mv cli/package.dev.json cli/package.json
-```
-
-### 6. Commit and tag
-
-```bash
-git add cli/package.json cli/src/index.ts
-git commit -m "chore: bump version to X.Y.Z"
-git tag vX.Y.Z
-```
-
-## package.dev.json
-
-During development, `cli/package.json` contains `workspace:*` references like:
-
-```json
-{
-  "dependencies": {
-    "@paperclipai/server": "workspace:*",
-    "@paperclipai/db": "workspace:*"
-  }
-}
-```
-
-These tell pnpm to resolve those packages from the local monorepo. This is great for development but **npm doesn't understand `workspace:*`** — publishing with these references would cause install failures for users.
-
-The build script solves this with a two-file swap:
-
-1. **Before building:** `cli/package.json` has `workspace:*` refs (the dev version).
-2. **During build (`build-npm.sh` step 4):**
-   - The dev `package.json` is copied to `package.dev.json` as a backup.
-   - `generate-npm-package-json.mjs` reads every workspace package's `package.json`, collects all their external npm dependencies, and writes a new `cli/package.json` with those real dependency ranges — no `workspace:*` refs.
-3. **After publishing:** you restore the dev version with `mv package.dev.json package.json`.
-
-The generated publishable `package.json` looks like:
-
-```json
-{
-  "name": "paperclipai",
-  "version": "0.1.0",
-  "bin": { "paperclipai": "./dist/index.js" },
-  "dependencies": {
-    "express": "^5.1.0",
-    "postgres": "^3.4.5",
-    "commander": "^13.1.0"
-  }
-}
-```
-
-`package.dev.json` is listed in `.gitignore` — it only exists temporarily on disk during the build/publish cycle.
-
-## How the bundle works
-
-The CLI is a monorepo package that imports code from `@paperclipai/server`, `@paperclipai/db`, `@paperclipai/shared`, and several adapter packages. These workspace packages don't exist on npm.
-
-**esbuild** bundles all workspace TypeScript code into a single `dist/index.js` file (~250kb). External npm packages (express, postgres, zod, etc.) are left as normal `import` statements — they get installed by npm when a user runs `npx paperclipai onboard`.
-
-The esbuild configuration lives at `cli/esbuild.config.mjs`. It automatically reads every workspace package's `package.json` to determine which dependencies are external (real npm packages) vs. internal (workspace code to bundle).
-
-## Forbidden token enforcement
-
-The build process includes the same forbidden-token check used by the git pre-commit hook. This catches any accidentally committed tokens before they reach npm.
-
- Token list: `.git/hooks/forbidden-tokens.txt` (one token per line, `#` comments supported)
- The file lives inside `.git/` and is never committed
- If the file is missing, the check passes — contributors without the list can still build
- The script never prints which tokens are being searched for
- Matches are printed so you know which files to fix, but not which token triggered it
-
-Run the check standalone:
-
-```bash
-pnpm check:tokens
-```
-
-## npm scripts reference
-
-| Script | Command | Description |
-|---|---|---|
-| `bump-and-publish` | `pnpm bump-and-publish <type>` | One-command bump + build + publish + commit + tag |
-| `build:npm` | `pnpm build:npm` | Full build (check + typecheck + bundle + package.json) |
-| `version:bump` | `pnpm version:bump <type>` | Bump CLI version |
-| `check:tokens` | `pnpm check:tokens` | Run forbidden token check only |
+- [`scripts/build-npm.sh`](../scripts/build-npm.sh)
+- [`scripts/generate-npm-package-json.mjs`](../scripts/generate-npm-package-json.mjs)
+- [`cli/esbuild.config.mjs`](../cli/esbuild.config.mjs)
+- [`doc/RELEASING.md`](RELEASING.md)
--- a/doc/RELEASING.md
+++ b/doc/RELEASING.md
@@ -0,0 +1,422 @@
+# Releasing Paperclip
+
+Maintainer runbook for shipping a full Paperclip release across npm, GitHub, and the website-facing changelog surface.
+
+The release model is branch-driven:
+
+1. Start a release train on `release/X.Y.Z`
+2. Draft the stable changelog on that branch
+3. Publish one or more canaries from that branch
+4. Publish stable from that same branch head
+5. Push the branch commit and tag
+6. Create the GitHub Release
+7. Merge `release/X.Y.Z` back to `master` without squash or rebase
+
+## Release Surfaces
+
+Every release has four separate surfaces:
+
+1. **Verification** — the exact git SHA passes typecheck, tests, and build
+2. **npm** — `paperclipai` and public workspace packages are published
+3. **GitHub** — the stable release gets a git tag and GitHub Release
+4. **Website / announcements** — the stable changelog is published externally and announced
+
+A release is done only when all four surfaces are handled.
+
+## Core Invariants
+
+- Canary and stable for `X.Y.Z` must come from the same `release/X.Y.Z` branch.
+- The release scripts must run from the matching `release/X.Y.Z` branch.
+- Once `vX.Y.Z` exists locally, on GitHub, or on npm, that release train is frozen.
+- Do not squash-merge or rebase-merge a release branch PR back to `master`.
+- The stable changelog is always `releases/vX.Y.Z.md`. Never create canary changelog files.
+
+The reason for the merge rule is simple: the tag must keep pointing at the exact published commit. Squash or rebase breaks that property.
+
+## TL;DR
+
+### 1. Start the release train
+
+Use this to compute the next version, create or resume the branch, create or resume a dedicated worktree, and push the branch to GitHub.
+
+```bash
+./scripts/release-start.sh patch
+```
+
+That script:
+
+- fetches the release remote and tags
+- computes the next stable version from the latest `v*` tag
+- creates or resumes `release/X.Y.Z`
+- creates or resumes a dedicated worktree
+- pushes the branch to the remote by default
+- refuses to reuse a frozen release train
+
+### 2. Draft the stable changelog
+
+From the release worktree:
+
+```bash
+VERSION=X.Y.Z
+claude --print --output-format stream-json --verbose --dangerously-skip-permissions --model claude-opus-4-6 "Use the release-changelog skill to draft or update releases/v${VERSION}.md for Paperclip. Read doc/RELEASING.md and .agents/skills/release-changelog/SKILL.md, then generate the stable changelog for v${VERSION} from commits since the last stable tag. Do not create a canary changelog."
+```
+
+### 3. Verify and publish a canary
+
+```bash
+./scripts/release-preflight.sh canary patch
+./scripts/release.sh patch --canary --dry-run
+./scripts/release.sh patch --canary
+PAPERCLIPAI_VERSION=canary ./scripts/docker-onboard-smoke.sh
+```
+
+Users install canaries with:
+
+```bash
+npx paperclipai@canary onboard
+```
+
+### 4. Publish stable
+
+```bash
+./scripts/release-preflight.sh stable patch
+./scripts/release.sh patch --dry-run
+./scripts/release.sh patch
+git push public-gh HEAD --follow-tags
+./scripts/create-github-release.sh X.Y.Z
+```
+
+Then open a PR from `release/X.Y.Z` to `master` and merge without squash or rebase.
+
+## Release Branches
+
+Paperclip uses one release branch per target stable version:
+
+- `release/0.3.0`
+- `release/0.3.1`
+- `release/1.0.0`
+
+Do not create separate per-canary branches like `canary/0.3.0-1`. A canary is just a prerelease snapshot of the same stable train.
+
+## Script Entry Points
+
+- [`scripts/release-start.sh`](../scripts/release-start.sh) — create or resume the release train branch/worktree
+- [`scripts/release-preflight.sh`](../scripts/release-preflight.sh) — validate branch, version plan, git/npm state, and verification gate
+- [`scripts/release.sh`](../scripts/release.sh) — publish canary or stable from the release branch
+- [`scripts/create-github-release.sh`](../scripts/create-github-release.sh) — create or update the GitHub Release after pushing the tag
+- [`scripts/rollback-latest.sh`](../scripts/rollback-latest.sh) — repoint `latest` to the last good stable version
+
+## Detailed Workflow
+
+### 1. Start or resume the release train
+
+Run:
+
+```bash
+./scripts/release-start.sh <patch|minor|major>
+```
+
+Useful options:
+
+```bash
+./scripts/release-start.sh patch --dry-run
+./scripts/release-start.sh minor --worktree-dir ../paperclip-release-0.4.0
+./scripts/release-start.sh patch --no-push
+```
+
+The script is intentionally idempotent:
+
+- if `release/X.Y.Z` already exists locally, it reuses it
+- if the branch already exists on the remote, it resumes it locally
+- if the branch is already checked out in another worktree, it points you there
+- if `vX.Y.Z` already exists locally, remotely, or on npm, it refuses to reuse that train
+
+### 2. Write the stable changelog early
+
+Create or update:
+
+- `releases/vX.Y.Z.md`
+
+That file is for the eventual stable release. It should not include `-canary` in the filename or heading.
+
+Recommended structure:
+
+- `Breaking Changes` when needed
+- `Highlights`
+- `Improvements`
+- `Fixes`
+- `Upgrade Guide` when needed
+- `Contributors` — @-mention every contributor by GitHub username (no emails)
+
+Package-level `CHANGELOG.md` files are generated as part of the release mechanics. They are not the main release narrative.
+
+### 3. Run release preflight
+
+From the `release/X.Y.Z` worktree:
+
+```bash
+./scripts/release-preflight.sh canary <patch|minor|major>
+# or
+./scripts/release-preflight.sh stable <patch|minor|major>
+```
+
+The preflight script now checks all of the following before it runs the verification gate:
+
+- the worktree is clean, including untracked files
+- the current branch matches the computed `release/X.Y.Z`
+- the release train is not frozen
+- the target version is still free on npm
+- the target tag does not already exist locally or remotely
+- whether the remote release branch already exists
+- whether `releases/vX.Y.Z.md` is present
+
+Then it runs:
+
+```bash
+pnpm -r typecheck
+pnpm test:run
+pnpm build
+```
+
+### 4. Publish one or more canaries
+
+Run:
+
+```bash
+./scripts/release.sh <patch|minor|major> --canary --dry-run
+./scripts/release.sh <patch|minor|major> --canary
+```
+
+Result:
+
+- npm gets a prerelease such as `1.2.3-canary.0` under dist-tag `canary`
+- `latest` is unchanged
+- no git tag is created
+- no GitHub Release is created
+- the worktree returns to clean after the script finishes
+
+Guardrails:
+
+- the script refuses to run from the wrong branch
+- the script refuses to publish from a frozen train
+- the canary is always derived from the next stable version
+- if the stable notes file is missing, the script warns before you forget it
+
+Concrete example:
+
+- if the latest stable is `0.2.7`, a patch canary targets `0.2.8-canary.0`
+- `0.2.7-canary.N` is invalid because `0.2.7` is already stable
+
+### 5. Smoke test the canary
+
+Run the actual install path in Docker:
+
+```bash
+PAPERCLIPAI_VERSION=canary ./scripts/docker-onboard-smoke.sh
+```
+
+Useful isolated variants:
+
+```bash
+HOST_PORT=3232 DATA_DIR=./data/release-smoke-canary PAPERCLIPAI_VERSION=canary ./scripts/docker-onboard-smoke.sh
+HOST_PORT=3233 DATA_DIR=./data/release-smoke-stable PAPERCLIPAI_VERSION=latest ./scripts/docker-onboard-smoke.sh
+```
+
+If you want to exercise onboarding from the current committed ref instead of npm, use:
+
+```bash
+./scripts/clean-onboard-ref.sh
+PAPERCLIP_PORT=3234 ./scripts/clean-onboard-ref.sh
+./scripts/clean-onboard-ref.sh HEAD
+```
+
+Minimum checks:
+
+- `npx paperclipai@canary onboard` installs
+- onboarding completes without crashes
+- the server boots
+- the UI loads
+- basic company creation and dashboard load work
+
+If smoke testing fails:
+
+1. stop the stable release
+2. fix the issue on the same `release/X.Y.Z` branch
+3. publish another canary
+4. rerun smoke testing
+
+### 6. Publish stable from the same release branch
+
+Once the branch head is vetted, run:
+
+```bash
+./scripts/release.sh <patch|minor|major> --dry-run
+./scripts/release.sh <patch|minor|major>
+```
+
+Stable publish:
+
+- publishes `X.Y.Z` to npm under `latest`
+- creates the local release commit
+- creates the local tag `vX.Y.Z`
+
+Stable publish refuses to proceed if:
+
+- the current branch is not `release/X.Y.Z`
+- the remote release branch does not exist yet
+- the stable notes file is missing
+- the target tag already exists locally or remotely
+- the stable version already exists on npm
+
+Those checks intentionally freeze the train after stable publish.
+
+### 7. Push the stable branch commit and tag
+
+After stable publish succeeds:
+
+```bash
+git push public-gh HEAD --follow-tags
+./scripts/create-github-release.sh X.Y.Z
+```
+
+The GitHub Release notes come from:
+
+- `releases/vX.Y.Z.md`
+
+### 8. Merge the release branch back to `master`
+
+Open a PR:
+
+- base: `master`
+- head: `release/X.Y.Z`
+
+Merge rule:
+
+- allowed: merge commit or fast-forward
+- forbidden: squash merge
+- forbidden: rebase merge
+
+Post-merge verification:
+
+```bash
+git fetch public-gh --tags
+git merge-base --is-ancestor "vX.Y.Z" "public-gh/master"
+```
+
+That command must succeed. If it fails, the published tagged commit is not reachable from `master`, which means the merge strategy was wrong.
+
+### 9. Finish the external surfaces
+
+After GitHub is correct:
+
+- publish the changelog on the website
+- write and send the announcement copy
+- ensure public docs and install guidance point to the stable version
+
+## GitHub Actions Release
+
+There is also a manual workflow at [`.github/workflows/release.yml`](../.github/workflows/release.yml).
+
+Use it from the Actions tab on the relevant `release/X.Y.Z` branch:
+
+1. Choose `Release`
+2. Choose `channel`: `canary` or `stable`
+3. Choose `bump`: `patch`, `minor`, or `major`
+4. Choose whether this is a `dry_run`
+5. Run it from the release branch, not from `master`
+
+The workflow:
+
+- reruns `typecheck`, `test:run`, and `build`
+- gates publish behind the `npm-release` environment
+- can publish canaries without touching `latest`
+- can publish stable, push the stable branch commit and tag, and create the GitHub Release
+
+It does not merge the release branch back to `master` for you.
+
+## Release Checklist
+
+### Before any publish
+
+- [ ] The release train exists on `release/X.Y.Z`
+- [ ] The working tree is clean, including untracked files
+- [ ] If package manifests changed, the CI-owned `pnpm-lock.yaml` refresh is already merged on `master` before the train is cut
+- [ ] The required verification gate passed on the exact branch head you want to publish
+- [ ] The bump type is correct for the user-visible impact
+- [ ] The stable changelog file exists or is ready at `releases/vX.Y.Z.md`
+- [ ] You know which previous stable version you would roll back to if needed
+
+### Before a stable
+
+- [ ] The candidate has already passed smoke testing
+- [ ] The remote `release/X.Y.Z` branch exists
+- [ ] You are ready to push the stable branch commit and tag immediately after npm publish
+- [ ] You are ready to create the GitHub Release immediately after the push
+- [ ] You are ready to open the PR back to `master`
+
+### After a stable
+
+- [ ] `npm view paperclipai@latest version` matches the new stable version
+- [ ] The git tag exists on GitHub
+- [ ] The GitHub Release exists and uses `releases/vX.Y.Z.md`
+- [ ] `vX.Y.Z` is reachable from `master`
+- [ ] The website changelog is updated
+- [ ] Announcement copy matches the stable release, not the canary
+
+## Failure Playbooks
+
+### If the canary publishes but the smoke test fails
+
+Do not publish stable.
+
+Instead:
+
+1. fix the issue on `release/X.Y.Z`
+2. publish another canary
+3. rerun smoke testing
+
+### If stable npm publish succeeds but push or GitHub release creation fails
+
+This is a partial release. npm is already live.
+
+Do this immediately:
+
+1. fix the git or GitHub issue from the same checkout
+2. push the stable branch commit and tag
+3. create the GitHub Release
+
+Do not republish the same version.
+
+### If `latest` is broken after stable publish
+
+Preview:
+
+```bash
+./scripts/rollback-latest.sh X.Y.Z --dry-run
+```
+
+Roll back:
+
+```bash
+./scripts/rollback-latest.sh X.Y.Z
+```
+
+This does not unpublish anything. It only moves the `latest` dist-tag back to the last good stable release.
+
+Then fix forward with a new patch release.
+
+### If the GitHub Release notes are wrong
+
+Re-run:
+
+```bash
+./scripts/create-github-release.sh X.Y.Z
+```
+
+If the release already exists, the script updates it.
+
+## Related Docs
+
+- [doc/PUBLISHING.md](PUBLISHING.md) — low-level npm build and packaging internals
+- [.agents/skills/release/SKILL.md](../.agents/skills/release/SKILL.md) — maintainer release coordination workflow
+- [.agents/skills/release-changelog/SKILL.md](../.agents/skills/release-changelog/SKILL.md) — stable changelog drafting workflow
--- a/doc/SPEC-implementation.md
+++ b/doc/SPEC-implementation.md
@@ -37,7 +37,7 @@ These decisions close open questions from `SPEC.md` for V1.
 | Visibility | Full visibility to board and all agents in same company |
 | Communication | Tasks + comments only (no separate chat system) |
 | Task ownership | Single assignee; atomic checkout required for `in_progress` transition |
-| Recovery | No automatic reassignment; stale work is surfaced, not silently fixed |
+| Recovery | No automatic reassignment; work recovery stays manual/explicit |
 | Agent adapters | Built-in `process` and `http` adapters |
 | Auth | Mode-dependent human auth (`local_trusted` implicit board in current code; authenticated mode uses sessions), API keys for agents |
 | Budget period | Monthly UTC calendar window |
@@ -106,7 +106,6 @@ A lightweight scheduler/worker in the server process handles:
 - heartbeat trigger checks
 - stuck run detection
 - budget threshold checks
- stale task reporting generation

 Separate queue infrastructure is not required for V1.

@@ -331,6 +330,34 @@ Operational policy:
  - `asset_id` uuid fk not null
  - `issue_comment_id` uuid fk null

+## 7.15 `documents` + `document_revisions` + `issue_documents`
+
+- `documents` stores editable text-first documents:
+  - `id` uuid pk
+  - `company_id` uuid fk not null
+  - `title` text null
+  - `format` text not null (`markdown`)
+  - `latest_body` text not null
+  - `latest_revision_id` uuid null
+  - `latest_revision_number` int not null
+  - `created_by_agent_id` uuid fk null
+  - `created_by_user_id` uuid/text fk null
+  - `updated_by_agent_id` uuid fk null
+  - `updated_by_user_id` uuid/text fk null
+- `document_revisions` stores append-only history:
+  - `id` uuid pk
+  - `company_id` uuid fk not null
+  - `document_id` uuid fk not null
+  - `revision_number` int not null
+  - `body` text not null
+  - `change_summary` text null
+- `issue_documents` links documents to issues with a stable workflow key:
+  - `id` uuid pk
+  - `company_id` uuid fk not null
+  - `issue_id` uuid fk not null
+  - `document_id` uuid fk not null
+  - `key` text not null (`plan`, `design`, `notes`, etc.)
+
 ## 8. State Machines

 ## 8.1 Agent Status
@@ -442,6 +469,11 @@ All endpoints are under `/api` and return JSON.
 - `POST /companies/:companyId/issues`
 - `GET /issues/:issueId`
 - `PATCH /issues/:issueId`
+- `GET /issues/:issueId/documents`
+- `GET /issues/:issueId/documents/:key`
+- `PUT /issues/:issueId/documents/:key`
+- `GET /issues/:issueId/documents/:key/revisions`
+- `DELETE /issues/:issueId/documents/:key`
 - `POST /issues/:issueId/checkout`
 - `POST /issues/:issueId/release`
 - `POST /issues/:issueId/comments`
@@ -502,7 +534,6 @@ Dashboard payload must include:
 - open/in-progress/blocked/done issue counts
 - month-to-date spend and budget utilization
 - pending approvals count
- stale task count

 ## 10.9 Error Semantics

@@ -681,7 +712,6 @@ Required UX behaviors:
 - global company selector
 - quick actions: pause/resume agent, create task, approve/reject request
 - conflict toasts on atomic checkout failure
- clear stale-task indicators
 - no silent background failures; every failed run visible in UI

 ## 15. Operational Requirements
@@ -780,7 +810,6 @@ A release candidate is blocked unless these pass:

 - add company selector and org chart view
 - add approvals and cost pages
- add operational dashboard and stale-task surfacing

 ## Milestone 6: Hardening and Release

--- a/doc/experimental/issue-worktree-support.md
+++ b/doc/experimental/issue-worktree-support.md
@@ -0,0 +1,62 @@
+# Issue worktree support
+
+Status: experimental, runtime-only, not shipping as a user-facing feature yet.
+
+This branch contains the runtime and seeding work needed for issue-scoped worktrees:
+
+- project execution workspace policy support
+- issue-level execution workspace settings
+- git worktree realization for isolated issue execution
+- optional command-based worktree provisioning
+- seeded worktree fixes for secrets key compatibility
+- seeded project workspace rebinding to the current git worktree
+
+We are intentionally not shipping the UI for this yet. The runtime code remains in place, but the main UI entrypoints are hard-gated off for now.
+
+## What works today
+
+- projects can carry execution workspace policy in the backend
+- issues can carry execution workspace settings in the backend
+- heartbeat execution can realize isolated git worktrees
+- runtime can run a project-defined provision command inside the derived worktree
+- seeded worktree instances can keep local-encrypted secrets working
+- seeded worktree instances can rebind same-repo project workspace paths onto the current git worktree
+
+## Hidden UI entrypoints
+
+These are the current user-facing UI surfaces for the feature, now intentionally disabled:
+
+- project settings:
+  - `ui/src/components/ProjectProperties.tsx`
+  - execution workspace policy controls
+  - git worktree base ref / branch template / parent dir
+  - provision / teardown command inputs
+
+- issue creation:
+  - `ui/src/components/NewIssueDialog.tsx`
+  - isolated issue checkout toggle
+  - defaulting issue execution workspace settings from project policy
+
+- issue editing:
+  - `ui/src/components/IssueProperties.tsx`
+  - issue-level workspace mode toggle
+  - defaulting issue execution workspace settings when project changes
+
+- agent/runtime settings:
+  - `ui/src/adapters/runtime-json-fields.tsx`
+  - runtime services JSON field, which is part of the broader workspace-runtime support surface
+
+## Why the UI is hidden
+
+- the runtime behavior is still being validated
+- the workflow and operator ergonomics are not final
+- we do not want to expose a partially-baked user-facing feature in issues, projects, or settings
+
+## Re-enable plan
+
+When this is ready to ship:
+
+- re-enable the gated UI sections in the files above
+- review wording and defaults for project and issue controls
+- decide which agent/runtime settings should remain advanced-only
+- add end-to-end product-level verification for the full UI workflow
--- a/doc/plans/2026-02-16-module-system.md
+++ b/doc/plans/2026-02-16-module-system.md
@@ -577,7 +577,7 @@ The Company Store is a registry for discovering and installing modules and templ
      "id": "startup-in-a-box",
      "name": "Startup in a Box",
      "description": "5-agent startup team",
-      "url": "https://store.paperclip.dev/templates/startup-in-a-box.json",
+      "url": "https://store.paperclip.ing/templates/startup-in-a-box.json",
      "tags": ["startup", "team"]
    }
  ]
--- a/doc/plans/2026-02-18-agent-authentication-implementation.md
+++ b/doc/plans/2026-02-18-agent-authentication-implementation.md
--- a/doc/plans/2026-02-18-agent-authentication.md
+++ b/doc/plans/2026-02-18-agent-authentication.md
@@ -127,7 +127,7 @@ Response:
  },
  "onboarding": {
    "instructions": "You are being invited to join Acme Corp as an employee agent...",
-    "skillUrl": "https://app.paperclip.dev/skills/paperclip/SKILL.md",
+    "skillUrl": "https://app.paperclip.ing/skills/paperclip/SKILL.md",
    "requiredFields": {
      "name": "Your display name",
      "adapterType": "How Paperclip should send you heartbeats",
--- a/doc/plans/2026-02-19-agent-mgmt-followup-plan.md
+++ b/doc/plans/2026-02-19-agent-mgmt-followup-plan.md
--- a/doc/plans/2026-02-19-ceo-agent-creation-and-hiring.md
+++ b/doc/plans/2026-02-19-ceo-agent-creation-and-hiring.md
--- a/doc/plans/2026-02-20-issue-run-orchestration-plan.md
+++ b/doc/plans/2026-02-20-issue-run-orchestration-plan.md
--- a/doc/plans/2026-02-20-storage-system-implementation.md
+++ b/doc/plans/2026-02-20-storage-system-implementation.md
--- a/doc/plans/2026-02-21-humans-and-permissions-implementation.md
+++ b/doc/plans/2026-02-21-humans-and-permissions-implementation.md
--- a/doc/plans/2026-02-21-humans-and-permissions.md
+++ b/doc/plans/2026-02-21-humans-and-permissions.md
--- a/doc/plans/2026-02-23-cursor-cloud-adapter.md
+++ b/doc/plans/2026-02-23-cursor-cloud-adapter.md
--- a/doc/plans/2026-02-23-deployment-auth-mode-consolidation.md
+++ b/doc/plans/2026-02-23-deployment-auth-mode-consolidation.md
--- a/doc/plans/2026-03-10-workspace-strategy-and-git-worktrees.md
+++ b/doc/plans/2026-03-10-workspace-strategy-and-git-worktrees.md
--- a/doc/plans/2026-03-11-agent-chat-ui-and-issue-backed-conversations.md
+++ b/doc/plans/2026-03-11-agent-chat-ui-and-issue-backed-conversations.md
@@ -0,0 +1,329 @@
+# Agent Chat UI and Issue-Backed Conversations
+
+## Context
+
+`PAP-475` asks two related questions:
+
+1. What UI kit should Paperclip use if we add a chat surface with an agent?
+2. How should chat fit the product without breaking the current issue-centric model?
+
+This is not only a component-library decision. In Paperclip today:
+
+- V1 explicitly says communication is `tasks + comments only`, with no separate chat system.
+- Issues already carry assignment, audit trail, billing code, project linkage, goal linkage, and active run linkage.
+- Live run streaming already exists on issue detail pages.
+- Agent sessions already persist by `taskKey`, and today `taskKey` falls back to `issueId`.
+- The OpenClaw gateway adapter already supports an issue-scoped session key strategy.
+
+That means the cheapest useful path is not "add a second messaging product inside Paperclip." It is "add a better conversational UI on top of issue and run primitives we already have."
+
+## Current Constraints From the Codebase
+
+### Durable work object
+
+The durable object in Paperclip is the issue, not a chat thread.
+
+- `IssueDetail` already combines comments, linked runs, live runs, and activity into one timeline.
+- `CommentThread` already renders markdown comments and supports reply/reassignment flows.
+- `LiveRunWidget` already renders streaming assistant/tool/system output for active runs.
+
+### Session behavior
+
+Session continuity is already task-shaped.
+
+- `heartbeat.ts` derives `taskKey` from `taskKey`, then `taskId`, then `issueId`.
+- `agent_task_sessions` stores session state per company + agent + adapter + task key.
+- OpenClaw gateway supports `sessionKeyStrategy=issue|fixed|run`, and `issue` already matches the Paperclip mental model well.
+
+That means "chat with the CEO about this issue" naturally maps to one durable session per issue today without inventing a second session system.
+
+### Billing behavior
+
+Billing is already issue-aware.
+
+- `cost_events` can attach to `issueId`, `projectId`, `goalId`, and `billingCode`.
+- heartbeat context already propagates issue linkage into runs and cost rollups.
+
+If chat leaves the issue model, Paperclip would need a second billing story. That is avoidable.
+
+## UI Kit Recommendation
+
+## Recommendation: `assistant-ui`
+
+Use `assistant-ui` as the chat presentation layer.
+
+Why it fits Paperclip:
+
+- It is a real chat UI kit, not just a hook.
+- It is composable and aligned with shadcn-style primitives, which matches the current UI stack well.
+- It explicitly supports custom backends, which matters because Paperclip talks to agents through issue comments, heartbeats, and run streams rather than direct provider calls.
+- It gives us polished chat affordances quickly: message list, composer, streaming text, attachments, thread affordances, and markdown-oriented rendering.
+
+Why not make "the Vercel one" the primary choice:
+
+- Vercel AI SDK is stronger today than the older "just `useChat` over `/api/chat`" framing. Its transport layer is flexible and can support custom protocols.
+- But AI SDK is still better understood here as a transport/runtime protocol layer than as the best end-user chat surface for Paperclip.
+- Paperclip does not need Vercel to own message state, persistence, or the backend contract. Paperclip already has its own issue, run, and session model.
+
+So the clean split is:
+
+- `assistant-ui` for UI primitives
+- Paperclip-owned runtime/store for state, persistence, and transport
+- optional AI SDK usage later only if we want its stream protocol or client transport abstraction
+
+## Product Options
+
+### Option A: Separate chat object
+
+Create a new top-level chat/thread model unrelated to issues.
+
+Pros:
+
+- clean mental model if users want freeform conversation
+- easy to hide from issue boards
+
+Cons:
+
+- breaks the current V1 product decision that communication is issue-centric
+- needs new persistence, billing, session, permissions, activity, and wakeup rules
+- creates a second "why does this exist?" object beside issues
+- makes "pick up an old chat" a separate retrieval problem
+
+Verdict: not recommended for V1.
+
+### Option B: Every chat is an issue
+
+Treat chat as a UI mode over an issue. The issue remains the durable record.
+
+Pros:
+
+- matches current product spec
+- billing, runs, comments, approvals, and activity already work
+- sessions already resume on issue identity
+- works with all adapters, including OpenClaw, without new agent auth or a second API surface
+
+Cons:
+
+- some chats are not really "tasks" in a board sense
+- onboarding and review conversations may clutter normal issue lists
+
+Verdict: best V1 foundation.
+
+### Option C: Hybrid with hidden conversation issues
+
+Back every conversation with an issue, but allow a conversation-flavored issue mode that is hidden from default execution boards unless promoted.
+
+Pros:
+
+- preserves the issue-centric backend
+- gives onboarding/review chat a cleaner UX
+- preserves billing and session continuity
+
+Cons:
+
+- requires extra UI rules and possibly a small schema or filtering addition
+- can become a disguised second system if not kept narrow
+
+Verdict: likely the right product shape after a basic issue-backed MVP.
+
+## Recommended Product Model
+
+### Phase 1 product decision
+
+For the first implementation, chat should be issue-backed.
+
+More specifically:
+
+- the board opens a chat surface for an issue
+- sending a message is a comment mutation on that issue
+- the assigned agent is woken through the existing issue-comment flow
+- streaming output comes from the existing live run stream for that issue
+- durable assistant output remains comments and run history, not an extra transcript store
+
+This keeps Paperclip honest about what it is:
+
+- the control plane stays issue-centric
+- chat is a better way to interact with issue work, not a new collaboration product
+
+### Onboarding and CEO conversations
+
+For onboarding, weekly reviews, and "chat with the CEO", use a conversation issue rather than a global chat tab.
+
+Suggested shape:
+
+- create a board-initiated issue assigned to the CEO
+- mark it as conversation-flavored in UI treatment
+- optionally hide it from normal issue boards by default later
+- keep all cost/run/session linkage on that issue
+
+This solves several concerns at once:
+
+- no separate API key or direct provider wiring is needed
+- the same CEO adapter is used
+- old conversations are recovered through normal issue history
+- the CEO can still create or update real child issues from the conversation
+
+## Session Model
+
+### V1
+
+Use one durable conversation session per issue.
+
+That already matches current behavior:
+
+- adapter task sessions persist against `taskKey`
+- `taskKey` already falls back to `issueId`
+- OpenClaw already supports an issue-scoped session key
+
+This means "resume the CEO conversation later" works by reopening the same issue and waking the same agent on the same issue.
+
+### What not to add yet
+
+Do not add multi-thread-per-issue chat in the first pass.
+
+If Paperclip later needs several parallel threads on one issue, then add an explicit conversation identity and derive:
+
+- `taskKey = issue:<issueId>:conversation:<conversationId>`
+- OpenClaw `sessionKey = paperclip:conversation:<conversationId>`
+
+Until that requirement becomes real, one issue == one durable conversation is the simpler and better rule.
+
+## Billing Model
+
+Chat should not invent a separate billing pipeline.
+
+All chat cost should continue to roll up through the issue:
+
+- `cost_events.issueId`
+- project and goal rollups through existing relationships
+- issue `billingCode` when present
+
+If a conversation is important enough to exist, it is important enough to have a durable issue-backed audit and cost trail.
+
+This is another reason ephemeral freeform chat should not be the default.
+
+## UI Architecture
+
+### Recommended stack
+
+1. Keep Paperclip as the source of truth for message history and run state.
+2. Add `assistant-ui` as the rendering/composer layer.
+3. Build a Paperclip runtime adapter that maps:
+   - issue comments -> user/assistant messages
+   - live run deltas -> streaming assistant messages
+   - issue attachments -> chat attachments
+4. Keep current markdown rendering and code-block support where possible.
+
+### Interaction flow
+
+1. Board opens issue detail in "Chat" mode.
+2. Existing comment history is mapped into chat messages.
+3. When the board sends a message:
+   - `POST /api/issues/{id}/comments`
+   - optionally interrupt the active run if the UX wants "send and replace current response"
+4. Existing issue comment wakeup logic wakes the assignee.
+5. Existing `/issues/{id}/live-runs` and `/issues/{id}/active-run` data feeds drive streaming.
+6. When the run completes, durable state remains in comments/runs/activity as it does now.
+
+### Why this fits the current code
+
+Paperclip already has most of the backend pieces:
+
+- issue comments
+- run timeline
+- run log and event streaming
+- markdown rendering
+- attachment support
+- assignee wakeups on comments
+
+The missing piece is mostly the presentation and the mapping layer, not a new backend domain.
+
+## Agent Scope
+
+Do not launch this as "chat with every agent."
+
+Start narrower:
+
+- onboarding chat with CEO
+- workflow/review chat with CEO
+- maybe selected exec roles later
+
+Reasons:
+
+- it keeps the feature from becoming a second inbox/chat product
+- it limits permission and UX questions early
+- it matches the stated product demand
+
+If direct chat with other agents becomes useful later, the same issue-backed pattern can expand cleanly.
+
+## Recommended Delivery Phases
+
+### Phase 1: Chat UI on existing issues
+
+- add a chat presentation mode to issue detail
+- use `assistant-ui`
+- map comments + live runs into the chat surface
+- no schema change
+- no new API surface
+
+This is the highest-leverage step because it tests whether the UX is actually useful before product model expansion.
+
+### Phase 2: Conversation-flavored issues for CEO chat
+
+- add a lightweight conversation classification
+- support creation of CEO conversation issues from onboarding and workflow entry points
+- optionally hide these from normal backlog/board views by default
+
+The smallest implementation could be a label or issue metadata flag. If it becomes important enough, then promote it to a first-class issue subtype later.
+
+### Phase 3: Promotion and thread splitting only if needed
+
+Only if we later see a real need:
+
+- allow promoting a conversation to a formal task issue
+- allow several threads per issue with explicit conversation identity
+
+This should be demand-driven, not designed up front.
+
+## Clear Recommendation
+
+If the question is "what should we use?", the answer is:
+
+- use `assistant-ui` for the chat UI
+- do not treat raw Vercel AI SDK UI hooks as the main product answer
+- keep chat issue-backed in V1
+- use the current issue comment + run + session + billing model rather than inventing a parallel chat subsystem
+
+If the question is "how should we think about chat in Paperclip?", the answer is:
+
+- chat is a mode of interacting with issue-backed agent work
+- not a separate product silo
+- not an excuse to stop tracing work, cost, and session history back to the issue
+
+## Implementation Notes
+
+### Immediate implementation target
+
+The most defensible first build is:
+
+- add a chat tab or chat-focused layout on issue detail
+- back it with the currently assigned agent on that issue
+- use `assistant-ui` primitives over existing comments and live run events
+
+### Defer these until proven necessary
+
+- standalone global chat objects
+- multi-thread chat inside one issue
+- chat with every agent in the org
+- a second persistence layer for message history
+- separate cost tracking for chats
+
+## References
+
+- V1 communication model: `doc/SPEC-implementation.md`
+- Current issue/comment/run UI: `ui/src/pages/IssueDetail.tsx`, `ui/src/components/CommentThread.tsx`, `ui/src/components/LiveRunWidget.tsx`
+- Session persistence and task key derivation: `server/src/services/heartbeat.ts`, `packages/db/src/schema/agent_task_sessions.ts`
+- OpenClaw session routing: `packages/adapters/openclaw-gateway/README.md`
+- assistant-ui docs: <https://www.assistant-ui.com/docs>
+- assistant-ui repo: <https://github.com/assistant-ui/assistant-ui>
+- AI SDK transport docs: <https://ai-sdk.dev/docs/ai-sdk-ui/transport>
--- a/doc/plans/2026-03-13-TOKEN-OPTIMIZATION-PLAN.md
+++ b/doc/plans/2026-03-13-TOKEN-OPTIMIZATION-PLAN.md
@@ -0,0 +1,397 @@
+# Token Optimization Plan
+
+Date: 2026-03-13  
+Related discussion: https://github.com/paperclipai/paperclip/discussions/449
+
+## Goal
+
+Reduce token consumption materially without reducing agent capability, control-plane visibility, or task completion quality.
+
+This plan is based on:
+
+- the current V1 control-plane design
+- the current adapter and heartbeat implementation
+- the linked user discussion
+- local runtime data from the default Paperclip instance on 2026-03-13
+
+## Executive Summary
+
+The discussion is directionally right about two things:
+
+1. We should preserve session and prompt-cache locality more aggressively.
+2. We should separate stable startup instructions from per-heartbeat dynamic context.
+
+But that is not enough on its own.
+
+After reviewing the code and local run data, the token problem appears to have four distinct causes:
+
+1. **Measurement inflation on sessioned adapters.** Some token counters, especially for `codex_local`, appear to be recorded as cumulative session totals instead of per-heartbeat deltas.
+2. **Avoidable session resets.** Task sessions are intentionally reset on timer wakes and manual wakes, which destroys cache locality for common heartbeat paths.
+3. **Repeated context reacquisition.** The `paperclip` skill tells agents to re-fetch assignments, issue details, ancestors, and full comment threads on every heartbeat. The API does not currently offer efficient delta-oriented alternatives.
+4. **Large static instruction surfaces.** Agent instruction files and globally injected skills are reintroduced at startup even when most of that content is unchanged and not needed for the current task.
+
+The correct approach is:
+
+1. fix telemetry so we can trust the numbers
+2. preserve reuse where it is safe
+3. make context retrieval incremental
+4. add session compaction/rotation so long-lived sessions do not become progressively more expensive
+
+## Validated Findings
+
+### 1. Token telemetry is at least partly overstated today
+
+Observed from the local default instance:
+
+- `heartbeat_runs`: 11,360 runs between 2026-02-18 and 2026-03-13
+- summed `usage_json.inputTokens`: `2,272,142,368,952`
+- summed `usage_json.cachedInputTokens`: `2,217,501,559,420`
+
+Those totals are not credible as true per-heartbeat usage for the observed prompt sizes.
+
+Supporting evidence:
+
+- `adapter.invoke.payload.prompt` averages were small:
+  - `codex_local`: ~193 chars average, 6,067 chars max
+  - `claude_local`: ~160 chars average, 1,160 chars max
+- despite that, many `codex_local` runs report millions of input tokens
+- one reused Codex session in local data spans 3,607 runs and recorded `inputTokens` growing up to `1,155,283,166`
+
+Interpretation:
+
+- for sessioned adapters, especially Codex, we are likely storing usage reported by the runtime as a **session total**, not a **per-run delta**
+- this makes trend reporting, optimization work, and customer trust worse
+
+This does **not** mean there is no real token problem. It means we need a trustworthy baseline before we can judge optimization impact.
+
+### 2. Timer wakes currently throw away reusable task sessions
+
+In `server/src/services/heartbeat.ts`, `shouldResetTaskSessionForWake(...)` returns `true` for:
+
+- `wakeReason === "issue_assigned"`
+- `wakeSource === "timer"`
+- manual on-demand wakes
+
+That means many normal heartbeats skip saved task-session resume even when the workspace is stable.
+
+Local data supports the impact:
+
+- `timer/system` runs: 6,587 total
+- only 976 had a previous session
+- only 963 ended with the same session
+
+So timer wakes are the largest heartbeat path and are mostly not resuming prior task state.
+
+### 3. We repeatedly ask agents to reload the same task context
+
+The `paperclip` skill currently tells agents to do this on essentially every heartbeat:
+
+- fetch assignments
+- fetch issue details
+- fetch ancestor chain
+- fetch full issue comments
+
+Current API shape reinforces that pattern:
+
+- `GET /api/issues/:id/comments` returns the full thread
+- there is no `since`, cursor, digest, or summary endpoint for heartbeat consumption
+- `GET /api/issues/:id` returns full enriched issue context, not a minimal delta payload
+
+This is safe but expensive. It forces the model to repeatedly consume unchanged information.
+
+### 4. Static instruction payloads are not separated cleanly from dynamic heartbeat prompts
+
+The user discussion suggested a bootstrap prompt. That is the right direction.
+
+Current state:
+
+- the UI exposes `bootstrapPromptTemplate`
+- adapter execution paths do not currently use it
+- several adapters prepend `instructionsFilePath` content directly into the per-run prompt or system prompt
+
+Result:
+
+- stable instructions are re-sent or re-applied in the same path as dynamic heartbeat content
+- we are not deliberately optimizing for provider prompt caching
+
+### 5. We inject more skill surface than most agents need
+
+Local adapters inject repo skills into runtime skill directories.
+
+Important `codex_local` nuance:
+
+- Codex does not read skills directly from the active worktree.
+- Paperclip discovers repo skills from the current checkout, then symlinks them into `$CODEX_HOME/skills` or `~/.codex/skills`.
+- If an existing Paperclip skill symlink already points at another live checkout, the current implementation skips it instead of repointing it.
+- This can leave Codex using stale skill content from a different worktree even after Paperclip-side skill changes land.
+- That is both a correctness risk and a token-analysis risk, because runtime behavior may not reflect the instructions in the checkout being tested.
+
+Current repo skill sizes:
+
+- `skills/paperclip/SKILL.md`: 17,441 bytes
+- `.agents/skills/create-agent-adapter/SKILL.md`: 31,832 bytes
+- `skills/paperclip-create-agent/SKILL.md`: 4,718 bytes
+- `skills/para-memory-files/SKILL.md`: 3,978 bytes
+
+That is nearly 58 KB of skill markdown before any company-specific instructions.
+
+Not all of that is necessarily loaded into model context every run, but it increases startup surface area and should be treated as a token budget concern.
+
+## Principles
+
+We should optimize tokens under these rules:
+
+1. **Do not lose functionality.** Agents must still be able to resume work safely, understand why tasks exist, and act within governance rules.
+2. **Prefer stable context over repeated context.** Unchanged instructions should not be resent through the most expensive path.
+3. **Prefer deltas over full reloads.** Heartbeats should consume only what changed since the last useful run.
+4. **Measure normalized deltas, not raw adapter claims.** Especially for sessioned CLIs.
+5. **Keep escape hatches.** Board/manual runs may still want a forced fresh session.
+
+## Plan
+
+## Phase 1: Make token telemetry trustworthy
+
+This should happen first.
+
+### Changes
+
+- Store both:
+  - raw adapter-reported usage
+  - Paperclip-normalized per-run usage
+- For sessioned adapters, compute normalized deltas against prior usage for the same persisted session.
+- Add explicit fields for:
+  - `sessionReused`
+  - `taskSessionReused`
+  - `promptChars`
+  - `instructionsChars`
+  - `hasInstructionsFile`
+  - `skillSetHash` or skill count
+  - `contextFetchMode` (`full`, `delta`, `summary`)
+- Add per-adapter parser tests that distinguish cumulative-session counters from per-run counters.
+
+### Why
+
+Without this, we cannot tell whether a reduction came from a real optimization or a reporting artifact.
+
+### Success criteria
+
+- per-run token totals stop exploding on long-lived sessions
+- a resumed session’s usage curve is believable and monotonic at the session level, but not double-counted at the run level
+- cost pages can show both raw and normalized numbers while we migrate
+
+## Phase 2: Preserve safe session reuse by default
+
+This is the highest-leverage behavior change.
+
+### Changes
+
+- Stop resetting task sessions on ordinary timer wakes.
+- Keep resetting on:
+  - explicit manual “fresh run” invocations
+  - assignment changes
+  - workspace mismatch
+  - model mismatch / invalid resume errors
+- Add an explicit wake flag like `forceFreshSession: true` when the board wants a reset.
+- Record why a session was reused or reset in run metadata.
+
+### Why
+
+Timer wakes are the dominant heartbeat path. Resetting them destroys both session continuity and prompt cache reuse.
+
+### Success criteria
+
+- timer wakes resume the prior task session in the large majority of stable-workspace cases
+- no increase in stale-session failures
+- lower normalized input tokens per timer heartbeat
+
+## Phase 3: Separate static bootstrap context from per-heartbeat context
+
+This is the right version of the discussion’s bootstrap idea.
+
+### Changes
+
+- Implement `bootstrapPromptTemplate` in adapter execution paths.
+- Use it only when starting a fresh session, not on resumed sessions.
+- Keep `promptTemplate` intentionally small and stable:
+  - who I am
+  - what triggered this wake
+  - which task/comment/approval to prioritize
+- Move long-lived setup text out of recurring per-run prompts where possible.
+- Add UI guidance and warnings when `promptTemplate` contains high-churn or large inline content.
+
+### Why
+
+Static instructions and dynamic wake context have different cache behavior and should be modeled separately.
+
+For `codex_local`, this also requires isolating the Codex skill home per worktree or teaching Paperclip to repoint its own skill symlinks when the source checkout changes. Otherwise prompt and skill improvements in the active worktree may not reach the running agent.
+
+### Success criteria
+
+- fresh-session prompts can remain richer without inflating every resumed heartbeat
+- resumed prompts become short and structurally stable
+- cache hit rates improve for session-preserving adapters
+
+## Phase 4: Make issue/task context incremental
+
+This is the biggest product change and likely the biggest real token saver after session reuse.
+
+### Changes
+
+Add heartbeat-oriented endpoints and skill behavior:
+
+- `GET /api/agents/me/inbox-lite`
+  - minimal assignment list
+  - issue id, identifier, status, priority, updatedAt, lastExternalCommentAt
+- `GET /api/issues/:id/heartbeat-context`
+  - compact issue state
+  - parent-chain summary
+  - latest execution summary
+  - change markers
+- `GET /api/issues/:id/comments?after=<cursor>` or `?since=<timestamp>`
+  - return only new comments
+- optional `GET /api/issues/:id/context-digest`
+  - server-generated compact summary for heartbeat use
+
+Update the `paperclip` skill so the default pattern becomes:
+
+1. fetch compact inbox
+2. fetch compact task context
+3. fetch only new comments unless this is the first read, a mention-triggered wake, or a cache miss
+4. fetch full thread only on demand
+
+### Why
+
+Today we are using full-fidelity board APIs as heartbeat APIs. That is convenient but token-inefficient.
+
+### Success criteria
+
+- after first task acquisition, most heartbeats consume only deltas
+- repeated blocked-task or long-thread work no longer replays the whole comment history
+- mention-triggered wakes still have enough context to respond correctly
+
+## Phase 5: Add session compaction and controlled rotation
+
+This protects against long-lived session bloat.
+
+### Changes
+
+- Add rotation thresholds per adapter/session:
+  - turns
+  - normalized input tokens
+  - age
+  - cache hit degradation
+- Before rotating, produce a structured carry-forward summary:
+  - current objective
+  - work completed
+  - open decisions
+  - blockers
+  - files/artifacts touched
+  - next recommended action
+- Persist that summary in task session state or runtime state.
+- Start the next session with:
+  - bootstrap prompt
+  - compact carry-forward summary
+  - current wake trigger
+
+### Why
+
+Even when reuse is desirable, some sessions become too expensive to keep alive indefinitely.
+
+### Success criteria
+
+- very long sessions stop growing without bound
+- rotating a session does not cause loss of task continuity
+- successful task completion rate stays flat or improves
+
+## Phase 6: Reduce unnecessary skill surface
+
+### Changes
+
+- Move from “inject all repo skills” to an allowlist per agent or per adapter.
+- Default local runtime skill set should likely be:
+  - `paperclip`
+- Add opt-in skills for specialized agents:
+  - `paperclip-create-agent`
+  - `para-memory-files`
+  - `create-agent-adapter`
+- Expose active skill set in agent config and run metadata.
+- For `codex_local`, either:
+  - run with a worktree-specific `CODEX_HOME`, or
+  - treat Paperclip-owned Codex skill symlinks as repairable when they point at a different checkout
+
+### Why
+
+Most agents do not need adapter-authoring or memory-system skills on every run.
+
+### Success criteria
+
+- smaller startup instruction surface
+- no loss of capability for specialist agents that explicitly need extra skills
+
+## Rollout Order
+
+Recommended order:
+
+1. telemetry normalization
+2. timer-wake session reuse
+3. bootstrap prompt implementation
+4. heartbeat delta APIs + `paperclip` skill rewrite
+5. session compaction/rotation
+6. skill allowlists
+
+## Acceptance Metrics
+
+We should treat this plan as successful only if we improve both efficiency and task outcomes.
+
+Primary metrics:
+
+- normalized input tokens per successful heartbeat
+- normalized input tokens per completed issue
+- cache-hit ratio for sessioned adapters
+- session reuse rate by invocation source
+- fraction of heartbeats that fetch full comment threads
+
+Guardrail metrics:
+
+- task completion rate
+- blocked-task rate
+- stale-session failure rate
+- manual intervention rate
+- issue reopen rate after agent completion
+
+Initial targets:
+
+- 30% to 50% reduction in normalized input tokens per successful resumed heartbeat
+- 80%+ session reuse on stable timer wakes
+- 80%+ reduction in full-thread comment reloads after first task read
+- no statistically meaningful regression in completion rate or failure rate
+
+## Concrete Engineering Tasks
+
+1. Add normalized usage fields and migration support for run analytics.
+2. Patch sessioned adapter accounting to compute deltas from prior session totals.
+3. Change `shouldResetTaskSessionForWake(...)` so timer wakes do not reset by default.
+4. Implement `bootstrapPromptTemplate` end-to-end in adapter execution.
+5. Add compact heartbeat context and incremental comment APIs.
+6. Rewrite `skills/paperclip/SKILL.md` around delta-fetch behavior.
+7. Add session rotation with carry-forward summaries.
+8. Replace global skill injection with explicit allowlists.
+9. Fix `codex_local` skill resolution so worktree-local skill changes reliably reach the runtime.
+
+## Recommendation
+
+Treat this as a two-track effort:
+
+- **Track A: correctness and no-regret wins**
+  - telemetry normalization
+  - timer-wake session reuse
+  - bootstrap prompt implementation
+- **Track B: structural token reduction**
+  - delta APIs
+  - skill rewrite
+  - session compaction
+  - skill allowlists
+
+If we only do Track A, we will improve things, but agents will still re-read too much unchanged task context.
+
+If we only do Track B without fixing telemetry first, we will not be able to prove the gains cleanly.
--- a/doc/plans/2026-03-13-agent-evals-framework.md
+++ b/doc/plans/2026-03-13-agent-evals-framework.md
@@ -0,0 +1,775 @@
+# Agent Evals Framework Plan
+
+Date: 2026-03-13
+
+## Context
+
+We need evals for the thing Paperclip actually ships:
+
+- agent behavior produced by adapter config
+- prompt templates and bootstrap prompts
+- skill sets and skill instructions
+- model choice
+- runtime policy choices that affect outcomes and cost
+
+We do **not** primarily need a fine-tuning pipeline.
+We need a regression framework that can answer:
+
+- if we change prompts or skills, do agents still do the right thing?
+- if we switch models, what got better, worse, or more expensive?
+- if we optimize tokens, did we preserve task outcomes?
+- can we grow the suite over time from real Paperclip usage?
+
+This plan is based on:
+
+- `doc/GOAL.md`
+- `doc/PRODUCT.md`
+- `doc/SPEC-implementation.md`
+- `docs/agents-runtime.md`
+- `doc/plans/2026-03-13-TOKEN-OPTIMIZATION-PLAN.md`
+- Discussion #449: <https://github.com/paperclipai/paperclip/discussions/449>
+- OpenAI eval best practices: <https://developers.openai.com/api/docs/guides/evaluation-best-practices>
+- Promptfoo docs: <https://www.promptfoo.dev/docs/configuration/test-cases/> and <https://www.promptfoo.dev/docs/providers/custom-api/>
+- LangSmith complex agent eval docs: <https://docs.langchain.com/langsmith/evaluate-complex-agent>
+- Braintrust dataset/scorer docs: <https://www.braintrust.dev/docs/annotate/datasets> and <https://www.braintrust.dev/docs/evaluate/write-scorers>
+
+## Recommendation
+
+Paperclip should take a **two-stage approach**:
+
+1. **Start with Promptfoo now** for narrow, prompt-and-skill behavior evals across models.
+2. **Grow toward a first-party, repo-local eval harness in TypeScript** for full Paperclip scenario evals.
+
+So the recommendation is no longer “skip Promptfoo.” It is:
+
+- use Promptfoo as the fastest bootstrap layer
+- keep eval cases and fixtures in this repo
+- avoid making Promptfoo config the deepest long-term abstraction
+
+More specifically:
+
+1. The canonical eval definitions should live in this repo under a top-level `evals/` directory.
+2. `v0` should use Promptfoo to run focused test cases across models and providers.
+3. The longer-term harness should run **real Paperclip scenarios** against seeded companies/issues/agents, not just raw prompt completions.
+4. The scoring model should combine:
+   - deterministic checks
+   - structured rubric scoring
+   - pairwise candidate-vs-baseline judging
+   - efficiency metrics from normalized usage/cost telemetry
+5. The framework should compare **bundles**, not just models.
+
+A bundle is:
+
+- adapter type
+- model id
+- prompt template(s)
+- bootstrap prompt template
+- skill allowlist / skill content version
+- relevant runtime flags
+
+That is the right unit because that is what actually changes behavior in Paperclip.
+
+## Why This Is The Right Shape
+
+### 1. We need to evaluate system behavior, not only prompt output
+
+Prompt-only tools are useful, but Paperclip’s real failure modes are often:
+
+- wrong issue chosen
+- wrong API call sequence
+- bad delegation
+- failure to respect approval boundaries
+- stale session behavior
+- over-reading context
+- claiming completion without producing artifacts or comments
+
+Those are control-plane behaviors. They require scenario setup, execution, and trace inspection.
+
+### 2. The repo is already TypeScript-first
+
+The existing monorepo already uses:
+
+- `pnpm`
+- `tsx`
+- `vitest`
+- TypeScript across server, UI, shared contracts, and adapters
+
+A TypeScript-first harness will fit the repo and CI better than introducing a Python-first test subsystem as the default path.
+
+Python can stay optional later for specialty scorers or research experiments.
+
+### 3. We need provider/model comparison without vendor lock-in
+
+OpenAI’s guidance is directionally right:
+
+- eval early and often
+- use task-specific evals
+- log everything
+- prefer pairwise/comparison-style judging over open-ended scoring
+
+But OpenAI’s Evals API is not the right control plane for Paperclip as the primary system because our target is explicitly multi-model and multi-provider.
+
+### 4. Hosted eval products are useful, and Promptfoo is the right bootstrap tool
+
+The current tradeoff:
+
+- Promptfoo is very good for local, repo-based prompt/provider matrices and CI integration.
+- LangSmith is strong on trajectory-style agent evals.
+- Braintrust has a clean dataset + scorer + experiment model and strong TypeScript support.
+
+The community suggestion is directionally right:
+
+- Promptfoo lets us start small
+- it supports simple assertions like contains / not-contains / regex / custom JS
+- it can run the same cases across multiple models
+- it supports OpenRouter
+- it can move into CI later
+
+That makes it the best `v0` tool for “did this prompt/skill/model change obviously regress?”
+
+But Paperclip should still avoid making a hosted platform or a third-party config format the core abstraction before we have our own stable eval model.
+
+The right move is:
+
+- start with Promptfoo for quick wins
+- keep the data portable and repo-owned
+- build a thin first-party harness around Paperclip concepts as the system grows
+- optionally export to or integrate with other tools later if useful
+
+## What We Should Evaluate
+
+We should split evals into four layers.
+
+### Layer 1: Deterministic contract evals
+
+These should require no judge model.
+
+Examples:
+
+- agent comments on the assigned issue
+- no mutation outside the agent’s company
+- approval-required actions do not bypass approval flow
+- task transitions are legal
+- output contains required structured fields
+- artifact links exist when the task required an artifact
+- no full-thread refetch on delta-only cases once the API supports it
+
+These are cheap, reliable, and should be the first line of defense.
+
+### Layer 2: Single-step behavior evals
+
+These test narrow behaviors in isolation.
+
+Examples:
+
+- chooses the correct issue from inbox
+- writes a reasonable first status comment
+- decides to ask for approval instead of acting directly
+- delegates to the correct report
+- recognizes blocked state and reports it clearly
+
+These are the closest thing to prompt evals, but still framed in Paperclip terms.
+
+### Layer 3: End-to-end scenario evals
+
+These run a full heartbeat or short sequence of heartbeats against a seeded scenario.
+
+Examples:
+
+- new assignment pickup
+- long-thread continuation
+- mention-triggered clarification
+- approval-gated hire request
+- manager escalation
+- workspace coding task that must leave a meaningful issue update
+
+These should evaluate both final state and trace quality.
+
+### Layer 4: Efficiency and regression evals
+
+These are not “did the answer look good?” evals. They are “did we preserve quality while improving cost/latency?” evals.
+
+Examples:
+
+- normalized input tokens per successful heartbeat
+- normalized tokens per completed issue
+- session reuse rate
+- full-thread reload rate
+- wall-clock duration
+- cost per successful scenario
+
+This layer is especially important for token optimization work.
+
+## Core Design
+
+## 1. Canonical object: `EvalCase`
+
+Each eval case should define:
+
+- scenario setup
+- target bundle(s)
+- execution mode
+- expected invariants
+- scoring rubric
+- tags/metadata
+
+Suggested shape:
+
+```ts
+type EvalCase = {
+  id: string;
+  description: string;
+  tags: string[];
+  setup: {
+    fixture: string;
+    agentId: string;
+    trigger: "assignment" | "timer" | "on_demand" | "comment" | "approval";
+  };
+  inputs?: Record<string, unknown>;
+  checks: {
+    hard: HardCheck[];
+    rubric?: RubricCheck[];
+    pairwise?: PairwiseCheck[];
+  };
+  metrics: MetricSpec[];
+};
+```
+
+The important part is that the case is about a Paperclip scenario, not a standalone prompt string.
+
+## 2. Canonical object: `EvalBundle`
+
+Suggested shape:
+
+```ts
+type EvalBundle = {
+  id: string;
+  adapter: string;
+  model: string;
+  promptTemplate: string;
+  bootstrapPromptTemplate?: string;
+  skills: string[];
+  flags?: Record<string, string | number | boolean>;
+};
+```
+
+Every comparison run should say which bundle was tested.
+
+This avoids the common mistake of saying “model X is better” when the real change was model + prompt + skills + runtime behavior.
+
+## 3. Canonical output: `EvalTrace`
+
+We should capture a normalized trace for scoring:
+
+- run ids
+- prompts actually sent
+- session reuse metadata
+- issue mutations
+- comments created
+- approvals requested
+- artifacts created
+- token/cost telemetry
+- timing
+- raw outputs
+
+The scorer layer should never need to scrape ad hoc logs.
+
+## Scoring Framework
+
+## 1. Hard checks first
+
+Every eval should start with pass/fail checks that can invalidate the run immediately.
+
+Examples:
+
+- touched wrong company
+- skipped required approval
+- no issue update produced
+- returned malformed structured output
+- marked task done without required artifact
+
+If a hard check fails, the scenario fails regardless of style or judge score.
+
+## 2. Rubric scoring second
+
+Rubric scoring should use narrow criteria, not vague “how good was this?” prompts.
+
+Good rubric dimensions:
+
+- task understanding
+- governance compliance
+- useful progress communication
+- correct delegation
+- evidence of completion
+- concision / unnecessary verbosity
+
+Each rubric should be a small 0-1 or 0-2 decision, not a mushy 1-10 scale.
+
+## 3. Pairwise judging for candidate vs baseline
+
+OpenAI’s eval guidance is right that LLMs are better at discrimination than open-ended generation.
+
+So for non-deterministic quality checks, the default pattern should be:
+
+- run baseline bundle on the case
+- run candidate bundle on the same case
+- ask a judge model which is better on explicit criteria
+- allow `baseline`, `candidate`, or `tie`
+
+This is better than asking a judge for an absolute quality score with no anchor.
+
+## 4. Efficiency scoring is separate
+
+Do not bury efficiency inside a single blended quality score.
+
+Record it separately:
+
+- quality score
+- cost score
+- latency score
+
+Then compute a summary decision such as:
+
+- candidate is acceptable only if quality is non-inferior and efficiency is improved
+
+That is much easier to reason about than one magic number.
+
+## Suggested Decision Rule
+
+For PR gating:
+
+1. No hard-check regressions.
+2. No significant regression on required scenario pass rate.
+3. No significant regression on key rubric dimensions.
+4. If the change is token-optimization-oriented, require efficiency improvement on target scenarios.
+
+For deeper comparison reports, show:
+
+- pass rate
+- pairwise wins/losses/ties
+- median normalized tokens
+- median wall-clock time
+- cost deltas
+
+## Dataset Strategy
+
+We should explicitly build the dataset from three sources.
+
+### 1. Hand-authored seed cases
+
+Start here.
+
+These should cover core product invariants:
+
+- assignment pickup
+- status update
+- blocked reporting
+- delegation
+- approval request
+- cross-company access denial
+- issue comment follow-up
+
+These are small, clear, and stable.
+
+### 2. Production-derived cases
+
+Per OpenAI’s guidance, we should log everything and mine real usage for eval cases.
+
+Paperclip should grow eval coverage by promoting real runs into cases when we see:
+
+- regressions
+- interesting failures
+- edge cases
+- high-value success patterns worth preserving
+
+The initial version can be manual:
+
+- take a real run
+- redact/normalize it
+- convert it into an `EvalCase`
+
+Later we can automate trace-to-case generation.
+
+### 3. Adversarial and guardrail cases
+
+These should intentionally probe failure modes:
+
+- approval bypass attempts
+- wrong-company references
+- stale context traps
+- irrelevant long threads
+- misleading instructions in comments
+- verbosity traps
+
+This is where promptfoo-style red-team ideas can become useful later, but it is not the first slice.
+
+## Repo Layout
+
+Recommended initial layout:
+
+```text
+evals/
+  README.md
+  promptfoo/
+    promptfooconfig.yaml
+    prompts/
+    cases/
+  cases/
+    core/
+    approvals/
+    delegation/
+    efficiency/
+  fixtures/
+    companies/
+    issues/
+  bundles/
+    baseline/
+    experiments/
+  runners/
+    scenario-runner.ts
+    compare-runner.ts
+  scorers/
+    hard/
+    rubric/
+    pairwise/
+  judges/
+    rubric-judge.ts
+    pairwise-judge.ts
+  lib/
+    types.ts
+    traces.ts
+    metrics.ts
+  reports/
+    .gitignore
+```
+
+Why top-level `evals/`:
+
+- it makes evals feel first-class
+- it avoids hiding them inside `server/` even though they span adapters and runtime behavior
+- it leaves room for both TS and optional Python helpers later
+- it gives us a clean place for Promptfoo `v0` config plus the later first-party runner
+
+## Execution Model
+
+The harness should support three modes.
+
+### Mode A: Cheap local smoke
+
+Purpose:
+
+- run on PRs
+- keep cost low
+- catch obvious regressions
+
+Characteristics:
+
+- 5 to 20 cases
+- 1 or 2 bundles
+- mostly hard checks and narrow rubrics
+
+### Mode B: Candidate vs baseline compare
+
+Purpose:
+
+- evaluate a prompt/skill/model change before merge
+
+Characteristics:
+
+- paired runs
+- pairwise judging enabled
+- quality + efficiency diff report
+
+### Mode C: Nightly broader matrix
+
+Purpose:
+
+- compare multiple models and bundles
+- grow historical benchmark data
+
+Characteristics:
+
+- larger case set
+- multiple models
+- more expensive rubric/pairwise judging
+
+## CI and Developer Workflow
+
+Suggested commands:
+
+```sh
+pnpm evals:smoke
+pnpm evals:compare --baseline baseline/codex-default --candidate experiments/codex-lean-skillset
+pnpm evals:nightly
+```
+
+PR behavior:
+
+- run `evals:smoke` on prompt/skill/adapter/runtime changes
+- optionally trigger `evals:compare` for labeled PRs or manual runs
+
+Nightly behavior:
+
+- run larger matrix
+- save report artifact
+- surface trend lines on pass rate, pairwise wins, and efficiency
+
+## Framework Comparison
+
+## Promptfoo
+
+Best use for Paperclip:
+
+- prompt-level micro-evals
+- provider/model comparison
+- quick local CI integration
+- custom JS assertions and custom providers
+- bootstrap-layer evals for one skill or one agent workflow
+
+What changed in this recommendation:
+
+- Promptfoo is now the recommended **starting point**
+- especially for “one skill, a handful of cases, compare across models”
+
+Why it still should not be the only long-term system:
+
+- its primary abstraction is still prompt/provider/test-case oriented
+- Paperclip needs scenario setup, control-plane state inspection, and multi-step traces as first-class concepts
+
+Recommendation:
+
+- use Promptfoo first
+- store Promptfoo config and cases in-repo under `evals/promptfoo/`
+- use custom JS/TS assertions and, if needed later, a custom provider that calls Paperclip scenario runners
+- do not make Promptfoo YAML the only canonical Paperclip eval format once we outgrow prompt-level evals
+
+## LangSmith
+
+What it gets right:
+
+- final response evals
+- trajectory evals
+- single-step evals
+
+Why not the primary system today:
+
+- stronger fit for teams already centered on LangChain/LangGraph
+- introduces hosted/external workflow gravity before our own eval model is stable
+
+Recommendation:
+
+- copy the trajectory/final/single-step taxonomy
+- do not adopt the platform as the default requirement
+
+## Braintrust
+
+What it gets right:
+
+- TypeScript support
+- clean dataset/task/scorer model
+- production logging to datasets
+- experiment comparison over time
+
+Why not the primary system today:
+
+- still externalizes the canonical dataset and review workflow
+- we are not yet at the maturity where hosted experiment management should define the shape of the system
+
+Recommendation:
+
+- borrow its dataset/scorer/experiment mental model
+- revisit once we want hosted review and experiment history at scale
+
+## OpenAI Evals / Evals API
+
+What it gets right:
+
+- strong eval principles
+- emphasis on task-specific evals
+- continuous evaluation mindset
+
+Why not the primary system:
+
+- Paperclip must compare across models/providers
+- we do not want our primary eval runner coupled to one model vendor
+
+Recommendation:
+
+- use the guidance
+- do not use it as the core Paperclip eval runtime
+
+## First Implementation Slice
+
+The first version should be intentionally small.
+
+## Phase 0: Promptfoo bootstrap
+
+Build:
+
+- `evals/promptfoo/promptfooconfig.yaml`
+- 5 to 10 focused cases for one skill or one agent workflow
+- model matrix using the providers we care about most
+- mostly deterministic assertions:
+  - contains
+  - not-contains
+  - regex
+  - custom JS assertions
+
+Target scope:
+
+- one skill, or one narrow workflow such as assignment pickup / first status update
+- compare a small set of bundles across several models
+
+Success criteria:
+
+- we can run one command and compare outputs across models
+- prompt/skill regressions become visible quickly
+- the team gets signal before building heavier infrastructure
+
+## Phase 1: Skeleton and core cases
+
+Build:
+
+- `evals/` scaffold
+- `EvalCase`, `EvalBundle`, `EvalTrace` types
+- scenario runner for seeded local cases
+- 10 hand-authored core cases
+- hard checks only
+
+Target cases:
+
+- assigned issue pickup
+- write progress comment
+- ask for approval when required
+- respect company boundary
+- report blocked state
+- avoid marking done without artifact/comment evidence
+
+Success criteria:
+
+- a developer can run a local smoke suite
+- prompt/skill changes can fail the suite deterministically
+- Promptfoo `v0` cases either migrate into or coexist with this layer cleanly
+
+## Phase 2: Pairwise and rubric layer
+
+Build:
+
+- rubric scorer interface
+- pairwise judge runner
+- candidate vs baseline compare command
+- markdown/html report output
+
+Success criteria:
+
+- model/prompt bundle changes produce a readable diff report
+- we can tell “better”, “worse”, or “same” on curated scenarios
+
+## Phase 3: Efficiency integration
+
+Build:
+
+- normalized token/cost metrics into eval traces
+- cost and latency comparisons
+- efficiency gates for token optimization work
+
+Dependency:
+
+- this should align with the telemetry normalization work in `2026-03-13-TOKEN-OPTIMIZATION-PLAN.md`
+
+Success criteria:
+
+- quality and efficiency can be judged together
+- token-reduction work no longer relies on anecdotal improvements
+
+## Phase 4: Production-case ingestion
+
+Build:
+
+- tooling to promote real runs into new eval cases
+- metadata tagging
+- failure corpus growth process
+
+Success criteria:
+
+- the eval suite grows from real product behavior instead of staying synthetic
+
+## Initial Case Categories
+
+We should start with these categories:
+
+1. `core.assignment_pickup`
+2. `core.progress_update`
+3. `core.blocked_reporting`
+4. `governance.approval_required`
+5. `governance.company_boundary`
+6. `delegation.correct_report`
+7. `threads.long_context_followup`
+8. `efficiency.no_unnecessary_reloads`
+
+That is enough to start catching the classes of regressions we actually care about.
+
+## Important Guardrails
+
+### 1. Do not rely on judge models alone
+
+Every important scenario needs deterministic checks first.
+
+### 2. Do not gate PRs on a single noisy score
+
+Use pass/fail invariants plus a small number of stable rubric or pairwise checks.
+
+### 3. Do not confuse benchmark score with product quality
+
+The suite must keep growing from real runs, otherwise it will become a toy benchmark.
+
+### 4. Do not evaluate only final output
+
+Trajectory matters for agents:
+
+- did they call the right Paperclip APIs?
+- did they ask for approval?
+- did they communicate progress?
+- did they choose the right issue?
+
+### 5. Do not make the framework vendor-shaped
+
+Our eval model should survive changes in:
+
+- judge provider
+- candidate provider
+- adapter implementation
+- hosted tooling choices
+
+## Open Questions
+
+1. Should the first scenario runner invoke the real server over HTTP, or call services directly in-process?
+   My recommendation: start in-process for speed, then add HTTP-mode coverage once the model stabilizes.
+
+2. Should we support Python scorers in v1?
+   My recommendation: no. Keep v1 all-TypeScript.
+
+3. Should we commit baseline outputs?
+   My recommendation: commit case definitions and bundle definitions, but keep run artifacts out of git.
+
+4. Should we add hosted experiment tracking immediately?
+   My recommendation: no. Revisit after the local harness proves useful.
+
+## Final Recommendation
+
+Start with Promptfoo for immediate, narrow model-and-prompt comparisons, then grow into a first-party `evals/` framework in TypeScript that evaluates **Paperclip scenarios and bundles**, not just prompts.
+
+Use this structure:
+
+- Promptfoo for `v0` bootstrap
+- deterministic hard checks as the foundation
+- rubric and pairwise judging for non-deterministic quality
+- normalized efficiency metrics as a separate axis
+- repo-local datasets that grow from real runs
+
+Use external tools selectively:
+
+- Promptfoo as the initial path for narrow prompt/provider tests
+- Braintrust or LangSmith later if we want hosted experiment management
+
+But keep the canonical eval model inside the Paperclip repo and aligned to Paperclip’s actual control-plane behaviors.
--- a/doc/plans/2026-03-13-features.md
+++ b/doc/plans/2026-03-13-features.md
@@ -0,0 +1,780 @@
+# Feature specs
+
+## 1) Guided onboarding + first-job magic
+
+The repo already has `onboard`, `doctor`, `run`, deployment modes, and even agent-oriented onboarding text/skills endpoints, but there are also current onboarding/auth validation issues and an open “onboard failed” report. That means this is not just polish; it is product-critical. ([GitHub][1])
+
+### Product decision
+
+Replace “configuration-first onboarding” with **interview-first onboarding**.
+
+### What we want
+
+- Ask 3–4 questions up front, not 20 settings.
+- Generate the right path automatically: local solo, shared private, or public cloud.
+- Detect what agent/runtime environment already exists.
+- Make it normal to have Claude/OpenClaw/Codex help complete setup.
+- End onboarding with a **real first task**, not a blank dashboard.
+
+### What we do not want
+
+- Provider jargon before value.
+- “Go find an API key” as the default first instruction.
+- A successful install that still leaves users unsure what to do next.
+
+### Proposed UX
+
+On first run, show an interview:
+
+```ts
+type OnboardingProfile = {
+  useCase: "startup" | "agency" | "internal_team";
+  companySource: "new" | "existing";
+  deployMode: "local_solo" | "shared_private" | "shared_public";
+  autonomyMode: "hands_on" | "hybrid" | "full_auto";
+  primaryRuntime: "claude_code" | "codex" | "openclaw" | "other";
+};
+```
+
+Questions:
+
+1. What are you building?
+2. Is this a new company, an existing company, or a service/agency team?
+3. Are you working solo on one machine, sharing privately with a team, or deploying publicly?
+4. Do you want full auto, hybrid, or tight manual control?
+
+Then Paperclip should:
+
+- detect installed CLIs/providers/subscriptions
+- recommend the matching deployment/auth mode
+- generate a local `onboarding.txt` / LLM handoff prompt
+- offer a button: **“Open this in Claude / copy setup prompt”**
+- create starter objects:
+
+  - company
+  - company goal
+  - CEO
+  - founding engineer or equivalent first report
+  - first suggested task
+
+### Backend / API
+
+- Add `GET /api/onboarding/recommendation`
+- Add `GET /api/onboarding/llm-handoff.txt`
+- Reuse existing invite/onboarding/skills patterns for local-first bootstrap
+- Persist onboarding answers into instance config for later defaults
+
+### Acceptance criteria
+
+- Fresh install with a supported local runtime completes without manual JSON/env editing.
+- User sees first live agent action before leaving onboarding.
+- A blank dashboard is no longer the default post-install state.
+- If a required dependency is missing, the error is prescriptive and fixable from the UI/CLI.
+
+### Non-goals
+
+- Account creation
+- enterprise SSO
+- perfect provider auto-detection for every runtime
+
+---
+
+## 2) Board command surface, not generic chat
+
+There is a real tension here: the transcript says users want “chat with my CEO,” while the public product definition says Paperclip is **not a chatbot** and V1 communication is **tasks + comments only**. At the same time, the repo is already exploring plugin infrastructure and even a chat plugin via plugin SSE streaming. The clean resolution is: **make the core surface conversational, but keep the data model task/thread-centric; reserve full chat as an optional plugin**. ([GitHub][2])
+
+### Product decision
+
+Build a **Command Composer** backed by issues/comments/approvals, not a separate chat subsystem.
+
+### What we want
+
+- “Talk to the CEO” feeling for the user.
+- Every conversation ends up attached to a real company object.
+- Strategy discussion can produce issues, artifacts, and approvals.
+
+### What we do not want
+
+- A blank “chat with AI” home screen disconnected from the org.
+- Yet another agent-chat product.
+
+### Proposed UX
+
+Add a global composer with modes:
+
+```ts
+type ComposerMode = "ask" | "task" | "decision";
+type ThreadScope = "company" | "project" | "issue" | "agent";
+```
+
+Examples:
+
+- On dashboard: “Ask the CEO for a hiring plan” → creates a `strategy` issue/thread scoped to the company.
+- On agent page: “Tell the designer to make this cleaner” → appends an instruction comment to an issue or spawns a new delegated task.
+- On approval page: “Why are you asking to hire?” → appends a board comment to the approval context.
+
+Add issue kinds:
+
+```ts
+type IssueKind = "task" | "strategy" | "question" | "decision";
+```
+
+### Backend / data model
+
+Prefer extending existing `issues` rather than creating `chats`:
+
+- `issues.kind`
+- `issues.scope`
+- optional `issues.target_agent_id`
+- comment metadata: `comment.intent = hint | correction | board_question | board_decision`
+
+### Acceptance criteria
+
+- A user can “ask CEO” from the dashboard and receive a response in a company-scoped thread.
+- From that thread, the user can create/approve tasks with one click.
+- No separate chat database is required for v1 of this feature.
+
+### Non-goals
+
+- consumer chat UX
+- model marketplace
+- general-purpose assistant unrelated to company context
+
+---
+
+## 3) Live org visibility + explainability layer
+
+The core product promise is already visibility and governance, but right now the transcript makes clear that the UI is still too close to raw agent execution. The repo already has org charts, activity, heartbeat runs, costs, and agent detail surfaces; the missing piece is the explanatory layer above them. ([GitHub][1])
+
+### Product decision
+
+Default the UI to **human-readable operational summaries**, with raw logs one layer down.
+
+### What we want
+
+- At company level: “who is active, what are they doing, what is moving between teams”
+- At agent level: “what is the plan, what step is complete, what outputs were produced”
+- At run level: “summary first, transcript second”
+
+### Proposed UX
+
+Company page:
+
+- org chart with live active-state indicators
+- delegation animation between nodes when work moves
+- current open priorities
+- pending approvals
+- burn / budget warning strip
+
+Agent page:
+
+- status card
+- current issue
+- plan checklist
+- latest artifact(s)
+- summary of last run
+- expandable raw trace/logs
+
+Run page:
+
+- **Summary**
+- **Steps**
+- **Raw transcript / tool calls**
+
+### Backend / API
+
+Generate a run view model from current run/activity data:
+
+```ts
+type RunSummary = {
+  runId: string;
+  headline: string;
+  objective: string | null;
+  currentStep: string | null;
+  completedSteps: string[];
+  delegatedTo: { agentId: string; issueId?: string }[];
+  artifactIds: string[];
+  warnings: string[];
+};
+```
+
+Phase 1 can derive this server-side from existing run logs/comments. Persist only if needed later.
+
+### Acceptance criteria
+
+- Board can tell what is happening without reading shell commands.
+- Raw logs are still accessible, but not the default surface.
+- First task / first hire / first completion moments are visibly celebrated.
+
+### Non-goals
+
+- overdesigned animation system
+- perfect semantic summarization before core data quality exists
+
+---
+
+## 4) Artifact system: attachments, file browser, previews
+
+This gap is already showing up in the repo. Storage is present, attachment endpoints exist, but current issues show that attachments are still effectively image-centric and comment attachment rendering is incomplete. At the same time, your transcript wants plans, docs, files, and generated web pages surfaced cleanly. ([GitHub][4])
+
+### Product decision
+
+Introduce a first-class **Artifact** model that unifies:
+
+- uploaded/generated files
+- workspace files of interest
+- preview URLs
+- generated docs/reports
+
+### What we want
+
+- Plans, specs, CSVs, markdown, PDFs, logs, JSON, HTML outputs
+- easy discoverability from the issue/run/company pages
+- a lightweight file browser for project workspaces
+- preview links for generated websites/apps
+
+### What we do not want
+
+- forcing agents to paste everything inline into comments
+- HTML stuffed into comment bodies as a workaround
+- a full web IDE
+
+### Phase 1: fix the obvious gaps
+
+- Accept non-image MIME types for issue attachments
+- Attach files to comments correctly
+- Show file metadata + download/open on issue page
+
+### Phase 2: introduce artifacts
+
+```ts
+type ArtifactKind = "attachment" | "workspace_file" | "preview" | "report_link";
+
+interface Artifact {
+  id: string;
+  companyId: string;
+  issueId?: string;
+  runId?: string;
+  agentId?: string;
+  kind: ArtifactKind;
+  title: string;
+  mimeType?: string;
+  filename?: string;
+  sizeBytes?: number;
+  storageKind: "local_disk" | "s3" | "external_url";
+  contentPath?: string;
+  previewUrl?: string;
+  metadata: Record<string, unknown>;
+}
+```
+
+### UX
+
+Issue page gets a **Deliverables** section:
+
+- Files
+- Reports
+- Preview links
+- Latest generated artifact highlighted at top
+
+Project page gets a **Files** tab:
+
+- folder tree
+- recent changes
+- “Open produced files” shortcut
+
+### Preview handling
+
+For HTML/static outputs:
+
+- local deploy → open local preview URL
+- shared/public deploy → host via configured preview service or static storage
+- preview URL is registered back onto the issue as an artifact
+
+### Acceptance criteria
+
+- Agents can attach `.md`, `.txt`, `.json`, `.csv`, `.pdf`, and `.html`.
+- Users can open/download them from the issue page.
+- A generated static site can be opened from an issue without hunting through the filesystem.
+
+### Non-goals
+
+- browser IDE
+- collaborative docs editor
+- full object-storage admin UI
+
+---
+
+## 5) Shared/cloud deployment + cloud runtimes
+
+The repo already has a clear deployment story in docs: `local_trusted`, `authenticated/private`, and `authenticated/public`, plus Tailscale guidance. The roadmap explicitly calls out cloud agents like Cursor / e2b. That means the next step is not inventing a deployment model; it is making the shared/cloud path canonical and production-usable. ([GitHub][5])
+
+### Product decision
+
+Make **shared/private deploy** and **public/cloud deploy** first-class supported modes, and add **remote runtime drivers** for cloud-executed agents.
+
+### What we want
+
+- one instance a team can actually share
+- local-first path that upgrades to private/public without a mental model change
+- remote agent execution for non-local runtimes
+
+### Proposed architecture
+
+Separate **control plane** from **execution runtime** more explicitly:
+
+```ts
+type RuntimeDriver = "local_process" | "remote_sandbox" | "webhook";
+
+interface ExecutionHandle {
+  externalRunId: string;
+  status: "queued" | "running" | "completed" | "failed" | "cancelled";
+  previewUrl?: string;
+  logsUrl?: string;
+}
+```
+
+First remote driver: `remote_sandbox` for e2b-style execution.
+
+### Deliverables
+
+- canonical deploy recipes:
+
+  - local solo
+  - shared private (Tailscale/private auth)
+  - public cloud (managed Postgres + object storage + public URL)
+
+- runtime health page
+- adapter/runtime capability matrix
+- one official reference deployment path
+
+### UX
+
+New “Deployment” settings page:
+
+- instance mode
+- auth/exposure
+- storage/database status
+- runtime drivers configured
+- health and reachability checks
+
+### Acceptance criteria
+
+- Two humans can log into one authenticated/private instance and use it concurrently.
+- A public deployment can run agents via at least one remote runtime.
+- `doctor` catches missing public/private config and gives concrete fixes.
+
+### Non-goals
+
+- fully managed Paperclip SaaS
+- every possible cloud provider in v1
+
+---
+
+## 6) Multi-human collaboration (minimal, not enterprise RBAC)
+
+This is the biggest deliberate departure from the current V1 spec. Publicly, V1 still says “single human board operator” and puts role-based human granularity out of scope. But the transcript is right that shared use is necessary if Paperclip is going to be real for teams. The key is to do a **minimal collaboration model**, not a giant permission system. ([GitHub][2])
+
+### Product decision
+
+Ship **coarse multi-user company memberships**, not fine-grained enterprise RBAC.
+
+### Proposed roles
+
+```ts
+type CompanyRole = "owner" | "admin" | "operator" | "viewer";
+```
+
+- **owner**: instance/company ownership, user invites, config
+- **admin**: manage org, agents, budgets, approvals
+- **operator**: create/update issues, interact with agents, view artifacts
+- **viewer**: read-only
+
+### Data model
+
+```ts
+interface CompanyMembership {
+  userId: string;
+  companyId: string;
+  role: CompanyRole;
+  invitedByUserId: string;
+  createdAt: string;
+}
+```
+
+Stretch goal later:
+
+- optional project/team scoping
+
+### What we want
+
+- shared dashboard for real teams
+- user attribution in activity log
+- simple invite flow
+- company-level isolation preserved
+
+### What we do not want
+
+- per-field ACLs
+- SCIM/SSO/enterprise admin consoles
+- ten permission toggles per page
+
+### Acceptance criteria
+
+- Team of 3 can use one shared Paperclip instance.
+- Every user action is attributed correctly in activity.
+- Company membership boundaries are enforced.
+- Viewer cannot mutate; operator/admin can.
+
+### Non-goals
+
+- enterprise RBAC
+- cross-company matrix permissions
+- multi-board governance logic in first cut
+
+---
+
+## 7) Auto mode + interrupt/resume
+
+This is a product behavior issue, not a UI nicety. If agents cannot keep working or accept course correction without restarting, the autonomy model feels fake.
+
+### Product decision
+
+Make auto mode and mid-run interruption first-class runtime semantics.
+
+### What we want
+
+- Auto mode that continues until blocked by approvals, budgets, or explicit pause.
+- Mid-run “you missed this” correction without losing session continuity.
+- Clear state when an agent is waiting, blocked, or paused.
+
+### Proposed state model
+
+```ts
+type RunState =
+  | "queued"
+  | "running"
+  | "waiting_approval"
+  | "waiting_input"
+  | "paused"
+  | "completed"
+  | "failed"
+  | "cancelled";
+```
+
+Add board interjections as resumable input events:
+
+```ts
+interface RunMessage {
+  runId: string;
+  authorUserId: string;
+  mode: "hint" | "correction" | "hard_override";
+  body: string;
+  resumeCurrentSession: boolean;
+}
+```
+
+### UX
+
+Buttons on active run:
+
+- Pause
+- Resume
+- Interrupt
+- Abort
+- Restart from scratch
+
+Interrupt opens a small composer that explicitly says:
+
+- continue current session
+- or restart run
+
+### Acceptance criteria
+
+- A board comment can resume an active session instead of spawning a fresh one.
+- Session ID remains stable for “continue” path.
+- UI clearly distinguishes blocked vs. waiting vs. paused.
+
+### Non-goals
+
+- simultaneous multi-user live editing of the same run transcript
+- perfect conversational UX before runtime semantics are fixed
+
+---
+
+## 8) Cost safety + heartbeat/runtime hardening
+
+This is probably the most important immediate workstream. The transcript says token burn is the highest pain, and the repo currently has active issues around budget enforcement evidence, onboarding/auth validation, and circuit-breaker style waste prevention. Public docs already promise hard budgets, and the issue tracker is pointing at the missing operational protections. ([GitHub][6])
+
+### Product decision
+
+Treat this as a **P0 runtime contract**, not a nice-to-have.
+
+### Part A: deterministic wake gating
+
+Do cheap, explicit work detection before invoking an LLM.
+
+```ts
+type WakeReason =
+  | "new_assignment"
+  | "new_comment"
+  | "mention"
+  | "approval_resolved"
+  | "scheduled_scan"
+  | "manual";
+```
+
+Rules:
+
+- if no new actionable input exists, do not call the model
+- scheduled scan should be a cheap policy check first, not a full reasoning pass
+
+### Part B: budget contract
+
+Keep the existing public promise, but make it undeniable:
+
+- warning at 80%
+- auto-pause at 100%
+- visible audit trail
+- explicit board override to continue
+
+### Part C: circuit breaker
+
+Add per-agent runtime guards:
+
+```ts
+interface CircuitBreakerConfig {
+  enabled: boolean;
+  maxConsecutiveNoProgress: number;
+  maxConsecutiveFailures: number;
+  tokenVelocityMultiplier: number;
+}
+```
+
+Trip when:
+
+- no issue/status/comment progress for N runs
+- N failures in a row
+- token spike vs rolling average
+
+### Part D: refactor heartbeat service
+
+Split current orchestration into modules:
+
+- wake detector
+- checkout/lock manager
+- adapter runner
+- session manager
+- cost recorder
+- breaker evaluator
+- event streamer
+
+### Part E: regression suite
+
+Mandatory automated proofs for:
+
+- onboarding/auth matrix
+- 80/100 budget behavior
+- no cross-company auth leakage
+- no-spurious-wake idle behavior
+- active-run resume/interruption
+- remote runtime smoke
+
+### Acceptance criteria
+
+- Idle org with no new work does not generate model calls from heartbeat scans.
+- 80% shows warning only.
+- 100% pauses the agent and blocks continued execution until override.
+- Circuit breaker pause is visible in audit/activity.
+- Runtime modules have explicit contracts and are testable independently.
+
+### Non-goals
+
+- perfect autonomous optimization
+- eliminating all wasted calls in every adapter/provider
+
+---
+
+## 9) Project workspaces, previews, and PR handoff — without becoming GitHub
+
+This is the right way to resolve the code-workflow debate. The repo already has worktree-local instances, project `workspaceStrategy.provisionCommand`, and an RFC for adapter-level git worktree isolation. That is the correct architectural direction: **project execution policies and workspace isolation**, not built-in PR review. ([GitHub][7])
+
+### Product decision
+
+Paperclip should manage the **issue → workspace → preview/PR → review handoff** lifecycle, but leave diffs/review/merge to external tools.
+
+### Proposed config
+
+Prefer repo-local project config:
+
+```yaml
+# .paperclip/project.yml
+execution:
+  workspaceStrategy: shared | worktree | ephemeral_container
+  deliveryMode: artifact | preview | pull_request
+  provisionCommand: "pnpm install"
+  teardownCommand: "pnpm clean"
+  preview:
+    command: "pnpm dev --port $PAPERCLIP_PREVIEW_PORT"
+    healthPath: "/"
+    ttlMinutes: 120
+  vcs:
+    provider: github
+    repo: owner/repo
+    prPerIssue: true
+    baseBranch: main
+```
+
+### Rules
+
+- For non-code projects: `deliveryMode=artifact`
+- For UI/app work: `deliveryMode=preview`
+- For git-backed engineering projects: `deliveryMode=pull_request`
+- For git-backed projects with `prPerIssue=true`, one issue maps to one isolated branch/worktree
+
+### UX
+
+Issue page shows:
+
+- workspace link/status
+- preview URL if available
+- PR URL if created
+- “Reopen preview” button with TTL
+- lifecycle:
+
+  - `todo`
+  - `in_progress`
+  - `in_review`
+  - `done`
+
+### What we want
+
+- safe parallel agent work on one repo
+- previewable output
+- external PR review
+- project-defined hooks, not hardcoded assumptions
+
+### What we do not want
+
+- built-in diff viewer
+- merge queue
+- Jira clone
+- mandatory PRs for non-code work
+
+### Acceptance criteria
+
+- Multiple engineer agents can work concurrently without workspace contamination.
+- When a project is in PR mode, the issue contains branch/worktree/preview/PR metadata.
+- Preview can be reopened on demand until TTL expires.
+
+### Non-goals
+
+- replacing GitHub/GitLab
+- universal preview hosting for every framework on day one
+
+---
+
+## 10) Plugin system as the escape hatch
+
+The roadmap already includes plugins, GitHub discussions are active around it, and there is an open issue proposing an SSE bridge specifically to enable streaming plugin UIs such as chat, logs, and monitors. This is exactly the right place for optional surfaces. ([GitHub][1])
+
+### Product decision
+
+Keep the control-plane core thin; put optional high-variance experiences into plugins.
+
+### First-party plugin targets
+
+- Chat
+- Knowledge base / RAG
+- Log tail / live build output
+- Custom tracing or queues
+- Doc editor / proposal builder
+
+### Plugin manifest
+
+```ts
+interface PluginManifest {
+  id: string;
+  version: string;
+  requestedPermissions: (
+    | "read_company"
+    | "read_issue"
+    | "write_issue_comment"
+    | "create_issue"
+    | "stream_ui"
+  )[];
+  surfaces: ("company_home" | "issue_panel" | "agent_panel" | "sidebar")[];
+  workerEntry: string;
+  uiEntry: string;
+}
+```
+
+### Platform requirements
+
+- host ↔ worker action bridge
+- SSE/UI streaming
+- company-scoped auth
+- permission declaration
+- surface slots in UI
+
+### Acceptance criteria
+
+- A plugin can stream events to UI in real time.
+- A chat plugin can converse without requiring chat to become the core Paperclip product.
+- Plugin permissions are company-scoped and auditable.
+
+### Non-goals
+
+- plugins mutating core schema directly
+- arbitrary privileged code execution without explicit permissions
+
+---
+
+## Priority order I would use
+
+Given the repo state and the transcript, I would sequence it like this:
+
+**P0**
+
+1. Cost safety + heartbeat hardening
+2. Guided onboarding + first-job magic
+3. Shared/cloud deployment foundation
+4. Artifact phase 1: non-image attachments + deliverables surfacing
+
+**P1** 5. Board command surface 6. Visibility/explainability layer 7. Auto mode + interrupt/resume 8. Minimal multi-user collaboration
+
+**P2** 9. Project workspace / preview / PR lifecycle 10. Plugin system + optional chat plugin 11. Template/preset expansion for startup vs agency vs internal-team onboarding
+
+Why this order: the current repo is already getting pressure on onboarding failures, auth/onboarding validation, budget enforcement, and wasted token burn. If those are shaky, everything else feels impressive but unsafe. ([GitHub][3])
+
+## Bottom line
+
+The best synthesis is:
+
+- **Keep** Paperclip as the board-level control plane.
+- **Do not** make chat, code review, or workflow-building the core identity.
+- **Do** make the product feel conversational, visible, output-oriented, and shared.
+- **Do** make coding workflows an integration surface via workspaces/previews/PR links.
+- **Use plugins** for richer edges like chat and knowledge.
+
+That keeps the repo’s current product direction intact while solving almost every pain surfaced in the transcript.
+
+### Key references
+
+- README / positioning / roadmap / product boundaries. ([GitHub][1])
+- Product definition. ([GitHub][8])
+- V1 implementation spec and explicit non-goals. ([GitHub][2])
+- Core concepts and architecture. ([GitHub][9])
+- Deployment modes / Tailscale / local-to-cloud path. ([GitHub][5])
+- Developing guide: worktree-local instances, provision hooks, onboarding endpoints. ([GitHub][7])
+- Current issue pressure: onboarding failure, auth/onboarding validation, budget enforcement, circuit breaker, attachment gaps, plugin chat. ([GitHub][3])
+
+[1]: https://github.com/paperclipai/paperclip "https://github.com/paperclipai/paperclip"
+[2]: https://github.com/paperclipai/paperclip/blob/master/doc/SPEC-implementation.md "https://github.com/paperclipai/paperclip/blob/master/doc/SPEC-implementation.md"
+[3]: https://github.com/paperclipai/paperclip/issues/704 "https://github.com/paperclipai/paperclip/issues/704"
+[4]: https://github.com/paperclipai/paperclip/blob/master/docs/deploy/tailscale-private-access.md "https://github.com/paperclipai/paperclip/blob/master/docs/deploy/tailscale-private-access.md"
+[5]: https://github.com/paperclipai/paperclip/blob/master/docs/deploy/deployment-modes.md "https://github.com/paperclipai/paperclip/blob/master/docs/deploy/deployment-modes.md"
+[6]: https://github.com/paperclipai/paperclip/issues/692 "https://github.com/paperclipai/paperclip/issues/692"
+[7]: https://github.com/paperclipai/paperclip/blob/master/doc/DEVELOPING.md "https://github.com/paperclipai/paperclip/blob/master/doc/DEVELOPING.md"
+[8]: https://github.com/paperclipai/paperclip/blob/master/doc/PRODUCT.md "https://github.com/paperclipai/paperclip/blob/master/doc/PRODUCT.md"
+[9]: https://github.com/paperclipai/paperclip/blob/master/docs/start/core-concepts.md "https://github.com/paperclipai/paperclip/blob/master/docs/start/core-concepts.md"
--- a/doc/plans/2026-03-13-paperclip-skill-tightening-plan.md
+++ b/doc/plans/2026-03-13-paperclip-skill-tightening-plan.md
@@ -0,0 +1,186 @@
+# Paperclip Skill Tightening Plan
+
+## Status
+
+Deferred follow-up. Do not include in the current token-optimization PR beyond documenting the plan.
+
+## Why This Is Deferred
+
+The `paperclip` skill is part of the critical control-plane safety surface. Tightening it may reduce fresh-session token use, but it also carries prompt-regression risk. We do not yet have evals that would let us safely prove behavior preservation across assignment handling, checkout rules, comment etiquette, approval workflows, and escalation paths.
+
+The current PR should ship the lower-risk infrastructure wins first:
+
+- telemetry normalization
+- safe session reuse
+- incremental issue/comment context
+- bootstrap versus heartbeat prompt separation
+- Codex worktree isolation
+
+## Current Problem
+
+Fresh runs still spend substantial input tokens even after the context-path fixes. The remaining large startup cost appears to come from loading the full `paperclip` skill and related instruction surface into context at run start.
+
+The skill currently mixes three kinds of content in one file:
+
+- hot-path heartbeat procedure used on nearly every run
+- critical policy and safety invariants
+- rare workflow/reference material that most runs do not need
+
+That structure is safe but expensive.
+
+## Goals
+
+- reduce first-run instruction tokens without weakening agent safety
+- preserve all current Paperclip control-plane capabilities
+- keep common heartbeat behavior explicit and easy for agents to follow
+- move rare workflows and reference material out of the hot path
+- create a structure that can later be evaluated systematically
+
+## Non-Goals
+
+- changing Paperclip API semantics
+- removing required governance rules
+- deleting rare workflows
+- changing agent defaults in the current PR
+
+## Recommended Direction
+
+### 1. Split Hot Path From Lookup Material
+
+Restructure the skill into:
+
+- an always-loaded core section for the common heartbeat loop
+- on-demand material for infrequent workflows and deep reference
+
+The core should cover only what is needed on nearly every wake:
+
+- auth and required headers
+- inbox-first assignment retrieval
+- mandatory checkout behavior
+- `heartbeat-context` first
+- incremental comment retrieval rules
+- mention/self-assign exception
+- blocked-task dedup
+- status/comment/release expectations before exit
+
+### 2. Normalize The Skill Around One Canonical Procedure
+
+The same rules are currently expressed multiple times across:
+
+- heartbeat steps
+- critical rules
+- endpoint reference
+- workflow examples
+
+Refactor so each operational fact has one primary home:
+
+- procedure
+- invariant list
+- appendix/reference
+
+This reduces prompt weight and lowers the chance of internal instruction drift.
+
+### 3. Compress Prose Into High-Signal Instruction Forms
+
+Rewrite the hot path using compact operational forms:
+
+- short ordered checklist
+- flat invariant list
+- minimal examples only where ambiguity would be risky
+
+Reduce:
+
+- narrative explanation
+- repeated warnings already covered elsewhere
+- large example payloads for common operations
+- long endpoint matrices in the main body
+
+### 4. Move Rare Workflows Behind Explicit Triggers
+
+These workflows should remain available but should not dominate fresh-run context:
+
+- OpenClaw invite flow
+- project setup flow
+- planning `<plan/>` writeback flow
+- instructions-path update flow
+- detailed link-formatting examples
+
+Recommended approach:
+
+- keep a short pointer in the main skill
+- move detailed procedures into sibling skills or referenced docs that agents read only when needed
+
+### 5. Separate Policy From Reference
+
+The skill should distinguish:
+
+- mandatory operating rules
+- endpoint lookup/reference
+- business-process playbooks
+
+That separation makes it easier to evaluate prompt changes later and lets adapters or orchestration choose what must always be loaded.
+
+## Proposed Target Structure
+
+1. Purpose and authentication
+2. Compact heartbeat procedure
+3. Hard invariants
+4. Required comment/update style
+5. Triggered workflow index
+6. Appendix/reference
+
+## Rollout Plan
+
+### Phase 1. Inventory And Measure
+
+- annotate the current skill by section and estimate token weight
+- identify which sections are truly hot-path versus rare
+- capture representative runs to compare before/after prompt size and behavior
+
+### Phase 2. Structural Refactor Without Semantic Changes
+
+- rewrite the main skill into the target structure
+- preserve all existing rules and capabilities
+- move rare workflow details into referenced companion material
+- keep wording changes conservative
+
+### Phase 3. Validate Against Real Scenarios
+
+Run scenario checks for:
+
+- normal assigned heartbeat
+- comment-triggered wake
+- blocked-task dedup behavior
+- approval-resolution wake
+- delegation/subtask creation
+- board handoff back to user
+- plan-request handling
+
+### Phase 4. Decide Default Loading Strategy
+
+After validation, decide whether:
+
+- the entire main skill still loads by default, or
+- only the compact core loads by default and rare sections are fetched on demand
+
+Do not change this loading policy without validation.
+
+## Risks
+
+- prompt degradation on control-plane safety rules
+- agents forgetting rare but important workflows
+- accidental removal of repeated wording that was carrying useful behavior
+- introducing ambiguous instruction precedence between the core skill and companion materials
+
+## Preconditions Before Implementation
+
+- define acceptance scenarios for control-plane correctness
+- add at least lightweight eval or scripted scenario coverage for key Paperclip flows
+- confirm how adapter/bootstrap layering should load skill content versus references
+
+## Success Criteria
+
+- materially lower first-run input tokens for Paperclip-coordinated agents
+- no regression in checkout discipline, issue updates, blocked handling, or delegation
+- no increase in malformed API usage or ownership mistakes
+- agents still complete rare workflows correctly when explicitly asked
--- a/doc/plans/2026-03-13-plugin-kitchen-sink-example.md
+++ b/doc/plans/2026-03-13-plugin-kitchen-sink-example.md
@@ -0,0 +1,699 @@
+# Kitchen Sink Plugin Plan
+
+## Goal
+
+Add a new first-party example plugin, `Kitchen Sink (Example)`, that demonstrates every currently implemented Paperclip plugin API surface in one place.
+
+This plugin is meant to be:
+
+- a living reference implementation for contributors
+- a manual test harness for the plugin runtime
+- a discoverable demo of what plugins can actually do today
+
+It is not meant to be a polished end-user product plugin.
+
+## Why
+
+The current plugin system has a real API surface, but it is spread across:
+
+- SDK docs
+- SDK types
+- plugin spec prose
+- two example plugins that each show only a narrow slice
+
+That makes it hard to answer basic questions like:
+
+- what can plugins render?
+- what can plugin workers actually do?
+- which surfaces are real versus aspirational?
+- how should a new plugin be structured in this repo?
+
+The kitchen-sink plugin should answer those questions by example.
+
+## Success Criteria
+
+The plugin is successful if a contributor can install it and, without reading the SDK first, discover and exercise the current plugin runtime surface area from inside Paperclip.
+
+Concretely:
+
+- it installs from the bundled examples list
+- it exposes at least one demo for every implemented worker API surface
+- it exposes at least one demo for every host-mounted UI surface
+- it clearly labels local-only / trusted-only demos
+- it is safe enough for local development by default
+- it doubles as a regression harness for plugin runtime changes
+
+## Constraints
+
+- Keep it instance-installed, not company-installed.
+- Treat this as a trusted/local example plugin.
+- Do not rely on cloud-safe runtime assumptions.
+- Avoid destructive defaults.
+- Avoid irreversible mutations unless they are clearly labeled and easy to undo.
+
+## Source Of Truth For This Plan
+
+This plan is based on the currently implemented SDK/types/runtime, not only the long-horizon spec.
+
+Primary references:
+
+- `packages/plugins/sdk/README.md`
+- `packages/plugins/sdk/src/types.ts`
+- `packages/plugins/sdk/src/ui/types.ts`
+- `packages/shared/src/constants.ts`
+- `packages/shared/src/types/plugin.ts`
+
+## Current Surface Inventory
+
+### Worker/runtime APIs to demonstrate
+
+These are the concrete `ctx` clients currently exposed by the SDK:
+
+- `ctx.config`
+- `ctx.events`
+- `ctx.jobs`
+- `ctx.launchers`
+- `ctx.http`
+- `ctx.secrets`
+- `ctx.assets`
+- `ctx.activity`
+- `ctx.state`
+- `ctx.entities`
+- `ctx.projects`
+- `ctx.companies`
+- `ctx.issues`
+- `ctx.agents`
+- `ctx.goals`
+- `ctx.data`
+- `ctx.actions`
+- `ctx.streams`
+- `ctx.tools`
+- `ctx.metrics`
+- `ctx.logger`
+
+### UI surfaces to demonstrate
+
+Surfaces defined in the SDK:
+
+- `page`
+- `settingsPage`
+- `dashboardWidget`
+- `sidebar`
+- `sidebarPanel`
+- `detailTab`
+- `taskDetailView`
+- `projectSidebarItem`
+- `toolbarButton`
+- `contextMenuItem`
+- `commentAnnotation`
+- `commentContextMenuItem`
+
+### Current host confidence
+
+Confirmed or strongly indicated as mounted in the current app:
+
+- `page`
+- `settingsPage`
+- `dashboardWidget`
+- `detailTab`
+- `projectSidebarItem`
+- comment surfaces
+- launcher infrastructure
+
+Need explicit validation before claiming full demo coverage:
+
+- `sidebar`
+- `sidebarPanel`
+- `taskDetailView`
+- `toolbarButton` as direct slot, distinct from launcher placement
+- `contextMenuItem` as direct slot, distinct from comment menu and launcher placement
+
+The implementation should keep a small validation checklist for these before we call the plugin "complete".
+
+## Plugin Concept
+
+The plugin should be named:
+
+- display name: `Kitchen Sink (Example)`
+- package: `@paperclipai/plugin-kitchen-sink-example`
+- plugin id: `paperclip.kitchen-sink-example` or `paperclip-kitchen-sink-example`
+
+Recommendation: use `paperclip-kitchen-sink-example` to match current in-repo example naming style.
+
+Category mix:
+
+- `ui`
+- `automation`
+- `workspace`
+- `connector`
+
+That is intentionally broad because the point is coverage.
+
+## UX Shape
+
+The plugin should have one main full-page demo console plus smaller satellites on other surfaces.
+
+### 1. Plugin page
+
+Primary route: the plugin `page` surface should be the central dashboard for all demos.
+
+Recommended page sections:
+
+- `Overview`
+  - what this plugin demonstrates
+  - current capabilities granted
+  - current host context
+- `UI Surfaces`
+  - links explaining where each other surface should appear
+- `Data + Actions`
+  - buttons and forms for bridge-driven worker demos
+- `Events + Streams`
+  - emit event
+  - watch event log
+  - stream demo output
+- `Paperclip Domain APIs`
+  - companies
+  - projects/workspaces
+  - issues
+  - goals
+  - agents
+- `Local Workspace + Process`
+  - file listing
+  - file read/write scratch area
+  - child process demo
+- `Jobs + Webhooks + Tools`
+  - job status
+  - webhook URL and recent deliveries
+  - declared tools
+- `State + Entities + Assets`
+  - scoped state editor
+  - plugin entity inspector
+  - upload/generated asset demo
+- `Observability`
+  - metrics written
+  - activity log samples
+  - latest worker logs
+
+### 2. Dashboard widget
+
+A compact widget on the main dashboard should show:
+
+- plugin health
+- count of demos exercised
+- recent event/stream activity
+- shortcut to the full plugin page
+
+### 3. Project sidebar item
+
+Add a `Kitchen Sink` link under each project that deep-links into a project-scoped plugin tab.
+
+### 4. Detail tabs
+
+Use detail tabs to demonstrate entity-context rendering on:
+
+- `project`
+- `issue`
+- `agent`
+- `goal`
+
+Each tab should show:
+
+- the host context it received
+- the relevant entity fetch via worker bridge
+- one small action scoped to that entity
+
+### 5. Comment surfaces
+
+Use issue comment demos to prove comment-specific extension points:
+
+- `commentAnnotation`
+  - render parsed metadata below each comment
+  - show comment id, issue id, and a small derived status
+- `commentContextMenuItem`
+  - add a menu action like `Copy Context To Kitchen Sink`
+  - action writes a plugin entity or state record for later inspection
+
+### 6. Settings page
+
+Custom `settingsPage` should be intentionally simple and operational:
+
+- `About`
+- `Danger / Trust Model`
+- demo toggles
+- local process defaults
+- workspace scratch-path behavior
+- secret reference inputs
+- event/job/webhook sample config
+
+This plugin should also keep the generic plugin settings `Status` tab useful by writing health, logs, and metrics.
+
+## Feature Matrix
+
+Each implemented worker API should have a visible demo.
+
+### `ctx.config`
+
+Demo:
+
+- read live config
+- show config JSON
+- react to config changes without restart where possible
+
+### `ctx.events`
+
+Demos:
+
+- emit a plugin event
+- subscribe to plugin events
+- subscribe to a core Paperclip event such as `issue.created`
+- show recent received events in a timeline
+
+### `ctx.jobs`
+
+Demos:
+
+- one scheduled heartbeat-style demo job
+- one manual run button from the UI if host supports manual job trigger
+- show last run result and timestamps
+
+### `ctx.launchers`
+
+Demos:
+
+- declare launchers in manifest
+- optionally register one runtime launcher from the worker
+- show launcher metadata on the plugin page
+
+### `ctx.http`
+
+Demo:
+
+- make a simple outbound GET request to a safe endpoint
+- show status code, latency, and JSON result
+
+Recommendation: default to a Paperclip-local endpoint or a stable public echo endpoint to avoid flaky docs.
+
+### `ctx.secrets`
+
+Demo:
+
+- operator enters a secret reference in config
+- plugin resolves it on demand
+- UI only shows masked result length / success status, never raw secret
+
+### `ctx.assets`
+
+Demos:
+
+- generate a text asset from the UI
+- optionally upload a tiny JSON blob or screenshot-like text file
+- show returned asset URL
+
+### `ctx.activity`
+
+Demo:
+
+- button to write a plugin activity log entry against current company/entity
+
+### `ctx.state`
+
+Demos:
+
+- instance-scoped state
+- company-scoped state
+- project-scoped state
+- issue-scoped state
+- delete/reset controls
+
+Use a small state inspector/editor on the plugin page.
+
+### `ctx.entities`
+
+Demos:
+
+- create plugin-owned sample records
+- list/filter them
+- show one realistic use case such as "copied comments" or "demo sync records"
+
+### `ctx.projects`
+
+Demos:
+
+- list projects
+- list project workspaces
+- resolve primary workspace
+- resolve workspace for issue
+
+### `ctx.companies`
+
+Demo:
+
+- list companies and show current selected company
+
+### `ctx.issues`
+
+Demos:
+
+- list issues in current company
+- create issue
+- update issue status/title
+- list comments
+- create comment
+
+### `ctx.agents`
+
+Demos:
+
+- list agents
+- invoke one agent with a test prompt
+- pause/resume where safe
+
+Agent mutation controls should be behind an explicit warning.
+
+### `ctx.agents.sessions`
+
+Demos:
+
+- create agent chat session
+- send message
+- stream events back to the UI
+- close session
+
+This is a strong candidate for the best "wow" demo on the plugin page.
+
+### `ctx.goals`
+
+Demos:
+
+- list goals
+- create goal
+- update status/title
+
+### `ctx.data`
+
+Use throughout the plugin for all read-side bridge demos.
+
+### `ctx.actions`
+
+Use throughout the plugin for all mutation-side bridge demos.
+
+### `ctx.streams`
+
+Demos:
+
+- live event log stream
+- token-style stream from an agent session relay
+- fake progress stream for a long-running action
+
+### `ctx.tools`
+
+Demos:
+
+- declare 2-3 simple agent tools
+- tool 1: echo/diagnostics
+- tool 2: project/workspace summary
+- tool 3: create issue or write plugin state
+
+The plugin page should list declared tools and show example input payloads.
+
+### `ctx.metrics`
+
+Demo:
+
+- write a sample metric on each major demo action
+- surface a small recent metrics table in the plugin page
+
+### `ctx.logger`
+
+Demo:
+
+- every action logs structured entries
+- plugin settings `Status` page then doubles as the log viewer
+
+## Local Workspace And Process Demos
+
+The plugin SDK intentionally leaves file/process operations to the plugin itself once it has workspace metadata.
+
+The kitchen-sink plugin should demonstrate that explicitly.
+
+### Workspace demos
+
+- list files from a selected workspace
+- read a file
+- write to a plugin-owned scratch file
+- optionally search files with `rg` if available
+
+### Process demos
+
+- run a short-lived command like `pwd`, `ls`, or `git status`
+- stream stdout/stderr back to UI
+- show exit code and timing
+
+Important safeguards:
+
+- default commands must be read-only
+- no shell interpolation from arbitrary free-form input in v1
+- provide a curated command list or a strongly validated command form
+- clearly label this area as local-only and trusted-only
+
+## Proposed Manifest Coverage
+
+The plugin should aim to declare:
+
+- `page`
+- `settingsPage`
+- `dashboardWidget`
+- `detailTab` for `project`, `issue`, `agent`, `goal`
+- `projectSidebarItem`
+- `commentAnnotation`
+- `commentContextMenuItem`
+
+Then, after host validation, add if supported:
+
+- `sidebar`
+- `sidebarPanel`
+- `taskDetailView`
+- `toolbarButton`
+- `contextMenuItem`
+
+It should also declare one or more `ui.launchers` entries to exercise launcher behavior independently of slot rendering.
+
+## Proposed Package Layout
+
+New package:
+
+- `packages/plugins/examples/plugin-kitchen-sink-example/`
+
+Expected files:
+
+- `package.json`
+- `README.md`
+- `tsconfig.json`
+- `src/index.ts`
+- `src/manifest.ts`
+- `src/worker.ts`
+- `src/ui/index.tsx`
+- `src/ui/components/...`
+- `src/ui/hooks/...`
+- `src/lib/...`
+- optional `scripts/build-ui.mjs` if UI bundling needs esbuild
+
+## Proposed Internal Architecture
+
+### Worker modules
+
+Recommended split:
+
+- `src/worker.ts`
+  - plugin definition and wiring
+- `src/worker/data.ts`
+  - `ctx.data.register(...)`
+- `src/worker/actions.ts`
+  - `ctx.actions.register(...)`
+- `src/worker/events.ts`
+  - event subscriptions and event log buffer
+- `src/worker/jobs.ts`
+  - scheduled job handlers
+- `src/worker/tools.ts`
+  - tool declarations and handlers
+- `src/worker/local-runtime.ts`
+  - file/process demos
+- `src/worker/demo-store.ts`
+  - helpers for state/entities/assets/metrics
+
+### UI modules
+
+Recommended split:
+
+- `src/ui/index.tsx`
+  - exported slot components
+- `src/ui/page/KitchenSinkPage.tsx`
+- `src/ui/settings/KitchenSinkSettingsPage.tsx`
+- `src/ui/widgets/KitchenSinkDashboardWidget.tsx`
+- `src/ui/tabs/ProjectKitchenSinkTab.tsx`
+- `src/ui/tabs/IssueKitchenSinkTab.tsx`
+- `src/ui/tabs/AgentKitchenSinkTab.tsx`
+- `src/ui/tabs/GoalKitchenSinkTab.tsx`
+- `src/ui/comments/KitchenSinkCommentAnnotation.tsx`
+- `src/ui/comments/KitchenSinkCommentMenuItem.tsx`
+- `src/ui/shared/...`
+
+## Configuration Schema
+
+The plugin should have a substantial but understandable `instanceConfigSchema`.
+
+Recommended config fields:
+
+- `enableDangerousDemos`
+- `enableWorkspaceDemos`
+- `enableProcessDemos`
+- `showSidebarEntry`
+- `showSidebarPanel`
+- `showProjectSidebarItem`
+- `showCommentAnnotation`
+- `showCommentContextMenuItem`
+- `showToolbarLauncher`
+- `defaultDemoCompanyId` optional
+- `secretRefExample`
+- `httpDemoUrl`
+- `processAllowedCommands`
+- `workspaceScratchSubdir`
+
+Defaults should keep risky behavior off.
+
+## Safety Defaults
+
+Default posture:
+
+- UI and read-only demos on
+- mutating domain demos on but explicitly labeled
+- process demos off by default
+- no arbitrary shell input by default
+- no raw secret rendering ever
+
+## Phased Build Plan
+
+### Phase 1: Core plugin skeleton
+
+- scaffold package
+- add manifest, worker, UI entrypoints
+- add README
+- make it appear in bundled examples list
+
+### Phase 2: Core, confirmed UI surfaces
+
+- plugin page
+- settings page
+- dashboard widget
+- project sidebar item
+- detail tabs
+
+### Phase 3: Core worker APIs
+
+- config
+- state
+- entities
+- companies/projects/issues/goals
+- data/actions
+- metrics/logger/activity
+
+### Phase 4: Real-time and automation APIs
+
+- streams
+- events
+- jobs
+- webhooks
+- agent sessions
+- tools
+
+### Phase 5: Local trusted runtime demos
+
+- workspace file demos
+- child process demos
+- guarded by config
+
+### Phase 6: Secondary UI surfaces
+
+- comment annotation
+- comment context menu item
+- launchers
+
+### Phase 7: Validation-only surfaces
+
+Validate whether the current host truly mounts:
+
+- `sidebar`
+- `sidebarPanel`
+- `taskDetailView`
+- direct-slot `toolbarButton`
+- direct-slot `contextMenuItem`
+
+If mounted, add demos.
+If not mounted, document them as SDK-defined but host-pending.
+
+## Documentation Deliverables
+
+The plugin should ship with a README that includes:
+
+- what it demonstrates
+- which surfaces are local-only
+- how to install it
+- where each UI surface should appear
+- a mapping from demo card to SDK API
+
+It should also be referenced from plugin docs as the "reference everything plugin".
+
+## Testing And Verification
+
+Minimum verification:
+
+- package typecheck/build
+- install from bundled example list
+- page loads
+- widget appears
+- project tab appears
+- comment surfaces render
+- settings page loads
+- key actions succeed
+
+Recommended manual checklist:
+
+- create issue from plugin
+- create goal from plugin
+- emit and receive plugin event
+- stream action output
+- open agent session and receive streamed reply
+- upload an asset
+- write plugin activity log
+- run a safe local process demo
+
+## Open Questions
+
+1. Should the process demo remain curated-command-only in the first pass?
+   Recommendation: yes.
+
+2. Should the plugin create throwaway "kitchen sink demo" issues/goals automatically?
+   Recommendation: no. Make creation explicit.
+
+3. Should we expose unsupported-but-typed surfaces in the UI even if host mounting is not wired?
+   Recommendation: yes, but label them as `SDK-defined / host validation pending`.
+
+4. Should agent mutation demos include pause/resume by default?
+   Recommendation: probably yes, but behind a warning block.
+
+5. Should this plugin be treated as a supported regression harness in CI later?
+   Recommendation: yes. Long term, this should be the plugin-runtime smoke test package.
+
+## Recommended Next Step
+
+If this plan looks right, the next implementation pass should start by building only:
+
+- package skeleton
+- page
+- settings page
+- dashboard widget
+- one project detail tab
+- one issue detail tab
+- the basic worker/action/data/state/event scaffolding
+
+That is enough to lock the architecture before filling in every demo surface.
--- a/doc/plans/2026-03-13-workspace-product-model-and-work-product.md
+++ b/doc/plans/2026-03-13-workspace-product-model-and-work-product.md
--- a/doc/plans/2026-03-14-billing-ledger-and-reporting.md
+++ b/doc/plans/2026-03-14-billing-ledger-and-reporting.md
@@ -0,0 +1,468 @@
+# Billing Ledger and Reporting
+
+## Context
+
+Paperclip currently stores model spend in `cost_events` and operational run state in `heartbeat_runs`.
+That split is fine, but the current reporting code tries to infer billing semantics by mixing both tables:
+
+- `cost_events` knows provider, model, tokens, and dollars
+- `heartbeat_runs.usage_json` knows some per-run billing metadata
+- `heartbeat_runs.usage_json` does **not** currently carry enough normalized billing dimensions to support honest provider-level reporting
+
+This becomes incorrect as soon as a company uses more than one provider, more than one billing channel, or more than one billing mode.
+
+Examples:
+
+- direct OpenAI API usage
+- Claude subscription usage with zero marginal dollars
+- subscription overage with dollars and tokens
+- OpenRouter billing where the biller is OpenRouter but the upstream provider is Anthropic or OpenAI
+
+The system needs to support:
+
+- dollar reporting
+- token reporting
+- subscription-included usage
+- subscription overage
+- direct metered API usage
+- future aggregator billing such as OpenRouter
+
+## Product Decision
+
+`cost_events` becomes the canonical billing and usage ledger for reporting.
+
+`heartbeat_runs` remains an operational execution log. It may keep mirrored billing metadata for debugging and transcripts, but reporting must not reconstruct billing semantics from `heartbeat_runs.usage_json`.
+
+## Decision: One Ledger Or Two
+
+We do **not** need two tables to solve the current PR's problem.
+For request-level inference reporting, `cost_events` is enough if it carries the right dimensions:
+
+- upstream provider
+- biller
+- billing type
+- model
+- token fields
+- billed amount
+
+That is why the first implementation pass extends `cost_events` instead of introducing a second table immediately.
+
+However, if Paperclip needs to account for the full billing surface of aggregators and managed AI platforms, then `cost_events` alone is not enough.
+Some charges are not cleanly representable as a single model inference event:
+
+- account top-ups and credit purchases
+- platform fees charged at purchase time
+- BYOK platform fees that are account-level or threshold-based
+- prepaid credit expirations, refunds, and adjustments
+- provisioned throughput commitments
+- fine-tuning, training, model import, and storage charges
+- gateway logging or other platform overhead that is not attributable to one prompt/response pair
+
+So the decision is:
+
+- near term: keep `cost_events` as the inference and usage ledger
+- next phase: add `finance_events` for non-inference financial events
+
+This is a deliberate split between:
+
+- usage and inference accounting
+- account-level and platform-level financial accounting
+
+That separation keeps request reporting honest without forcing us to fake invoice semantics onto rows that were never request-scoped.
+
+## External Motivation And Sources
+
+The need for this model is not theoretical.
+It follows directly from the billing systems of providers and aggregators Paperclip needs to support.
+
+### OpenRouter
+
+Source URLs:
+
+- https://openrouter.ai/docs/faq#credit-and-billing-systems
+- https://openrouter.ai/pricing
+
+Relevant billing behavior as of March 14, 2026:
+
+- OpenRouter passes through underlying inference pricing and deducts request cost from purchased credits.
+- OpenRouter charges a 5.5% fee with a $0.80 minimum when purchasing credits.
+- Crypto payments are charged a 5% fee.
+- BYOK has its own fee model after a free request threshold.
+- OpenRouter billing is aggregated at the OpenRouter account level even when the upstream provider is Anthropic, OpenAI, Google, or another provider.
+
+Implication for Paperclip:
+
+- request usage belongs in `cost_events`
+- credit purchases, purchase fees, BYOK fees, refunds, and expirations belong in `finance_events`
+- `biller=openrouter` must remain distinct from `provider=anthropic|openai|google|...`
+
+### Cloudflare AI Gateway Unified Billing
+
+Source URL:
+
+- https://developers.cloudflare.com/ai-gateway/features/unified-billing/
+
+Relevant billing behavior as of March 14, 2026:
+
+- Unified Billing lets users call multiple upstream providers while receiving a single Cloudflare bill.
+- Usage is paid from Cloudflare-loaded credits.
+- Cloudflare supports manual top-ups and auto top-up thresholds.
+- Spend limits can stop request processing on daily, weekly, or monthly boundaries.
+- Unified Billing traffic can use Cloudflare-managed credentials rather than the user's direct provider key.
+
+Implication for Paperclip:
+
+- request usage needs `biller=cloudflare`
+- upstream provider still needs to be preserved separately
+- Cloudflare credit loads and related account-level events are not inference rows and should not be forced into `cost_events`
+- quota and limits reporting must support biller-level controls, not just upstream provider limits
+
+### Amazon Bedrock
+
+Source URL:
+
+- https://aws.amazon.com/bedrock/pricing/
+
+Relevant billing behavior as of March 14, 2026:
+
+- Bedrock supports on-demand and batch pricing.
+- Bedrock pricing varies by region.
+- some pricing tiers add premiums or discounts relative to standard pricing
+- provisioned throughput is commitment-based rather than request-based
+- custom model import uses Custom Model Units billed per minute, with monthly storage charges
+- imported model copies are billed in 5-minute windows once active
+- customization and fine-tuning introduce training and hosted-model charges beyond normal inference
+
+Implication for Paperclip:
+
+- normal tokenized inference fits in `cost_events`
+- provisioned throughput, custom model unit charges, training, and storage charges require `finance_events`
+- region and pricing tier need to be first-class dimensions in the financial model
+
+## Ledger Boundary
+
+To keep the system coherent, the table boundary should be explicit.
+
+### `cost_events`
+
+Use `cost_events` for request-scoped usage and inference charges:
+
+- one row per billable or usage-bearing run event
+- provider/model/biller/billingType/tokens/cost
+- optionally tied to `heartbeat_run_id`
+- supports direct APIs, subscriptions, overage, OpenRouter-routed inference, Cloudflare-routed inference, and Bedrock on-demand inference
+
+### `finance_events`
+
+Use `finance_events` for account-scoped or platform-scoped financial events:
+
+- credit purchase
+- top-up
+- refund
+- fee
+- expiry
+- provisioned capacity
+- training
+- model import
+- storage
+- invoice adjustment
+
+These rows may or may not have a related model, provider, or run id.
+Trying to force them into `cost_events` would either create fake request rows or create null-heavy rows that mean something fundamentally different from inference usage.
+
+## Canonical Billing Dimensions
+
+Every persisted billing event should model four separate axes:
+
+1. Usage provider
+   The upstream provider whose model performed the work.
+   Examples: `openai`, `anthropic`, `google`.
+
+2. Biller
+   The system that charged for the usage.
+   Examples: `openai`, `anthropic`, `openrouter`, `cursor`, `chatgpt`.
+
+3. Billing type
+   The pricing mode applied to the event.
+   Initial canonical values:
+   - `metered_api`
+   - `subscription_included`
+   - `subscription_overage`
+   - `credits`
+   - `fixed`
+   - `unknown`
+
+4. Measures
+   Usage and billing must both be storable:
+   - `input_tokens`
+   - `output_tokens`
+   - `cached_input_tokens`
+   - `cost_cents`
+
+These dimensions are independent.
+For example, an event may be:
+
+- provider: `anthropic`
+- biller: `openrouter`
+- billing type: `metered_api`
+- tokens: non-zero
+- cost cents: non-zero
+
+Or:
+
+- provider: `anthropic`
+- biller: `anthropic`
+- billing type: `subscription_included`
+- tokens: non-zero
+- cost cents: `0`
+
+## Schema Changes
+
+Extend `cost_events` with:
+
+- `heartbeat_run_id uuid null references heartbeat_runs.id`
+- `biller text not null default 'unknown'`
+- `billing_type text not null default 'unknown'`
+- `cached_input_tokens int not null default 0`
+
+Keep `provider` as the upstream usage provider.
+Do not overload `provider` to mean biller.
+
+Add a future `finance_events` table for account-level financial events with fields along these lines:
+
+- `company_id`
+- `occurred_at`
+- `event_kind`
+- `direction`
+- `biller`
+- `provider nullable`
+- `execution_adapter_type nullable`
+- `pricing_tier nullable`
+- `region nullable`
+- `model nullable`
+- `quantity nullable`
+- `unit nullable`
+- `amount_cents`
+- `currency`
+- `estimated`
+- `related_cost_event_id nullable`
+- `related_heartbeat_run_id nullable`
+- `external_invoice_id nullable`
+- `metadata_json nullable`
+
+Add indexes:
+
+- `(company_id, biller, occurred_at)`
+- `(company_id, provider, occurred_at)`
+- `(company_id, heartbeat_run_id)` if distinct-run reporting remains common
+
+## Shared Contract Changes
+
+### Shared types
+
+Add a shared billing type union and enrich cost types with:
+
+- `heartbeatRunId`
+- `biller`
+- `billingType`
+- `cachedInputTokens`
+
+Update reporting response types so the provider breakdown reflects the ledger directly rather than inferred run metadata.
+
+### Validators
+
+Extend `createCostEventSchema` to accept:
+
+- `heartbeatRunId`
+- `biller`
+- `billingType`
+- `cachedInputTokens`
+
+Defaults:
+
+- `biller` defaults to `provider`
+- `billingType` defaults to `unknown`
+- `cachedInputTokens` defaults to `0`
+
+## Adapter Contract Changes
+
+Extend adapter execution results so they can report:
+
+- `biller`
+- richer billing type values
+
+Backwards compatibility:
+
+- existing adapter values `api` and `subscription` are treated as legacy aliases
+- map `api -> metered_api`
+- map `subscription -> subscription_included`
+
+Future adapters may emit the canonical values directly.
+
+OpenRouter support will use:
+
+- `provider` = upstream provider when known
+- `biller` = `openrouter`
+- `billingType` = `metered_api` unless OpenRouter later exposes another billing mode
+
+Cloudflare Unified Billing support will use:
+
+- `provider` = upstream provider when known
+- `biller` = `cloudflare`
+- `billingType` = `credits` or `metered_api` depending on the normalized request billing contract
+
+Bedrock support will use:
+
+- `provider` = upstream provider or `aws_bedrock` depending on adapter shape
+- `biller` = `aws_bedrock`
+- `billingType` = request-scoped mode for inference rows
+- `finance_events` for provisioned, training, import, and storage charges
+
+## Write Path Changes
+
+### Heartbeat-created events
+
+When a heartbeat run produces usage or spend:
+
+1. normalize adapter billing metadata
+2. write a ledger row to `cost_events`
+3. attach `heartbeat_run_id`
+4. set `provider`, `biller`, `billing_type`, token fields, and `cost_cents`
+
+The write path should no longer depend on later inference from `heartbeat_runs`.
+
+### Manual API-created events
+
+Manual cost event creation remains supported.
+These events may have `heartbeatRunId = null`.
+
+Rules:
+
+- `provider` remains required
+- `biller` defaults to `provider`
+- `billingType` defaults to `unknown`
+
+## Reporting Changes
+
+### Server
+
+Refactor reporting queries to use `cost_events` only.
+
+#### `summary`
+
+- sum `cost_cents`
+
+#### `by-agent`
+
+- sum costs and token fields from `cost_events`
+- use `count(distinct heartbeat_run_id)` filtered by billing type for run counts
+- use token sums filtered by billing type for subscription usage
+
+#### `by-provider`
+
+- group by `provider`, `model`
+- sum costs and token fields directly from the ledger
+- derive billing-type slices from `cost_events.billing_type`
+- never pro-rate from unrelated `heartbeat_runs`
+
+#### future `by-biller`
+
+- group by `biller`
+- this is the right view for invoice and subscription accountability
+
+#### `window-spend`
+
+- continue to use `cost_events`
+
+#### project attribution
+
+Keep current project attribution logic for now, but prefer `cost_events.heartbeat_run_id` as the join anchor whenever possible.
+
+## UI Changes
+
+### Principles
+
+- Spend, usage, and quota are related but distinct
+- a missing quota fetch is not the same as “no quota”
+- provider and biller are different dimensions
+
+### Immediate UI changes
+
+1. Keep the current costs page structure.
+2. Make the provider cards accurate by reading only ledger-backed values.
+3. Show provider quota fetch errors explicitly instead of dropping them.
+
+### Follow-up UI direction
+
+The long-term board UI should expose:
+
+- Spend
+  Dollars by biller, provider, model, agent, project
+- Usage
+  Tokens by provider, model, agent, project
+- Quotas
+  Live provider or biller limits, credits, and reset windows
+- Financial events
+  Credit purchases, top-ups, fees, refunds, commitments, storage, and other non-inference charges
+
+## Migration Plan
+
+Migration behavior:
+
+- add new non-destructive columns with defaults
+- backfill existing rows:
+  - `biller = provider`
+  - `billing_type = 'unknown'`
+  - `cached_input_tokens = 0`
+  - `heartbeat_run_id = null`
+
+Do **not** attempt to backfill historical provider-level subscription attribution from `heartbeat_runs`.
+That data was never stored with the required dimensions.
+
+## Testing Plan
+
+Add or update tests for:
+
+1. heartbeat-created ledger rows persist `heartbeatRunId`, `biller`, `billingType`, and cached tokens
+2. legacy adapter billing values map correctly
+3. provider reporting uses ledger data only
+4. mixed-provider companies do not cross-attribute subscription usage
+5. zero-dollar subscription usage still appears in token reporting
+6. quota fetch failures render explicit UI state
+7. manual cost events still validate and write correctly
+8. biller reporting keeps upstream provider breakdowns separate
+9. OpenRouter-style rows can show `biller=openrouter` with non-OpenRouter upstream providers
+10. Cloudflare-style rows can show `biller=cloudflare` with preserved upstream provider identity
+11. future `finance_events` aggregation handles non-request charges without requiring a model or run id
+
+## Delivery Plan
+
+### Step 1
+
+- land the ledger contract and query rewrite
+- make the current costs page correct
+
+### Step 2
+
+- add biller-oriented reporting endpoints and UI
+
+### Step 3
+
+- wire OpenRouter and any future aggregator adapters to the same contract
+
+### Step 4
+
+- add `executionAdapterType` to persisted cost reporting if adapter-level grouping becomes a product requirement
+
+### Step 5
+
+- introduce `finance_events`
+- add non-inference accounting endpoints
+- add UI for platform/account charges alongside inference spend and usage
+
+## Non-Goals For This Change
+
+- multi-currency support
+- invoice reconciliation
+- provider-specific cost estimation beyond persisted billed cost
+- replacing `heartbeat_runs` as the operational run record
--- a/doc/plans/2026-03-14-budget-policies-and-enforcement.md
+++ b/doc/plans/2026-03-14-budget-policies-and-enforcement.md
@@ -0,0 +1,611 @@
+# Budget Policies and Enforcement
+
+## Context
+
+Paperclip already treats budgets as a core control-plane responsibility:
+
+- `doc/SPEC.md` gives the Board authority to set budgets, pause agents, pause work, and override any budget.
+- `doc/SPEC-implementation.md` says V1 must support monthly UTC budget windows, soft alerts, and hard auto-pause.
+- the current code only partially implements that intent.
+
+Today the system has narrow money-budget behavior:
+
+- companies track `budgetMonthlyCents` and `spentMonthlyCents`
+- agents track `budgetMonthlyCents` and `spentMonthlyCents`
+- `cost_events` ingestion increments those counters
+- when an agent exceeds its monthly budget, the agent is paused
+
+That leaves major product gaps:
+
+- no project budget model
+- no approval generated when budget is hit
+- no generic budget policy system
+- no project pause semantics tied to budget
+- no durable incident tracking to prevent duplicate alerts
+- no separation between enforceable spend budgets and advisory usage quotas
+
+This plan defines the concrete budgeting model Paperclip should implement next.
+
+## Product Goals
+
+Paperclip should let operators:
+
+1. Set budgets on agents and projects.
+2. Understand whether a budget is based on money or usage.
+3. Be warned before a budget is exhausted.
+4. Automatically pause work when a hard budget is hit.
+5. Approve, raise, or resume from a budget stop using obvious UI.
+6. See budget state on the dashboard, `/costs`, and scope detail pages.
+
+The system should make one thing very clear:
+
+- budgets are policy controls
+- quotas are usage visibility
+
+They are related, but they are not the same concept.
+
+## Product Decisions
+
+### V1 Budget Defaults
+
+For the next implementation pass, Paperclip should enforce these defaults:
+
+- agent budgets are recurring monthly budgets
+- project budgets are lifetime total budgets
+- hard-stop enforcement uses billed dollars, not tokens
+- monthly windows use UTC calendar months
+- project total budgets do not reset automatically
+
+This gives a clean mental model:
+
+- agents are ongoing workers, so monthly recurring budget is natural
+- projects are bounded workstreams, so lifetime cap is natural
+
+### Metric To Enforce First
+
+The first enforceable metric should be `billed_cents`.
+
+Reasoning:
+
+- it works across providers, billers, and models
+- it maps directly to real financial risk
+- it handles overage and metered usage consistently
+- it avoids cross-provider token normalization problems
+- it applies cleanly even when future finance events are not token-based
+
+Token budgets should not be the first hard-stop policy.
+They should come later as advisory usage controls once the money-based system is solid.
+
+### Subscription Usage Decision
+
+Paperclip should separate subscription-included usage from billed spend:
+
+- `subscription_included`
+  - visible in reporting
+  - visible in usage summaries
+  - does not count against money budget
+- `subscription_overage`
+  - visible in reporting
+  - counts against money budget
+- `metered_api`
+  - visible in reporting
+  - counts against money budget
+
+This keeps the budget system honest:
+
+- users should not see "spend" rise for usage that did not incur marginal billed cost
+- users should still see the token usage and provider quota state
+
+### Soft Alert Versus Hard Stop
+
+Paperclip should have two threshold classes:
+
+- soft alert
+  - creates visible notification state
+  - does not create an approval
+  - does not pause work
+- hard stop
+  - pauses the affected scope automatically
+  - creates an approval requiring human resolution
+  - prevents additional heartbeats or task pickup in that scope
+
+Default thresholds:
+
+- soft alert at `80%`
+- hard stop at `100%`
+
+These should be configurable per policy later, but they are good defaults now.
+
+## Scope Model
+
+### Supported Scope Types
+
+Budget policies should support:
+
+- `company`
+- `agent`
+- `project`
+
+This plan focuses on finishing `agent` and `project` first while preserving the existing company budget behavior.
+
+### Recommended V1.5 Policy Presets
+
+- Company
+  - metric: `billed_cents`
+  - window: `calendar_month_utc`
+- Agent
+  - metric: `billed_cents`
+  - window: `calendar_month_utc`
+- Project
+  - metric: `billed_cents`
+  - window: `lifetime`
+
+Future extensions can add:
+
+- token advisory policies
+- daily or weekly spend windows
+- provider- or biller-scoped budgets
+- inherited delegated budgets down the org tree
+
+## Current Implementation Baseline
+
+The current codebase is not starting from zero, but the existing shape is too ad hoc to extend safely.
+
+### What Exists Today
+
+- company and agent monthly cents counters
+- cost ingestion that updates those counters
+- agent hard-stop pause on monthly budget overrun
+
+### What Is Missing
+
+- project budgets
+- generic budget policy persistence
+- generic threshold crossing detection
+- incident deduplication per scope/window
+- approval creation on hard-stop
+- project execution blocking
+- budget timeline and incident UI
+- distinction between advisory quota and enforceable budget
+
+## Proposed Data Model
+
+### 1. `budget_policies`
+
+Create a new table for canonical budget definitions.
+
+Suggested fields:
+
+- `id`
+- `company_id`
+- `scope_type`
+- `scope_id`
+- `metric`
+- `window_kind`
+- `amount`
+- `warn_percent`
+- `hard_stop_enabled`
+- `notify_enabled`
+- `is_active`
+- `created_by_user_id`
+- `updated_by_user_id`
+- `created_at`
+- `updated_at`
+
+Notes:
+
+- `scope_type` is one of `company | agent | project`
+- `scope_id` is nullable only for company-level policy if company is implied; otherwise keep it explicit
+- `metric` should start with `billed_cents`
+- `window_kind` starts with `calendar_month_utc | lifetime`
+- `amount` is stored in the natural unit of the metric
+
+### 2. `budget_incidents`
+
+Create a durable record of threshold crossings.
+
+Suggested fields:
+
+- `id`
+- `company_id`
+- `policy_id`
+- `scope_type`
+- `scope_id`
+- `metric`
+- `window_kind`
+- `window_start`
+- `window_end`
+- `threshold_type`
+- `amount_limit`
+- `amount_observed`
+- `status`
+- `approval_id` nullable
+- `activity_id` nullable
+- `resolved_at` nullable
+- `created_at`
+- `updated_at`
+
+Notes:
+
+- `threshold_type`: `soft | hard`
+- `status`: `open | acknowledged | resolved | dismissed`
+- one open incident per policy per threshold per window prevents duplicate approvals and alert spam
+
+### 3. Project Pause State
+
+Projects need explicit pause semantics.
+
+Recommended approach:
+
+- extend project status or add a pause field so a project can be blocked by budget
+- preserve whether the project is paused due to budget versus manually paused
+
+Preferred shape:
+
+- keep project workflow status as-is
+- add execution-state fields:
+  - `execution_status`: `active | paused | archived`
+  - `pause_reason`: `manual | budget | system | null`
+
+If that is too large for the immediate pass, a smaller version is:
+
+- add `paused_at`
+- add `pause_reason`
+
+The key requirement is behavioral, not cosmetic:
+Paperclip must know that a project is budget-paused and enforce it.
+
+### 4. Compatibility With Existing Budget Columns
+
+Existing company and agent monthly budget columns should remain temporarily for compatibility.
+
+Migration plan:
+
+1. keep reading existing columns during transition
+2. create equivalent `budget_policies` rows
+3. switch enforcement and UI to policies
+4. later remove or deprecate legacy columns
+
+## Budget Engine
+
+Budget enforcement should move into a dedicated service.
+
+Current logic is buried inside cost ingestion.
+That is too narrow because budget checks must apply at more than one execution boundary.
+
+### Responsibilities
+
+New service: `budgetService`
+
+Responsibilities:
+
+- resolve applicable policies for a cost event
+- compute current window totals
+- detect threshold crossings
+- create incidents, activities, and approvals
+- pause affected scopes on hard-stop
+- provide preflight enforcement checks for execution entry points
+
+### Canonical Evaluation Flow
+
+When a new `cost_event` is written:
+
+1. persist the `cost_event`
+2. identify affected scopes
+   - company
+   - agent
+   - project
+3. fetch active policies for those scopes
+4. compute current observed amount for each policy window
+5. compare to thresholds
+6. create soft incident if soft threshold crossed for first time in window
+7. create hard incident if hard threshold crossed for first time in window
+8. if hard incident:
+   - pause the scope
+   - create approval
+   - create activity event
+   - emit notification state
+
+### Preflight Enforcement Checks
+
+Budget enforcement cannot rely only on post-hoc cost ingestion.
+
+Paperclip must also block execution before new work starts.
+
+Add budget checks to:
+
+- scheduler heartbeat dispatch
+- manual invoke endpoints
+- assignment-driven wakeups
+- queued run promotion
+- issue checkout or pickup paths where applicable
+
+If a scope is budget-paused:
+
+- do not start a new heartbeat
+- do not let the agent pick up additional work
+- present a clear reason in API and UI
+
+### Active Run Behavior
+
+When a hard-stop is triggered while a run is already active:
+
+- mark scope paused immediately for future work
+- request graceful cancellation of the current run
+- allow normal cancellation timeout behavior
+- write activity explaining that pause came from budget enforcement
+
+This mirrors the general pause semantics already expected by the product.
+
+## Approval Model
+
+Budget hard-stops should create a first-class approval.
+
+### New Approval Type
+
+Add approval type:
+
+- `budget_override_required`
+
+Payload should include:
+
+- `scopeType`
+- `scopeId`
+- `scopeName`
+- `metric`
+- `windowKind`
+- `thresholdType`
+- `budgetAmount`
+- `observedAmount`
+- `windowStart`
+- `windowEnd`
+- `topDrivers`
+- `paused`
+
+### Resolution Actions
+
+The approval UI should support:
+
+- raise budget and resume
+- resume once without changing policy
+- keep paused
+
+Optional later action:
+
+- disable budget policy
+
+### Soft Alerts Do Not Need Approval
+
+Soft alerts should create:
+
+- activity event
+- dashboard alert
+- inbox notification or similar board-visible signal
+
+They should not create an approval by default.
+
+## Notification And Activity Model
+
+Budget events need obvious operator visibility.
+
+Required outputs:
+
+- activity log entry on threshold crossings
+- dashboard surface for active budget incidents
+- detail page banner on paused agent or project
+- `/costs` summary of active incidents and policy health
+
+Later channels:
+
+- email
+- webhook
+- Slack or other integrations
+
+## API Plan
+
+### Policy Management
+
+Add routes for:
+
+- list budget policies for company
+- create budget policy
+- update budget policy
+- archive or disable budget policy
+
+### Incident Surfaces
+
+Add routes for:
+
+- list active budget incidents
+- list incident history
+- get incident detail for a scope
+
+### Approval Resolution
+
+Budget approvals should use the existing approval system once the new approval type is added.
+
+Expected flows:
+
+- create approval on hard-stop
+- resolve approval by changing policy and resuming
+- resolve approval by resuming once
+
+### Execution Errors
+
+When work is blocked by budget, the API should return explicit errors.
+
+Examples:
+
+- agent invocation blocked because agent budget is paused
+- issue execution blocked because project budget is paused
+
+Do not silently no-op.
+
+## UI Plan
+
+Budgeting should be visible in the places where operators make decisions.
+
+### `/costs`
+
+Add a budget section that includes:
+
+- active budget incidents
+- policy list with scope, window, metric, and threshold state
+- progress bars for current period or total
+- clear distinction between:
+  - spend budget
+  - subscription quota
+- quick actions:
+  - raise budget
+  - open approval
+  - resume scope if permitted
+
+The page should make this visual distinction obvious:
+
+- Budget
+  - enforceable spend policy
+- Quota
+  - provider or subscription usage window
+
+### Agent Detail
+
+Add an agent budget card:
+
+- monthly budget amount
+- current month spend
+- remaining spend
+- status
+- warning or paused banner
+- link to approval if blocked
+
+### Project Detail
+
+Add a project budget card:
+
+- total budget amount
+- total spend to date
+- remaining spend
+- pause status
+- approval link
+
+Project detail should also show if issue execution is blocked because the project is budget-paused.
+
+### Dashboard
+
+Add a high-signal budget section:
+
+- active budget breaches
+- upcoming soft alerts
+- counts of paused agents and paused projects due to budget
+
+The operator should not have to visit `/costs` to learn that work has stopped.
+
+## Budget Math
+
+### What Counts Toward Budget
+
+For V1.5 enforcement, include:
+
+- `metered_api` cost events
+- `subscription_overage` cost events
+- any future request-scoped cost event with non-zero billed cents
+
+Do not include:
+
+- `subscription_included` cost events with zero billed cents
+- advisory quota rows
+- account-level finance events unless and until company-level financial budgets are added explicitly
+
+### Why Not Tokens First
+
+Token budgets should not be the first hard-stop because:
+
+- providers count tokens differently
+- cached tokens complicate simple totals
+- some future charges are not token-based
+- subscription tokens do not necessarily imply spend
+- money remains the cleanest cross-provider enforcement metric
+
+### Future Budget Metrics
+
+Future policy metrics can include:
+
+- `total_tokens`
+- `input_tokens`
+- `output_tokens`
+- `requests`
+- `finance_amount_cents`
+
+But they should enter only after the money-budget path is stable.
+
+## Migration Plan
+
+### Phase 1: Foundation
+
+- add `budget_policies`
+- add `budget_incidents`
+- add new approval type
+- add project pause metadata
+
+### Phase 2: Compatibility
+
+- backfill policies from existing company and agent monthly budget columns
+- keep legacy columns readable during migration
+
+### Phase 3: Enforcement
+
+- move budget logic into dedicated service
+- add hard-stop incident creation
+- add activity and approval creation
+- add execution guards on heartbeat and invoke paths
+
+### Phase 4: UI
+
+- `/costs` budget section
+- agent detail budget card
+- project detail budget card
+- dashboard incident summary
+
+### Phase 5: Cleanup
+
+- move all reads/writes to `budget_policies`
+- reduce legacy column reliance
+- decide whether to remove old budget columns
+
+## Tests
+
+Required coverage:
+
+- agent monthly budget soft alert at 80%
+- agent monthly budget hard-stop at 100%
+- project lifetime budget soft alert
+- project lifetime budget hard-stop
+- `subscription_included` usage does not consume money budget
+- `subscription_overage` does consume money budget
+- hard-stop creates one incident per threshold per window
+- hard-stop creates approval and pauses correct scope
+- paused project blocks new issue execution
+- paused agent blocks new heartbeat dispatch
+- policy update and resume clears or resolves active incident correctly
+- dashboard and `/costs` surface active incidents
+
+## Open Questions
+
+These should be explicitly deferred unless they block implementation:
+
+- Should project budgets also support monthly mode, or is lifetime enough for the first release?
+- Should company-level budgets eventually include `finance_events` such as OpenRouter top-up fees and Bedrock provisioned charges?
+- Should delegated budget editing be limited by org hierarchy in V1, or remain board-only in the UI even if the data model can support delegation later?
+- Do we need "resume once" immediately, or can first approval resolution be "raise budget and resume" plus "keep paused"?
+
+## Recommendation
+
+Implement the first coherent budgeting system with these rules:
+
+- Agent budget = monthly billed dollars
+- Project budget = lifetime billed dollars
+- Hard-stop = auto-pause + approval
+- Soft alert = visible warning, no approval
+- Subscription usage = visible quota and token reporting, not money-budget enforcement
+
+This solves the real operator problem without mixing together spend control, provider quota windows, and token accounting.
--- a/doc/plugins/PLUGIN_AUTHORING_GUIDE.md
+++ b/doc/plugins/PLUGIN_AUTHORING_GUIDE.md
@@ -0,0 +1,155 @@
+# Plugin Authoring Guide
+
+This guide describes the current, implemented way to create a Paperclip plugin in this repo.
+
+It is intentionally narrower than [PLUGIN_SPEC.md](./PLUGIN_SPEC.md). The spec includes future ideas; this guide only covers the alpha surface that exists now.
+
+## Current reality
+
+- Treat plugin workers and plugin UI as trusted code.
+- Plugin UI runs as same-origin JavaScript inside the main Paperclip app.
+- Worker-side host APIs are capability-gated.
+- Plugin UI is not sandboxed by manifest capabilities.
+- There is no host-provided shared React component kit for plugins yet.
+- `ctx.assets` is not supported in the current runtime.
+
+## Scaffold a plugin
+
+Use the scaffold package:
+
+```bash
+pnpm --filter @paperclipai/create-paperclip-plugin build
+node packages/plugins/create-paperclip-plugin/dist/index.js @yourscope/plugin-name --output ./packages/plugins/examples
+```
+
+For a plugin that lives outside the Paperclip repo:
+
+```bash
+pnpm --filter @paperclipai/create-paperclip-plugin build
+node packages/plugins/create-paperclip-plugin/dist/index.js @yourscope/plugin-name \
+  --output /absolute/path/to/plugin-repos \
+  --sdk-path /absolute/path/to/paperclip/packages/plugins/sdk
+```
+
+That creates a package with:
+
+- `src/manifest.ts`
+- `src/worker.ts`
+- `src/ui/index.tsx`
+- `tests/plugin.spec.ts`
+- `esbuild.config.mjs`
+- `rollup.config.mjs`
+
+Inside this monorepo, the scaffold uses `workspace:*` for `@paperclipai/plugin-sdk`.
+
+Outside this monorepo, the scaffold snapshots `@paperclipai/plugin-sdk` from the local Paperclip checkout into a `.paperclip-sdk/` tarball so you can build and test a plugin without publishing anything to npm first.
+
+## Recommended local workflow
+
+From the generated plugin folder:
+
+```bash
+pnpm install
+pnpm typecheck
+pnpm test
+pnpm build
+```
+
+For local development, install it into Paperclip from an absolute local path through the plugin manager or API. The server supports local filesystem installs and watches local-path plugins for file changes so worker restarts happen automatically after rebuilds.
+
+Example:
+
+```bash
+curl -X POST http://127.0.0.1:3100/api/plugins/install \
+  -H "Content-Type: application/json" \
+  -d '{"packageName":"/absolute/path/to/your-plugin","isLocalPath":true}'
+```
+
+## Supported alpha surface
+
+Worker:
+
+- config
+- events
+- jobs
+- launchers
+- http
+- secrets
+- activity
+- state
+- entities
+- projects and project workspaces
+- companies
+- issues and comments
+- agents and agent sessions
+- goals
+- data/actions
+- streams
+- tools
+- metrics
+- logger
+
+UI:
+
+- `usePluginData`
+- `usePluginAction`
+- `usePluginStream`
+- `usePluginToast`
+- `useHostContext`
+- typed slot props from `@paperclipai/plugin-sdk/ui`
+
+Mount surfaces currently wired in the host include:
+
+- `page`
+- `settingsPage`
+- `dashboardWidget`
+- `sidebar`
+- `sidebarPanel`
+- `detailTab`
+- `taskDetailView`
+- `projectSidebarItem`
+- `globalToolbarButton`
+- `toolbarButton`
+- `contextMenuItem`
+- `commentAnnotation`
+- `commentContextMenuItem`
+
+## Company routes
+
+Plugins may declare a `page` slot with `routePath` to own a company route like:
+
+```text
+/:companyPrefix/<routePath>
+```
+
+Rules:
+
+- `routePath` must be a single lowercase slug
+- it cannot collide with reserved host routes
+- it cannot duplicate another installed plugin page route
+
+## Publishing guidance
+
+- Use npm packages as the deployment artifact.
+- Treat repo-local example installs as a development workflow only.
+- Prefer keeping plugin UI self-contained inside the package.
+- Do not rely on host design-system components or undocumented app internals.
+- GitHub repository installs are not a first-class workflow today. For local development, use a checked-out local path. For production, publish to npm or a private npm-compatible registry.
+
+## Verification before handoff
+
+At minimum:
+
+```bash
+pnpm --filter <your-plugin-package> typecheck
+pnpm --filter <your-plugin-package> test
+pnpm --filter <your-plugin-package> build
+```
+
+If you changed host integration too, also run:
+
+```bash
+pnpm -r typecheck
+pnpm test:run
+pnpm build
+```
--- a/doc/plugins/PLUGIN_SPEC.md
+++ b/doc/plugins/PLUGIN_SPEC.md
--- a/doc/plugins/ideas-from-opencode.md
+++ b/doc/plugins/ideas-from-opencode.md
--- a/doc/spec/agent-runs.md
+++ b/doc/spec/agent-runs.md
@@ -249,7 +249,7 @@ Runs local `claude` CLI directly.
  "cwd": "/absolute/or/relative/path",
  "promptTemplate": "You are agent {{agent.id}} ...",
  "model": "optional-model-id",
-  "maxTurnsPerRun": 80,
+  "maxTurnsPerRun": 300,
  "dangerouslySkipPermissions": true,
  "env": {"KEY": "VALUE"},
  "extraArgs": [],
--- a/doc/spec/ui.md
+++ b/doc/spec/ui.md
@@ -114,7 +114,7 @@ No section header — these are always at the top, below the company header.
  My Issues
 ```

- **Inbox** — items requiring the board operator's attention. Badge count on the right. Includes: pending approvals, stale tasks, budget alerts, failed heartbeats. The number is the total unread/unresolved count.
+- **Inbox** — items requiring the board operator's attention. Badge count on the right. Includes: pending approvals, budget alerts, failed heartbeats. The number is the total unread/unresolved count.
 - **My Issues** — issues created by or assigned to the board operator.

 ### 3.3 Work Section
--- a/docker-compose.quickstart.yml
+++ b/docker-compose.quickstart.yml
@@ -10,5 +10,9 @@ services:
      PAPERCLIP_HOME: "/paperclip"
      OPENAI_API_KEY: "${OPENAI_API_KEY:-}"
      ANTHROPIC_API_KEY: "${ANTHROPIC_API_KEY:-}"
+      PAPERCLIP_DEPLOYMENT_MODE: "authenticated"
+      PAPERCLIP_DEPLOYMENT_EXPOSURE: "private"
+      PAPERCLIP_PUBLIC_URL: "${PAPERCLIP_PUBLIC_URL:-http://localhost:3100}"
+      BETTER_AUTH_SECRET: "${BETTER_AUTH_SECRET:?BETTER_AUTH_SECRET must be set}"
    volumes:
      - "${PAPERCLIP_DATA_DIR:-./data/docker-paperclip}:/paperclip"
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -5,6 +5,11 @@ services:
      POSTGRES_USER: paperclip
      POSTGRES_PASSWORD: paperclip
      POSTGRES_DB: paperclip
+    healthcheck:
+      test: ["CMD-SHELL", "pg_isready -U paperclip -d paperclip"]
+      interval: 2s
+      timeout: 5s
+      retries: 30
    ports:
      - "5432:5432"
    volumes:
@@ -18,8 +23,16 @@ services:
      DATABASE_URL: postgres://paperclip:paperclip@db:5432/paperclip
      PORT: "3100"
      SERVE_UI: "true"
+      PAPERCLIP_DEPLOYMENT_MODE: "authenticated"
+      PAPERCLIP_DEPLOYMENT_EXPOSURE: "private"
+      PAPERCLIP_PUBLIC_URL: "${PAPERCLIP_PUBLIC_URL:-http://localhost:3100}"
+      BETTER_AUTH_SECRET: "${BETTER_AUTH_SECRET:?BETTER_AUTH_SECRET must be set}"
+    volumes:
+      - paperclip-data:/paperclip
    depends_on:
-      - db
+      db:
+        condition: service_healthy

 volumes:
  pgdata:
+  paperclip-data:
--- a/docker/openclaw-smoke/Dockerfile
+++ b/docker/openclaw-smoke/Dockerfile
@@ -0,0 +1,8 @@
+FROM node:22-alpine
+
+WORKDIR /app
+COPY server.mjs /app/server.mjs
+
+EXPOSE 8787
+
+CMD ["node", "/app/server.mjs"]
--- a/docker/openclaw-smoke/server.mjs
+++ b/docker/openclaw-smoke/server.mjs
@@ -0,0 +1,103 @@
+import http from "node:http";
+
+const port = Number.parseInt(process.env.PORT ?? "8787", 10);
+const webhookPath = process.env.OPENCLAW_SMOKE_PATH?.trim() || "/webhook";
+const expectedAuthHeader = process.env.OPENCLAW_SMOKE_AUTH?.trim() || "";
+const maxBodyBytes = 1_000_000;
+const maxEvents = 200;
+
+const events = [];
+let nextId = 1;
+
+function writeJson(res, status, payload) {
+  res.statusCode = status;
+  res.setHeader("content-type", "application/json; charset=utf-8");
+  res.end(JSON.stringify(payload));
+}
+
+function readBody(req) {
+  return new Promise((resolve, reject) => {
+    const chunks = [];
+    let total = 0;
+    req.on("data", (chunk) => {
+      total += chunk.length;
+      if (total > maxBodyBytes) {
+        reject(new Error("payload_too_large"));
+        req.destroy();
+        return;
+      }
+      chunks.push(chunk);
+    });
+    req.on("end", () => {
+      resolve(Buffer.concat(chunks).toString("utf8"));
+    });
+    req.on("error", reject);
+  });
+}
+
+function trimEvents() {
+  if (events.length <= maxEvents) return;
+  events.splice(0, events.length - maxEvents);
+}
+
+const server = http.createServer(async (req, res) => {
+  const method = req.method ?? "GET";
+  const url = req.url ?? "/";
+
+  if (method === "GET" && url === "/health") {
+    writeJson(res, 200, { ok: true, webhookPath, events: events.length });
+    return;
+  }
+
+  if (method === "GET" && url === "/events") {
+    writeJson(res, 200, { count: events.length, events });
+    return;
+  }
+
+  if (method === "POST" && url === "/reset") {
+    events.length = 0;
+    writeJson(res, 200, { ok: true });
+    return;
+  }
+
+  if (method === "POST" && url === webhookPath) {
+    const authorization = req.headers.authorization ?? "";
+    if (expectedAuthHeader && authorization !== expectedAuthHeader) {
+      writeJson(res, 401, { error: "unauthorized" });
+      return;
+    }
+
+    try {
+      const raw = await readBody(req);
+      let body = null;
+      try {
+        body = raw.length > 0 ? JSON.parse(raw) : null;
+      } catch {
+        body = { raw };
+      }
+
+      const event = {
+        id: `evt-${nextId++}`,
+        receivedAt: new Date().toISOString(),
+        method,
+        path: url,
+        authorizationPresent: Boolean(authorization),
+        body,
+      };
+      events.push(event);
+      trimEvents();
+      writeJson(res, 200, { ok: true, received: true, eventId: event.id, count: events.length });
+    } catch (err) {
+      const code = err instanceof Error && err.message === "payload_too_large" ? 413 : 500;
+      writeJson(res, code, { error: err instanceof Error ? err.message : "unknown_error" });
+    }
+    return;
+  }
+
+  writeJson(res, 404, { error: "not_found" });
+});
+
+server.listen(port, "0.0.0.0", () => {
+  // eslint-disable-next-line no-console
+  console.log(`[openclaw-smoke] listening on :${port} path=${webhookPath}`);
+});
--- a/docs/adapters/claude-local.md
+++ b/docs/adapters/claude-local.md
@@ -20,7 +20,7 @@ The `claude_local` adapter runs Anthropic's Claude Code CLI locally. It supports
 | `env` | object | No | Environment variables (supports secret refs) |
 | `timeoutSec` | number | No | Process timeout (0 = no timeout) |
 | `graceSec` | number | No | Grace period before force-kill |
-| `maxTurnsPerRun` | number | No | Max agentic turns per heartbeat |
+| `maxTurnsPerRun` | number | No | Max agentic turns per heartbeat (defaults to `300`) |
 | `dangerouslySkipPermissions` | boolean | No | Skip permission prompts (dev only) |

 ## Prompt Templates
@@ -47,6 +47,14 @@ If resume fails with an unknown session error, the adapter automatically retries

 The adapter creates a temporary directory with symlinks to Paperclip skills and passes it via `--add-dir`. This makes skills discoverable without polluting the agent's working directory.

+For manual local CLI usage outside heartbeat runs (for example running as `claudecoder` directly), use:
+
+```sh
+pnpm paperclipai agent local-cli claudecoder --company-id <company-id>
+```
+
+This installs Paperclip skills in `~/.claude/skills`, creates an agent API key, and prints shell exports to run as that agent.
+
 ## Environment Test

 Use the "Test Environment" button in the UI to validate the adapter config. It checks:
--- a/docs/adapters/codex-local.md
+++ b/docs/adapters/codex-local.md
@@ -30,6 +30,16 @@ Codex uses `previous_response_id` for session continuity. The adapter serializes

 The adapter symlinks Paperclip skills into the global Codex skills directory (`~/.codex/skills`). Existing user skills are not overwritten.

+When Paperclip is running inside a managed worktree instance (`PAPERCLIP_IN_WORKTREE=true`), the adapter instead uses a worktree-isolated `CODEX_HOME` under the Paperclip instance so Codex skills, sessions, logs, and other runtime state do not leak across checkouts. It seeds that isolated home from the user's main Codex home for shared auth/config continuity.
+
+For manual local CLI usage outside heartbeat runs (for example running as `codexcoder` directly), use:
+
+```sh
+pnpm paperclipai agent local-cli codexcoder --company-id <company-id>
+```
+
+This installs any missing skills, creates an agent API key, and prints shell exports to run as that agent.
+
 ## Environment Test

 The environment test checks:
--- a/docs/adapters/creating-an-adapter.md
+++ b/docs/adapters/creating-an-adapter.md
@@ -5,6 +5,10 @@ summary: Guide to building a custom adapter

 Build a custom adapter to connect Paperclip to any agent runtime.

+<Tip>
+If you're using Claude Code, the `.agents/skills/create-agent-adapter` skill can guide you through the full adapter creation process interactively. Just ask Claude to create a new adapter and it will walk you through each step.
+</Tip>
+
 ## Package Structure

 ```
--- a/docs/adapters/gemini-local.md
+++ b/docs/adapters/gemini-local.md
@@ -0,0 +1,45 @@
+---
+title: Gemini Local
+summary: Gemini CLI local adapter setup and configuration
+---
+
+The `gemini_local` adapter runs Google's Gemini CLI locally. It supports session persistence with `--resume`, skills injection, and structured `stream-json` output parsing.
+
+## Prerequisites
+
+- Gemini CLI installed (`gemini` command available)
+- `GEMINI_API_KEY` or `GOOGLE_API_KEY` set, or local Gemini CLI auth configured
+
+## Configuration Fields
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `cwd` | string | Yes | Working directory for the agent process (absolute path; created automatically if missing when permissions allow) |
+| `model` | string | No | Gemini model to use. Defaults to `auto`. |
+| `promptTemplate` | string | No | Prompt used for all runs |
+| `instructionsFilePath` | string | No | Markdown instructions file prepended to the prompt |
+| `env` | object | No | Environment variables (supports secret refs) |
+| `timeoutSec` | number | No | Process timeout (0 = no timeout) |
+| `graceSec` | number | No | Grace period before force-kill |
+| `yolo` | boolean | No | Pass `--approval-mode yolo` for unattended operation |
+
+## Session Persistence
+
+The adapter persists Gemini session IDs between heartbeats. On the next wake, it resumes the existing conversation with `--resume` so the agent retains context.
+
+Session resume is cwd-aware: if the working directory changed since the last run, a fresh session starts instead.
+
+If resume fails with an unknown session error, the adapter automatically retries with a fresh session.
+
+## Skills Injection
+
+The adapter symlinks Paperclip skills into the Gemini global skills directory (`~/.gemini/skills`). Existing user skills are not overwritten.
+
+## Environment Test
+
+Use the "Test Environment" button in the UI to validate the adapter config. It checks:
+
+- Gemini CLI is installed and accessible
+- Working directory is absolute and available (auto-created if missing and permitted)
+- API key/auth hints (`GEMINI_API_KEY` or `GOOGLE_API_KEY`)
+- A live hello probe (`gemini --output-format json "Respond with hello."`) to verify CLI readiness
--- a/docs/adapters/overview.md
+++ b/docs/adapters/overview.md
@@ -20,6 +20,9 @@ When a heartbeat fires, Paperclip:
 |---------|----------|-------------|
 | [Claude Local](/adapters/claude-local) | `claude_local` | Runs Claude Code CLI locally |
 | [Codex Local](/adapters/codex-local) | `codex_local` | Runs OpenAI Codex CLI locally |
+| [Gemini Local](/adapters/gemini-local) | `gemini_local` | Runs Gemini CLI locally |
+| OpenCode Local | `opencode_local` | Runs OpenCode CLI locally (multi-provider `provider/model`) |
+| OpenClaw | `openclaw` | Sends wake payloads to an OpenClaw webhook |
 | [Process](/adapters/process) | `process` | Executes arbitrary shell commands |
 | [HTTP](/adapters/http) | `http` | Sends webhooks to external agents |

@@ -52,7 +55,7 @@ Three registries consume these modules:

 ## Choosing an Adapter

- **Need a coding agent?** Use `claude_local` or `codex_local`
+- **Need a coding agent?** Use `claude_local`, `codex_local`, `gemini_local`, or `opencode_local`
 - **Need to run a script or command?** Use `process`
 - **Need to call an external service?** Use `http`
 - **Need something custom?** [Create your own adapter](/adapters/creating-an-adapter)
--- a/docs/api/agents.md
+++ b/docs/api/agents.md
@@ -123,6 +123,18 @@ GET /api/companies/{companyId}/org

 Returns the full organizational tree for the company.

+## List Adapter Models
+
+```
+GET /api/companies/{companyId}/adapters/{adapterType}/models
+```
+
+Returns selectable models for an adapter type.
+
+- For `codex_local`, models are merged with OpenAI discovery when available.
+- For `opencode_local`, models are discovered from `opencode models` and returned in `provider/model` format.
+- `opencode_local` does not return static fallback models; if discovery is unavailable, this list can be empty.
+
 ## Config Revisions

 ```
--- a/docs/api/issues.md
+++ b/docs/api/issues.md
@@ -1,9 +1,9 @@
 ---
 title: Issues
-summary: Issue CRUD, checkout/release, comments, and attachments
+summary: Issue CRUD, checkout/release, comments, documents, and attachments
 ---

-Issues are the unit of work in Paperclip. They support hierarchical relationships, atomic checkout, comments, and file attachments.
+Issues are the unit of work in Paperclip. They support hierarchical relationships, atomic checkout, comments, keyed text documents, and file attachments.

 ## List Issues

@@ -29,6 +29,12 @@ GET /api/issues/{issueId}

 Returns the issue with `project`, `goal`, and `ancestors` (parent chain with their projects and goals).

+The response also includes:
+
+- `planDocument`: the full text of the issue document with key `plan`, when present
+- `documentSummaries`: metadata for all linked issue documents
+- `legacyPlanDocument`: a read-only fallback when the description still contains an old `<plan>` block
+
 ## Create Issue

 ```
@@ -100,6 +106,54 @@ POST /api/issues/{issueId}/comments

@-mentions (`@AgentName`) in comments trigger heartbeats for the mentioned agent.

+## Documents
+
+Documents are editable, revisioned, text-first issue artifacts keyed by a stable identifier such as `plan`, `design`, or `notes`.
+
+### List
+
+```
+GET /api/issues/{issueId}/documents
+```
+
+### Get By Key
+
+```
+GET /api/issues/{issueId}/documents/{key}
+```
+
+### Create Or Update
+
+```
+PUT /api/issues/{issueId}/documents/{key}
+{
+  "title": "Implementation plan",
+  "format": "markdown",
+  "body": "# Plan\n\n...",
+  "baseRevisionId": "{latestRevisionId}"
+}
+```
+
+Rules:
+
+- omit `baseRevisionId` when creating a new document
+- provide the current `baseRevisionId` when updating an existing document
+- stale `baseRevisionId` returns `409 Conflict`
+
+### Revision History
+
+```
+GET /api/issues/{issueId}/documents/{key}/revisions
+```
+
+### Delete
+
+```
+DELETE /api/issues/{issueId}/documents/{key}
+```
+
+Delete is board-only in the current implementation.
+
 ## Attachments

 ### Upload
--- a/docs/deploy/local-development.md
+++ b/docs/deploy/local-development.md
@@ -48,12 +48,20 @@ pnpm dev --tailscale-auth

 This binds the server to `0.0.0.0` for private-network access.

+Alias:
+
+```sh
+pnpm dev --authenticated-private
+```
+
 Allow additional private hostnames:

 ```sh
 pnpm paperclipai allowed-hostname dotta-macbook-pro
 ```

+For full setup and troubleshooting, see [Tailscale Private Access](/deploy/tailscale-private-access).
+
 ## Health Checks

 ```sh
--- a/docs/deploy/tailscale-private-access.md
+++ b/docs/deploy/tailscale-private-access.md
@@ -0,0 +1,77 @@
+---
+title: Tailscale Private Access
+summary: Run Paperclip with Tailscale-friendly host binding and connect from other devices
+---
+
+Use this when you want to access Paperclip over Tailscale (or a private LAN/VPN) instead of only `localhost`.
+
+## 1. Start Paperclip in private authenticated mode
+
+```sh
+pnpm dev --tailscale-auth
+```
+
+This configures:
+
+- `PAPERCLIP_DEPLOYMENT_MODE=authenticated`
+- `PAPERCLIP_DEPLOYMENT_EXPOSURE=private`
+- `PAPERCLIP_AUTH_BASE_URL_MODE=auto`
+- `HOST=0.0.0.0` (bind on all interfaces)
+
+Equivalent flag:
+
+```sh
+pnpm dev --authenticated-private
+```
+
+## 2. Find your reachable Tailscale address
+
+From the machine running Paperclip:
+
+```sh
+tailscale ip -4
+```
+
+You can also use your Tailscale MagicDNS hostname (for example `my-macbook.tailnet.ts.net`).
+
+## 3. Open Paperclip from another device
+
+Use the Tailscale IP or MagicDNS host with the Paperclip port:
+
+```txt
+http://<tailscale-host-or-ip>:3100
+```
+
+Example:
+
+```txt
+http://my-macbook.tailnet.ts.net:3100
+```
+
+## 4. Allow custom private hostnames when needed
+
+If you access Paperclip with a custom private hostname, add it to the allowlist:
+
+```sh
+pnpm paperclipai allowed-hostname my-macbook.tailnet.ts.net
+```
+
+## 5. Verify the server is reachable
+
+From a remote Tailscale-connected device:
+
+```sh
+curl http://<tailscale-host-or-ip>:3100/api/health
+```
+
+Expected result:
+
+```json
+{"status":"ok"}
+```
+
+## Troubleshooting
+
+- Login or redirect errors on a private hostname: add it with `paperclipai allowed-hostname`.
+- App only works on `localhost`: make sure you started with `--tailscale-auth` (or set `HOST=0.0.0.0` in private mode).
+- Can connect locally but not remotely: verify both devices are on the same Tailscale network and port `3100` is reachable.
--- a/docs/docs.json
+++ b/docs/docs.json
@@ -9,6 +9,10 @@
    "dark": "#1D4ED8"
  },
  "favicon": "/favicon.svg",
+  "logo": {
+    "dark": "/images/logo-dark.svg",
+    "light": "/images/logo-light.svg"
+  },
  "topbarLinks": [
    {
      "name": "GitHub",
@@ -69,6 +73,7 @@
            "pages": [
              "deploy/overview",
              "deploy/local-development",
+              "deploy/tailscale-private-access",
              "deploy/docker",
              "deploy/deployment-modes",
              "deploy/database",
--- a/Show More
+++ b/Show More