chore: improve worktree tooling and security docs

2026-04-25 17:25:15 +02:00 · 2026-04-10 22:26:30 -05:00
parent 548721248e
commit 8bdf4081ee
17 changed files with 1100 additions and 123 deletions
--- a/doc/plans/2026-03-17-memory-service-surface-api.md
+++ b/doc/plans/2026-03-17-memory-service-surface-api.md
@@ -7,10 +7,10 @@ Define a Paperclip memory service and surface API that can sit above multiple me
 - company scoping
 - auditability
 - provenance back to Paperclip work objects
- budget / cost visibility
+- budget and cost visibility
 - plugin-first extensibility

-This plan is based on the external landscape summarized in `doc/memory-landscape.md` and on the current Paperclip architecture in:
+This plan is based on the external landscape summarized in `doc/memory-landscape.md`, the AWS AgentCore comparison captured in [PAP-1274](/PAP/issues/PAP-1274), and the current Paperclip architecture in:

 - `doc/SPEC-implementation.md`
 - `doc/plugins/PLUGIN_SPEC.md`
@@ -19,23 +19,26 @@ This plan is based on the external landscape summarized in `doc/memory-landscape

 ## Recommendation In One Sentence

-Paperclip should not embed one opinionated memory engine into core. It should add a company-scoped memory control plane with a small normalized adapter contract, then let built-ins and plugins implement the provider-specific behavior.
+Paperclip should add a company-scoped memory control plane with company default plus agent override resolution, shared hook delivery, and full operation attribution, while leaving extraction and storage semantics to built-ins and plugins.

 ## Product Decisions

-### 1. Memory is company-scoped by default
+### 1. Memory resolution is company default plus agent override

 Every memory binding belongs to exactly one company.

-That binding can then be:
+Resolution order in V1:

- the company default
- an agent override
- a project override later if we need it
+- company default binding
+- optional per-agent override
+
+There is no per-project override in V1.
+
+Project context can still appear in scope and provenance so providers can use it for retrieval and partitioning, but projects do not participate in binding selection.

 No cross-company memory sharing in the initial design.

-### 2. Providers are selected by key
+### 2. Providers are selected by stable binding key

 Each configured memory provider gets a stable key inside a company, for example:

@@ -44,36 +47,53 @@ Each configured memory provider gets a stable key inside a company, for example:
 - `local-markdown`
 - `research-kb`

-Agents and services resolve the active provider by key, not by hard-coded vendor logic.
+Agents, tools, and background hooks resolve the active provider by key, not by hard-coded vendor logic.

 ### 3. Plugins are the primary provider path

 Built-ins are useful for a zero-config local path, but most providers should arrive through the existing Paperclip plugin runtime.

-That keeps the core small and matches the current direction that optional knowledge-like systems live at the edges.
+That keeps the core small and matches the broader Paperclip direction that specialized knowledge systems live at the edges.

-### 4. Paperclip owns routing, provenance, and accounting
+### 4. Paperclip owns routing, provenance, and policy

 Providers should not decide how Paperclip entities map to governance.

 Paperclip core should own:

+- binding resolution
 - who is allowed to call a memory operation
- which company / agent / project scope is active
- what issue / run / comment / document the operation belongs to
- how usage gets recorded
+- which company, agent, issue, project, run, and subject scope is active
+- what source object the operation belongs to
+- how usage and costs are attributed
+- how operators inspect what happened

-### 5. Automatic memory should be narrow at first
+### 5. Paperclip exposes shared hooks, providers own extraction
+
+Paperclip should emit a common set of memory hooks that built-ins, third-party adapters, and plugins can all use.
+
+Those hooks should pass structured Paperclip source objects plus normalized metadata. The provider then decides how to extract from those objects.
+
+Paperclip should not force one extraction pipeline or one canonical "memory text" transform before the provider sees the input.
+
+### 6. Automatic memory should start narrow, but the hook surface should be general

 Automatic capture is useful, but broad silent capture is dangerous.

-Initial automatic hooks should be:
+Initial built-in automatic hooks should be:

+- pre-run hydrate for agent context recall
 - post-run capture from agent runs
- issue comment / document capture when the binding enables it
- pre-run recall for agent context hydration
+- optional issue comment capture
+- optional issue document capture

-Everything else should start explicit.
+The hook registry itself should be general enough that other providers can subscribe to the same events without core changes.
+
+### 7. No approval gate for binding changes in the open-source product
+
+For the open-source version, changing memory bindings should not require approvals.
+
+Paperclip should still log those changes in activity and preserve full auditability. Approval-gated memory governance can remain an enterprise or future policy layer.

 ## Proposed Concepts

@@ -83,7 +103,7 @@ A built-in or plugin-supplied implementation that stores and retrieves memory.

 Examples:

- local markdown + vector index
+- local markdown plus semantic index
 - mem0 adapter
 - supermemory adapter
 - MemOS adapter
@@ -94,6 +114,15 @@ A company-scoped configuration record that points to a provider and carries prov

 This is the object selected by key.

+### Memory binding target
+
+A mapping from a Paperclip target to a binding.
+
+V1 targets:
+
+- `company`
+- `agent`
+
 ### Memory scope

 The normalized Paperclip scope passed into a provider request.
@@ -105,7 +134,9 @@ At minimum:
 - optional `projectId`
 - optional `issueId`
 - optional `runId`
- optional `subjectId` for external/user identity
+- optional `subjectId` for external or user identity
+- optional `sessionKey` for providers that organize memory around sessions
+- optional `namespace` for providers that need an explicit partition hint

 ### Memory source reference

@@ -121,24 +152,36 @@ Supported source kinds should include:
 - `manual_note`
 - `external_document`

+### Memory hook
+
+A normalized trigger emitted by Paperclip when something memory-relevant happens.
+
+Initial hook kinds:
+
+- `pre_run_hydrate`
+- `post_run_capture`
+- `issue_comment_capture`
+- `issue_document_capture`
+- `manual_capture`
+
 ### Memory operation

-A normalized write, query, browse, or delete action performed through Paperclip.
+A normalized capture, record-write, query, browse, get, correction, or delete action performed through Paperclip.

-Paperclip should log every operation, whether the provider is local or external.
+Paperclip should log every memory operation whether the provider is local, plugin-backed, or external.

 ## Required Adapter Contract

-The required core should be small enough to fit `memsearch`, `mem0`, `Memori`, `MemOS`, or `OpenViking`.
+The required core should be small enough to fit `memsearch`, `mem0`, `Memori`, `MemOS`, or `OpenViking`, but strong enough to satisfy Paperclip's attribution and inspectability requirements.

 ```ts
 export interface MemoryAdapterCapabilities {
  profile?: boolean;
-  browse?: boolean;
  correction?: boolean;
-  asyncIngestion?: boolean;
  multimodal?: boolean;
  providerManagedExtraction?: boolean;
+  asyncExtraction?: boolean;
+  providerNativeBrowse?: boolean;
 }

 export interface MemoryScope {
@@ -148,6 +191,8 @@ export interface MemoryScope {
  issueId?: string;
  runId?: string;
  subjectId?: string;
+  sessionKey?: string;
+  namespace?: string;
 }

 export interface MemorySourceRef {
@@ -168,10 +213,34 @@ export interface MemorySourceRef {
  externalRef?: string;
 }

+export interface MemoryHookContext {
+  hookKind:
+    | "pre_run_hydrate"
+    | "post_run_capture"
+    | "issue_comment_capture"
+    | "issue_document_capture"
+    | "manual_capture";
+  hookId: string;
+  triggeredAt: string;
+  actorAgentId?: string;
+  heartbeatRunId?: string;
+}
+
+export interface MemorySourcePayload {
+  text?: string;
+  mimeType?: string;
+  metadata?: Record<string, unknown>;
+  object?: Record<string, unknown>;
+}
+
 export interface MemoryUsage {
  provider: string;
+  biller?: string;
  model?: string;
+  billingType?: "metered_api" | "subscription_included" | "subscription_overage" | "unknown";
+  attributionMode?: "billed_directly" | "included_in_run" | "external_invoice" | "untracked";
  inputTokens?: number;
+  cachedInputTokens?: number;
  outputTokens?: number;
  embeddingTokens?: number;
  costCents?: number;
@@ -179,20 +248,32 @@ export interface MemoryUsage {
  details?: Record<string, unknown>;
 }

-export interface MemoryWriteRequest {
-  bindingKey: string;
-  scope: MemoryScope;
-  source: MemorySourceRef;
-  content: string;
-  metadata?: Record<string, unknown>;
-  mode?: "append" | "upsert" | "summarize";
-}
-
 export interface MemoryRecordHandle {
  providerKey: string;
  providerRecordId: string;
 }

+export interface MemoryCaptureRequest {
+  bindingKey: string;
+  scope: MemoryScope;
+  source: MemorySourceRef;
+  payload: MemorySourcePayload;
+  hook?: MemoryHookContext;
+  mode?: "capture_residue" | "capture_record";
+  metadata?: Record<string, unknown>;
+}
+
+export interface MemoryRecordWriteRequest {
+  bindingKey: string;
+  scope: MemoryScope;
+  source?: MemorySourceRef;
+  records: Array<{
+    text: string;
+    summary?: string;
+    metadata?: Record<string, unknown>;
+  }>;
+}
+
 export interface MemoryQueryRequest {
  bindingKey: string;
  scope: MemoryScope;
@@ -202,6 +283,14 @@ export interface MemoryQueryRequest {
  metadataFilter?: Record<string, unknown>;
 }

+export interface MemoryListRequest {
+  bindingKey: string;
+  scope: MemoryScope;
+  cursor?: string;
+  limit?: number;
+  metadataFilter?: Record<string, unknown>;
+}
+
 export interface MemorySnippet {
  handle: MemoryRecordHandle;
  text: string;
@@ -217,30 +306,149 @@ export interface MemoryContextBundle {
  usage?: MemoryUsage[];
 }

+export interface MemoryListPage {
+  items: MemorySnippet[];
+  nextCursor?: string;
+  usage?: MemoryUsage[];
+}
+
+export interface MemoryExtractionJob {
+  providerJobId: string;
+  status: "queued" | "running" | "succeeded" | "failed" | "cancelled";
+  hookKind?: MemoryHookContext["hookKind"];
+  source?: MemorySourceRef;
+  error?: string;
+  submittedAt?: string;
+  startedAt?: string;
+  finishedAt?: string;
+}
+
 export interface MemoryAdapter {
  key: string;
  capabilities: MemoryAdapterCapabilities;
-  write(req: MemoryWriteRequest): Promise<{
+  capture(req: MemoryCaptureRequest): Promise<{
+    records?: MemoryRecordHandle[];
+    jobs?: MemoryExtractionJob[];
+    usage?: MemoryUsage[];
+  }>;
+  upsertRecords(req: MemoryRecordWriteRequest): Promise<{
    records?: MemoryRecordHandle[];
    usage?: MemoryUsage[];
  }>;
  query(req: MemoryQueryRequest): Promise<MemoryContextBundle>;
+  list(req: MemoryListRequest): Promise<MemoryListPage>;
  get(handle: MemoryRecordHandle, scope: MemoryScope): Promise<MemorySnippet | null>;
  forget(handles: MemoryRecordHandle[], scope: MemoryScope): Promise<{ usage?: MemoryUsage[] }>;
 }
 ```

-This contract intentionally does not force a provider to expose its internal graph, filesystem, or ontology.
+This contract intentionally does not force a provider to expose its internal graph, file tree, or ontology. It does require enough structure for Paperclip to browse, attribute, and audit what happened.

 ## Optional Adapter Surfaces

 These should be capability-gated, not required:

- `browse(scope, filters)` for file-system / graph / timeline inspection
 - `correct(handle, patch)` for natural-language correction flows
 - `profile(scope)` when the provider can synthesize stable preferences or summaries
- `sync(source)` for connectors or background ingestion
+- `listExtractionJobs(scope, cursor)` when async extraction needs richer operator visibility
+- `retryExtractionJob(jobId)` when a provider supports re-drive
 - `explain(queryResult)` for providers that can expose retrieval traces
+- provider-native browse or graph surfaces exposed through plugin UI
+
+## Lessons From AWS AgentCore Memory API
+
+AWS AgentCore Memory is a useful check on whether this plan is too abstract or missing important operational surfaces.
+
+The broad direction still looks right:
+
+- AWS splits memory into a control plane (`CreateMemory`, `UpdateMemory`, `ListMemories`) and a data plane (`CreateEvent`, `RetrieveMemoryRecords`, `GetMemoryRecord`, `ListMemoryRecords`)
+- AWS separates raw interaction capture from curated long-term memory records
+- AWS supports both provider-managed extraction and self-managed pipelines
+- AWS treats browse and list operations as first-class APIs, not ad hoc debugging helpers
+- AWS exposes extraction jobs instead of hiding asynchronous maintenance completely
+
+That lines up with the Paperclip plan at a high level: provider configuration, scoped writes, scoped retrieval, provider-managed extraction as a capability, and a browse and inspect surface.
+
+The concrete changes Paperclip should take from AWS are:
+
+### 1. Keep config APIs separate from runtime traffic
+
+The rollout should preserve a clean separation between:
+
+- control-plane APIs for binding CRUD, defaults, overrides, and capability metadata
+- runtime APIs and tools for capture, record writes, query, list, get, forget, and extraction status
+
+This keeps governance changes distinct from high-volume memory traffic.
+
+### 2. Distinguish capture from curated record writes
+
+AWS does not flatten everything into one write primitive. It distinguishes captured events from durable memory records.
+
+Paperclip should do the same:
+
+- `capture(...)` for raw run, comment, document, or activity residue
+- `upsertRecords(...)` for curated durable facts and notes
+
+That is a better fit for provider-managed extraction and for manual curation flows.
+
+### 3. Make list and browse first-class
+
+AWS exposes list and retrieve surfaces directly. Paperclip should not make browse optional at the portable layer.
+
+The minimum portable surface should include:
+
+- `query`
+- `list`
+- `get`
+
+Provider-native graph or file browsing can remain optional beyond that.
+
+### 4. Add pagination and cursors for operator inspection
+
+AWS consistently uses pagination on browse-heavy APIs.
+
+Paperclip should add cursor-based pagination to:
+
+- record listing
+- extraction job listing
+- memory operation explorer APIs
+
+Prompt hydration can continue to use `topK`, but operator surfaces need cursors.
+
+### 5. Add explicit session and namespace hints
+
+AWS uses `actorId`, `sessionId`, `namespace`, and `memoryStrategyId` heavily.
+
+Paperclip should keep its own control-plane-centric model, but the adapter contract needs obvious places to map those concepts:
+
+- `sessionKey`
+- `namespace`
+
+The provider adapter can map them to AWS or other vendor-specific identifiers without leaking those identifiers into core.
+
+### 6. Treat asynchronous extraction as a real operational surface
+
+AWS exposes extraction jobs explicitly. Paperclip should too.
+
+Operators should be able to see:
+
+- pending extraction work
+- failed extraction work
+- which hook or source caused the work
+- whether a retry is available
+
+### 7. Keep Paperclip provenance primary
+
+Paperclip should continue to center:
+
+- `companyId`
+- `agentId`
+- `projectId`
+- `issueId`
+- `runId`
+- issue comments, documents, and activity as sources
+
+The lesson from AWS is to support clean mapping into provider-specific models, not to let provider identifiers take over the core product model.

 ## What Paperclip Should Persist

@@ -248,39 +456,67 @@ Paperclip should not mirror the full provider memory corpus into Postgres unless

 Paperclip core should persist:

- memory bindings and overrides
+- memory bindings
+- company default and agent override resolution targets
 - provider keys and capability metadata
 - normalized memory operation logs
- provider record handles returned by operations when available
 - source references back to issue comments, documents, runs, and activity
- usage and cost data
+- provider record handles returned by operations when available
+- hook delivery records and extraction job state
+- usage and cost attribution

-For external providers, the memory payload itself can remain in the provider.
+For external providers, the actual memory payload can remain in the provider.

 ## Hook Model

-### Automatic hooks
+### Shared hook surface
+
+Paperclip should expose one shared hook system for memory.
+
+That same system must be available to:
+
+- built-in memory providers
+- plugin-based memory providers
+- third-party adapter integrations that want to use memory hooks
+
+### What a hook delivers
+
+Each hook delivery should include:
+
+- resolved binding key
+- normalized `MemoryScope`
+- `MemorySourceRef`
+- structured source payload
+- hook metadata such as hook kind, trigger time, and related run id
+
+The payload should include structured objects where possible so the provider can decide how to extract and chunk.
+
+### Initial automatic hooks

 These should be low-risk and easy to reason about:

-1. `pre-run hydrate`
+1. `pre_run_hydrate`
   Before an agent run starts, Paperclip may call `query(... intent = "agent_preamble")` using the active binding.

-2. `post-run capture`
-   After a run finishes, Paperclip may write a summary or transcript-derived note tied to the run.
+2. `post_run_capture`
+   After a run finishes, Paperclip may call `capture(...)` with structured run output, excerpts, and provenance.

-3. `issue comment / document capture`
-   When enabled on the binding, Paperclip may capture selected issue comments or issue documents as memory sources.
+3. `issue_comment_capture`
+   When enabled on the binding, Paperclip may call `capture(...)` for selected issue comments.

-### Explicit hooks
+4. `issue_document_capture`
+   When enabled on the binding, Paperclip may call `capture(...)` for selected issue documents.

-These should be tool- or UI-driven first:
+### Explicit tools and APIs
+
+These should be tool-driven or UI-driven first:

 - `memory.search`
 - `memory.note`
 - `memory.forget`
 - `memory.correct`
- `memory.browse`
+- memory record list and get
+- extraction-job inspection

 ### Not automatic in the first version

@@ -309,34 +545,69 @@ The initial browse surface should support:

 - active binding by company and agent
 - recent memory operations
- recent write sources
+- recent write and capture sources
+- record list and record detail with source backlinks
 - query results with source backlinks
- filters by agent, issue, run, source kind, and date
- provider usage / cost / latency summaries
+- extraction job status
+- filters by agent, issue, project, run, source kind, and date
+- provider usage, cost, and latency summaries

 When a provider supports richer browsing, the plugin can add deeper views through the existing plugin UI surfaces.

 ## Cost And Evaluation

-Every adapter response should be able to return usage records.
+Paperclip should treat memory accounting as two related but distinct concerns:

-Paperclip should roll up:
+### 1. `memory_operations` is the authoritative audit trail

- memory inference tokens
- embedding tokens
- external provider cost
+Every memory action should create a normalized operation record that captures:
+
+- binding
+- scope
+- source provenance
+- operation type
+- success or failure
 - latency
- query count
- write count
+- usage details reported by the provider
+- attribution mode
+- related run, issue, and agent when available

-It should also record evaluation-oriented metrics where possible:
+This is where operators answer "what memory work happened and why?"
+
+### 2. `cost_events` remains the canonical spend ledger for billable metered usage
+
+The current `cost_events` model is already the canonical cost ledger for token and model spend, and `agent_runtime_state` plus `heartbeat_runs.usageJson` already roll up and summarize run usage.
+
+The recommendation is:
+
+- if a memory operation runs inside a normal Paperclip agent heartbeat and the model usage is already counted on that run, do not create a duplicate `cost_event`
+- instead, store the memory operation with `attributionMode = "included_in_run"` and link it to the related `heartbeatRunId`
+- if a memory provider makes a direct metered model call outside the agent run accounting path, the provider must report usage and Paperclip should create a `cost_event`
+- that direct `cost_event` should still link back to the memory operation, agent, company, and issue or run context when possible
+
+### 3. `finance_events` should carry flat subscription or invoice-style costs
+
+If a memory service incurs:
+
+- monthly subscription cost
+- storage invoices
+- provider platform charges not tied to one request
+
+those should be represented as `finance_events`, not as synthetic per-query memory operations.
+
+That keeps usage telemetry separate from accounting entries like invoices and flat fees.
+
+### 4. Evaluation metrics still matter
+
+Paperclip should record evaluation-oriented metrics where possible:

 - recall hit rate
 - empty query rate
 - manual correction count
- per-binding success / failure counts
+- extraction failure count
+- per-binding success and failure counts

-This is important because a memory system that "works" but silently burns budget is not acceptable in Paperclip.
+This is important because a memory system that "works" but silently burns budget or silently fails extraction is not acceptable in Paperclip.

 ## Suggested Data Model Additions

@@ -344,23 +615,36 @@ At the control-plane level, the likely new core tables are:

 - `memory_bindings`
  - company-scoped key
-  - provider id / plugin id
+  - provider id or plugin id
  - config blob
  - enabled status

 - `memory_binding_targets`
-  - target type (`company`, `agent`, later `project`)
+  - target type (`company`, `agent`)
  - target id
  - binding id

 - `memory_operations`
  - company id
  - binding id
-  - operation type (`write`, `query`, `forget`, `browse`, `correct`)
+  - operation type (`capture`, `record_upsert`, `query`, `list`, `get`, `forget`, `correct`)
  - scope fields
  - source refs
-  - usage / latency / cost
-  - success / error
+  - usage, latency, and attribution mode
+  - related heartbeat run id
+  - related cost event id
+  - success or error
+
+- `memory_extraction_jobs`
+  - company id
+  - binding id
+  - operation id
+  - provider job id
+  - hook kind
+  - status
+  - source refs
+  - error
+  - submitted, started, and finished timestamps

 Provider-specific long-form state should stay in plugin state or the provider itself unless a built-in local provider needs its own schema.

@@ -382,45 +666,46 @@ The design should still treat that built-in as just another provider behind the
 ### Phase 1: Control-plane contract

 - add memory binding models and API types
- add plugin capability / registration surface for memory providers
- add operation logging and usage reporting
+- add company default plus agent override resolution
+- add plugin capability and registration surface for memory providers

-### Phase 2: One built-in + one plugin example
+### Phase 2: Hook delivery and operation audit
+
+- add shared memory hook emission in core
+- add operation logging, extraction job state, and usage attribution
+- add direct-provider cost and finance-event linkage rules
+
+### Phase 3: One built-in plus one plugin example

 - ship a local markdown-first provider
 - ship one hosted adapter example to validate the external-provider path

-### Phase 3: UI inspection
+### Phase 4: UI inspection

- add company / agent memory settings
+- add company and agent memory settings
 - add a memory operation explorer
+- add record list and detail surfaces
 - add source backlinks to issues and runs

-### Phase 4: Automatic hooks
-
- pre-run hydrate
- post-run capture
- selected issue comment / document capture
-
 ### Phase 5: Rich capabilities

 - correction flows
- provider-native browse / graph views
- project-level overrides if needed
+- provider-native browse or graph views
 - evaluation dashboards
+- retention and quota controls

-## Open Questions
+## Remaining Open Questions

- Should project overrides exist in V1 of the memory service, or should we force company default + agent override first?
- Do we want Paperclip-managed extraction pipelines at all, or should built-ins be the only place where Paperclip owns extraction?
- Should memory usage extend the current `cost_events` model directly, or should memory operations keep a parallel usage log and roll up into `cost_events` secondarily?
- Do we want provider install / binding changes to require approvals for some companies?
+- Which built-in local provider should ship first: pure markdown, markdown plus embeddings, or a lightweight local vector store?
+- How much source payload should Paperclip snapshot inside `memory_operations` for debugging without duplicating large transcripts?
+- Should correction flows mutate provider state directly, create superseding records, or both depending on provider capability?
+- What default retention and size limits should the local built-in enforce?

 ## Bottom Line

 The right abstraction is:

- Paperclip owns memory bindings, scopes, provenance, governance, and usage reporting.
+- Paperclip owns bindings, resolution, hooks, provenance, policy, and attribution.
 - Providers own extraction, ranking, storage, and provider-native memory semantics.

-That gives Paperclip a stable "memory service" without locking the product to one memory philosophy or one vendor.
+That gives Paperclip a stable memory service without locking the product to one memory philosophy or one vendor, and it integrates the AWS lessons without importing AWS's model into core.