mirror of https://github.com/glittercowboy/get-shit-done synced 2026-05-14 02:56:38 +02:00

Files

Tibsfox ae8c0e6b26 docs(sdk): recommend 1-hour cache TTL for system prompts (#2055 )

* docs(sdk): recommend 1-hour cache TTL for system prompts (#1980)

Add sdk/docs/caching.md with prompt caching best practices for API
users building on GSD patterns. Recommends 1-hour TTL for executor,
planner, and verifier system prompts which are large and stable across
requests within a session.

The default 5-minute TTL expires during human review pauses between
phases. 1-hour TTL costs 2x on cache miss but pays for itself after
3 hits — GSD phases typically involve dozens of requests per hour.

Closes #1980

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs(sdk): fix ttl type to string per Anthropic API spec

The Anthropic extended caching API requires ttl as a string ('1h'),
not an integer (3600). Corrects both code examples in caching.md.

Review feedback on #2055 from @trek-e.

* docs(sdk): fix second ttl value in direct-api example to string '1h'

Follow-up to trek-e's re-review on #2055. The first fix corrected the Agent SDK integration example (line 16) but missed the second code block (line 60) that shows the direct Claude API call. Both now use ttl: '1h' (string) as the Anthropic extended caching API requires — integer forms like ttl: 3600 are silently ignored by the API and the cache never activates.

Closes #1980

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

2026-04-12 08:09:44 -04:00

2.6 KiB

Raw Blame History

Prompt Caching Best Practices

When building applications on the GSD SDK, system prompts that include workflow instructions (executor prompts, planner context, verification rules) are large and stable across requests. Prompt caching avoids re-processing these on every API call.

Recommended: 1-Hour Cache TTL

Use cache_control with a 1-hour TTL on system prompts that include GSD workflow content:

const response = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  system: [
    {
      type: 'text',
      text: executorPrompt, // GSD workflow instructions — large, stable across requests
      cache_control: { type: 'ephemeral', ttl: '1h' },
    },
  ],
  messages,
});

Why 1 hour instead of the default 5 minutes

GSD workflows involve human review pauses between phases — discussing results, checking verification output, deciding next steps. The default 5-minute TTL expires during these pauses, forcing full re-processing of the system prompt on the next request.

With a 1-hour TTL:

Cost: 2x write cost on cache miss (vs. 1.25x for 5-minute TTL)
Break-even: Pays for itself after 3 cache hits per hour
GSD usage pattern: Phase execution involves dozens of requests per hour, well above break-even
Cache refresh: Every cache hit resets the TTL at no cost, so active sessions maintain warm cache throughout

Which prompts to cache

Prompt	Cache?	Reason
Executor system prompt	Yes	Large (~10K tokens), identical across tasks in a phase
Planner system prompt	Yes	Large, stable within a planning session
Verifier system prompt	Yes	Large, stable within a verification session
User/task-specific content	No	Changes per request

SDK integration point

In session-runner.ts, the systemPrompt.append field carries the executor/planner prompt. When using the Claude API directly (outside the Agent SDK's query() helper), wrap this content with cache_control:

// In runPlanSession / runPhaseStepSession, the systemPrompt is:
systemPrompt: {
  type: 'preset',
  preset: 'claude_code',
  append: executorPrompt, // <-- this is the content to cache
}

// When calling the API directly, convert to:
system: [
  {
    type: 'text',
    text: executorPrompt,
    cache_control: { type: 'ephemeral', ttl: '1h' },
  },
]

2.6 KiB Raw Blame History