feat(agents): enforce size budget + extract duplicated boilerplate (#2361) (#2362)

Adds tiered agent-size-budget test to prevent unbounded growth in agent definitions, which are loaded verbatim into context on every subagent dispatch. Extracts two duplicated blocks (mandatory-initial-read, project-skills-discovery) to shared references under get-shit-done/references/ and migrates the 5 top agents (planner, executor, debugger, verifier, phase-researcher) to @file includes. Also fixes two broken relative @planner-source-audit.md references in gsd-planner.md that silently disabled the planner's source audit discipline. Closes #2361 Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 17:25:23 +02:00 · 2026-04-17 10:47:08 -04:00
parent 4a912e2e45
commit c5e77c8809
9 changed files with 162 additions and 45 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -6,10 +6,17 @@ Format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
 ## [Unreleased]
 ### Added
 - **Agent size-budget enforcement** — New `tests/agent-size-budget.test.cjs` enforces tiered line-count limits on every `gsd-*.md` agent (XL=1600, LARGE=1000, DEFAULT=500). Unbounded agent growth is paid in context on every subagent dispatch; the test prevents regressions and requires a deliberate PR rationale to raise a budget (#2361)
 - **Shared `references/mandatory-initial-read.md`** — Extracts the `<required_reading>` enforcement block that was duplicated across 5 top agents. Agents now include it via a single `@~/.claude/get-shit-done/references/mandatory-initial-read.md` line, using Claude Code's progressive-disclosure `@file` reference mechanism (#2361)
 - **Shared `references/project-skills-discovery.md`** — Extracts the 5-step project skills discovery checklist that was copy-pasted across 5 top agents with slight divergence. Single source of truth with a per-agent "Application" paragraph documenting how planners, executors, researchers, verifiers, and debuggers each apply the rules (#2361)
 ### Changed
 - **`gsd-debugger` philosophy extracted to shared reference** — The 76-line `<philosophy>` block containing evergreen debugging disciplines (user-as-reporter framing, meta-debugging, foundation principles, cognitive-bias table, systematic investigation, when-to-restart protocol) is now in `get-shit-done/references/debugger-philosophy.md` and pulled into the agent via a single `@file` include. Same content, lighter per-dispatch context footprint (#2363)
 - **`gsd-planner`, `gsd-executor`, `gsd-debugger`, `gsd-verifier`, `gsd-phase-researcher`** — Migrated to `@file` includes for the mandatory-initial-read and project-skills-discovery boilerplate. Reduces per-dispatch context load without changing behavior (#2361)
 ### Fixed
 - **Broken `@planner-source-audit.md` relative references in `gsd-planner.md`** — Two locations referenced `@planner-source-audit.md` (resolves relative to working directory, almost always missing) instead of the correct absolute `@~/.claude/get-shit-done/references/planner-source-audit.md`. The planner's source audit discipline was silently unenforced (#2361)
 - **Shell hooks falsely flagged as stale on every session** — `gsd-phase-boundary.sh`, `gsd-session-state.sh`, and `gsd-validate-commit.sh` now ship with a `# gsd-hook-version: {{GSD_VERSION}}` header; the installer substitutes `{{GSD_VERSION}}` in `.sh` hooks the same way it does for `.js` hooks; and the stale-hook detector in `gsd-check-update.js` now matches bash `#` comment syntax in addition to JS `//` syntax. All three changes are required together — neither the regex fix alone nor the install fix alone is sufficient to resolve the false positive (#2136, #2206, #2209, #2210, #2212)
 ## [1.36.0] - 2026-04-14
--- a/agents/gsd-debugger.md
+++ b/agents/gsd-debugger.md
@@ -21,8 +21,7 @@ You are spawned by:
 Your job: Find the root cause through hypothesis testing, maintain debug file state, optionally fix and verify (depending on mode).
-**CRITICAL: Mandatory Initial Read**
+@~/.claude/get-shit-done/references/mandatory-initial-read.md
 If the prompt contains a `<required_reading>` block, you MUST use the `Read` tool to load every file listed there before performing any other actions. This is your primary context.
 **Core responsibilities:**
 - Investigate autonomously (user reports symptoms, you find cause)
@@ -37,14 +36,9 @@ If the prompt contains a `<required_reading>` block, you MUST use the `Read` too
@~/.claude/get-shit-done/references/common-bug-patterns.md
 </required_reading>
-**Project skills:** Check `.claude/skills/` or `.agents/skills/` directory if either exists:
+**Project skills:** @~/.claude/get-shit-done/references/project-skills-discovery.md
-1. List available skills (subdirectories)
+- Load `rules/*.md` as needed during **investigation and fix**.
-2. Read `SKILL.md` for each skill (lightweight index ~130 lines)
+- Follow skill rules relevant to the bug being investigated and the fix being applied.
 3. Load specific `rules/*.md` files as needed during implementation
 4. Do NOT load full `AGENTS.md` files (100KB+ context cost)
 5. Follow skill rules relevant to the bug being investigated and the fix being applied.
 This ensures project-specific patterns, conventions, and best practices are applied during execution.
 <philosophy>
--- a/agents/gsd-executor.md
+++ b/agents/gsd-executor.md
@@ -18,8 +18,7 @@ Spawned by `/gsd-execute-phase` orchestrator.
 Your job: Execute the plan completely, commit each task, create SUMMARY.md, update STATE.md.
-**CRITICAL: Mandatory Initial Read**
+@~/.claude/get-shit-done/references/mandatory-initial-read.md
 If the prompt contains a `<required_reading>` block, you MUST use the `Read` tool to load every file listed there before performing any other actions. This is your primary context.
 </role>
 <documentation_lookup>
@@ -54,14 +53,9 @@ Before executing, discover project context:
 **Project instructions:** Read `./CLAUDE.md` if it exists in the working directory. Follow all project-specific guidelines, security requirements, and coding conventions.
-**Project skills:** Check `.claude/skills/` or `.agents/skills/` directory if either exists:
+**Project skills:** @~/.claude/get-shit-done/references/project-skills-discovery.md
-1. List available skills (subdirectories)
+- Load `rules/*.md` as needed during **implementation**.
-2. Read `SKILL.md` for each skill (lightweight index ~130 lines)
+- Follow skill rules relevant to the task you are about to commit.
 3. Load specific `rules/*.md` files as needed during implementation
 4. Do NOT load full `AGENTS.md` files (100KB+ context cost)
 5. Follow skill rules relevant to your current task
 This ensures project-specific patterns, conventions, and best practices are applied during execution.
 **CLAUDE.md enforcement:** If `./CLAUDE.md` exists, treat its directives as hard constraints during execution. Before committing each task, verify that code changes do not violate CLAUDE.md rules (forbidden patterns, required conventions, mandated tools). If a task action would contradict a CLAUDE.md directive, apply the CLAUDE.md rule — it takes precedence over plan instructions. Document any CLAUDE.md-driven adjustments as deviations (Rule 2: auto-add missing critical functionality).
 </project_context>
--- a/agents/gsd-phase-researcher.md
+++ b/agents/gsd-phase-researcher.md
@@ -16,8 +16,7 @@ You are a GSD phase researcher. You answer "What do I need to know to PLAN this
 Spawned by `/gsd-plan-phase` (integrated) or `/gsd-research-phase` (standalone).
-**CRITICAL: Mandatory Initial Read**
+@~/.claude/get-shit-done/references/mandatory-initial-read.md
 If the prompt contains a `<required_reading>` block, you MUST use the `Read` tool to load every file listed there before performing any other actions. This is your primary context.
 **Core responsibilities:**
 - Investigate the phase's technical domain
@@ -62,14 +61,9 @@ Before researching, discover project context:
 **Project instructions:** Read `./CLAUDE.md` if it exists in the working directory. Follow all project-specific guidelines, security requirements, and coding conventions.
-**Project skills:** Check `.claude/skills/` or `.agents/skills/` directory if either exists:
+**Project skills:** @~/.claude/get-shit-done/references/project-skills-discovery.md
-1. List available skills (subdirectories)
+- Load `rules/*.md` as needed during **research**.
-2. Read `SKILL.md` for each skill (lightweight index ~130 lines)
+- Research output should account for project skill patterns and conventions.
 3. Load specific `rules/*.md` files as needed during research
 4. Do NOT load full `AGENTS.md` files (100KB+ context cost)
 5. Research should account for project skill patterns
 This ensures research aligns with project-specific conventions and libraries.
 **CLAUDE.md enforcement:** If `./CLAUDE.md` exists, extract all actionable directives (required tools, forbidden patterns, coding conventions, testing rules, security requirements). Include a `## Project Constraints (from CLAUDE.md)` section in RESEARCH.md listing these directives so the planner can verify compliance. Treat CLAUDE.md directives with the same authority as locked decisions from CONTEXT.md — research should not recommend approaches that contradict them.
 </project_context>
--- a/agents/gsd-planner.md
+++ b/agents/gsd-planner.md
@@ -22,8 +22,7 @@ Spawned by:
 Your job: Produce PLAN.md files that Claude executors can implement without interpretation. Plans are prompts, not documents that become prompts.
-**CRITICAL: Mandatory Initial Read**
+@~/.claude/get-shit-done/references/mandatory-initial-read.md
 If the prompt contains a `<required_reading>` block, you MUST use the `Read` tool to load every file listed there before performing any other actions. This is your primary context.
 **Core responsibilities:**
 - **FIRST: Parse and honor user decisions from CONTEXT.md** (locked decisions are NON-NEGOTIABLE)
@@ -44,7 +43,9 @@ Before planning, discover project context:
 **Project instructions:** Read `./CLAUDE.md` if it exists in the working directory. Follow all project-specific guidelines, security requirements, and coding conventions.
-**Project skills:** Check `.claude/skills/` or `.agents/skills/` if either exists. Read `SKILL.md` for each skill (lightweight index), load specific `rules/*.md` as needed. Do NOT load full `AGENTS.md` files. Ensure plans reflect project skill patterns.
+**Project skills:** @~/.claude/get-shit-done/references/project-skills-discovery.md
 - Load `rules/*.md` as needed during **planning**.
 - Ensure plans account for project skill patterns and conventions.
 </project_context>
 <context_fidelity>
@@ -95,7 +96,7 @@ Do NOT silently omit features. Instead:
 ## Multi-Source Coverage Audit (MANDATORY in every plan set)
-@planner-source-audit.md for full format, examples, and gap-handling rules.
+@~/.claude/get-shit-done/references/planner-source-audit.md for full format, examples, and gap-handling rules.
 Audit ALL four source types before finalizing: **GOAL** (ROADMAP phase goal), **REQ** (phase_req_ids from REQUIREMENTS.md), **RESEARCH** (RESEARCH.md features/constraints), **CONTEXT** (D-XX decisions from CONTEXT.md).
@@ -107,7 +108,7 @@ Exclusions (not gaps): Deferred Ideas in CONTEXT.md, items scoped to other phase
 <planner_authority_limits>
 ## The Planner Does Not Decide What Is Too Hard
-@planner-source-audit.md for constraint examples.
+@~/.claude/get-shit-done/references/planner-source-audit.md for constraint examples.
 The planner has no authority to judge a feature as too difficult, omit features because they seem challenging, or use "complex/difficult/non-trivial" to justify scope reduction.
--- a/agents/gsd-verifier.md
+++ b/agents/gsd-verifier.md
@@ -16,8 +16,7 @@ You are a GSD phase verifier. You verify that a phase achieved its GOAL, not jus
 Your job: Goal-backward verification. Start from what the phase SHOULD deliver, verify it actually exists and works in the codebase.
-**CRITICAL: Mandatory Initial Read**
+@~/.claude/get-shit-done/references/mandatory-initial-read.md
 If the prompt contains a `<required_reading>` block, you MUST use the `Read` tool to load every file listed there before performing any other actions. This is your primary context.
 **Critical mindset:** Do NOT trust SUMMARY.md claims. SUMMARYs document what Claude SAID it did. You verify what ACTUALLY exists in the code. These often differ.
@@ -34,14 +33,9 @@ Before verifying, discover project context:
 **Project instructions:** Read `./CLAUDE.md` if it exists in the working directory. Follow all project-specific guidelines, security requirements, and coding conventions.
-**Project skills:** Check `.claude/skills/` or `.agents/skills/` directory if either exists:
+**Project skills:** @~/.claude/get-shit-done/references/project-skills-discovery.md
-1. List available skills (subdirectories)
+- Load `rules/*.md` as needed during **verification**.
-2. Read `SKILL.md` for each skill (lightweight index ~130 lines)
+- Apply skill rules when scanning for anti-patterns and verifying quality.
 3. Load specific `rules/*.md` files as needed during verification
 4. Do NOT load full `AGENTS.md` files (100KB+ context cost)
 5. Apply skill rules when scanning for anti-patterns and verifying quality
 This ensures project-specific patterns, conventions, and best practices are applied during verification.
 </project_context>
 <core_principle>
--- a/get-shit-done/references/mandatory-initial-read.md
+++ b/get-shit-done/references/mandatory-initial-read.md
@@ -0,0 +1,2 @@
 **CRITICAL: Mandatory Initial Read**
 If the prompt contains a `<required_reading>` block, you MUST use the `Read` tool to load every file listed there before performing any other actions. This is your primary context.
--- a/get-shit-done/references/project-skills-discovery.md
+++ b/get-shit-done/references/project-skills-discovery.md
@@ -0,0 +1,19 @@
 # Project Skills Discovery
 Before execution, check for project-defined skills and apply their rules.
 **Discovery steps (shared across all GSD agents):**
 1. Check `.claude/skills/` or `.agents/skills/` directory — if neither exists, skip.
 2. List available skills (subdirectories).
 3. Read `SKILL.md` for each skill (lightweight index, typically ~130 lines).
 4. Load specific `rules/*.md` files only as needed during the current task.
 5. Do NOT load full `AGENTS.md` files — they are large (100KB+) and cost significant context.
 **Application** — how to apply the loaded rules depends on the calling agent:
 - Planners account for project skill patterns and conventions in the plan.
 - Executors follow skill rules relevant to the task being implemented.
 - Researchers ensure research output accounts for project skill patterns.
 - Verifiers apply skill rules when scanning for anti-patterns and verifying quality.
 - Debuggers follow skill rules relevant to the bug being investigated and the fix being applied.
 The caller's agent file should specify which application applies.
--- a/tests/agent-size-budget.test.cjs
+++ b/tests/agent-size-budget.test.cjs
@@ -0,0 +1,112 @@
 /**
 * Agent size budget.
 *
 * Agent definitions in `agents/gsd-*.md` are loaded verbatim into Claude's
 * context on every subagent dispatch. Unbounded growth is paid on every call
 * across every workflow.
 *
 * Budgets are tiered to reflect the intent of each agent class:
 *   - XL       : top-level orchestrators that own end-to-end rubrics
 *   - LARGE    : multi-phase operators with branching workflows
 *   - DEFAULT  : focused single-purpose agents
 *
 * Raising a budget is a deliberate choice — adjust the constant, write a
 * rationale in the PR, and make sure the bloat is not duplicated content
 * that belongs in `get-shit-done/references/`.
 *
 * See: https://github.com/gsd-build/get-shit-done/issues/2361
 */
 const { test, describe } = require('node:test');
 const assert = require('node:assert/strict');
 const fs = require('fs');
 const path = require('path');
 const AGENTS_DIR = path.join(__dirname, '..', 'agents');
 const XL_BUDGET = 1600;
 const LARGE_BUDGET = 1000;
 const DEFAULT_BUDGET = 500;
 const XL_AGENTS = new Set([
  'gsd-debugger',
  'gsd-planner',
 ]);
 const LARGE_AGENTS = new Set([
  'gsd-phase-researcher',
  'gsd-verifier',
  'gsd-doc-writer',
  'gsd-plan-checker',
  'gsd-executor',
  'gsd-code-fixer',
  'gsd-codebase-mapper',
  'gsd-project-researcher',
  'gsd-roadmapper',
 ]);
 const ALL_AGENTS = fs.readdirSync(AGENTS_DIR)
  .filter(f => f.startsWith('gsd-') && f.endsWith('.md'))
  .map(f => f.replace('.md', ''));
 function budgetFor(agent) {
  if (XL_AGENTS.has(agent)) return { tier: 'XL', limit: XL_BUDGET };
  if (LARGE_AGENTS.has(agent)) return { tier: 'LARGE', limit: LARGE_BUDGET };
  return { tier: 'DEFAULT', limit: DEFAULT_BUDGET };
 }
 function lineCount(filePath) {
  const content = fs.readFileSync(filePath, 'utf-8');
  if (content.length === 0) return 0;
  const trailingNewline = content.endsWith('\n') ? 1 : 0;
  return content.split('\n').length - trailingNewline;
 }
 describe('SIZE: agent line-count budget', () => {
  for (const agent of ALL_AGENTS) {
    const { tier, limit } = budgetFor(agent);
    test(`${agent} (${tier}) stays under ${limit} lines`, () => {
      const filePath = path.join(AGENTS_DIR, agent + '.md');
      const lines = lineCount(filePath);
      assert.ok(
        lines <= limit,
        `${agent}.md has ${lines} lines — exceeds ${tier} budget of ${limit}. ` +
        `Extract shared boilerplate to get-shit-done/references/ or raise the budget ` +
        `in tests/agent-size-budget.test.cjs with a rationale.`
      );
    });
  }
 });
 describe('SIZE: every agent is classified', () => {
  test('every agent falls in exactly one tier', () => {
    for (const agent of ALL_AGENTS) {
      const inXL = XL_AGENTS.has(agent);
      const inLarge = LARGE_AGENTS.has(agent);
      assert.ok(
        !(inXL && inLarge),
        `${agent} is in both XL_AGENTS and LARGE_AGENTS — pick one`
      );
    }
  });
  test('every named XL agent exists', () => {
    for (const agent of XL_AGENTS) {
      const filePath = path.join(AGENTS_DIR, agent + '.md');
      assert.ok(
        fs.existsSync(filePath),
        `XL_AGENTS references ${agent}.md which does not exist — clean up the set`
      );
    }
  });
  test('every named LARGE agent exists', () => {
    for (const agent of LARGE_AGENTS) {
      const filePath = path.join(AGENTS_DIR, agent + '.md');
      assert.ok(
        fs.existsSync(filePath),
        `LARGE_AGENTS references ${agent}.md which does not exist — clean up the set`
      );
    }
  });
 });
		`@@ -0,0 +1,2 @@`
							`CRITICAL: Mandatory Initial Read`
							If the prompt contains a `<required_reading>` block, you MUST use the `Read` tool to load every file listed there before performing any other actions. This is your primary context.