fix: break infinite summary-retry loop (#1633) (#2072)

* Initial plan

* fix: break infinite summary-retry loop (#1633)

Three-part fix:
1. Parser coercion: When LLM returns <observation> tags instead of <summary>,
   coerce observation content into summary fields (root cause fix)
2. Stronger summary prompt: Add clearer tag requirements with warnings
3. Circuit breaker: Track consecutive summary failures per session,
   skip further attempts after 3 failures to prevent unbounded prompt growth

Agent-Logs-Url: https://github.com/thedotmack/claude-mem/sessions/e345e8ec-bc97-4eaa-94bd-6e951fda8f77

Co-authored-by: thedotmack <683968+thedotmack@users.noreply.github.com>

* refactor: extract shared constants for summary mode marker and failure threshold

Addresses code review feedback: SUMMARY_MODE_MARKER and
MAX_CONSECUTIVE_SUMMARY_FAILURES are now defined once in sdk/prompts.ts
and imported by ResponseProcessor and SessionManager.

Agent-Logs-Url: https://github.com/thedotmack/claude-mem/sessions/e345e8ec-bc97-4eaa-94bd-6e951fda8f77

Co-authored-by: thedotmack <683968+thedotmack@users.noreply.github.com>

* fix: guard summary failure counter on summaryExpected (Greptile P1)

The circuit breaker counter previously incremented on any response
containing <observation> or <summary> tags — which matches virtually
every normal observation response. After 3 observations the breaker
would open and permanently block summarization, reproducing the
data-loss scenario #1633 was meant to prevent.

Gate the increment block on summaryExpected (already computed for
parseSummary coercion) so the counter only tracks actual summary
attempts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: cover circuit-breaker + apply review polish

- Use findLast / at(-1) for last-user-message lookup instead of
  filter + index (O(1) common case).
- Drop redundant `|| 0` fallback — field is required and initialized.
- Add comment noting counter is ephemeral by design.
- Add ResponseProcessor tests covering:
  * counter NOT incrementing on normal observation responses
    (regression guard for the Greptile P1)
  * counter incrementing when a summary was expected but missing
  * counter resetting to 0 on successful summary storage

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: iterate all observation blocks; don't count skip_summary as failure

Addresses CodeRabbit review on #2072:

- coerceObservationToSummary now iterates all <observation> blocks
  with a global regex and returns the first block that has title,
  narrative, or facts. Previously, an empty leading observation
  would short-circuit and discard populated follow-ups.

- Circuit-breaker counter now treats explicit <skip_summary/> as
  neutral — neither a failure nor a success — so a run that happens
  to end on a skip doesn't punish the session or mask a prior bad
  streak. Real failures (no summary, no skip) still increment.

- Tests added for both cases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: reference SUMMARY_MODE_MARKER constant instead of hardcoded string

Addresses CodeRabbit nitpick: tests should pull the marker from the
canonical source so they don't silently drift when the constant is
renamed or edited.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: also coerce observations when <summary> has empty sub-tags

When the LLM wraps an empty <summary></summary> around real observation
content, the #1360 empty-subtag guard rejects the summary and returns
null — which would lose the observation content and resurrect the
#1633 retry loop. Fall back to coerceObservationToSummary in that
branch too, mirroring the unmatched-<summary> path.

Adds a test covering the empty-summary-wraps-observation case and
a guard test for empty summary with no observation content.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thedotmack <683968+thedotmack@users.noreply.github.com>
Co-authored-by: Alex Newman <thedotmack@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Copilot
2026-04-19 12:00:38 -07:00
committed by GitHub
parent beea7899b9
commit 8ec91e7ffa
7 changed files with 351 additions and 11 deletions

View File

@@ -8,10 +8,14 @@ import { describe, it, expect } from 'bun:test';
import { parseSummary } from '../../src/sdk/parser.js';
describe('parseSummary', () => {
it('returns null when no <summary> tag present', () => {
it('returns null when no <summary> tag present and coercion disabled', () => {
expect(parseSummary('<observation><title>foo</title></observation>')).toBeNull();
});
it('returns null when no <summary> or <observation> tags present', () => {
expect(parseSummary('Some plain text response without any XML tags')).toBeNull();
});
it('returns null when <summary> has no sub-tags (false positive — fix for #1360)', () => {
// This is the bug: observation response accidentally contains <summary>some text</summary>
expect(parseSummary('<observation>done <summary>some content here</summary></observation>')).toBeNull();
@@ -50,4 +54,96 @@ describe('parseSummary', () => {
it('returns null when skip_summary tag is present', () => {
expect(parseSummary('<skip_summary reason="no work done"/>')).toBeNull();
});
// Observation-to-summary coercion tests (#1633)
it('coerces <observation> with content into a summary when coerceFromObservation=true (#1633)', () => {
const result = parseSummary('<observation><title>foo</title></observation>', undefined, true);
expect(result).not.toBeNull();
expect(result?.request).toBe('foo');
expect(result?.completed).toBe('foo');
});
it('coerces observation with narrative into summary with investigated field (#1633)', () => {
const text = `<observation>
<type>refactor</type>
<title>UObjectArray refactored</title>
<narrative>Removed local XXXX and migrated to new pattern</narrative>
</observation>`;
const result = parseSummary(text, undefined, true);
expect(result).not.toBeNull();
expect(result?.request).toBe('UObjectArray refactored');
expect(result?.investigated).toBe('Removed local XXXX and migrated to new pattern');
});
it('coerces observation with facts into summary with learned field (#1633)', () => {
const text = `<observation>
<type>discovery</type>
<title>JWT token handling</title>
<facts>
<fact>Tokens expire after 1 hour</fact>
<fact>Refresh flow uses rotating keys</fact>
</facts>
</observation>`;
const result = parseSummary(text, undefined, true);
expect(result).not.toBeNull();
expect(result?.request).toBe('JWT token handling');
expect(result?.learned).toBe('Tokens expire after 1 hour; Refresh flow uses rotating keys');
});
it('coerces observation with subtitle into completed field (#1633)', () => {
const text = `<observation>
<type>config</type>
<title>Database migration</title>
<subtitle>Added new index for performance</subtitle>
</observation>`;
const result = parseSummary(text, undefined, true);
expect(result).not.toBeNull();
expect(result?.completed).toBe('Database migration — Added new index for performance');
});
it('returns null for empty observation even with coercion enabled (#1633)', () => {
const text = `<observation><type>config</type></observation>`;
expect(parseSummary(text, undefined, true)).toBeNull();
});
it('prefers <summary> tags over observation coercion when both present (#1633)', () => {
const text = `<observation><title>obs title</title></observation>
<summary><request>summary request</request></summary>`;
const result = parseSummary(text, undefined, true);
expect(result).not.toBeNull();
expect(result?.request).toBe('summary request');
});
it('falls back to observation coercion when <summary> matches but has empty sub-tags (#1633)', () => {
// LLM wraps an empty summary around real observation content — without the
// fallback, the empty-subtag guard (#1360) rejects the summary and we lose
// the observation content, resurrecting the retry loop.
const text = `<summary></summary>
<observation>
<title>the real work</title>
<narrative>what actually happened</narrative>
</observation>`;
const result = parseSummary(text, undefined, true);
expect(result).not.toBeNull();
expect(result?.request).toBe('the real work');
expect(result?.investigated).toBe('what actually happened');
});
it('empty <summary> with no observation content still returns null (coercion disabled)', () => {
const text = '<summary></summary>';
expect(parseSummary(text, undefined, true)).toBeNull();
});
it('skips empty leading observation blocks and coerces from the first populated one (#1633)', () => {
const text = `<observation><type>discovery</type></observation>
<observation>
<type>bugfix</type>
<title>second block has content</title>
<narrative>fixed the crash</narrative>
</observation>`;
const result = parseSummary(text, undefined, true);
expect(result).not.toBeNull();
expect(result?.request).toBe('second block has content');
expect(result?.investigated).toBe('fixed the crash');
});
});

View File

@@ -31,6 +31,7 @@ mock.module('../../../src/services/domain/ModeManager.js', () => ({
// Import after mocks
import { processAgentResponse } from '../../../src/services/worker/agents/ResponseProcessor.js';
import { SUMMARY_MODE_MARKER } from '../../../src/sdk/prompts.js';
import type { WorkerRef, StorageResult } from '../../../src/services/worker/agents/types.js';
import type { ActiveSession } from '../../../src/services/worker-types.js';
import type { DatabaseManager } from '../../../src/services/worker/DatabaseManager.js';
@@ -130,8 +131,9 @@ describe('ResponseProcessor', () => {
conversationHistory: [],
currentProvider: 'claude',
processingMessageIds: [], // CLAIM-CONFIRM pattern: track message IDs being processed
consecutiveSummaryFailures: 0,
...overrides,
};
} as ActiveSession;
}
describe('parsing observations from XML response', () => {
@@ -726,4 +728,103 @@ describe('ResponseProcessor', () => {
expect(session.lastSummaryStored).toBe(false);
});
});
describe('circuit breaker: consecutiveSummaryFailures counter (#1633)', () => {
const SUMMARY_PROMPT = `--- ${SUMMARY_MODE_MARKER} ---\nDo the summary now.`;
it('does NOT increment the counter on normal observation responses (P1 regression guard)', async () => {
// Session where the last user message is an OBSERVATION request, not a summary request.
// The counter must stay at 0 even though the response has <observation> tags and no summary.
mockStoreObservations.mockImplementation(() => ({
observationIds: [1],
summaryId: null,
createdAtEpoch: 1700000000000,
} as StorageResult));
const session = createMockSession({
conversationHistory: [{ role: 'user', content: 'record a new observation' }],
});
const obsResponse = `
<observation>
<type>discovery</type>
<title>found a thing</title>
<narrative>it happened</narrative>
<facts></facts>
<concepts></concepts>
<files_read></files_read>
<files_modified></files_modified>
</observation>
`;
// Drive multiple observation responses — counter must never increment.
for (let i = 0; i < 5; i++) {
await processAgentResponse(obsResponse, session, mockDbManager, mockSessionManager, mockWorker, 0, null, 'TestAgent');
}
expect(session.consecutiveSummaryFailures).toBe(0);
});
it('increments the counter when a summary was expected but none was stored', async () => {
mockStoreObservations.mockImplementation(() => ({
observationIds: [],
summaryId: null,
createdAtEpoch: 1700000000000,
} as StorageResult));
const session = createMockSession({
conversationHistory: [{ role: 'user', content: SUMMARY_PROMPT }],
});
// LLM returned nothing structured — no summary stored
const badResponse = 'I cannot comply with that request.';
await processAgentResponse(badResponse, session, mockDbManager, mockSessionManager, mockWorker, 0, null, 'TestAgent');
expect(session.consecutiveSummaryFailures).toBe(1);
});
it('does NOT increment the counter on intentional <skip_summary/> responses', async () => {
mockStoreObservations.mockImplementation(() => ({
observationIds: [],
summaryId: null,
createdAtEpoch: 1700000000000,
} as StorageResult));
const session = createMockSession({
consecutiveSummaryFailures: 1,
conversationHistory: [{ role: 'user', content: SUMMARY_PROMPT }],
});
const skipResponse = '<skip_summary reason="no meaningful work this session"/>';
await processAgentResponse(skipResponse, session, mockDbManager, mockSessionManager, mockWorker, 0, null, 'TestAgent');
// Skip is neutral — counter stays where it was, no spurious increment
expect(session.consecutiveSummaryFailures).toBe(1);
});
it('resets the counter to 0 when a summary is successfully stored', async () => {
mockStoreObservations.mockImplementation(() => ({
observationIds: [],
summaryId: 42,
createdAtEpoch: 1700000000000,
} as StorageResult));
const session = createMockSession({
consecutiveSummaryFailures: 2,
conversationHistory: [{ role: 'user', content: SUMMARY_PROMPT }],
});
const goodResponse = `
<summary>
<request>wrap it up</request>
<investigated>the thing</investigated>
<learned>the answer</learned>
<completed>the work</completed>
<next_steps>none</next_steps>
</summary>
`;
await processAgentResponse(goodResponse, session, mockDbManager, mockSessionManager, mockWorker, 0, null, 'TestAgent');
expect(session.consecutiveSummaryFailures).toBe(0);
});
});
});