Files
claude-mem/scripts/extraction

XML Extraction Scripts

Scripts to extract XML observations and summaries from Claude Code transcript files.

Scripts

filter-actual-xml.py

Recommended for import

Extracts only actual XML from assistant responses, filtering out:

  • Template/example XML (with placeholders like [...] or **field**:)
  • XML from tool_use blocks
  • XML from user messages

Output: ~/Scripts/claude-mem/actual_xml_only_with_timestamps.xml

Usage:

python3 scripts/extraction/filter-actual-xml.py

extract-all-xml.py

For debugging/analysis

Extracts ALL XML blocks from transcripts without filtering.

Output: ~/Scripts/claude-mem/all_xml_fragments_with_timestamps.xml

Usage:

python3 scripts/extraction/extract-all-xml.py

Workflow

  1. Extract XML from transcripts:

    cd ~/Scripts/claude-mem
    python3 scripts/extraction/filter-actual-xml.py
    
  2. Import to database:

    npm run import:xml
    
  3. Clean up duplicates (if needed):

    npm run cleanup:duplicates
    

Source Data

Scripts read from: ~/.claude/projects/-Users-alexnewman-Scripts-claude-mem/*.jsonl

These are Claude Code session transcripts stored in JSONL (JSON Lines) format.

Output Format

<?xml version="1.0" encoding="UTF-8"?>
<transcript_extracts>

<!-- Block 1 | 2025-10-19 03:03:23 UTC -->
<observation>
  <type>discovery</type>
  <title>Example observation</title>
  ...
</observation>

<!-- Block 2 | 2025-10-19 03:03:45 UTC -->
<summary>
  <request>What was accomplished</request>
  ...
</summary>

</transcript_extracts>

Each XML block includes a comment with:

  • Block number
  • Original timestamp from transcript