XML Extraction Scripts

Scripts to extract XML observations and summaries from Claude Code transcript files.

Scripts

`filter-actual-xml.py`

Recommended for import

Extracts only actual XML from assistant responses, filtering out:

Template/example XML (with placeholders like [...] or **field**:)
XML from tool_use blocks
XML from user messages

Output: ~/Scripts/claude-mem/actual_xml_only_with_timestamps.xml

Usage:

python3 scripts/extraction/filter-actual-xml.py

`extract-all-xml.py`

For debugging/analysis

Extracts ALL XML blocks from transcripts without filtering.

Output: ~/Scripts/claude-mem/all_xml_fragments_with_timestamps.xml

Usage:

python3 scripts/extraction/extract-all-xml.py

Workflow

Extract XML from transcripts:

cd ~/Scripts/claude-mem
python3 scripts/extraction/filter-actual-xml.py

Import to database:
```
npm run import:xml
```
Clean up duplicates (if needed):
```
npm run cleanup:duplicates
```

Source Data

Scripts read from: ~/.claude/projects/-Users-alexnewman-Scripts-claude-mem/*.jsonl

These are Claude Code session transcripts stored in JSONL (JSON Lines) format.

Output Format

<?xml version="1.0" encoding="UTF-8"?>
<transcript_extracts>

<!-- Block 1 | 2025-10-19 03:03:23 UTC -->
<observation>
  <type>discovery</type>
  <title>Example observation</title>
  ...
</observation>

<!-- Block 2 | 2025-10-19 03:03:45 UTC -->
<summary>
  <request>What was accomplished</request>
  ...
</summary>

</transcript_extracts>

Each XML block includes a comment with:

Block number
Original timestamp from transcript