mirror of
https://github.com/thedotmack/claude-mem
synced 2026-04-25 17:15:04 +02:00
XML Extraction Scripts
Scripts to extract XML observations and summaries from Claude Code transcript files.
Scripts
filter-actual-xml.py
Recommended for import
Extracts only actual XML from assistant responses, filtering out:
- Template/example XML (with placeholders like
[...]or**field**:) - XML from tool_use blocks
- XML from user messages
Output: ~/Scripts/claude-mem/actual_xml_only_with_timestamps.xml
Usage:
python3 scripts/extraction/filter-actual-xml.py
extract-all-xml.py
For debugging/analysis
Extracts ALL XML blocks from transcripts without filtering.
Output: ~/Scripts/claude-mem/all_xml_fragments_with_timestamps.xml
Usage:
python3 scripts/extraction/extract-all-xml.py
Workflow
-
Extract XML from transcripts:
cd ~/Scripts/claude-mem python3 scripts/extraction/filter-actual-xml.py -
Import to database:
npm run import:xml -
Clean up duplicates (if needed):
npm run cleanup:duplicates
Source Data
Scripts read from: ~/.claude/projects/-Users-alexnewman-Scripts-claude-mem/*.jsonl
These are Claude Code session transcripts stored in JSONL (JSON Lines) format.
Output Format
<?xml version="1.0" encoding="UTF-8"?>
<transcript_extracts>
<!-- Block 1 | 2025-10-19 03:03:23 UTC -->
<observation>
<type>discovery</type>
<title>Example observation</title>
...
</observation>
<!-- Block 2 | 2025-10-19 03:03:45 UTC -->
<summary>
<request>What was accomplished</request>
...
</summary>
</transcript_extracts>
Each XML block includes a comment with:
- Block number
- Original timestamp from transcript