---
title: "Unified Message Extractor" component_type: script version: "3.0.0" audience: contributor status: stable summary: "Extract ALL data from Claude Code sessions - zero data loss" keywords: ['extractor', 'database', 'session', 'deduplication', 'indexing'] tokens: ~500 created: 2025-12-22 updated: 2025-12-23 script_name: "unified-message-extractor.py" language: python executable: true usage: "python3 scripts/unified-message-extractor.py [options]" python_version: "3.10+" dependencies: [] modifies_files: true network_access: false requires_auth: false
Unified Message Extractor for CODITECT v3.0
Extracts ALL data from:
- Native JSONL session files (~/.claude/projects/*.jsonl)
- Export TXT files (from /export command)
Captures ALL 6 entry types:
- user: User messages with todos, tool results, thinking metadata
- assistant: AI responses with token usage, model info, errors
- system: Compaction events, retries, errors
- queue-operation: Command queue operations
- summary: Conversation summaries
- file-history-snapshot: File backup data
Zero data loss - raw JSON preserved for all entries.
Usage: python3 scripts/unified-message-extractor.py --jsonl SESSION.jsonl python3 scripts/unified-message-extractor.py --export EXPORT.txt python3 scripts/unified-message-extractor.py --batch --min-size 10 python3 scripts/unified-message-extractor.py --merge # Merge both stores
File: unified-message-extractor-v3.0.0-backup.py
Classes
JSONLExtractor
Extract messages from native Claude Code JSONL session files.
ExportTXTExtractor
Extract messages from Claude Code /export TXT files.
UnifiedMessageStore
Unified storage for deduplicated messages from all sources.
CustomFormatter
No description
Functions
find_context_storage()
Find the real context-storage directory, preventing duplicate databases.
is_claude_export_file(file_path)
Detect if a file is a Claude Code /export file by checking for signature patterns.
detect_file_type(file_path)
Auto-detect file type: 'jsonl', 'export', or None (unknown).
create_unified_message(content, role, source_type, source_file, source_line, session_id, checkpoint, timestamp, metadata, token_usage, agent_context)
Create a unified message format from any source.
extract_token_usage(entry)
Extract token usage from a JSONL assistant message entry.
extract_agent_context(entry)
Extract agent context from a JSONL entry.
create_comprehensive_entry(entry, source_file, source_line)
Create a comprehensive entry that captures ALL data from any JSONL entry type.
process_jsonl(file_path, store)
Process a single JSONL file.
get_file_hash(file_path)
Compute SHA256 hash of file content.
archive_export_file(file_path, archive_dir)
Move processed export file to archive directory with hash-based deduplication.
process_export(file_path, store, archive, archive_dir)
Process a single export TXT file.
find_jsonl_files(min_size_mb)
Find all JSONL session files.
find_git_root()
Find the git repository root from current directory.
find_export_files(search_paths, verify_content)
Find all Claude Code export TXT files.
main()
No description
Usage
python unified-message-extractor-v3.0.0-backup.py