Skip to main content

---

title: "Unified Message Extractor" component_type: script version: "3.0.0" audience: contributor status: stable summary: "Extract ALL data from Claude Code sessions - zero data loss" keywords: ['extractor', 'database', 'session', 'deduplication', 'indexing'] tokens: ~500 created: 2025-12-22 updated: 2025-12-23 script_name: "unified-message-extractor.py" language: python executable: true usage: "python3 scripts/unified-message-extractor.py [options]" python_version: "3.10+" dependencies: [] modifies_files: true network_access: false requires_auth: false

Unified Message Extractor for CODITECT v3.0

Extracts ALL data from:

  1. Native JSONL session files (~/.claude/projects/*.jsonl)
  2. Export TXT files (from /export command)

Captures ALL 6 entry types:

  • user: User messages with todos, tool results, thinking metadata
  • assistant: AI responses with token usage, model info, errors
  • system: Compaction events, retries, errors
  • queue-operation: Command queue operations
  • summary: Conversation summaries
  • file-history-snapshot: File backup data

Zero data loss - raw JSON preserved for all entries.

Usage: python3 scripts/unified-message-extractor.py --jsonl SESSION.jsonl python3 scripts/unified-message-extractor.py --export EXPORT.txt python3 scripts/unified-message-extractor.py --batch --min-size 10 python3 scripts/unified-message-extractor.py --merge # Merge both stores

File: unified-message-extractor-v3.0.0-backup.py

Classes

JSONLExtractor

Extract messages from native Claude Code JSONL session files.

ExportTXTExtractor

Extract messages from Claude Code /export TXT files.

UnifiedMessageStore

Unified storage for deduplicated messages from all sources.

CustomFormatter

No description

Functions

find_context_storage()

Find the real context-storage directory, preventing duplicate databases.

is_claude_export_file(file_path)

Detect if a file is a Claude Code /export file by checking for signature patterns.

detect_file_type(file_path)

Auto-detect file type: 'jsonl', 'export', or None (unknown).

create_unified_message(content, role, source_type, source_file, source_line, session_id, checkpoint, timestamp, metadata, token_usage, agent_context)

Create a unified message format from any source.

extract_token_usage(entry)

Extract token usage from a JSONL assistant message entry.

extract_agent_context(entry)

Extract agent context from a JSONL entry.

create_comprehensive_entry(entry, source_file, source_line)

Create a comprehensive entry that captures ALL data from any JSONL entry type.

process_jsonl(file_path, store)

Process a single JSONL file.

get_file_hash(file_path)

Compute SHA256 hash of file content.

archive_export_file(file_path, archive_dir)

Move processed export file to archive directory with hash-based deduplication.

process_export(file_path, store, archive, archive_dir)

Process a single export TXT file.

find_jsonl_files(min_size_mb)

Find all JSONL session files.

find_git_root()

Find the git repository root from current directory.

find_export_files(search_paths, verify_content)

Find all Claude Code export TXT files.

main()

No description

Usage

python unified-message-extractor-v3.0.0-backup.py