Claude Conversation Export Deduplicator
Hybrid deduplication system for Claude Code conversation exports that combines:
- Sequence number tracking (primary deduplication mechanism)
- Content hashing (secondary, catches exact duplicates)
- Append-only log (persistent storage with zero data loss)
- Idempotent processing (safe to re-run on same exports)
Solves the exponential growth problem in multi-day sessions:
- Day 1: 13KB export
- Day 2: 51KB export (cumulative)
- Day 3: 439KB export (cumulative with full history)
Expected storage reduction: 95%+ through deduplication.
Author: Claude + AZ1.AI License: MIT
File: conversation_deduplicator.py
Classes
ClaudeConversationDeduplicator
Hybrid deduplication for Claude conversation exports.
Functions
parse_claude_export_file(filepath)
Parse Claude Code conversation export file.
extract_session_id_from_filename(filepath)
Extract session ID from export filename.
process_export(conversation_id, export_data, dry_run)
Process a Claude conversation export, returning only new unique messages.
get_full_conversation(conversation_id)
Reconstruct full conversation from append-only log.
get_statistics(conversation_id)
Get statistics for a conversation.
get_all_conversations()
Get list of all conversation IDs in the system.
validate_integrity(conversation_id)
Validate data integrity for a conversation.
Usage
python conversation_deduplicator.py