Skip to main content

Claude Conversation Export Deduplicator

Hybrid deduplication system for Claude Code conversation exports that combines:

  • Sequence number tracking (primary deduplication mechanism)
  • Content hashing (secondary, catches exact duplicates)
  • Append-only log (persistent storage with zero data loss)
  • Idempotent processing (safe to re-run on same exports)

Solves the exponential growth problem in multi-day sessions:

  • Day 1: 13KB export
  • Day 2: 51KB export (cumulative)
  • Day 3: 439KB export (cumulative with full history)

Expected storage reduction: 95%+ through deduplication.

Author: Claude + AZ1.AI License: MIT

File: conversation_deduplicator.py

Classes

ClaudeConversationDeduplicator

Hybrid deduplication for Claude conversation exports.

Functions

parse_claude_export_file(filepath)

Parse Claude Code conversation export file.

extract_session_id_from_filename(filepath)

Extract session ID from export filename.

process_export(conversation_id, export_data, dry_run)

Process a Claude conversation export, returning only new unique messages.

get_full_conversation(conversation_id)

Reconstruct full conversation from append-only log.

get_statistics(conversation_id)

Get statistics for a conversation.

get_all_conversations()

Get list of all conversation IDs in the system.

validate_integrity(conversation_id)

Validate data integrity for a conversation.

Usage

python conversation_deduplicator.py