---
title: "Add core scripts to path" component_type: script version: "1.0.0" audience: contributor status: stable summary: "CODITECT Conversation Export Deduplicator - CLI Tool" keywords: ['deduplicate', 'export', 'review', 'validation'] tokens: ~500 created: 2025-12-22 updated: 2025-12-22 script_name: "deduplicate_export.py" language: python executable: true usage: "python3 scripts/deduplicate_export.py [options]" python_version: "3.10+" dependencies: [] modifies_files: false network_access: false requires_auth: false
CODITECT Conversation Export Deduplicator - CLI Tool
User-friendly command-line interface for conversation export deduplication. Supports single file, batch directory processing, statistics, and integrity checks.
PRODUCTION-GRADE ERROR HANDLING:
- Custom exception hierarchy (7 exceptions)
- Dual logging (file + stdout)
- Atomic file operations (temp → rename pattern)
- Backup before modifications
- Data integrity verification (checksums)
- Hash collision detection
- Resource cleanup with finally blocks
- Input validation
- Standardized exit codes (0/1/130)
- User-friendly error messages
Usage: deduplicate-export --file export.json --session-id my-session deduplicate-export --batch MEMORY-CONTEXT/exports/ deduplicate-export --stats --session-id my-session deduplicate-export --integrity --storage-dir MEMORY-CONTEXT/dedup_state
Author: Claude + AZ1.AI License: MIT
File: deduplicate_export.py
Classes
DedupError
Base exception for deduplication operations
SourceFileError
Export file not found or unreadable
HashCollisionError
Hash collision detected during deduplication
ProcessingError
Deduplication processing failure
BackupError
Backup creation or restoration failure
OutputError
Output file write failure
DataIntegrityError
Data integrity verification failure
GracefulExit
Handle graceful shutdown on SIGINT/SIGTERM
Colors
ANSI color codes for terminal output
Functions
setup_logging(log_dir, verbose)
Configure dual logging (file + stdout).
compute_file_checksum(filepath)
Compute SHA-256 checksum of file.
create_backup(filepath, logger)
Create timestamped backup of file.
atomic_write(filepath, content, logger)
Atomically write content to file using temp + rename.
verify_data_integrity(original_file, processed_file, logger)
Verify data integrity after processing.
print_header(text)
Print colored header
print_success(text)
Print success message
print_error(text)
Print error message
print_warning(text)
Print warning message
print_info(text)
Print info message
extract_session_id_from_filename(filepath)
Extract session ID from export filename.
parse_export_file(filepath, logger)
Parse export file and convert to standard format.
process_single_file(filepath, session_id, dedup, dry_run, verbose, logger, graceful_exit)
Process a single export file with comprehensive error handling.
process_batch(directory, dedup, dry_run, verbose, logger, graceful_exit)
Process all export files in a directory.
show_statistics(session_id, dedup)
Display statistics for a session.
Usage
python deduplicate_export.py