Skip to main content

---

title: "Add core scripts to path" component_type: script version: "1.0.0" audience: contributor status: stable summary: "CODITECT Conversation Export Deduplicator - CLI Tool" keywords: ['deduplicate', 'export', 'review', 'validation'] tokens: ~500 created: 2025-12-22 updated: 2025-12-22 script_name: "deduplicate_export.py" language: python executable: true usage: "python3 scripts/deduplicate_export.py [options]" python_version: "3.10+" dependencies: [] modifies_files: false network_access: false requires_auth: false

CODITECT Conversation Export Deduplicator - CLI Tool

User-friendly command-line interface for conversation export deduplication. Supports single file, batch directory processing, statistics, and integrity checks.

PRODUCTION-GRADE ERROR HANDLING:

  • Custom exception hierarchy (7 exceptions)
  • Dual logging (file + stdout)
  • Atomic file operations (temp → rename pattern)
  • Backup before modifications
  • Data integrity verification (checksums)
  • Hash collision detection
  • Resource cleanup with finally blocks
  • Input validation
  • Standardized exit codes (0/1/130)
  • User-friendly error messages

Usage: deduplicate-export --file export.json --session-id my-session deduplicate-export --batch MEMORY-CONTEXT/exports/ deduplicate-export --stats --session-id my-session deduplicate-export --integrity --storage-dir MEMORY-CONTEXT/dedup_state

Author: Claude + AZ1.AI License: MIT

File: deduplicate_export.py

Classes

DedupError

Base exception for deduplication operations

SourceFileError

Export file not found or unreadable

HashCollisionError

Hash collision detected during deduplication

ProcessingError

Deduplication processing failure

BackupError

Backup creation or restoration failure

OutputError

Output file write failure

DataIntegrityError

Data integrity verification failure

GracefulExit

Handle graceful shutdown on SIGINT/SIGTERM

Colors

ANSI color codes for terminal output

Functions

setup_logging(log_dir, verbose)

Configure dual logging (file + stdout).

compute_file_checksum(filepath)

Compute SHA-256 checksum of file.

create_backup(filepath, logger)

Create timestamped backup of file.

atomic_write(filepath, content, logger)

Atomically write content to file using temp + rename.

verify_data_integrity(original_file, processed_file, logger)

Verify data integrity after processing.

Print colored header

Print success message

Print error message

Print warning message

Print info message

extract_session_id_from_filename(filepath)

Extract session ID from export filename.

parse_export_file(filepath, logger)

Parse export file and convert to standard format.

process_single_file(filepath, session_id, dedup, dry_run, verbose, logger, graceful_exit)

Process a single export file with comprehensive error handling.

process_batch(directory, dedup, dry_run, verbose, logger, graceful_exit)

Process all export files in a directory.

show_statistics(session_id, dedup)

Display statistics for a session.

Usage

python deduplicate_export.py