Skip to main content

Deduplicate all tables in context.db - ensure unique entries.

Removes duplicate entries across all major tables:

  • decisions (by decision text)
  • code_patterns (by code content)
  • doc_index (by file_path) - already has UNIQUE constraint

Usage: python3 scripts/deduplicate-all-tables.py # Dry run python3 scripts/deduplicate-all-tables.py --apply # Apply changes python3 scripts/deduplicate-all-tables.py --stats # Show statistics only

File: deduplicate-all-tables.py

Functions

get_all_stats(conn)

Get statistics for all major tables.

deduplicate_decisions(conn, dry_run)

Deduplicate decisions table by decision text.

deduplicate_code_patterns(conn, dry_run)

Deduplicate code_patterns table by code content.

add_unique_constraints(conn, dry_run)

Add UNIQUE constraints to prevent future duplicates.

vacuum_database(conn)

Vacuum the database to reclaim space.

main()

No description

Usage

python deduplicate-all-tables.py