ADR 003: Use Perceptual Hashing for Frame Deduplication

Status

Accepted

Context

Video frames are extracted at configurable rates (e.g., 0.5 fps), resulting in many similar frames. We need to:

Remove duplicates to reduce storage and processing
Preserve content changes (slide transitions, scene changes)
Process efficiently (potentially thousands of frames)

Decision

Use Perceptual Hashing (pHash) with Hamming distance comparison.

Consequences

Positive

Fast computation (O(n) for n frames)
Robust to minor changes (compression, slight variations)
64-bit fingerprints enable fast comparison
Tuning via Hamming distance threshold
No ML model required

Negative

May miss semantic duplicates (different visuals, same content)
Not robust to major transformations (crops, rotations)
Threshold requires tuning per content type

Algorithm Details

import imagehash
from PIL import Image

def phash_dedup(frames, threshold=10):
    unique = []
    last_hash = None
    
    for frame in frames:
        img = Image.open(frame.path)
        current = imagehash.phash(img)
        
        if last_hash is None or (current - last_hash) > threshold:
            unique.append(frame)
            last_hash = current
            
    return unique

Alternatives Considered

Alternative	Pros	Cons
SSIM	More accurate	Much slower
CNN embeddings	Semantic understanding	Requires GPU, model
Pixel diff	Fast	Not robust to encoding
aHash/dHash	Faster	Less accurate
Feature matching	Handles transforms	Complex, slow

Threshold Guidelines

Content Type	Recommended Threshold
Slides/Presentations	8-10
Screen recordings	10-12
Talking head	12-15
Dynamic video	15-20

Notes

Consider running SSIM on pHash-filtered candidates for higher accuracy if needed.

Status​

Context​

Decision​

Consequences​

Positive​

Negative​

Algorithm Details​

Alternatives Considered​

Threshold Guidelines​

Notes​