ADR 003: Use Perceptual Hashing for Frame Deduplication
Status
Accepted
Context
Video frames are extracted at configurable rates (e.g., 0.5 fps), resulting in many similar frames. We need to:
- Remove duplicates to reduce storage and processing
- Preserve content changes (slide transitions, scene changes)
- Process efficiently (potentially thousands of frames)
Decision
Use Perceptual Hashing (pHash) with Hamming distance comparison.
Consequences
Positive
- Fast computation (O(n) for n frames)
- Robust to minor changes (compression, slight variations)
- 64-bit fingerprints enable fast comparison
- Tuning via Hamming distance threshold
- No ML model required
Negative
- May miss semantic duplicates (different visuals, same content)
- Not robust to major transformations (crops, rotations)
- Threshold requires tuning per content type
Algorithm Details
import imagehash
from PIL import Image
def phash_dedup(frames, threshold=10):
unique = []
last_hash = None
for frame in frames:
img = Image.open(frame.path)
current = imagehash.phash(img)
if last_hash is None or (current - last_hash) > threshold:
unique.append(frame)
last_hash = current
return unique
Alternatives Considered
| Alternative | Pros | Cons |
|---|---|---|
| SSIM | More accurate | Much slower |
| CNN embeddings | Semantic understanding | Requires GPU, model |
| Pixel diff | Fast | Not robust to encoding |
| aHash/dHash | Faster | Less accurate |
| Feature matching | Handles transforms | Complex, slow |
Threshold Guidelines
| Content Type | Recommended Threshold |
|---|---|
| Slides/Presentations | 8-10 |
| Screen recordings | 10-12 |
| Talking head | 12-15 |
| Dynamic video | 15-20 |
Notes
Consider running SSIM on pHash-filtered candidates for higher accuracy if needed.