Skip to main content

ADR 003: Use Perceptual Hashing for Frame Deduplication

Status

Accepted

Context

Video frames are extracted at configurable rates (e.g., 0.5 fps), resulting in many similar frames. We need to:

  • Remove duplicates to reduce storage and processing
  • Preserve content changes (slide transitions, scene changes)
  • Process efficiently (potentially thousands of frames)

Decision

Use Perceptual Hashing (pHash) with Hamming distance comparison.

Consequences

Positive

  • Fast computation (O(n) for n frames)
  • Robust to minor changes (compression, slight variations)
  • 64-bit fingerprints enable fast comparison
  • Tuning via Hamming distance threshold
  • No ML model required

Negative

  • May miss semantic duplicates (different visuals, same content)
  • Not robust to major transformations (crops, rotations)
  • Threshold requires tuning per content type

Algorithm Details

import imagehash
from PIL import Image

def phash_dedup(frames, threshold=10):
unique = []
last_hash = None

for frame in frames:
img = Image.open(frame.path)
current = imagehash.phash(img)

if last_hash is None or (current - last_hash) > threshold:
unique.append(frame)
last_hash = current

return unique

Alternatives Considered

AlternativeProsCons
SSIMMore accurateMuch slower
CNN embeddingsSemantic understandingRequires GPU, model
Pixel diffFastNot robust to encoding
aHash/dHashFasterLess accurate
Feature matchingHandles transformsComplex, slow

Threshold Guidelines

Content TypeRecommended Threshold
Slides/Presentations8-10
Screen recordings10-12
Talking head12-15
Dynamic video15-20

Notes

Consider running SSIM on pHash-filtered candidates for higher accuracy if needed.