Skip to main content

ADR 007: Configurable Whisper Model Sizes

Status

Accepted

Context

The pipeline supports local Whisper transcription, but different use cases require different trade-offs between speed and accuracy. We need to support multiple model sizes (tiny, base, small, medium, large-v3) with automatic optimization based on hardware and requirements.

Decision

Support configurable Whisper model sizes with automatic optimization selection.

Model Options

ModelSizeVRAMRAMSpeedAccuracyUse Case
tiny39M~1GB~500MB32x realtime58.0 WERQuick tests
base74M~1GB~1GB16x realtime48.0 WERBalanced
small244M~2GB~2GB6x realtime38.0 WERGood quality
medium769M~5GB~5GB2x realtime32.0 WERHigh quality
large-v31550M~10GB~10GB1x realtime25.0 WERBest quality

Configuration

class TranscriptionConfig:
model_size: Literal["tiny", "base", "small", "medium", "large-v3", "auto"]
device: Literal["cpu", "cuda", "auto"]
compute_type: Literal["int8", "float16", "float32"]

Auto-Selection Logic

def select_optimal_model(available_memory_gb: float, priority: str) -> str:
"""
Select optimal model based on hardware and priority.

Args:
available_memory_gb: Available system memory
priority: "speed" | "quality" | "balanced"

Returns:
Model size string
"""
if priority == "speed":
return "base" if available_memory_gb >= 1 else "tiny"
elif priority == "quality":
if available_memory_gb >= 10: return "large-v3"
if available_memory_gb >= 5: return "medium"
if available_memory_gb >= 2: return "small"
return "base"
else: # balanced
if available_memory_gb >= 2: return "small"
if available_memory_gb >= 1: return "base"
return "tiny"

Consequences

Positive

  • Users can optimize for their hardware
  • Supports resource-constrained environments
  • Enables quality-first processing when needed
  • Backward compatible (defaults to "base")

Negative

  • Requires model download on first run
  • Different models produce slightly different outputs
  • More configuration complexity

Implementation

class TranscriptionEngine:
def __init__(
self,
model_size: str = "auto",
device: str = "auto",
priority: str = "balanced"
):
if model_size == "auto":
import psutil
memory = psutil.virtual_memory().available / (1024**3)
model_size = select_optimal_model(memory, priority)

self.model = whisper.load_model(model_size)

Notes

  • Models download automatically from HuggingFace on first use
  • Cached in ~/.cache/whisper/
  • Allow override via environment variable: WHISPER_MODEL_SIZE=medium