ADR 007: Configurable Whisper Model Sizes
Status
Accepted
Context
The pipeline supports local Whisper transcription, but different use cases require different trade-offs between speed and accuracy. We need to support multiple model sizes (tiny, base, small, medium, large-v3) with automatic optimization based on hardware and requirements.
Decision
Support configurable Whisper model sizes with automatic optimization selection.
Model Options
| Model | Size | VRAM | RAM | Speed | Accuracy | Use Case |
|---|---|---|---|---|---|---|
| tiny | 39M | ~1GB | ~500MB | 32x realtime | 58.0 WER | Quick tests |
| base | 74M | ~1GB | ~1GB | 16x realtime | 48.0 WER | Balanced |
| small | 244M | ~2GB | ~2GB | 6x realtime | 38.0 WER | Good quality |
| medium | 769M | ~5GB | ~5GB | 2x realtime | 32.0 WER | High quality |
| large-v3 | 1550M | ~10GB | ~10GB | 1x realtime | 25.0 WER | Best quality |
Configuration
class TranscriptionConfig:
model_size: Literal["tiny", "base", "small", "medium", "large-v3", "auto"]
device: Literal["cpu", "cuda", "auto"]
compute_type: Literal["int8", "float16", "float32"]
Auto-Selection Logic
def select_optimal_model(available_memory_gb: float, priority: str) -> str:
"""
Select optimal model based on hardware and priority.
Args:
available_memory_gb: Available system memory
priority: "speed" | "quality" | "balanced"
Returns:
Model size string
"""
if priority == "speed":
return "base" if available_memory_gb >= 1 else "tiny"
elif priority == "quality":
if available_memory_gb >= 10: return "large-v3"
if available_memory_gb >= 5: return "medium"
if available_memory_gb >= 2: return "small"
return "base"
else: # balanced
if available_memory_gb >= 2: return "small"
if available_memory_gb >= 1: return "base"
return "tiny"
Consequences
Positive
- Users can optimize for their hardware
- Supports resource-constrained environments
- Enables quality-first processing when needed
- Backward compatible (defaults to "base")
Negative
- Requires model download on first run
- Different models produce slightly different outputs
- More configuration complexity
Implementation
class TranscriptionEngine:
def __init__(
self,
model_size: str = "auto",
device: str = "auto",
priority: str = "balanced"
):
if model_size == "auto":
import psutil
memory = psutil.virtual_memory().available / (1024**3)
model_size = select_optimal_model(memory, priority)
self.model = whisper.load_model(model_size)
Notes
- Models download automatically from HuggingFace on first use
- Cached in
~/.cache/whisper/ - Allow override via environment variable:
WHISPER_MODEL_SIZE=medium