Updates Summary - Video-to-Knowledge Pipeline v1.2
Overview
Major enhancements adding interactive CLI, configurable Whisper models, Kimi Vision support, and presentation deck generation.
🆕 New Features
1. Interactive CLI (interactive_cli.py)
Launch with:
python -m src.pipeline
Features:
- 🎬 Process new video - Full guided setup wizard
- ⚡ Quick process - Use saved defaults
- 📦 Manage Whisper models - Download/view/manage models
- ⚙️ Configuration - View/reset settings
- Guided prompts for URL, model selection, FPS, output options
Configuration Persistence:
- Saves settings to
~/.config/video-pipeline/config.json - Remembers last URL, model preferences, output format
2. Whisper Model Manager (model_manager.py)
Supported Models:
| Model | Params | RAM | Disk | Speed |
|---|---|---|---|---|
| tiny | 39M | 0.5GB | 75MB | 32x |
| base | 74M | 1GB | 142MB | 16x |
| small | 244M | 2GB | 461MB | 6x |
| medium | 769M | 5GB | 1.4GB | 2x |
| large-v3 | 1550M | 10GB | 2.9GB | 1x |
Commands:
# List models
python -m src.pipeline models
# Download specific model
python -m src.pipeline models --download small
# Download all models
python -m src.pipeline models --download-all
# Interactive model manager
python -m src.pipeline models
Auto-Selection Logic:
- Detects available RAM using
psutil - Selects optimal model based on priority (speed/quality/balanced)
- Falls back gracefully to smaller models
3. Kimi Vision Integration (kimi_vision.py)
Alternative to Claude for frame analysis.
Setup:
export KIMI_API_KEY="your-key"
# or
export MOONSHOT_API_KEY="your-key"
Usage:
# Auto-detect (prefers Kimi if key available)
python -m src.pipeline "video.mp4"
# Force Kimi
python -m src.pipeline "video.mp4" --vision kimi
# Force Claude
python -m src.pipeline "video.mp4" --vision claude
Features:
- Same
ImageAnalysisoutput format as Claude - Rate limiting with semaphore
- Retry with exponential backoff
- Base64 image encoding
4. Presentation Deck Generator (deck_generator.py)
Creates interactive HTML presentations with synchronized transcript and frames.
Formats:
- HTML (default) - Full-featured interactive deck
- Markdown - Simple markdown slides
- Reveal.js - Reveal.js slideshow
Usage:
# Generate HTML deck
python -m src.pipeline "video.mp4" --deck
# Specify format
python -m src.pipeline "video.mp4" --deck --deck-format html
# Open in browser
python -m src.pipeline deck outputs/deck/presentation.html
HTML Deck Features:
- ⏱️ Synchronized playback - Auto-advances based on transcript timing
- 🖼️ Dual view - Left: video frame, Right: transcript
- ⌨️ Keyboard navigation - Arrow keys, Home, End
- ▶️ Speed control - 1x, 1.5x, 2x playback
- 📊 Progress bar - Visual progress indicator
- 🎨 Dark theme - Easy on the eyes
File Structure:
outputs/deck/
├── presentation.html # Main deck file
├── presentation_data.json # Slide data
└── deck_images/ # Copied frame images
├── slide_000.png
├── slide_001.png
└── ...
🔧 Enhanced Configuration
New PipelineConfig Options
PipelineConfig(
# Transcription backend
transcription_backend="local", # "local" or "openai-api"
# Whisper configuration
whisper_model="auto", # "tiny", "base", "small", "medium", "large-v3", "auto"
whisper_priority="balanced", # "speed", "quality", "balanced"
whisper_device="auto", # "cpu", "cuda", "auto"
# Vision provider
vision_provider="auto", # "auto", "kimi", "claude"
# Deck generation
create_deck=True, # Generate presentation deck
deck_format="html", # "html", "markdown", "revealjs"
)
Environment Variables
# Whisper
export WHISPER_MODEL_SIZE=medium
export WHISPER_PRIORITY=quality
export WHISPER_DEVICE=cuda
# Vision APIs
export KIMI_API_KEY="..."
export MOONSHOT_API_KEY="..."
export ANTHROPIC_API_KEY="..."
📊 File Statistics
| Component | Files | Lines | Purpose |
|---|---|---|---|
| Core Modules | 14 | ~4,150 | Main implementation |
| Documentation | 13 | ~2,800 | SDD, TDD, ADRs, C4 |
| Total | 27 | ~6,950 | Complete system |
New Files Added
src/model_manager.py(280 lines) - Model download & managementsrc/interactive_cli.py(320 lines) - Interactive menu systemsrc/kimi_vision.py(300 lines) - Kimi Vision API integrationsrc/deck_generator.py(580 lines) - HTML/Markdown/Reveal.js deckssrc/audio_processor_local.py(320 lines) - Local Whisper with configdocs/adr/007-configurable-whisper-models.md- ADRdocs/adr/008-kimi-vision.md- ADR
Updated Files
src/models.py- Added WhisperModel, TranscriptionPriority enumssrc/pipeline.py- Interactive CLI, deck generation, Kimi supportREADME.md- Updated usage instructionsPROJECT_SUMMARY.md- Updated feature listdocs/00_document_inventory.md- Added new ADRs and modules
🚀 Quick Start Examples
1. Interactive Mode (Recommended)
python -m src.pipeline
# Follow the guided prompts
2. Quick Process with Specific Model
# Download small model first
python -m src.pipeline models --download small
# Process video
python -m src.pipeline process "video.mp4" --whisper-model small
3. Use Kimi + Generate Deck
export KIMI_API_KEY="..."
python -m src.pipeline process "video.mp4" \
--vision kimi \
--whisper-model small \
--deck \
--deck-format html
# Open deck
python -m src.pipeline deck outputs/deck/presentation.html
4. Batch Model Download
# Download recommended models
python -m src.pipeline models --download small
python -m src.pipeline models --download medium
# Or all at once
python -m src.pipeline models --download-all
📝 Documentation Updates
- ADR-007: Configurable Whisper Model Sizes
- ADR-008: Kimi Vision for Frame Analysis
- README.md: Added interactive mode, model management, deck generation
- PROJECT_SUMMARY.md: Updated feature list and module counts
🔮 Future Enhancements
Potential next features:
- Real-time streaming processing
- Speaker diarization (who's speaking)
- Multi-language translation
- Video Q&A chatbot
- Custom deck templates
- Export to PowerPoint
Updated: 2026-02-05
Version: 1.2