Updates Summary - Video-to-Knowledge Pipeline v1.2

Overview

Major enhancements adding interactive CLI, configurable Whisper models, Kimi Vision support, and presentation deck generation.

🆕 New Features

1. Interactive CLI (`interactive_cli.py`)

Launch with:

python -m src.pipeline

Features:

🎬 Process new video - Full guided setup wizard
⚡ Quick process - Use saved defaults
📦 Manage Whisper models - Download/view/manage models
⚙️ Configuration - View/reset settings
Guided prompts for URL, model selection, FPS, output options

Configuration Persistence:

Saves settings to ~/.config/video-pipeline/config.json
Remembers last URL, model preferences, output format

2. Whisper Model Manager (`model_manager.py`)

Supported Models:

Model	Params	RAM	Disk	Speed
tiny	39M	0.5GB	75MB	32x
base	74M	1GB	142MB	16x
small	244M	2GB	461MB	6x
medium	769M	5GB	1.4GB	2x
large-v3	1550M	10GB	2.9GB	1x

Commands:

# List models
python -m src.pipeline models

# Download specific model
python -m src.pipeline models --download small

# Download all models
python -m src.pipeline models --download-all

# Interactive model manager
python -m src.pipeline models

Auto-Selection Logic:

Detects available RAM using psutil
Selects optimal model based on priority (speed/quality/balanced)
Falls back gracefully to smaller models

3. Kimi Vision Integration (`kimi_vision.py`)

Alternative to Claude for frame analysis.

Setup:

export KIMI_API_KEY="your-key"
# or
export MOONSHOT_API_KEY="your-key"

Usage:

# Auto-detect (prefers Kimi if key available)
python -m src.pipeline "video.mp4"

# Force Kimi
python -m src.pipeline "video.mp4" --vision kimi

# Force Claude
python -m src.pipeline "video.mp4" --vision claude

Features:

Same ImageAnalysis output format as Claude
Rate limiting with semaphore
Retry with exponential backoff
Base64 image encoding

4. Presentation Deck Generator (`deck_generator.py`)

Creates interactive HTML presentations with synchronized transcript and frames.

Formats:

HTML (default) - Full-featured interactive deck
Markdown - Simple markdown slides
Reveal.js - Reveal.js slideshow

Usage:

# Generate HTML deck
python -m src.pipeline "video.mp4" --deck

# Specify format
python -m src.pipeline "video.mp4" --deck --deck-format html

# Open in browser
python -m src.pipeline deck outputs/deck/presentation.html

HTML Deck Features:

⏱️ Synchronized playback - Auto-advances based on transcript timing
🖼️ Dual view - Left: video frame, Right: transcript
⌨️ Keyboard navigation - Arrow keys, Home, End
▶️ Speed control - 1x, 1.5x, 2x playback
📊 Progress bar - Visual progress indicator
🎨 Dark theme - Easy on the eyes

File Structure:

outputs/deck/
├── presentation.html          # Main deck file
├── presentation_data.json     # Slide data
└── deck_images/               # Copied frame images
    ├── slide_000.png
    ├── slide_001.png
    └── ...

🔧 Enhanced Configuration

New `PipelineConfig` Options

PipelineConfig(
    # Transcription backend
    transcription_backend="local",  # "local" or "openai-api"
    
    # Whisper configuration
    whisper_model="auto",           # "tiny", "base", "small", "medium", "large-v3", "auto"
    whisper_priority="balanced",    # "speed", "quality", "balanced"
    whisper_device="auto",          # "cpu", "cuda", "auto"
    
    # Vision provider
    vision_provider="auto",         # "auto", "kimi", "claude"
    
    # Deck generation
    create_deck=True,               # Generate presentation deck
    deck_format="html",             # "html", "markdown", "revealjs"
)

Environment Variables

# Whisper
export WHISPER_MODEL_SIZE=medium
export WHISPER_PRIORITY=quality
export WHISPER_DEVICE=cuda

# Vision APIs
export KIMI_API_KEY="..."
export MOONSHOT_API_KEY="..."
export ANTHROPIC_API_KEY="..."

📊 File Statistics

Component	Files	Lines	Purpose
Core Modules	14	~4,150	Main implementation
Documentation	13	~2,800	SDD, TDD, ADRs, C4
Total	27	~6,950	Complete system

New Files Added

src/model_manager.py (280 lines) - Model download & management
src/interactive_cli.py (320 lines) - Interactive menu system
src/kimi_vision.py (300 lines) - Kimi Vision API integration
src/deck_generator.py (580 lines) - HTML/Markdown/Reveal.js decks
src/audio_processor_local.py (320 lines) - Local Whisper with config
docs/adr/007-configurable-whisper-models.md - ADR
docs/adr/008-kimi-vision.md - ADR

Updated Files

src/models.py - Added WhisperModel, TranscriptionPriority enums
src/pipeline.py - Interactive CLI, deck generation, Kimi support
README.md - Updated usage instructions
PROJECT_SUMMARY.md - Updated feature list
docs/00_document_inventory.md - Added new ADRs and modules

🚀 Quick Start Examples

1. Interactive Mode (Recommended)

python -m src.pipeline
# Follow the guided prompts

2. Quick Process with Specific Model

# Download small model first
python -m src.pipeline models --download small

# Process video
python -m src.pipeline process "video.mp4" --whisper-model small

3. Use Kimi + Generate Deck

export KIMI_API_KEY="..."

python -m src.pipeline process "video.mp4" \
    --vision kimi \
    --whisper-model small \
    --deck \
    --deck-format html

# Open deck
python -m src.pipeline deck outputs/deck/presentation.html

4. Batch Model Download

# Download recommended models
python -m src.pipeline models --download small
python -m src.pipeline models --download medium

# Or all at once
python -m src.pipeline models --download-all

📝 Documentation Updates

ADR-007: Configurable Whisper Model Sizes
ADR-008: Kimi Vision for Frame Analysis
README.md: Added interactive mode, model management, deck generation
PROJECT_SUMMARY.md: Updated feature list and module counts

🔮 Future Enhancements

Potential next features:

Real-time streaming processing
Speaker diarization (who's speaking)
Multi-language translation
Video Q&A chatbot
Custom deck templates
Export to PowerPoint

Updated: 2026-02-05
Version: 1.2

Overview​

🆕 New Features​

1. Interactive CLI (interactive_cli.py)​

2. Whisper Model Manager (model_manager.py)​

3. Kimi Vision Integration (kimi_vision.py)​

4. Presentation Deck Generator (deck_generator.py)​

🔧 Enhanced Configuration​

New PipelineConfig Options​

Environment Variables​

📊 File Statistics​

New Files Added​

Updated Files​

🚀 Quick Start Examples​

1. Interactive Mode (Recommended)​

2. Quick Process with Specific Model​

3. Use Kimi + Generate Deck​

4. Batch Model Download​

📝 Documentation Updates​

🔮 Future Enhancements​