Skip to main content

Updates Summary - Video-to-Knowledge Pipeline v1.2

Overview

Major enhancements adding interactive CLI, configurable Whisper models, Kimi Vision support, and presentation deck generation.


🆕 New Features

1. Interactive CLI (interactive_cli.py)

Launch with:

python -m src.pipeline

Features:

  • 🎬 Process new video - Full guided setup wizard
  • Quick process - Use saved defaults
  • 📦 Manage Whisper models - Download/view/manage models
  • ⚙️ Configuration - View/reset settings
  • Guided prompts for URL, model selection, FPS, output options

Configuration Persistence:

  • Saves settings to ~/.config/video-pipeline/config.json
  • Remembers last URL, model preferences, output format

2. Whisper Model Manager (model_manager.py)

Supported Models:

ModelParamsRAMDiskSpeed
tiny39M0.5GB75MB32x
base74M1GB142MB16x
small244M2GB461MB6x
medium769M5GB1.4GB2x
large-v31550M10GB2.9GB1x

Commands:

# List models
python -m src.pipeline models

# Download specific model
python -m src.pipeline models --download small

# Download all models
python -m src.pipeline models --download-all

# Interactive model manager
python -m src.pipeline models

Auto-Selection Logic:

  • Detects available RAM using psutil
  • Selects optimal model based on priority (speed/quality/balanced)
  • Falls back gracefully to smaller models

3. Kimi Vision Integration (kimi_vision.py)

Alternative to Claude for frame analysis.

Setup:

export KIMI_API_KEY="your-key"
# or
export MOONSHOT_API_KEY="your-key"

Usage:

# Auto-detect (prefers Kimi if key available)
python -m src.pipeline "video.mp4"

# Force Kimi
python -m src.pipeline "video.mp4" --vision kimi

# Force Claude
python -m src.pipeline "video.mp4" --vision claude

Features:

  • Same ImageAnalysis output format as Claude
  • Rate limiting with semaphore
  • Retry with exponential backoff
  • Base64 image encoding

4. Presentation Deck Generator (deck_generator.py)

Creates interactive HTML presentations with synchronized transcript and frames.

Formats:

  • HTML (default) - Full-featured interactive deck
  • Markdown - Simple markdown slides
  • Reveal.js - Reveal.js slideshow

Usage:

# Generate HTML deck
python -m src.pipeline "video.mp4" --deck

# Specify format
python -m src.pipeline "video.mp4" --deck --deck-format html

# Open in browser
python -m src.pipeline deck outputs/deck/presentation.html

HTML Deck Features:

  • ⏱️ Synchronized playback - Auto-advances based on transcript timing
  • 🖼️ Dual view - Left: video frame, Right: transcript
  • ⌨️ Keyboard navigation - Arrow keys, Home, End
  • ▶️ Speed control - 1x, 1.5x, 2x playback
  • 📊 Progress bar - Visual progress indicator
  • 🎨 Dark theme - Easy on the eyes

File Structure:

outputs/deck/
├── presentation.html # Main deck file
├── presentation_data.json # Slide data
└── deck_images/ # Copied frame images
├── slide_000.png
├── slide_001.png
└── ...

🔧 Enhanced Configuration

New PipelineConfig Options

PipelineConfig(
# Transcription backend
transcription_backend="local", # "local" or "openai-api"

# Whisper configuration
whisper_model="auto", # "tiny", "base", "small", "medium", "large-v3", "auto"
whisper_priority="balanced", # "speed", "quality", "balanced"
whisper_device="auto", # "cpu", "cuda", "auto"

# Vision provider
vision_provider="auto", # "auto", "kimi", "claude"

# Deck generation
create_deck=True, # Generate presentation deck
deck_format="html", # "html", "markdown", "revealjs"
)

Environment Variables

# Whisper
export WHISPER_MODEL_SIZE=medium
export WHISPER_PRIORITY=quality
export WHISPER_DEVICE=cuda

# Vision APIs
export KIMI_API_KEY="..."
export MOONSHOT_API_KEY="..."
export ANTHROPIC_API_KEY="..."

📊 File Statistics

ComponentFilesLinesPurpose
Core Modules14~4,150Main implementation
Documentation13~2,800SDD, TDD, ADRs, C4
Total27~6,950Complete system

New Files Added

  1. src/model_manager.py (280 lines) - Model download & management
  2. src/interactive_cli.py (320 lines) - Interactive menu system
  3. src/kimi_vision.py (300 lines) - Kimi Vision API integration
  4. src/deck_generator.py (580 lines) - HTML/Markdown/Reveal.js decks
  5. src/audio_processor_local.py (320 lines) - Local Whisper with config
  6. docs/adr/007-configurable-whisper-models.md - ADR
  7. docs/adr/008-kimi-vision.md - ADR

Updated Files

  • src/models.py - Added WhisperModel, TranscriptionPriority enums
  • src/pipeline.py - Interactive CLI, deck generation, Kimi support
  • README.md - Updated usage instructions
  • PROJECT_SUMMARY.md - Updated feature list
  • docs/00_document_inventory.md - Added new ADRs and modules

🚀 Quick Start Examples

python -m src.pipeline
# Follow the guided prompts

2. Quick Process with Specific Model

# Download small model first
python -m src.pipeline models --download small

# Process video
python -m src.pipeline process "video.mp4" --whisper-model small

3. Use Kimi + Generate Deck

export KIMI_API_KEY="..."

python -m src.pipeline process "video.mp4" \
--vision kimi \
--whisper-model small \
--deck \
--deck-format html

# Open deck
python -m src.pipeline deck outputs/deck/presentation.html

4. Batch Model Download

# Download recommended models
python -m src.pipeline models --download small
python -m src.pipeline models --download medium

# Or all at once
python -m src.pipeline models --download-all

📝 Documentation Updates

  • ADR-007: Configurable Whisper Model Sizes
  • ADR-008: Kimi Vision for Frame Analysis
  • README.md: Added interactive mode, model management, deck generation
  • PROJECT_SUMMARY.md: Updated feature list and module counts

🔮 Future Enhancements

Potential next features:

  • Real-time streaming processing
  • Speaker diarization (who's speaking)
  • Multi-language translation
  • Video Q&A chatbot
  • Custom deck templates
  • Export to PowerPoint

Updated: 2026-02-05
Version: 1.2