C4 Model - Level 1: System Context

System Context Diagram

System Context Narrative

Primary Actor: Content Creator

The Content Creator is the main user who initiates the pipeline by providing a video URL. This could be:

A software engineer wanting to document technical talks
A researcher processing lecture videos
A technical writer generating documentation from demos

Secondary Actors

Knowledge Worker

Consumes the generated artifacts (executive summaries, glossaries) for:

Quick understanding of video content
Research compilation
Knowledge base building

Developer

Uses technical artifacts (SDD, TDD, ADR, C4 diagrams) to:

Understand system architecture
Implement similar patterns
Generate project documentation

External System Relationships

External System	Purpose	Data Flow
YouTube/Vimeo	Video source	Out: URL, In: Video file + metadata
OpenAI API	Speech-to-text transcription	Out: Audio, In: Transcript
Anthropic API	Vision analysis, LLM generation	Out: Images/Prompts, In: Analysis/Text
Serper API	Web search for context expansion	Out: Queries, In: Search results

System Boundary

The Video-to-Knowledge Pipeline encapsulates:

Video download and preprocessing
Audio extraction and transcription
Frame sampling and deduplication
Vision-based content analysis
Content synthesis and chunking
Multi-format artifact generation
Document inventory management

The system does NOT include:

Long-term video/audio storage (ephemeral processing)
User authentication/authorization
Real-time collaboration features
Custom ML model training

Key Scenarios

Scenario 1: Technical Talk Processing

User provides YouTube URL of conference talk
System downloads video (15 min, 1080p)
Extracts audio → transcribes to 2,500 words
Samples 450 frames → deduplicates to 120 unique
Analyzes unique frames for diagrams/code
Generates SDD, TDD, ADR, C4 diagrams
Outputs structured documentation package

Scenario 2: Tutorial Video Processing

User provides Vimeo URL of software tutorial
System processes video with screen recordings
Higher deduplication threshold for static screens
Emphasizes UI component detection
Generates step-by-step outline + architecture docs

Data Sensitivity

Data Type	Sensitivity	Handling
Video content	Medium	Ephemeral, deleted after processing
Transcripts	Low	Stored as output artifacts
API keys	High	Environment variables only
Search queries	Low	May be logged for debugging

System Context Diagram​

System Context Narrative​

Primary Actor: Content Creator​

Secondary Actors​

Knowledge Worker​

Developer​

External System Relationships​

System Boundary​

Key Scenarios​

Scenario 1: Technical Talk Processing​

Scenario 2: Tutorial Video Processing​

Data Sensitivity​