Skip to main content

C4 Model - Level 3: Component Level

Component Diagram: Deduplication Engine

Component Diagram: Artifact Generator

Component Descriptions

Deduplication Engine Components

Frame Loader

  • Responsibility: Load frame files in batches to manage memory
  • Interface: load_batch(paths: List[Path]) -> List[Image]
  • Key Feature: Streaming loader for large video files

Perceptual Hasher

  • Responsibility: Compute 64-bit perceptual hash for each frame
  • Algorithm: pHash (discrete cosine transform based)
  • Interface: compute_hash(image: Image) -> ImageHash

Hash Comparer

  • Responsibility: Calculate Hamming distance between hashes
  • Logic: Fast XOR + bit count operation
  • Interface: distance(hash1: ImageHash, hash2: ImageHash) -> int

Duplicate Filter

  • Responsibility: Apply threshold to decide uniqueness
  • Configuration: threshold: int (default 10)
  • Logic: distance > threshold ? unique : duplicate

Unique Frame Writer

  • Responsibility: Tag and write unique frames
  • Metadata: Sets is_unique=true, stores similarity_score

Artifact Generator Components

Template Router

  • Responsibility: Select appropriate template based on artifact type
  • Logic: Maps artifact_type enum to template file

Jinja2 Templates

  • Responsibility: Define structure and prompts for each artifact
  • Types:
    • SDD: System design sections
    • TDD: Technical implementation details
    • ADR: Decision context, decision, consequences
    • C4: C1-C3 diagram definitions
    • Summary: Executive overview
    • Glossary: Term definitions

LLM Renderers

  • Responsibility: Execute template through Claude API
  • Prompt Engineering: Structured prompts with examples
  • Output Parsing: Extract markdown from response

Post Processor

  • Responsibility: Validate and clean generated markdown
  • Checks:
    • Mermaid syntax validation
    • Markdown link integrity
    • Header hierarchy

Front Matter Injector

  • Responsibility: Add YAML metadata header
  • Fields: id, type, title, version, date, parent_id

Component Interaction Sequences

Deduplication Sequence

Artifact Generation Sequence

Code-Level Design (C4 Level 3.5)

DeduplicationEngine Class

class DeduplicationEngine:
"""
Removes perceptually similar video frames.
"""

def __init__(
self,
threshold: int = 10,
hash_size: int = 8, # 8x8 = 64-bit hash
batch_size: int = 100
):
self.threshold = threshold
self.hash_size = hash_size
self.batch_size = batch_size
self._last_hash: Optional[ImageHash] = None

def process_frames(
self,
frame_paths: List[Path]
) -> Iterator[FrameResult]:
"""
Process frames and yield results with deduplication info.

Yields FrameResult with fields:
- path: Path to frame
- hash: Perceptual hash
- is_unique: Boolean
- similarity_score: Float 0-1
"""
for batch in self._batches(frame_paths):
for path in batch:
result = self._process_single(path)
yield result

if result.is_unique:
self._last_hash = result.hash

def _process_single(self, path: Path) -> FrameResult:
image = Image.open(path)
current_hash = imagehash.phash(image, hash_size=self.hash_size)

if self._last_hash is None:
return FrameResult(
path=path,
hash=current_hash,
is_unique=True,
similarity_score=1.0
)

distance = current_hash - self._last_hash
is_unique = distance > self.threshold
similarity = 1.0 - (distance / (self.hash_size ** 2 * 4))

return FrameResult(
path=path,
hash=current_hash,
is_unique=is_unique,
similarity_score=max(0.0, similarity)
)

ArtifactGenerator Class

class ArtifactGenerator:
"""
Generates structured documentation artifacts from content.
"""

TEMPLATE_DIR = Path("templates/")

def __init__(self, llm_client: AnthropicClient):
self.llm = llm_client
self.env = jinja2.Environment(
loader=jinja2.FileSystemLoader(self.TEMPLATE_DIR)
)

async def generate(
self,
artifact_type: ArtifactType,
content: SynthesizedContent,
context: GenerationContext
) -> Artifact:
# Load and render template
template = self.env.get_template(f"{artifact_type.value}.j2")
prompt = template.render(
content=content,
context=context
)

# Generate via LLM
response = await self.llm.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=8192,
messages=[{"role": "user", "content": prompt}]
)

raw_content = response.content[0].text

# Post-process
cleaned = self._validate_markdown(raw_content)

# Add front matter
final = self._inject_front_matter(
cleaned,
artifact_type=artifact_type,
content_id=content.id
)

return Artifact(
type=artifact_type,
content=final,
metadata=self._extract_metadata(final)
)