C4 Model - Level 3: Component Level
Component Diagram: Deduplication Engine
Component Diagram: Artifact Generator
Component Descriptions
Deduplication Engine Components
Frame Loader
- Responsibility: Load frame files in batches to manage memory
- Interface:
load_batch(paths: List[Path]) -> List[Image] - Key Feature: Streaming loader for large video files
Perceptual Hasher
- Responsibility: Compute 64-bit perceptual hash for each frame
- Algorithm: pHash (discrete cosine transform based)
- Interface:
compute_hash(image: Image) -> ImageHash
Hash Comparer
- Responsibility: Calculate Hamming distance between hashes
- Logic: Fast XOR + bit count operation
- Interface:
distance(hash1: ImageHash, hash2: ImageHash) -> int
Duplicate Filter
- Responsibility: Apply threshold to decide uniqueness
- Configuration:
threshold: int(default 10) - Logic:
distance > threshold ? unique : duplicate
Unique Frame Writer
- Responsibility: Tag and write unique frames
- Metadata: Sets
is_unique=true, storessimilarity_score
Artifact Generator Components
Template Router
- Responsibility: Select appropriate template based on artifact type
- Logic: Maps
artifact_typeenum to template file
Jinja2 Templates
- Responsibility: Define structure and prompts for each artifact
- Types:
- SDD: System design sections
- TDD: Technical implementation details
- ADR: Decision context, decision, consequences
- C4: C1-C3 diagram definitions
- Summary: Executive overview
- Glossary: Term definitions
LLM Renderers
- Responsibility: Execute template through Claude API
- Prompt Engineering: Structured prompts with examples
- Output Parsing: Extract markdown from response
Post Processor
- Responsibility: Validate and clean generated markdown
- Checks:
- Mermaid syntax validation
- Markdown link integrity
- Header hierarchy
Front Matter Injector
- Responsibility: Add YAML metadata header
- Fields: id, type, title, version, date, parent_id
Component Interaction Sequences
Deduplication Sequence
Artifact Generation Sequence
Code-Level Design (C4 Level 3.5)
DeduplicationEngine Class
class DeduplicationEngine:
"""
Removes perceptually similar video frames.
"""
def __init__(
self,
threshold: int = 10,
hash_size: int = 8, # 8x8 = 64-bit hash
batch_size: int = 100
):
self.threshold = threshold
self.hash_size = hash_size
self.batch_size = batch_size
self._last_hash: Optional[ImageHash] = None
def process_frames(
self,
frame_paths: List[Path]
) -> Iterator[FrameResult]:
"""
Process frames and yield results with deduplication info.
Yields FrameResult with fields:
- path: Path to frame
- hash: Perceptual hash
- is_unique: Boolean
- similarity_score: Float 0-1
"""
for batch in self._batches(frame_paths):
for path in batch:
result = self._process_single(path)
yield result
if result.is_unique:
self._last_hash = result.hash
def _process_single(self, path: Path) -> FrameResult:
image = Image.open(path)
current_hash = imagehash.phash(image, hash_size=self.hash_size)
if self._last_hash is None:
return FrameResult(
path=path,
hash=current_hash,
is_unique=True,
similarity_score=1.0
)
distance = current_hash - self._last_hash
is_unique = distance > self.threshold
similarity = 1.0 - (distance / (self.hash_size ** 2 * 4))
return FrameResult(
path=path,
hash=current_hash,
is_unique=is_unique,
similarity_score=max(0.0, similarity)
)
ArtifactGenerator Class
class ArtifactGenerator:
"""
Generates structured documentation artifacts from content.
"""
TEMPLATE_DIR = Path("templates/")
def __init__(self, llm_client: AnthropicClient):
self.llm = llm_client
self.env = jinja2.Environment(
loader=jinja2.FileSystemLoader(self.TEMPLATE_DIR)
)
async def generate(
self,
artifact_type: ArtifactType,
content: SynthesizedContent,
context: GenerationContext
) -> Artifact:
# Load and render template
template = self.env.get_template(f"{artifact_type.value}.j2")
prompt = template.render(
content=content,
context=context
)
# Generate via LLM
response = await self.llm.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=8192,
messages=[{"role": "user", "content": prompt}]
)
raw_content = response.content[0].text
# Post-process
cleaned = self._validate_markdown(raw_content)
# Add front matter
final = self._inject_front_matter(
cleaned,
artifact_type=artifact_type,
content_id=content.id
)
return Artifact(
type=artifact_type,
content=final,
metadata=self._extract_metadata(final)
)