Skip to main content

Auto Trigger Configuration (see skills/auto trigger framework/SKILL.md)

Code Summary Generator

How to Use This Skill

  1. Review the patterns and examples below
  2. Apply the relevant patterns to your implementation
  3. Follow the best practices outlined in this skill

Expert skill for generating concise, information-dense summaries of code files that preserve essential context while dramatically reducing token usage. Critical for multi-file implementations where full file history would exceed context limits.

When to Use

Use this skill when:

  • Implementing multiple files and need to preserve context
  • After writing a file, before clearing conversation history
  • Creating checkpoints for resumable implementations
  • Generating implementation documentation
  • Need file context but not full file contents

Don't use this skill when:

  • Working on a single file (full context available)
  • Debugging (need full file details)
  • Code review (need complete implementation)
  • First read of unfamiliar code (need full context)

Core Algorithm

Summary Generation Engine

import re
from typing import List, Dict, Optional, Set
from dataclasses import dataclass, field
from enum import Enum
from pathlib import Path


class LanguageType(Enum):
"""Supported language types for analysis"""
PYTHON = "python"
TYPESCRIPT = "typescript"
JAVASCRIPT = "javascript"
RUST = "rust"
GO = "go"
UNKNOWN = "unknown"


@dataclass
class CodeSummary:
"""Concise summary of a code file"""
file_path: str
language: LanguageType
purpose: str # 1-2 sentence description
exports: List[str] # Public API surface
imports: List[str] # External dependencies
key_classes: List[str] # Main classes/structs
key_functions: List[str] # Main functions
patterns_used: List[str] # Design patterns detected
dependencies: List[str] # Internal file dependencies
line_count: int
complexity_score: str # Low/Medium/High
token_estimate: int

def to_compact_string(self) -> str:
"""Generate minimal token representation"""
lines = [
f"## {Path(self.file_path).name}",
f"**Purpose:** {self.purpose}",
]

if self.exports:
lines.append(f"**Exports:** {', '.join(self.exports[:10])}")

if self.key_classes:
lines.append(f"**Classes:** {', '.join(self.key_classes[:5])}")

if self.key_functions:
lines.append(f"**Functions:** {', '.join(self.key_functions[:8])}")

if self.dependencies:
lines.append(f"**Depends on:** {', '.join(self.dependencies[:5])}")

lines.append(f"**Lines:** {self.line_count} | **Complexity:** {self.complexity_score}")

return "\n".join(lines)


class CodeSummaryGenerator:
"""
Generate token-efficient summaries of code files.

Key Innovation: Extract only the information needed to understand
a file's role and API without reading the full implementation.
This enables 70-90% token reduction in multi-file workflows.
"""

# Language detection patterns
LANGUAGE_PATTERNS = {
LanguageType.PYTHON: [r"\.py$", r"^import ", r"^from .+ import"],
LanguageType.TYPESCRIPT: [r"\.tsx?$", r"^import .+ from", r": \w+\[\]"],
LanguageType.JAVASCRIPT: [r"\.jsx?$", r"^const .+ = require"],
LanguageType.RUST: [r"\.rs$", r"^use ", r"^pub fn", r"^impl "],
LanguageType.GO: [r"\.go$", r"^package ", r"^func "],
}

# Export patterns by language
EXPORT_PATTERNS = {
LanguageType.PYTHON: [
r"^def (\w+)\(", # Functions
r"^class (\w+)", # Classes
r"^(\w+)\s*=", # Module-level assignments
],
LanguageType.TYPESCRIPT: [
r"^export (?:const|let|var|function|class|interface|type|enum) (\w+)",
r"^export default (?:class |function )?(\w+)",
r"^export \{ (.+) \}",
],
LanguageType.RUST: [
r"^pub fn (\w+)",
r"^pub struct (\w+)",
r"^pub enum (\w+)",
r"^pub trait (\w+)",
r"^pub type (\w+)",
],
}

# Import patterns by language
IMPORT_PATTERNS = {
LanguageType.PYTHON: [
r"^import (\S+)",
r"^from (\S+) import",
],
LanguageType.TYPESCRIPT: [
r"^import .+ from ['\"](#)['\"]",
r"^import ['\"](#)['\"]",
],
LanguageType.RUST: [
r"^use (\S+)",
],
}

def __init__(self):
self.summaries: Dict[str, CodeSummary] = {}

def detect_language(self, file_path: str, content: str) -> LanguageType:
"""Detect programming language from file path and content"""
for lang, patterns in self.LANGUAGE_PATTERNS.items():
for pattern in patterns:
if re.search(pattern, file_path) or re.search(pattern, content, re.MULTILINE):
return lang
return LanguageType.UNKNOWN

def extract_exports(self, content: str, language: LanguageType) -> List[str]:
"""Extract public API surface (exports)"""
exports = []
patterns = self.EXPORT_PATTERNS.get(language, [])

for pattern in patterns:
matches = re.findall(pattern, content, re.MULTILINE)
for match in matches:
if isinstance(match, tuple):
exports.extend([m.strip() for m in match if m.strip()])
else:
# Handle comma-separated exports
exports.extend([e.strip() for e in match.split(",")])

# Deduplicate while preserving order
seen = set()
unique_exports = []
for exp in exports:
if exp not in seen and not exp.startswith("_"):
seen.add(exp)
unique_exports.append(exp)

return unique_exports

def extract_imports(self, content: str, language: LanguageType) -> List[str]:
"""Extract external dependencies"""
imports = []
patterns = self.IMPORT_PATTERNS.get(language, [])

for pattern in patterns:
matches = re.findall(pattern, content, re.MULTILINE)
imports.extend(matches)

# Filter to external imports only (not relative)
external = [
imp for imp in imports
if not imp.startswith(".") and not imp.startswith("./")
]

return list(set(external))

def extract_classes(self, content: str, language: LanguageType) -> List[str]:
"""Extract main class definitions"""
patterns = {
LanguageType.PYTHON: r"^class (\w+)",
LanguageType.TYPESCRIPT: r"^(?:export )?class (\w+)",
LanguageType.RUST: r"^(?:pub )?struct (\w+)",
}

pattern = patterns.get(language)
if pattern:
return re.findall(pattern, content, re.MULTILINE)
return []

def extract_functions(self, content: str, language: LanguageType) -> List[str]:
"""Extract main function definitions"""
patterns = {
LanguageType.PYTHON: r"^def (\w+)\(",
LanguageType.TYPESCRIPT: r"^(?:export )?(?:async )?function (\w+)",
LanguageType.RUST: r"^(?:pub )?(?:async )?fn (\w+)",
}

pattern = patterns.get(language)
if pattern:
funcs = re.findall(pattern, content, re.MULTILINE)
# Filter out private/test functions
return [f for f in funcs if not f.startswith("_") and not f.startswith("test_")]
return []

def detect_patterns(self, content: str) -> List[str]:
"""Detect common design patterns in code"""
patterns = []

pattern_indicators = {
"Singleton": [r"_instance\s*=\s*None", r"getInstance\("],
"Factory": [r"def create_", r"Factory", r"Builder"],
"Observer": [r"subscribe\(", r"notify\(", r"addEventListener"],
"Repository": [r"Repository", r"def get_by_", r"def find_"],
"Service": [r"Service", r"async def process"],
"Middleware": [r"middleware", r"def __call__\(self, request"],
"Decorator": [r"@\w+\ndef", r"functools.wraps"],
"State Machine": [r"state\s*=", r"transition", r"StateMachine"],
}

for pattern_name, indicators in pattern_indicators.items():
for indicator in indicators:
if re.search(indicator, content, re.IGNORECASE):
patterns.append(pattern_name)
break

return patterns

def calculate_complexity(self, content: str) -> str:
"""Estimate code complexity (Low/Medium/High)"""
lines = content.split("\n")
line_count = len(lines)

# Count complexity indicators
nested_depth = max(
(len(line) - len(line.lstrip())) // 4
for line in lines if line.strip()
)

conditionals = len(re.findall(r"\b(if|elif|else|match|case)\b", content))
loops = len(re.findall(r"\b(for|while|loop)\b", content))

# Score calculation
score = (
(1 if line_count < 100 else 2 if line_count < 300 else 3) +
(1 if nested_depth < 4 else 2 if nested_depth < 6 else 3) +
(1 if conditionals < 10 else 2 if conditionals < 25 else 3) +
(1 if loops < 5 else 2 if loops < 10 else 3)
)

if score <= 5:
return "Low"
elif score <= 8:
return "Medium"
else:
return "High"

def generate_purpose(self, file_path: str, exports: List[str],
classes: List[str], content: str) -> str:
"""Generate concise purpose statement"""
# Try to extract from docstring
docstring_match = re.search(r'^"""(.+?)"""', content, re.DOTALL)
if docstring_match:
first_line = docstring_match.group(1).strip().split("\n")[0]
if len(first_line) < 100:
return first_line

# Generate from filename and exports
filename = Path(file_path).stem

if classes:
return f"Defines {', '.join(classes[:3])} for {filename.replace('_', ' ')} functionality"
elif exports:
return f"Provides {', '.join(exports[:3])} utilities for {filename.replace('_', ' ')}"
else:
return f"Implementation module for {filename.replace('_', ' ')}"

def summarize(self, file_path: str, content: str) -> CodeSummary:
"""
Generate comprehensive summary of a code file.

Args:
file_path: Path to the file
content: File content

Returns:
CodeSummary with all extracted information
"""
language = self.detect_language(file_path, content)
exports = self.extract_exports(content, language)
imports = self.extract_imports(content, language)
classes = self.extract_classes(content, language)
functions = self.extract_functions(content, language)
patterns = self.detect_patterns(content)
complexity = self.calculate_complexity(content)
purpose = self.generate_purpose(file_path, exports, classes, content)

# Identify internal dependencies (relative imports)
internal_deps = [
imp for imp in self.extract_imports(content, language)
if imp.startswith(".") or imp.startswith("./")
]

summary = CodeSummary(
file_path=file_path,
language=language,
purpose=purpose,
exports=exports,
imports=imports,
key_classes=classes,
key_functions=functions,
patterns_used=patterns,
dependencies=internal_deps,
line_count=len(content.split("\n")),
complexity_score=complexity,
token_estimate=len(content) // 4,
)

self.summaries[file_path] = summary
return summary

def get_implementation_context(self, file_paths: List[str]) -> str:
"""
Generate combined context for multiple implemented files.

Used when resuming multi-file implementations - provides
essential context without full file contents.
"""
context_lines = ["# Implementation Context\n"]

for path in file_paths:
if path in self.summaries:
context_lines.append(self.summaries[path].to_compact_string())
context_lines.append("")

return "\n".join(context_lines)

Usage Examples

Single File Summary

# Summarize a file after implementation
generator = CodeSummaryGenerator()

with open("src/services/auth_service.py") as f:
content = f.read()

summary = generator.summarize("src/services/auth_service.py", content)

print(summary.to_compact_string())
# Output:
# ## auth_service.py
# **Purpose:** Defines AuthService for authentication functionality
# **Exports:** AuthService, authenticate, create_token, verify_token
# **Classes:** AuthService
# **Functions:** authenticate, create_token, verify_token, hash_password
# **Depends on:** ./user_repository, ./token_manager
# **Lines:** 145 | **Complexity:** Medium

Multi-File Implementation Context

# Generate context for checkpoint/resume
generator = CodeSummaryGenerator()

# Summarize all implemented files
implemented_files = [
"src/models/user.py",
"src/services/auth_service.py",
"src/handlers/auth_handler.py",
]

for file_path in implemented_files:
with open(file_path) as f:
generator.summarize(file_path, f.read())

# Get combined context (use instead of full file contents)
context = generator.get_implementation_context(implemented_files)
print(context)
# Output: Compact summaries of all 3 files (~200 tokens vs ~2000 tokens for full files)

Integration with Memory Optimization

async def implement_with_memory_optimization(plan: dict, llm_client) -> dict:
"""Memory-optimized multi-file implementation"""

generator = CodeSummaryGenerator()
implemented = []

for file_spec in plan["files"]:
# Implement file
result = await llm_client.implement(
file_spec=file_spec,
context=generator.get_implementation_context(implemented)
)

# Generate summary immediately after write
summary = generator.summarize(file_spec["path"], result["content"])
implemented.append(file_spec["path"])

# Clear conversation history, preserve summaries
# Next iteration uses summaries instead of full history
llm_client.clear_history(preserve=["system_prompt", "plan"])

return {
"files": implemented,
"summaries": generator.get_implementation_context(implemented),
"token_savings": "70-90%"
}

Best Practices

DO

  • Summarize immediately after file write - Before context clear
  • Include all exports - API surface is critical for consumers
  • Track dependencies - Internal file relationships matter
  • Use compact format - Minimize tokens while preserving meaning
  • Store summaries persistently - Enable checkpoint/resume

DON'T

  • Don't include implementation details - Just API surface
  • Don't include private members - Only public API
  • Don't include test files in summaries - Tests are separate
  • Don't regenerate summaries unnecessarily - Cache them
  • Don't lose language-specific patterns - Preserve idioms

Configuration Reference

ParameterDefaultDescription
max_exports10Maximum exports to include
max_functions8Maximum functions to list
max_classes5Maximum classes to list
max_dependencies5Maximum deps to show
include_patternstrueDetect design patterns

Integration with CODITECT

Primary Integration Points:

ComponentIntegration TypeUsage
memory-optimization-agentPrimary consumerGenerate summaries after writes
implementation-trackerComplementaryTrack summary status per file
orchestratorWorkflow coordinationMulti-file implementation context
paper-to-code workflowStep integrationDocument generated code

Workflow Integration:

# In memory-optimized-implementation.workflow.json
Step 5: Generate Summary:
skill: code-summary-generator
inputs: implemented_file_content
outputs: concise_summary
next: Step 6 (Clear & Checkpoint)

Success Metrics

MetricTargetMeasurement
Token reduction70-90%Summary size vs full file
Export coverage100%All public API captured
Context preservation>95%Essential info retained
Resume success rate>95%Can continue from summaries

Success Output

When successful, this skill MUST output:

✅ SKILL COMPLETE: code-summary-generator

Completed:
- [x] Code language detected ({language})
- [x] Exports extracted ({count} exports)
- [x] Dependencies identified ({count} dependencies)
- [x] Complexity calculated ({complexity})
- [x] Summary generated

Outputs:
- Compact summary (~{token_count} tokens vs ~{original_tokens} tokens)
- Token reduction: {percentage}%
- Export coverage: {count}/{total} public APIs

Summary Location: {output_path or "returned in response"}

Completion Checklist

Before marking this skill as complete, verify:

  • Language detected correctly (Python, TypeScript, Rust, etc.)
  • All public exports captured (functions, classes, types)
  • External imports identified (dependencies)
  • Internal dependencies tracked (relative imports)
  • Complexity score calculated (Low/Medium/High)
  • Purpose statement generated or extracted
  • Token reduction achieved (target: 70-90%)
  • Summary preserves essential API context
  • No private members included in summary
  • Design patterns detected (if applicable)

Failure Indicators

This skill has FAILED if:

  • ❌ Language detection returns UNKNOWN for supported languages
  • ❌ Zero exports found for file with public API
  • ❌ Summary tokens exceed 30% of original file size
  • ❌ Essential API information missing from summary
  • ❌ Invalid file path or unreadable file content
  • ❌ Regex patterns fail on valid code syntax
  • ❌ Purpose statement is generic/meaningless
  • ❌ Private methods (_method, private) included in exports

When NOT to Use

Do NOT use this skill when:

  • Working on a single file only - Full context is available and preferred
  • Debugging code - Need complete implementation details, not summaries
  • Code review - Need to see actual implementation, not API surface
  • First read of unfamiliar code - Need full context to understand
  • Writing tests - Need implementation details to write proper tests
  • Refactoring - Need complete code structure and logic
  • Security audit - Need full implementation to identify vulnerabilities
  • Performance optimization - Need actual code to identify bottlenecks

Use alternative approaches instead:

  • Single file work → Read full file directly
  • Debugging → Use debugger and full file inspection
  • Code review → Request complete file changes
  • Unfamiliar code → Read documentation first, then full files

Anti-Patterns (Avoid)

Anti-PatternProblemSolution
Summarizing before file writeNo file content exists yetOnly summarize AFTER file is written
Including implementation detailsDefeats token reduction purposeExtract API surface only (exports, signatures)
Regenerating summaries repeatedlyWastes tokens and timeCache summaries, only regenerate on file changes
Summarizing test filesTests are separate concernExclude test files from summary workflow
Losing language-specific idiomsGeneric summaries miss language nuancesUse language-specific patterns (EXPORT_PATTERNS, IMPORT_PATTERNS)
Over-truncating exportsMissing critical API informationInclude at least top 10 exports, 8 functions, 5 classes
No confidence scoringUnclear summary reliabilityInclude confidence/quality metrics in output
Mixing public and private APIsSummary includes internal detailsFilter out private members (leading underscore, private keyword)

Principles

This skill embodies CODITECT principles:

  • #1 Recycle → Extend → Re-Use → Create - Reuse summaries instead of regenerating
  • #5 Eliminate Ambiguity - Clear API surface extraction, no implementation details
  • #6 Clear, Understandable, Explainable - Purpose statements explain "what" without "how"
  • #8 No Assumptions - Language detection based on evidence, not filename alone
  • #10 Token Efficiency - 70-90% reduction while preserving essential context
  • First Principles - Understand API needs: exports matter, implementation doesn't (for context)

Related Standards:

Source Reference

This pattern was extracted from DeepCode (HKUDS/DeepCode) multi-agent system.

Original location: agents/code_agent.py - summary generation functions

Original codebase stats:

  • 51 Python files analyzed
  • 33,497 lines of code
  • 12 patterns extracted

See /submodules/labs/DeepCode/DEEP-ANALYSIS.md for complete analysis.