Auto Trigger Configuration (see skills/auto trigger framework/SKILL.md)
Code Summary Generator
How to Use This Skill
- Review the patterns and examples below
- Apply the relevant patterns to your implementation
- Follow the best practices outlined in this skill
Expert skill for generating concise, information-dense summaries of code files that preserve essential context while dramatically reducing token usage. Critical for multi-file implementations where full file history would exceed context limits.
When to Use
Use this skill when:
- Implementing multiple files and need to preserve context
- After writing a file, before clearing conversation history
- Creating checkpoints for resumable implementations
- Generating implementation documentation
- Need file context but not full file contents
Don't use this skill when:
- Working on a single file (full context available)
- Debugging (need full file details)
- Code review (need complete implementation)
- First read of unfamiliar code (need full context)
Core Algorithm
Summary Generation Engine
import re
from typing import List, Dict, Optional, Set
from dataclasses import dataclass, field
from enum import Enum
from pathlib import Path
class LanguageType(Enum):
"""Supported language types for analysis"""
PYTHON = "python"
TYPESCRIPT = "typescript"
JAVASCRIPT = "javascript"
RUST = "rust"
GO = "go"
UNKNOWN = "unknown"
@dataclass
class CodeSummary:
"""Concise summary of a code file"""
file_path: str
language: LanguageType
purpose: str # 1-2 sentence description
exports: List[str] # Public API surface
imports: List[str] # External dependencies
key_classes: List[str] # Main classes/structs
key_functions: List[str] # Main functions
patterns_used: List[str] # Design patterns detected
dependencies: List[str] # Internal file dependencies
line_count: int
complexity_score: str # Low/Medium/High
token_estimate: int
def to_compact_string(self) -> str:
"""Generate minimal token representation"""
lines = [
f"## {Path(self.file_path).name}",
f"**Purpose:** {self.purpose}",
]
if self.exports:
lines.append(f"**Exports:** {', '.join(self.exports[:10])}")
if self.key_classes:
lines.append(f"**Classes:** {', '.join(self.key_classes[:5])}")
if self.key_functions:
lines.append(f"**Functions:** {', '.join(self.key_functions[:8])}")
if self.dependencies:
lines.append(f"**Depends on:** {', '.join(self.dependencies[:5])}")
lines.append(f"**Lines:** {self.line_count} | **Complexity:** {self.complexity_score}")
return "\n".join(lines)
class CodeSummaryGenerator:
"""
Generate token-efficient summaries of code files.
Key Innovation: Extract only the information needed to understand
a file's role and API without reading the full implementation.
This enables 70-90% token reduction in multi-file workflows.
"""
# Language detection patterns
LANGUAGE_PATTERNS = {
LanguageType.PYTHON: [r"\.py$", r"^import ", r"^from .+ import"],
LanguageType.TYPESCRIPT: [r"\.tsx?$", r"^import .+ from", r": \w+\[\]"],
LanguageType.JAVASCRIPT: [r"\.jsx?$", r"^const .+ = require"],
LanguageType.RUST: [r"\.rs$", r"^use ", r"^pub fn", r"^impl "],
LanguageType.GO: [r"\.go$", r"^package ", r"^func "],
}
# Export patterns by language
EXPORT_PATTERNS = {
LanguageType.PYTHON: [
r"^def (\w+)\(", # Functions
r"^class (\w+)", # Classes
r"^(\w+)\s*=", # Module-level assignments
],
LanguageType.TYPESCRIPT: [
r"^export (?:const|let|var|function|class|interface|type|enum) (\w+)",
r"^export default (?:class |function )?(\w+)",
r"^export \{ (.+) \}",
],
LanguageType.RUST: [
r"^pub fn (\w+)",
r"^pub struct (\w+)",
r"^pub enum (\w+)",
r"^pub trait (\w+)",
r"^pub type (\w+)",
],
}
# Import patterns by language
IMPORT_PATTERNS = {
LanguageType.PYTHON: [
r"^import (\S+)",
r"^from (\S+) import",
],
LanguageType.TYPESCRIPT: [
r"^import .+ from ['\"](#)['\"]",
r"^import ['\"](#)['\"]",
],
LanguageType.RUST: [
r"^use (\S+)",
],
}
def __init__(self):
self.summaries: Dict[str, CodeSummary] = {}
def detect_language(self, file_path: str, content: str) -> LanguageType:
"""Detect programming language from file path and content"""
for lang, patterns in self.LANGUAGE_PATTERNS.items():
for pattern in patterns:
if re.search(pattern, file_path) or re.search(pattern, content, re.MULTILINE):
return lang
return LanguageType.UNKNOWN
def extract_exports(self, content: str, language: LanguageType) -> List[str]:
"""Extract public API surface (exports)"""
exports = []
patterns = self.EXPORT_PATTERNS.get(language, [])
for pattern in patterns:
matches = re.findall(pattern, content, re.MULTILINE)
for match in matches:
if isinstance(match, tuple):
exports.extend([m.strip() for m in match if m.strip()])
else:
# Handle comma-separated exports
exports.extend([e.strip() for e in match.split(",")])
# Deduplicate while preserving order
seen = set()
unique_exports = []
for exp in exports:
if exp not in seen and not exp.startswith("_"):
seen.add(exp)
unique_exports.append(exp)
return unique_exports
def extract_imports(self, content: str, language: LanguageType) -> List[str]:
"""Extract external dependencies"""
imports = []
patterns = self.IMPORT_PATTERNS.get(language, [])
for pattern in patterns:
matches = re.findall(pattern, content, re.MULTILINE)
imports.extend(matches)
# Filter to external imports only (not relative)
external = [
imp for imp in imports
if not imp.startswith(".") and not imp.startswith("./")
]
return list(set(external))
def extract_classes(self, content: str, language: LanguageType) -> List[str]:
"""Extract main class definitions"""
patterns = {
LanguageType.PYTHON: r"^class (\w+)",
LanguageType.TYPESCRIPT: r"^(?:export )?class (\w+)",
LanguageType.RUST: r"^(?:pub )?struct (\w+)",
}
pattern = patterns.get(language)
if pattern:
return re.findall(pattern, content, re.MULTILINE)
return []
def extract_functions(self, content: str, language: LanguageType) -> List[str]:
"""Extract main function definitions"""
patterns = {
LanguageType.PYTHON: r"^def (\w+)\(",
LanguageType.TYPESCRIPT: r"^(?:export )?(?:async )?function (\w+)",
LanguageType.RUST: r"^(?:pub )?(?:async )?fn (\w+)",
}
pattern = patterns.get(language)
if pattern:
funcs = re.findall(pattern, content, re.MULTILINE)
# Filter out private/test functions
return [f for f in funcs if not f.startswith("_") and not f.startswith("test_")]
return []
def detect_patterns(self, content: str) -> List[str]:
"""Detect common design patterns in code"""
patterns = []
pattern_indicators = {
"Singleton": [r"_instance\s*=\s*None", r"getInstance\("],
"Factory": [r"def create_", r"Factory", r"Builder"],
"Observer": [r"subscribe\(", r"notify\(", r"addEventListener"],
"Repository": [r"Repository", r"def get_by_", r"def find_"],
"Service": [r"Service", r"async def process"],
"Middleware": [r"middleware", r"def __call__\(self, request"],
"Decorator": [r"@\w+\ndef", r"functools.wraps"],
"State Machine": [r"state\s*=", r"transition", r"StateMachine"],
}
for pattern_name, indicators in pattern_indicators.items():
for indicator in indicators:
if re.search(indicator, content, re.IGNORECASE):
patterns.append(pattern_name)
break
return patterns
def calculate_complexity(self, content: str) -> str:
"""Estimate code complexity (Low/Medium/High)"""
lines = content.split("\n")
line_count = len(lines)
# Count complexity indicators
nested_depth = max(
(len(line) - len(line.lstrip())) // 4
for line in lines if line.strip()
)
conditionals = len(re.findall(r"\b(if|elif|else|match|case)\b", content))
loops = len(re.findall(r"\b(for|while|loop)\b", content))
# Score calculation
score = (
(1 if line_count < 100 else 2 if line_count < 300 else 3) +
(1 if nested_depth < 4 else 2 if nested_depth < 6 else 3) +
(1 if conditionals < 10 else 2 if conditionals < 25 else 3) +
(1 if loops < 5 else 2 if loops < 10 else 3)
)
if score <= 5:
return "Low"
elif score <= 8:
return "Medium"
else:
return "High"
def generate_purpose(self, file_path: str, exports: List[str],
classes: List[str], content: str) -> str:
"""Generate concise purpose statement"""
# Try to extract from docstring
docstring_match = re.search(r'^"""(.+?)"""', content, re.DOTALL)
if docstring_match:
first_line = docstring_match.group(1).strip().split("\n")[0]
if len(first_line) < 100:
return first_line
# Generate from filename and exports
filename = Path(file_path).stem
if classes:
return f"Defines {', '.join(classes[:3])} for {filename.replace('_', ' ')} functionality"
elif exports:
return f"Provides {', '.join(exports[:3])} utilities for {filename.replace('_', ' ')}"
else:
return f"Implementation module for {filename.replace('_', ' ')}"
def summarize(self, file_path: str, content: str) -> CodeSummary:
"""
Generate comprehensive summary of a code file.
Args:
file_path: Path to the file
content: File content
Returns:
CodeSummary with all extracted information
"""
language = self.detect_language(file_path, content)
exports = self.extract_exports(content, language)
imports = self.extract_imports(content, language)
classes = self.extract_classes(content, language)
functions = self.extract_functions(content, language)
patterns = self.detect_patterns(content)
complexity = self.calculate_complexity(content)
purpose = self.generate_purpose(file_path, exports, classes, content)
# Identify internal dependencies (relative imports)
internal_deps = [
imp for imp in self.extract_imports(content, language)
if imp.startswith(".") or imp.startswith("./")
]
summary = CodeSummary(
file_path=file_path,
language=language,
purpose=purpose,
exports=exports,
imports=imports,
key_classes=classes,
key_functions=functions,
patterns_used=patterns,
dependencies=internal_deps,
line_count=len(content.split("\n")),
complexity_score=complexity,
token_estimate=len(content) // 4,
)
self.summaries[file_path] = summary
return summary
def get_implementation_context(self, file_paths: List[str]) -> str:
"""
Generate combined context for multiple implemented files.
Used when resuming multi-file implementations - provides
essential context without full file contents.
"""
context_lines = ["# Implementation Context\n"]
for path in file_paths:
if path in self.summaries:
context_lines.append(self.summaries[path].to_compact_string())
context_lines.append("")
return "\n".join(context_lines)
Usage Examples
Single File Summary
# Summarize a file after implementation
generator = CodeSummaryGenerator()
with open("src/services/auth_service.py") as f:
content = f.read()
summary = generator.summarize("src/services/auth_service.py", content)
print(summary.to_compact_string())
# Output:
# ## auth_service.py
# **Purpose:** Defines AuthService for authentication functionality
# **Exports:** AuthService, authenticate, create_token, verify_token
# **Classes:** AuthService
# **Functions:** authenticate, create_token, verify_token, hash_password
# **Depends on:** ./user_repository, ./token_manager
# **Lines:** 145 | **Complexity:** Medium
Multi-File Implementation Context
# Generate context for checkpoint/resume
generator = CodeSummaryGenerator()
# Summarize all implemented files
implemented_files = [
"src/models/user.py",
"src/services/auth_service.py",
"src/handlers/auth_handler.py",
]
for file_path in implemented_files:
with open(file_path) as f:
generator.summarize(file_path, f.read())
# Get combined context (use instead of full file contents)
context = generator.get_implementation_context(implemented_files)
print(context)
# Output: Compact summaries of all 3 files (~200 tokens vs ~2000 tokens for full files)
Integration with Memory Optimization
async def implement_with_memory_optimization(plan: dict, llm_client) -> dict:
"""Memory-optimized multi-file implementation"""
generator = CodeSummaryGenerator()
implemented = []
for file_spec in plan["files"]:
# Implement file
result = await llm_client.implement(
file_spec=file_spec,
context=generator.get_implementation_context(implemented)
)
# Generate summary immediately after write
summary = generator.summarize(file_spec["path"], result["content"])
implemented.append(file_spec["path"])
# Clear conversation history, preserve summaries
# Next iteration uses summaries instead of full history
llm_client.clear_history(preserve=["system_prompt", "plan"])
return {
"files": implemented,
"summaries": generator.get_implementation_context(implemented),
"token_savings": "70-90%"
}
Best Practices
DO
- Summarize immediately after file write - Before context clear
- Include all exports - API surface is critical for consumers
- Track dependencies - Internal file relationships matter
- Use compact format - Minimize tokens while preserving meaning
- Store summaries persistently - Enable checkpoint/resume
DON'T
- Don't include implementation details - Just API surface
- Don't include private members - Only public API
- Don't include test files in summaries - Tests are separate
- Don't regenerate summaries unnecessarily - Cache them
- Don't lose language-specific patterns - Preserve idioms
Configuration Reference
| Parameter | Default | Description |
|---|---|---|
max_exports | 10 | Maximum exports to include |
max_functions | 8 | Maximum functions to list |
max_classes | 5 | Maximum classes to list |
max_dependencies | 5 | Maximum deps to show |
include_patterns | true | Detect design patterns |
Integration with CODITECT
Primary Integration Points:
| Component | Integration Type | Usage |
|---|---|---|
| memory-optimization-agent | Primary consumer | Generate summaries after writes |
| implementation-tracker | Complementary | Track summary status per file |
| orchestrator | Workflow coordination | Multi-file implementation context |
| paper-to-code workflow | Step integration | Document generated code |
Workflow Integration:
# In memory-optimized-implementation.workflow.json
Step 5: Generate Summary:
skill: code-summary-generator
inputs: implemented_file_content
outputs: concise_summary
next: Step 6 (Clear & Checkpoint)
Success Metrics
| Metric | Target | Measurement |
|---|---|---|
| Token reduction | 70-90% | Summary size vs full file |
| Export coverage | 100% | All public API captured |
| Context preservation | >95% | Essential info retained |
| Resume success rate | >95% | Can continue from summaries |
Success Output
When successful, this skill MUST output:
✅ SKILL COMPLETE: code-summary-generator
Completed:
- [x] Code language detected ({language})
- [x] Exports extracted ({count} exports)
- [x] Dependencies identified ({count} dependencies)
- [x] Complexity calculated ({complexity})
- [x] Summary generated
Outputs:
- Compact summary (~{token_count} tokens vs ~{original_tokens} tokens)
- Token reduction: {percentage}%
- Export coverage: {count}/{total} public APIs
Summary Location: {output_path or "returned in response"}
Completion Checklist
Before marking this skill as complete, verify:
- Language detected correctly (Python, TypeScript, Rust, etc.)
- All public exports captured (functions, classes, types)
- External imports identified (dependencies)
- Internal dependencies tracked (relative imports)
- Complexity score calculated (Low/Medium/High)
- Purpose statement generated or extracted
- Token reduction achieved (target: 70-90%)
- Summary preserves essential API context
- No private members included in summary
- Design patterns detected (if applicable)
Failure Indicators
This skill has FAILED if:
- ❌ Language detection returns UNKNOWN for supported languages
- ❌ Zero exports found for file with public API
- ❌ Summary tokens exceed 30% of original file size
- ❌ Essential API information missing from summary
- ❌ Invalid file path or unreadable file content
- ❌ Regex patterns fail on valid code syntax
- ❌ Purpose statement is generic/meaningless
- ❌ Private methods (_method, private) included in exports
When NOT to Use
Do NOT use this skill when:
- Working on a single file only - Full context is available and preferred
- Debugging code - Need complete implementation details, not summaries
- Code review - Need to see actual implementation, not API surface
- First read of unfamiliar code - Need full context to understand
- Writing tests - Need implementation details to write proper tests
- Refactoring - Need complete code structure and logic
- Security audit - Need full implementation to identify vulnerabilities
- Performance optimization - Need actual code to identify bottlenecks
Use alternative approaches instead:
- Single file work → Read full file directly
- Debugging → Use debugger and full file inspection
- Code review → Request complete file changes
- Unfamiliar code → Read documentation first, then full files
Anti-Patterns (Avoid)
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Summarizing before file write | No file content exists yet | Only summarize AFTER file is written |
| Including implementation details | Defeats token reduction purpose | Extract API surface only (exports, signatures) |
| Regenerating summaries repeatedly | Wastes tokens and time | Cache summaries, only regenerate on file changes |
| Summarizing test files | Tests are separate concern | Exclude test files from summary workflow |
| Losing language-specific idioms | Generic summaries miss language nuances | Use language-specific patterns (EXPORT_PATTERNS, IMPORT_PATTERNS) |
| Over-truncating exports | Missing critical API information | Include at least top 10 exports, 8 functions, 5 classes |
| No confidence scoring | Unclear summary reliability | Include confidence/quality metrics in output |
| Mixing public and private APIs | Summary includes internal details | Filter out private members (leading underscore, private keyword) |
Principles
This skill embodies CODITECT principles:
- #1 Recycle → Extend → Re-Use → Create - Reuse summaries instead of regenerating
- #5 Eliminate Ambiguity - Clear API surface extraction, no implementation details
- #6 Clear, Understandable, Explainable - Purpose statements explain "what" without "how"
- #8 No Assumptions - Language detection based on evidence, not filename alone
- #10 Token Efficiency - 70-90% reduction while preserving essential context
- First Principles - Understand API needs: exports matter, implementation doesn't (for context)
Related Standards:
Source Reference
This pattern was extracted from DeepCode (HKUDS/DeepCode) multi-agent system.
Original location: agents/code_agent.py - summary generation functions
Original codebase stats:
- 51 Python files analyzed
- 33,497 lines of code
- 12 patterns extracted
See /submodules/labs/DeepCode/DEEP-ANALYSIS.md for complete analysis.