Auto Trigger Configuration (see skills/auto trigger framework/SKILL.md)

Code Summary Generator

How to Use This Skill

Review the patterns and examples below
Apply the relevant patterns to your implementation
Follow the best practices outlined in this skill

Expert skill for generating concise, information-dense summaries of code files that preserve essential context while dramatically reducing token usage. Critical for multi-file implementations where full file history would exceed context limits.

When to Use

Use this skill when:

Implementing multiple files and need to preserve context
After writing a file, before clearing conversation history
Creating checkpoints for resumable implementations
Generating implementation documentation
Need file context but not full file contents

Don't use this skill when:

Working on a single file (full context available)
Debugging (need full file details)
Code review (need complete implementation)
First read of unfamiliar code (need full context)

Core Algorithm

Summary Generation Engine

import re
from typing import List, Dict, Optional, Set
from dataclasses import dataclass, field
from enum import Enum
from pathlib import Path


class LanguageType(Enum):
    """Supported language types for analysis"""
    PYTHON = "python"
    TYPESCRIPT = "typescript"
    JAVASCRIPT = "javascript"
    RUST = "rust"
    GO = "go"
    UNKNOWN = "unknown"


@dataclass
class CodeSummary:
    """Concise summary of a code file"""
    file_path: str
    language: LanguageType
    purpose: str  # 1-2 sentence description
    exports: List[str]  # Public API surface
    imports: List[str]  # External dependencies
    key_classes: List[str]  # Main classes/structs
    key_functions: List[str]  # Main functions
    patterns_used: List[str]  # Design patterns detected
    dependencies: List[str]  # Internal file dependencies
    line_count: int
    complexity_score: str  # Low/Medium/High
    token_estimate: int

    def to_compact_string(self) -> str:
        """Generate minimal token representation"""
        lines = [
            f"## {Path(self.file_path).name}",
            f"**Purpose:** {self.purpose}",
        ]

        if self.exports:
            lines.append(f"**Exports:** {', '.join(self.exports[:10])}")

        if self.key_classes:
            lines.append(f"**Classes:** {', '.join(self.key_classes[:5])}")

        if self.key_functions:
            lines.append(f"**Functions:** {', '.join(self.key_functions[:8])}")

        if self.dependencies:
            lines.append(f"**Depends on:** {', '.join(self.dependencies[:5])}")

        lines.append(f"**Lines:** {self.line_count} | **Complexity:** {self.complexity_score}")

        return "\n".join(lines)


class CodeSummaryGenerator:
    """
    Generate token-efficient summaries of code files.

    Key Innovation: Extract only the information needed to understand
    a file's role and API without reading the full implementation.
    This enables 70-90% token reduction in multi-file workflows.
    """

    # Language detection patterns
    LANGUAGE_PATTERNS = {
        LanguageType.PYTHON: [r"\.py$", r"^import ", r"^from .+ import"],
        LanguageType.TYPESCRIPT: [r"\.tsx?$", r"^import .+ from", r": \w+\[\]"],
        LanguageType.JAVASCRIPT: [r"\.jsx?$", r"^const .+ = require"],
        LanguageType.RUST: [r"\.rs$", r"^use ", r"^pub fn", r"^impl "],
        LanguageType.GO: [r"\.go$", r"^package ", r"^func "],
    }

    # Export patterns by language
    EXPORT_PATTERNS = {
        LanguageType.PYTHON: [
            r"^def (\w+)\(",  # Functions
            r"^class (\w+)",  # Classes
            r"^(\w+)\s*=",  # Module-level assignments
        ],
        LanguageType.TYPESCRIPT: [
            r"^export (?:const|let|var|function|class|interface|type|enum) (\w+)",
            r"^export default (?:class |function )?(\w+)",
            r"^export \{ (.+) \}",
        ],
        LanguageType.RUST: [
            r"^pub fn (\w+)",
            r"^pub struct (\w+)",
            r"^pub enum (\w+)",
            r"^pub trait (\w+)",
            r"^pub type (\w+)",
        ],
    }

    # Import patterns by language
    IMPORT_PATTERNS = {
        LanguageType.PYTHON: [
            r"^import (\S+)",
            r"^from (\S+) import",
        ],
        LanguageType.TYPESCRIPT: [
            r"^import .+ from ['\"](#)['\"]",
            r"^import ['\"](#)['\"]",
        ],
        LanguageType.RUST: [
            r"^use (\S+)",
        ],
    }

    def __init__(self):
        self.summaries: Dict[str, CodeSummary] = {}

    def detect_language(self, file_path: str, content: str) -> LanguageType:
        """Detect programming language from file path and content"""
        for lang, patterns in self.LANGUAGE_PATTERNS.items():
            for pattern in patterns:
                if re.search(pattern, file_path) or re.search(pattern, content, re.MULTILINE):
                    return lang
        return LanguageType.UNKNOWN

    def extract_exports(self, content: str, language: LanguageType) -> List[str]:
        """Extract public API surface (exports)"""
        exports = []
        patterns = self.EXPORT_PATTERNS.get(language, [])

        for pattern in patterns:
            matches = re.findall(pattern, content, re.MULTILINE)
            for match in matches:
                if isinstance(match, tuple):
                    exports.extend([m.strip() for m in match if m.strip()])
                else:
                    # Handle comma-separated exports
                    exports.extend([e.strip() for e in match.split(",")])

        # Deduplicate while preserving order
        seen = set()
        unique_exports = []
        for exp in exports:
            if exp not in seen and not exp.startswith("_"):
                seen.add(exp)
                unique_exports.append(exp)

        return unique_exports

    def extract_imports(self, content: str, language: LanguageType) -> List[str]:
        """Extract external dependencies"""
        imports = []
        patterns = self.IMPORT_PATTERNS.get(language, [])

        for pattern in patterns:
            matches = re.findall(pattern, content, re.MULTILINE)
            imports.extend(matches)

        # Filter to external imports only (not relative)
        external = [
            imp for imp in imports
            if not imp.startswith(".") and not imp.startswith("./")
        ]

        return list(set(external))

    def extract_classes(self, content: str, language: LanguageType) -> List[str]:
        """Extract main class definitions"""
        patterns = {
            LanguageType.PYTHON: r"^class (\w+)",
            LanguageType.TYPESCRIPT: r"^(?:export )?class (\w+)",
            LanguageType.RUST: r"^(?:pub )?struct (\w+)",
        }

        pattern = patterns.get(language)
        if pattern:
            return re.findall(pattern, content, re.MULTILINE)
        return []

    def extract_functions(self, content: str, language: LanguageType) -> List[str]:
        """Extract main function definitions"""
        patterns = {
            LanguageType.PYTHON: r"^def (\w+)\(",
            LanguageType.TYPESCRIPT: r"^(?:export )?(?:async )?function (\w+)",
            LanguageType.RUST: r"^(?:pub )?(?:async )?fn (\w+)",
        }

        pattern = patterns.get(language)
        if pattern:
            funcs = re.findall(pattern, content, re.MULTILINE)
            # Filter out private/test functions
            return [f for f in funcs if not f.startswith("_") and not f.startswith("test_")]
        return []

    def detect_patterns(self, content: str) -> List[str]:
        """Detect common design patterns in code"""
        patterns = []

        pattern_indicators = {
            "Singleton": [r"_instance\s*=\s*None", r"getInstance\("],
            "Factory": [r"def create_", r"Factory", r"Builder"],
            "Observer": [r"subscribe\(", r"notify\(", r"addEventListener"],
            "Repository": [r"Repository", r"def get_by_", r"def find_"],
            "Service": [r"Service", r"async def process"],
            "Middleware": [r"middleware", r"def __call__\(self, request"],
            "Decorator": [r"@\w+\ndef", r"functools.wraps"],
            "State Machine": [r"state\s*=", r"transition", r"StateMachine"],
        }

        for pattern_name, indicators in pattern_indicators.items():
            for indicator in indicators:
                if re.search(indicator, content, re.IGNORECASE):
                    patterns.append(pattern_name)
                    break

        return patterns

    def calculate_complexity(self, content: str) -> str:
        """Estimate code complexity (Low/Medium/High)"""
        lines = content.split("\n")
        line_count = len(lines)

        # Count complexity indicators
        nested_depth = max(
            (len(line) - len(line.lstrip())) // 4
            for line in lines if line.strip()
        )

        conditionals = len(re.findall(r"\b(if|elif|else|match|case)\b", content))
        loops = len(re.findall(r"\b(for|while|loop)\b", content))

        # Score calculation
        score = (
            (1 if line_count < 100 else 2 if line_count < 300 else 3) +
            (1 if nested_depth < 4 else 2 if nested_depth < 6 else 3) +
            (1 if conditionals < 10 else 2 if conditionals < 25 else 3) +
            (1 if loops < 5 else 2 if loops < 10 else 3)
        )

        if score <= 5:
            return "Low"
        elif score <= 8:
            return "Medium"
        else:
            return "High"

    def generate_purpose(self, file_path: str, exports: List[str],
                         classes: List[str], content: str) -> str:
        """Generate concise purpose statement"""
        # Try to extract from docstring
        docstring_match = re.search(r'^"""(.+?)"""', content, re.DOTALL)
        if docstring_match:
            first_line = docstring_match.group(1).strip().split("\n")[0]
            if len(first_line) < 100:
                return first_line

        # Generate from filename and exports
        filename = Path(file_path).stem

        if classes:
            return f"Defines {', '.join(classes[:3])} for {filename.replace('_', ' ')} functionality"
        elif exports:
            return f"Provides {', '.join(exports[:3])} utilities for {filename.replace('_', ' ')}"
        else:
            return f"Implementation module for {filename.replace('_', ' ')}"

    def summarize(self, file_path: str, content: str) -> CodeSummary:
        """
        Generate comprehensive summary of a code file.

        Args:
            file_path: Path to the file
            content: File content

        Returns:
            CodeSummary with all extracted information
        """
        language = self.detect_language(file_path, content)
        exports = self.extract_exports(content, language)
        imports = self.extract_imports(content, language)
        classes = self.extract_classes(content, language)
        functions = self.extract_functions(content, language)
        patterns = self.detect_patterns(content)
        complexity = self.calculate_complexity(content)
        purpose = self.generate_purpose(file_path, exports, classes, content)

        # Identify internal dependencies (relative imports)
        internal_deps = [
            imp for imp in self.extract_imports(content, language)
            if imp.startswith(".") or imp.startswith("./")
        ]

        summary = CodeSummary(
            file_path=file_path,
            language=language,
            purpose=purpose,
            exports=exports,
            imports=imports,
            key_classes=classes,
            key_functions=functions,
            patterns_used=patterns,
            dependencies=internal_deps,
            line_count=len(content.split("\n")),
            complexity_score=complexity,
            token_estimate=len(content) // 4,
        )

        self.summaries[file_path] = summary
        return summary

    def get_implementation_context(self, file_paths: List[str]) -> str:
        """
        Generate combined context for multiple implemented files.

        Used when resuming multi-file implementations - provides
        essential context without full file contents.
        """
        context_lines = ["# Implementation Context\n"]

        for path in file_paths:
            if path in self.summaries:
                context_lines.append(self.summaries[path].to_compact_string())
                context_lines.append("")

        return "\n".join(context_lines)

Usage Examples

Single File Summary

# Summarize a file after implementation
generator = CodeSummaryGenerator()

with open("src/services/auth_service.py") as f:
    content = f.read()

summary = generator.summarize("src/services/auth_service.py", content)

print(summary.to_compact_string())
# Output:
# ## auth_service.py
# **Purpose:** Defines AuthService for authentication functionality
# **Exports:** AuthService, authenticate, create_token, verify_token
# **Classes:** AuthService
# **Functions:** authenticate, create_token, verify_token, hash_password
# **Depends on:** ./user_repository, ./token_manager
# **Lines:** 145 | **Complexity:** Medium

Multi-File Implementation Context

# Generate context for checkpoint/resume
generator = CodeSummaryGenerator()

# Summarize all implemented files
implemented_files = [
    "src/models/user.py",
    "src/services/auth_service.py",
    "src/handlers/auth_handler.py",
]

for file_path in implemented_files:
    with open(file_path) as f:
        generator.summarize(file_path, f.read())

# Get combined context (use instead of full file contents)
context = generator.get_implementation_context(implemented_files)
print(context)
# Output: Compact summaries of all 3 files (~200 tokens vs ~2000 tokens for full files)

Integration with Memory Optimization

async def implement_with_memory_optimization(plan: dict, llm_client) -> dict:
    """Memory-optimized multi-file implementation"""

    generator = CodeSummaryGenerator()
    implemented = []

    for file_spec in plan["files"]:
        # Implement file
        result = await llm_client.implement(
            file_spec=file_spec,
            context=generator.get_implementation_context(implemented)
        )

        # Generate summary immediately after write
        summary = generator.summarize(file_spec["path"], result["content"])
        implemented.append(file_spec["path"])

        # Clear conversation history, preserve summaries
        # Next iteration uses summaries instead of full history
        llm_client.clear_history(preserve=["system_prompt", "plan"])

    return {
        "files": implemented,
        "summaries": generator.get_implementation_context(implemented),
        "token_savings": "70-90%"
    }

Best Practices

DO

Summarize immediately after file write - Before context clear
Include all exports - API surface is critical for consumers
Track dependencies - Internal file relationships matter
Use compact format - Minimize tokens while preserving meaning
Store summaries persistently - Enable checkpoint/resume

DON'T

Don't include implementation details - Just API surface
Don't include private members - Only public API
Don't include test files in summaries - Tests are separate
Don't regenerate summaries unnecessarily - Cache them
Don't lose language-specific patterns - Preserve idioms

Configuration Reference

Parameter	Default	Description
`max_exports`	10	Maximum exports to include
`max_functions`	8	Maximum functions to list
`max_classes`	5	Maximum classes to list
`max_dependencies`	5	Maximum deps to show
`include_patterns`	true	Detect design patterns

Integration with CODITECT

Primary Integration Points:

Component	Integration Type	Usage
memory-optimization-agent	Primary consumer	Generate summaries after writes
implementation-tracker	Complementary	Track summary status per file
orchestrator	Workflow coordination	Multi-file implementation context
paper-to-code workflow	Step integration	Document generated code

Workflow Integration:

# In memory-optimized-implementation.workflow.json
Step 5: Generate Summary:
  skill: code-summary-generator
  inputs: implemented_file_content
  outputs: concise_summary
  next: Step 6 (Clear & Checkpoint)

Success Metrics

Metric	Target	Measurement
Token reduction	70-90%	Summary size vs full file
Export coverage	100%	All public API captured
Context preservation	>95%	Essential info retained
Resume success rate	>95%	Can continue from summaries

Success Output

When successful, this skill MUST output:

✅ SKILL COMPLETE: code-summary-generator

Completed:
- [x] Code language detected ({language})
- [x] Exports extracted ({count} exports)
- [x] Dependencies identified ({count} dependencies)
- [x] Complexity calculated ({complexity})
- [x] Summary generated

Outputs:
- Compact summary (~{token_count} tokens vs ~{original_tokens} tokens)
- Token reduction: {percentage}%
- Export coverage: {count}/{total} public APIs

Summary Location: {output_path or "returned in response"}

Completion Checklist

Before marking this skill as complete, verify:

Failure Indicators

This skill has FAILED if:

❌ Language detection returns UNKNOWN for supported languages
❌ Zero exports found for file with public API
❌ Summary tokens exceed 30% of original file size
❌ Essential API information missing from summary
❌ Invalid file path or unreadable file content
❌ Regex patterns fail on valid code syntax
❌ Purpose statement is generic/meaningless
❌ Private methods (_method, private) included in exports

When NOT to Use

Do NOT use this skill when:

Working on a single file only - Full context is available and preferred
Debugging code - Need complete implementation details, not summaries
Code review - Need to see actual implementation, not API surface
First read of unfamiliar code - Need full context to understand
Writing tests - Need implementation details to write proper tests
Refactoring - Need complete code structure and logic
Security audit - Need full implementation to identify vulnerabilities
Performance optimization - Need actual code to identify bottlenecks

Use alternative approaches instead:

Single file work → Read full file directly
Debugging → Use debugger and full file inspection
Code review → Request complete file changes
Unfamiliar code → Read documentation first, then full files

Anti-Patterns (Avoid)

Anti-Pattern	Problem	Solution
Summarizing before file write	No file content exists yet	Only summarize AFTER file is written
Including implementation details	Defeats token reduction purpose	Extract API surface only (exports, signatures)
Regenerating summaries repeatedly	Wastes tokens and time	Cache summaries, only regenerate on file changes
Summarizing test files	Tests are separate concern	Exclude test files from summary workflow
Losing language-specific idioms	Generic summaries miss language nuances	Use language-specific patterns (EXPORT_PATTERNS, IMPORT_PATTERNS)
Over-truncating exports	Missing critical API information	Include at least top 10 exports, 8 functions, 5 classes
No confidence scoring	Unclear summary reliability	Include confidence/quality metrics in output
Mixing public and private APIs	Summary includes internal details	Filter out private members (leading underscore, private keyword)

Principles

This skill embodies CODITECT principles:

#1 Recycle → Extend → Re-Use → Create - Reuse summaries instead of regenerating
#5 Eliminate Ambiguity - Clear API surface extraction, no implementation details
#6 Clear, Understandable, Explainable - Purpose statements explain "what" without "how"
#8 No Assumptions - Language detection based on evidence, not filename alone
#10 Token Efficiency - 70-90% reduction while preserving essential context
First Principles - Understand API needs: exports matter, implementation doesn't (for context)

Related Standards:

Source Reference

This pattern was extracted from DeepCode (HKUDS/DeepCode) multi-agent system.

Original location: agents/code_agent.py - summary generation functions

Original codebase stats:

51 Python files analyzed
33,497 lines of code
12 patterns extracted

See /submodules/labs/DeepCode/DEEP-ANALYSIS.md for complete analysis.

How to Use This Skill​

When to Use​

Core Algorithm​

Summary Generation Engine​

Usage Examples​

Single File Summary​

Multi-File Implementation Context​

Integration with Memory Optimization​

Best Practices​

DO​

DON'T​

Configuration Reference​

Integration with CODITECT​

Success Metrics​

Success Output​

Completion Checklist​

Failure Indicators​

When NOT to Use​

Anti-Patterns (Avoid)​

Principles​

Source Reference​