Codebase Analysis Patterns Skill
Codebase Analysis Patterns Skill
When to Use This Skill
Use this skill when implementing codebase analysis patterns patterns in your codebase.
How to Use This Skill
- Review the patterns and examples below
- Apply the relevant patterns to your implementation
- Follow the best practices outlined in this skill
Architecture analysis, pattern detection, dependency mapping, and codebase metrics for comprehensive code understanding.
Core Capabilities
- Architecture Analysis - System structure, layer separation, module boundaries
- Dependency Mapping - Import graphs, circular dependencies, coupling metrics
- Pattern Detection - Design patterns, architectural patterns, idioms
- Code Metrics - Complexity, maintainability, technical debt
- Anti-Pattern Detection - Code smells, architectural violations, bad practices
Analysis Scope Decision Matrix
| Codebase Size | Files | Recommended Scope | Analysis Depth | Time Budget |
|---|---|---|---|---|
| Small | <50 | Full analysis | Deep (all metrics) | <5 min |
| Medium | 50-500 | Full with sampling | Standard | 5-15 min |
| Large | 500-5000 | Module-focused | Targeted | 15-30 min |
| Enterprise | >5000 | Domain-bounded | Critical paths only | 30-60 min |
Scope Limits by Analysis Type:
| Analysis Type | Max Files | Max Depth | Exclude Patterns |
|---|---|---|---|
| Architecture | 1000 | 5 layers | test/, vendor/, node_modules/ |
| Dependency Graph | 500 | 3 hops | External packages, stdlib |
| Pattern Detection | 200 | Per-file | Generated code, config files |
| Code Metrics | 2000 | All | Binary files, minified code |
| Circular Deps | 300 | Full graph | Test fixtures, mocks |
Quick Decision: What to Analyze
What's your analysis goal?
├── Understand new codebase → Architecture + Dependency (Medium scope)
├── Find technical debt → Code Metrics + Anti-Patterns (Focused scope)
├── Pre-refactor audit → Full analysis (Targeted module)
├── Security review → Dependency + Pattern (Critical paths)
├── Onboard new dev → Architecture overview (High-level only)
└── Performance issues → Metrics + Dependencies (Hot paths)
Scope Validation Checklist:
- Excluded test directories (
test/,tests/,__tests__/) - Excluded vendor/external (
node_modules/,vendor/,.venv/) - Excluded generated code (
dist/,build/,*.generated.*) - Set max file limit appropriate for time budget
- Defined target modules (not whole monorepo)
- Confirmed language support (Python primary, TypeScript secondary)
Architecture Analysis Script
# scripts/analyze_architecture.py
import ast
import os
from pathlib import Path
from collections import defaultdict
from typing import Dict, List, Set, Tuple
import json
class ArchitectureAnalyzer:
"""Comprehensive architecture analysis for Python codebases."""
def __init__(self, root_path: str):
self.root_path = Path(root_path)
self.modules: Dict[str, ModuleInfo] = {}
self.dependencies: Dict[str, Set[str]] = defaultdict(set)
self.layers: Dict[str, List[str]] = defaultdict(list)
def analyze(self) -> Dict:
"""Run complete architecture analysis."""
self._discover_modules()
self._analyze_dependencies()
self._detect_layers()
self._calculate_metrics()
return {
'summary': self._generate_summary(),
'modules': self._format_modules(),
'dependencies': self._format_dependencies(),
'layers': self.layers,
'metrics': self.metrics,
'violations': self._detect_violations()
}
def _discover_modules(self):
"""Discover all Python modules in codebase."""
for py_file in self.root_path.rglob('*.py'):
if self._should_analyze(py_file):
module_path = self._get_module_path(py_file)
self.modules[module_path] = ModuleInfo(py_file)
def _analyze_dependencies(self):
"""Build dependency graph using AST parsing."""
for module_path, module_info in self.modules.items():
try:
with open(module_info.file_path, 'r') as f:
tree = ast.parse(f.read(), filename=str(module_info.file_path))
visitor = ImportVisitor(module_path, self.root_path)
visitor.visit(tree)
self.dependencies[module_path] = visitor.imports
module_info.imports = visitor.imports
module_info.exports = self._extract_exports(tree)
module_info.complexity = self._calculate_complexity(tree)
except Exception as e:
print(f"Error analyzing {module_path}: {e}")
def _detect_layers(self):
"""Detect architectural layers based on module structure."""
layer_patterns = {
'presentation': ['views', 'controllers', 'ui', 'api', 'routes'],
'application': ['services', 'handlers', 'use_cases', 'commands'],
'domain': ['models', 'entities', 'domain', 'core'],
'infrastructure': ['repositories', 'adapters', 'infrastructure', 'db'],
'shared': ['utils', 'helpers', 'common', 'shared']
}
for module_path in self.modules.keys():
parts = module_path.split('.')
for layer, patterns in layer_patterns.items():
if any(pattern in part.lower() for part in parts for pattern in patterns):
self.layers[layer].append(module_path)
break
else:
self.layers['other'].append(module_path)
def _calculate_metrics(self):
"""Calculate codebase-wide metrics."""
self.metrics = {
'total_modules': len(self.modules),
'total_lines': sum(m.lines for m in self.modules.values()),
'avg_complexity': sum(m.complexity for m in self.modules.values()) / len(self.modules),
'max_complexity': max(m.complexity for m in self.modules.values()),
'coupling': self._calculate_coupling(),
'cohesion': self._calculate_cohesion(),
'circular_dependencies': self._find_circular_deps()
}
def _detect_violations(self) -> List[Dict]:
"""Detect architectural violations."""
violations = []
# Layer violations (e.g., infrastructure importing from presentation)
layer_hierarchy = ['infrastructure', 'domain', 'application', 'presentation']
for module_path, imports in self.dependencies.items():
module_layer = self._get_module_layer(module_path)
for imported in imports:
import_layer = self._get_module_layer(imported)
if module_layer and import_layer:
module_idx = layer_hierarchy.index(module_layer)
import_idx = layer_hierarchy.index(import_layer)
if module_idx < import_idx:
violations.append({
'type': 'layer_violation',
'module': module_path,
'imports': imported,
'severity': 'high',
'message': f'{module_layer} should not import from {import_layer}'
})
# Circular dependencies
for cycle in self.metrics['circular_dependencies']:
violations.append({
'type': 'circular_dependency',
'cycle': cycle,
'severity': 'critical',
'message': f'Circular dependency detected: {" -> ".join(cycle)}'
})
return violations
def _calculate_coupling(self) -> float:
"""Calculate average coupling (afferent + efferent)."""
if not self.dependencies:
return 0.0
afferent = defaultdict(int) # Who depends on this module
efferent = defaultdict(int) # Who this module depends on
for module, imports in self.dependencies.items():
efferent[module] = len(imports)
for imported in imports:
afferent[imported] += 1
total_coupling = sum(afferent[m] + efferent[m] for m in self.modules.keys())
return total_coupling / len(self.modules)
def _calculate_cohesion(self) -> float:
"""Calculate module cohesion (simplified LCOM metric)."""
cohesion_scores = []
for module_info in self.modules.values():
if module_info.classes:
# Calculate cohesion based on method relationships
score = self._lcom4(module_info)
cohesion_scores.append(score)
return sum(cohesion_scores) / len(cohesion_scores) if cohesion_scores else 0.0
def _find_circular_deps(self) -> List[List[str]]:
"""Find circular dependencies using DFS."""
cycles = []
visited = set()
rec_stack = []
def dfs(node: str, path: List[str]):
if node in rec_stack:
cycle_start = rec_stack.index(node)
cycle = rec_stack[cycle_start:] + [node]
if cycle not in cycles:
cycles.append(cycle)
return
if node in visited:
return
visited.add(node)
rec_stack.append(node)
for neighbor in self.dependencies.get(node, []):
dfs(neighbor, path + [neighbor])
rec_stack.pop()
for module in self.modules.keys():
dfs(module, [])
return cycles
class ModuleInfo:
"""Information about a single module."""
def __init__(self, file_path: Path):
self.file_path = file_path
self.imports: Set[str] = set()
self.exports: List[str] = []
self.complexity = 0
self.lines = self._count_lines()
self.classes: List[str] = []
self.functions: List[str] = []
def _count_lines(self) -> int:
"""Count non-empty, non-comment lines."""
with open(self.file_path, 'r') as f:
lines = f.readlines()
return sum(1 for line in lines if line.strip() and not line.strip().startswith('#'))
class ImportVisitor(ast.NodeVisitor):
"""AST visitor to extract imports."""
def __init__(self, module_path: str, root_path: Path):
self.module_path = module_path
self.root_path = root_path
self.imports: Set[str] = set()
def visit_Import(self, node):
for alias in node.names:
self.imports.add(alias.name.split('.')[0])
self.generic_visit(node)
def visit_ImportFrom(self, node):
if node.module:
self.imports.add(node.module.split('.')[0])
self.generic_visit(node)
# Usage example
if __name__ == '__main__':
analyzer = ArchitectureAnalyzer('/path/to/project')
results = analyzer.analyze()
print(json.dumps(results, indent=2))
Dependency Graph Visualization
# scripts/visualize_dependencies.py
import networkx as nx
import matplotlib.pyplot as plt
from typing import Dict, Set, List
import json
class DependencyVisualizer:
"""Visualize codebase dependencies as graphs."""
def __init__(self, dependencies: Dict[str, Set[str]]):
self.dependencies = dependencies
self.graph = self._build_graph()
def _build_graph(self) -> nx.DiGraph:
"""Build directed graph from dependencies."""
G = nx.DiGraph()
for module, imports in self.dependencies.items():
G.add_node(module)
for imported in imports:
G.add_edge(module, imported)
return G
def generate_report(self) -> Dict:
"""Generate comprehensive dependency report."""
return {
'summary': {
'total_modules': self.graph.number_of_nodes(),
'total_dependencies': self.graph.number_of_edges(),
'avg_dependencies': self.graph.number_of_edges() / self.graph.number_of_nodes()
},
'metrics': {
'most_imported': self._most_imported(),
'most_dependent': self._most_dependent(),
'isolated_modules': self._isolated_modules(),
'hub_modules': self._hub_modules()
},
'analysis': {
'strongly_connected': list(nx.strongly_connected_components(self.graph)),
'clustering_coefficient': nx.average_clustering(self.graph.to_undirected()),
'graph_density': nx.density(self.graph)
}
}
def _most_imported(self, top_n: int = 10) -> List[tuple]:
"""Find most imported modules (highest in-degree)."""
in_degrees = dict(self.graph.in_degree())
return sorted(in_degrees.items(), key=lambda x: x[1], reverse=True)[:top_n]
def _most_dependent(self, top_n: int = 10) -> List[tuple]:
"""Find most dependent modules (highest out-degree)."""
out_degrees = dict(self.graph.out_degree())
return sorted(out_degrees.items(), key=lambda x: x[1], reverse=True)[:top_n]
def _isolated_modules(self) -> List[str]:
"""Find modules with no dependencies."""
return [node for node in self.graph.nodes() if self.graph.degree(node) == 0]
def _hub_modules(self, threshold: int = 5) -> List[str]:
"""Find hub modules (high total degree)."""
return [
node for node in self.graph.nodes()
if self.graph.in_degree(node) + self.graph.out_degree(node) > threshold
]
def visualize(self, output_path: str = 'dependency_graph.png'):
"""Generate visual dependency graph."""
plt.figure(figsize=(20, 20))
# Use hierarchical layout
pos = nx.spring_layout(self.graph, k=2, iterations=50)
# Color nodes by importance (degree)
node_colors = [self.graph.degree(node) for node in self.graph.nodes()]
nx.draw_networkx_nodes(
self.graph, pos,
node_color=node_colors,
node_size=500,
cmap=plt.cm.Blues
)
nx.draw_networkx_edges(
self.graph, pos,
edge_color='gray',
alpha=0.5,
arrows=True,
arrowsize=10
)
nx.draw_networkx_labels(
self.graph, pos,
font_size=8
)
plt.title('Module Dependency Graph', fontsize=16)
plt.axis('off')
plt.tight_layout()
plt.savefig(output_path, dpi=300, bbox_inches='tight')
print(f"Dependency graph saved to {output_path}")
# Usage
dependencies = {
'app.models': {'app.utils', 'sqlalchemy'},
'app.views': {'app.models', 'app.services'},
'app.services': {'app.models', 'app.repositories'},
'app.repositories': {'app.models', 'app.db'}
}
visualizer = DependencyVisualizer(dependencies)
report = visualizer.generate_report()
visualizer.visualize()
Pattern Detection Engine
// tools/pattern-detector.ts
interface PatternMatch {
pattern: string;
file: string;
line: number;
confidence: number;
context: string;
}
interface DesignPattern {
name: string;
indicators: string[];
antiIndicators?: string[];
validate: (ast: any) => boolean;
}
class PatternDetector {
private patterns: DesignPattern[] = [
{
name: 'Singleton',
indicators: ['__instance', 'getInstance', '_instance = None'],
validate: (ast) => this.validateSingleton(ast)
},
{
name: 'Factory',
indicators: ['create', 'build', 'make', 'factory'],
validate: (ast) => this.validateFactory(ast)
},
{
name: 'Observer',
indicators: ['subscribe', 'notify', 'observer', 'listener'],
validate: (ast) => this.validateObserver(ast)
},
{
name: 'Strategy',
indicators: ['strategy', 'algorithm', 'execute'],
validate: (ast) => this.validateStrategy(ast)
},
{
name: 'Decorator',
indicators: ['@decorator', 'wrapper', 'wrap'],
validate: (ast) => this.validateDecorator(ast)
}
];
async detectPatterns(codebasePath: string): Promise<Map<string, PatternMatch[]>> {
const results = new Map<string, PatternMatch[]>();
for (const pattern of this.patterns) {
const matches = await this.findPattern(codebasePath, pattern);
if (matches.length > 0) {
results.set(pattern.name, matches);
}
}
return results;
}
private async findPattern(path: string, pattern: DesignPattern): Promise<PatternMatch[]> {
const matches: PatternMatch[] = [];
// Search for indicator keywords
for (const indicator of pattern.indicators) {
const grepResults = await this.grepCodebase(path, indicator);
for (const result of grepResults) {
const ast = await this.parseFile(result.file);
if (pattern.validate(ast)) {
matches.push({
pattern: pattern.name,
file: result.file,
line: result.line,
confidence: this.calculateConfidence(result, pattern),
context: result.context
});
}
}
}
return matches;
}
private validateSingleton(ast: any): boolean {
// Check for:
// 1. Private constructor or __new__ override
// 2. Class-level instance variable
// 3. getInstance or __new__ method that returns same instance
const hasPrivateConstructor = ast.classes?.some(cls =>
cls.methods?.some(m => m.name === '__new__' || m.name === '__init__')
);
const hasInstanceVariable = ast.classes?.some(cls =>
cls.classVariables?.some(v => v.name.includes('instance'))
);
return hasPrivateConstructor && hasInstanceVariable;
}
private validateFactory(ast: any): boolean {
// Check for:
// 1. Method/function that returns instances
// 2. Conditional logic to select class
// 3. No direct instantiation in calling code
const hasFactoryMethod = ast.functions?.some(fn =>
fn.name.toLowerCase().includes('create') ||
fn.name.toLowerCase().includes('build') ||
fn.name.toLowerCase().includes('make')
);
const hasConditionalReturn = ast.functions?.some(fn =>
fn.body?.includes('if') && fn.returns?.length > 1
);
return hasFactoryMethod && hasConditionalReturn;
}
private calculateConfidence(result: any, pattern: DesignPattern): number {
let confidence = 0.5; // Base confidence
// Increase confidence for multiple indicators
const indicatorCount = pattern.indicators.filter(ind =>
result.context.toLowerCase().includes(ind.toLowerCase())
).length;
confidence += (indicatorCount * 0.1);
// Decrease confidence for anti-indicators
if (pattern.antiIndicators) {
const antiCount = pattern.antiIndicators.filter(anti =>
result.context.toLowerCase().includes(anti.toLowerCase())
).length;
confidence -= (antiCount * 0.15);
}
return Math.min(Math.max(confidence, 0), 1);
}
}
// Usage
const detector = new PatternDetector();
const patterns = await detector.detectPatterns('/path/to/codebase');
console.log('Detected Design Patterns:');
for (const [pattern, matches] of patterns) {
console.log(`\n${pattern}: ${matches.length} matches`);
matches.forEach(match => {
console.log(` ${match.file}:${match.line} (confidence: ${match.confidence.toFixed(2)})`);
});
}
Code Metrics Dashboard
# scripts/metrics_dashboard.py
from dataclasses import dataclass
from typing import Dict, List
import ast
import os
@dataclass
class FileMetrics:
"""Metrics for a single file."""
path: str
lines: int
complexity: int
functions: int
classes: int
imports: int
maintainability_index: float
class MetricsCalculator:
"""Calculate comprehensive code metrics."""
def __init__(self):
self.thresholds = {
'complexity': 10,
'lines': 300,
'functions_per_file': 15,
'maintainability': 65
}
def calculate_file_metrics(self, file_path: str) -> FileMetrics:
"""Calculate metrics for a single file."""
with open(file_path, 'r') as f:
content = f.read()
tree = ast.parse(content)
return FileMetrics(
path=file_path,
lines=len(content.splitlines()),
complexity=self._calculate_complexity(tree),
functions=len([n for n in ast.walk(tree) if isinstance(n, ast.FunctionDef)]),
classes=len([n for n in ast.walk(tree) if isinstance(n, ast.ClassDef)]),
imports=len([n for n in ast.walk(tree) if isinstance(n, (ast.Import, ast.ImportFrom))]),
maintainability_index=self._calculate_maintainability(tree, content)
)
def _calculate_complexity(self, tree: ast.AST) -> int:
"""Calculate cyclomatic complexity."""
complexity = 1 # Base complexity
for node in ast.walk(tree):
if isinstance(node, (ast.If, ast.While, ast.For, ast.ExceptHandler)):
complexity += 1
elif isinstance(node, ast.BoolOp):
complexity += len(node.values) - 1
return complexity
def _calculate_maintainability(self, tree: ast.AST, content: str) -> float:
"""Calculate maintainability index (simplified)."""
volume = len(content)
complexity = self._calculate_complexity(tree)
lines = len(content.splitlines())
# Simplified MI formula: 171 - 5.2*ln(V) - 0.23*G - 16.2*ln(L)
import math
mi = max(0, 171 - 5.2 * math.log(volume) - 0.23 * complexity - 16.2 * math.log(lines))
return min(100, mi)
def generate_report(self, metrics: List[FileMetrics]) -> Dict:
"""Generate comprehensive metrics report."""
return {
'summary': {
'total_files': len(metrics),
'total_lines': sum(m.lines for m in metrics),
'avg_complexity': sum(m.complexity for m in metrics) / len(metrics),
'avg_maintainability': sum(m.maintainability_index for m in metrics) / len(metrics)
},
'warnings': self._generate_warnings(metrics),
'top_complex': sorted(metrics, key=lambda m: m.complexity, reverse=True)[:10],
'low_maintainability': [m for m in metrics if m.maintainability_index < self.thresholds['maintainability']]
}
def _generate_warnings(self, metrics: List[FileMetrics]) -> List[Dict]:
"""Generate warnings for threshold violations."""
warnings = []
for metric in metrics:
if metric.complexity > self.thresholds['complexity']:
warnings.append({
'file': metric.path,
'type': 'high_complexity',
'value': metric.complexity,
'threshold': self.thresholds['complexity']
})
if metric.lines > self.thresholds['lines']:
warnings.append({
'file': metric.path,
'type': 'large_file',
'value': metric.lines,
'threshold': self.thresholds['lines']
})
return warnings
Usage Examples
Analyze Codebase Architecture
Apply codebase-analysis-patterns skill to analyze architecture, detect layers, and map dependencies for the project
Generate Dependency Graph
Apply codebase-analysis-patterns skill to visualize module dependencies and identify circular dependencies
Detect Design Patterns
Apply codebase-analysis-patterns skill to detect Singleton, Factory, and Observer patterns in the codebase
Calculate Code Metrics
Apply codebase-analysis-patterns skill to calculate complexity, maintainability index, and identify high-risk files
Integration Points
- codebase-navigation - File discovery and structure understanding
- pattern-finding - Pattern matching and similar code detection
- code-review-patterns - Review methodology and quality gates
- software-design-patterns - Design pattern validation
Success Output
When this skill completes successfully, output:
✅ SKILL COMPLETE: codebase-analysis-patterns
Analysis Complete:
- [x] Architecture analyzed: [module-count] modules, [layer-count] layers
- [x] Dependency graph generated: [edge-count] dependencies
- [x] Design patterns detected: [pattern-count] patterns ([confidence-avg]% avg confidence)
- [x] Code metrics calculated: [file-count] files analyzed
- [x] Violations identified: [violation-count] issues
Outputs:
- Architecture report: [path]
- Dependency graph: [path] (PNG visualization)
- Pattern detection results: [path]
- Metrics dashboard: [path]
- Violation list: [path]
Metrics:
- Avg complexity: [score] (target: <10)
- Avg maintainability: [score] (target: >65)
- Coupling factor: [score]
- Circular dependencies: [count]
Completion Checklist
Before marking this skill as complete, verify:
- All Python modules discovered and analyzed
- Dependency graph built without errors
- Architectural layers correctly detected
- Design patterns validated with confidence scores
- Code metrics calculated for all files
- Violations categorized by severity
- Visualizations generated (dependency graph PNG)
- Reports exported to specified output paths
Failure Indicators
This skill has FAILED if:
- ❌ AST parsing errors on valid Python files
- ❌ Dependency graph has disconnected components (should be connected)
- ❌ Layer detection misclassifies >20% of modules
- ❌ Pattern detection confidence <50% on average
- ❌ Metrics calculation throws exceptions
- ❌ Circular dependency detection misses known cycles
- ❌ Visualization rendering fails
- ❌ Critical violations not identified (e.g., obvious layer violations)
When NOT to Use
Do NOT use this skill when:
- Non-Python codebase (JavaScript, Java, etc.)
- Solution: Use language-specific analysis tools
- Simple script with <10 files
- Solution: Manual code review sufficient
- Analysis already performed and cached
- Solution: Reuse existing analysis results
- Real-time analysis needed (this skill is batch)
- Solution: Use incremental analysis tools
- Only need single metric (complexity, etc.)
- Solution: Use targeted metric tool
- Codebase uses unsupported Python syntax (e.g., Python 2)
- Solution: Upgrade codebase or use compatible analyzer
- Architecture not following standard patterns
- Solution: Configure custom layer detection rules first
Anti-Patterns (Avoid)
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Analyzing entire monorepo | Excessive time, irrelevant results | Scope to relevant subdirectories |
| Ignoring layer detection config | Misclassified modules | Configure layer patterns before analysis |
| Over-reliance on pattern detection | False positives | Validate high-confidence matches manually |
| Not filtering test files | Skewed metrics | Exclude test/ directories from metrics |
| Analyzing generated code | Noise in results | Exclude build/, dist/, generated/ |
| Single-threaded analysis | Slow on large codebases | Use parallel file processing |
| No validation of results | Incorrect conclusions | Cross-check metrics with manual review |
| Treating all violations equally | Misplaced priorities | Triage by severity (CRITICAL/HIGH/MEDIUM/LOW) |
| Analyzing without baseline | No improvement tracking | Establish baseline metrics first |
| Ignoring context | Inappropriate patterns flagged | Consider domain-specific patterns |
Principles
This skill embodies the following CODITECT principles:
#1 Recycle → Extend → Re-Use → Create
- Reuses AST parsing (Python's ast module)
- Extends with custom layer detection
- Creates comprehensive analysis on top of base tools
#2 Automation with Minimal Human Intervention
- Automated module discovery
- Automatic dependency graph construction
- Self-configuring layer detection
- Automated pattern detection and validation
#3 Separation of Concerns
- Architecture analysis separate from metrics
- Dependency mapping isolated from pattern detection
- Each analysis component independently runnable
#5 Eliminate Ambiguity
- Explicit layer hierarchy (infrastructure → domain → application → presentation)
- Clear violation severity levels
- Confidence scores for pattern detection
#6 Clear, Understandable, Explainable
- Visual dependency graphs
- Detailed metrics dashboard
- Violation reports with explanations
- Pattern matches with context
#7 First Principles Thinking
- Architecture based on fundamental software design principles
- Layer violations defined by dependency direction
- Complexity metrics grounded in cyclomatic complexity theory
#8 No Assumptions
- Validates module paths before analysis
- Confirms AST parseable before processing
- Checks layer patterns against actual structure
- Verifies pattern confidence before reporting