Skip to main content

Codebase Analysis Patterns Skill

Codebase Analysis Patterns Skill

When to Use This Skill

Use this skill when implementing codebase analysis patterns patterns in your codebase.

How to Use This Skill

  1. Review the patterns and examples below
  2. Apply the relevant patterns to your implementation
  3. Follow the best practices outlined in this skill

Architecture analysis, pattern detection, dependency mapping, and codebase metrics for comprehensive code understanding.

Core Capabilities

  1. Architecture Analysis - System structure, layer separation, module boundaries
  2. Dependency Mapping - Import graphs, circular dependencies, coupling metrics
  3. Pattern Detection - Design patterns, architectural patterns, idioms
  4. Code Metrics - Complexity, maintainability, technical debt
  5. Anti-Pattern Detection - Code smells, architectural violations, bad practices

Analysis Scope Decision Matrix

Codebase SizeFilesRecommended ScopeAnalysis DepthTime Budget
Small<50Full analysisDeep (all metrics)<5 min
Medium50-500Full with samplingStandard5-15 min
Large500-5000Module-focusedTargeted15-30 min
Enterprise>5000Domain-boundedCritical paths only30-60 min

Scope Limits by Analysis Type:

Analysis TypeMax FilesMax DepthExclude Patterns
Architecture10005 layerstest/, vendor/, node_modules/
Dependency Graph5003 hopsExternal packages, stdlib
Pattern Detection200Per-fileGenerated code, config files
Code Metrics2000AllBinary files, minified code
Circular Deps300Full graphTest fixtures, mocks

Quick Decision: What to Analyze

What's your analysis goal?
├── Understand new codebase → Architecture + Dependency (Medium scope)
├── Find technical debt → Code Metrics + Anti-Patterns (Focused scope)
├── Pre-refactor audit → Full analysis (Targeted module)
├── Security review → Dependency + Pattern (Critical paths)
├── Onboard new dev → Architecture overview (High-level only)
└── Performance issues → Metrics + Dependencies (Hot paths)

Scope Validation Checklist:

  • Excluded test directories (test/, tests/, __tests__/)
  • Excluded vendor/external (node_modules/, vendor/, .venv/)
  • Excluded generated code (dist/, build/, *.generated.*)
  • Set max file limit appropriate for time budget
  • Defined target modules (not whole monorepo)
  • Confirmed language support (Python primary, TypeScript secondary)

Architecture Analysis Script

# scripts/analyze_architecture.py
import ast
import os
from pathlib import Path
from collections import defaultdict
from typing import Dict, List, Set, Tuple
import json

class ArchitectureAnalyzer:
"""Comprehensive architecture analysis for Python codebases."""

def __init__(self, root_path: str):
self.root_path = Path(root_path)
self.modules: Dict[str, ModuleInfo] = {}
self.dependencies: Dict[str, Set[str]] = defaultdict(set)
self.layers: Dict[str, List[str]] = defaultdict(list)

def analyze(self) -> Dict:
"""Run complete architecture analysis."""
self._discover_modules()
self._analyze_dependencies()
self._detect_layers()
self._calculate_metrics()

return {
'summary': self._generate_summary(),
'modules': self._format_modules(),
'dependencies': self._format_dependencies(),
'layers': self.layers,
'metrics': self.metrics,
'violations': self._detect_violations()
}

def _discover_modules(self):
"""Discover all Python modules in codebase."""
for py_file in self.root_path.rglob('*.py'):
if self._should_analyze(py_file):
module_path = self._get_module_path(py_file)
self.modules[module_path] = ModuleInfo(py_file)

def _analyze_dependencies(self):
"""Build dependency graph using AST parsing."""
for module_path, module_info in self.modules.items():
try:
with open(module_info.file_path, 'r') as f:
tree = ast.parse(f.read(), filename=str(module_info.file_path))

visitor = ImportVisitor(module_path, self.root_path)
visitor.visit(tree)

self.dependencies[module_path] = visitor.imports
module_info.imports = visitor.imports
module_info.exports = self._extract_exports(tree)
module_info.complexity = self._calculate_complexity(tree)

except Exception as e:
print(f"Error analyzing {module_path}: {e}")

def _detect_layers(self):
"""Detect architectural layers based on module structure."""
layer_patterns = {
'presentation': ['views', 'controllers', 'ui', 'api', 'routes'],
'application': ['services', 'handlers', 'use_cases', 'commands'],
'domain': ['models', 'entities', 'domain', 'core'],
'infrastructure': ['repositories', 'adapters', 'infrastructure', 'db'],
'shared': ['utils', 'helpers', 'common', 'shared']
}

for module_path in self.modules.keys():
parts = module_path.split('.')

for layer, patterns in layer_patterns.items():
if any(pattern in part.lower() for part in parts for pattern in patterns):
self.layers[layer].append(module_path)
break
else:
self.layers['other'].append(module_path)

def _calculate_metrics(self):
"""Calculate codebase-wide metrics."""
self.metrics = {
'total_modules': len(self.modules),
'total_lines': sum(m.lines for m in self.modules.values()),
'avg_complexity': sum(m.complexity for m in self.modules.values()) / len(self.modules),
'max_complexity': max(m.complexity for m in self.modules.values()),
'coupling': self._calculate_coupling(),
'cohesion': self._calculate_cohesion(),
'circular_dependencies': self._find_circular_deps()
}

def _detect_violations(self) -> List[Dict]:
"""Detect architectural violations."""
violations = []

# Layer violations (e.g., infrastructure importing from presentation)
layer_hierarchy = ['infrastructure', 'domain', 'application', 'presentation']

for module_path, imports in self.dependencies.items():
module_layer = self._get_module_layer(module_path)

for imported in imports:
import_layer = self._get_module_layer(imported)

if module_layer and import_layer:
module_idx = layer_hierarchy.index(module_layer)
import_idx = layer_hierarchy.index(import_layer)

if module_idx < import_idx:
violations.append({
'type': 'layer_violation',
'module': module_path,
'imports': imported,
'severity': 'high',
'message': f'{module_layer} should not import from {import_layer}'
})

# Circular dependencies
for cycle in self.metrics['circular_dependencies']:
violations.append({
'type': 'circular_dependency',
'cycle': cycle,
'severity': 'critical',
'message': f'Circular dependency detected: {" -> ".join(cycle)}'
})

return violations

def _calculate_coupling(self) -> float:
"""Calculate average coupling (afferent + efferent)."""
if not self.dependencies:
return 0.0

afferent = defaultdict(int) # Who depends on this module
efferent = defaultdict(int) # Who this module depends on

for module, imports in self.dependencies.items():
efferent[module] = len(imports)
for imported in imports:
afferent[imported] += 1

total_coupling = sum(afferent[m] + efferent[m] for m in self.modules.keys())
return total_coupling / len(self.modules)

def _calculate_cohesion(self) -> float:
"""Calculate module cohesion (simplified LCOM metric)."""
cohesion_scores = []

for module_info in self.modules.values():
if module_info.classes:
# Calculate cohesion based on method relationships
score = self._lcom4(module_info)
cohesion_scores.append(score)

return sum(cohesion_scores) / len(cohesion_scores) if cohesion_scores else 0.0

def _find_circular_deps(self) -> List[List[str]]:
"""Find circular dependencies using DFS."""
cycles = []
visited = set()
rec_stack = []

def dfs(node: str, path: List[str]):
if node in rec_stack:
cycle_start = rec_stack.index(node)
cycle = rec_stack[cycle_start:] + [node]
if cycle not in cycles:
cycles.append(cycle)
return

if node in visited:
return

visited.add(node)
rec_stack.append(node)

for neighbor in self.dependencies.get(node, []):
dfs(neighbor, path + [neighbor])

rec_stack.pop()

for module in self.modules.keys():
dfs(module, [])

return cycles


class ModuleInfo:
"""Information about a single module."""

def __init__(self, file_path: Path):
self.file_path = file_path
self.imports: Set[str] = set()
self.exports: List[str] = []
self.complexity = 0
self.lines = self._count_lines()
self.classes: List[str] = []
self.functions: List[str] = []

def _count_lines(self) -> int:
"""Count non-empty, non-comment lines."""
with open(self.file_path, 'r') as f:
lines = f.readlines()
return sum(1 for line in lines if line.strip() and not line.strip().startswith('#'))


class ImportVisitor(ast.NodeVisitor):
"""AST visitor to extract imports."""

def __init__(self, module_path: str, root_path: Path):
self.module_path = module_path
self.root_path = root_path
self.imports: Set[str] = set()

def visit_Import(self, node):
for alias in node.names:
self.imports.add(alias.name.split('.')[0])
self.generic_visit(node)

def visit_ImportFrom(self, node):
if node.module:
self.imports.add(node.module.split('.')[0])
self.generic_visit(node)


# Usage example
if __name__ == '__main__':
analyzer = ArchitectureAnalyzer('/path/to/project')
results = analyzer.analyze()

print(json.dumps(results, indent=2))

Dependency Graph Visualization

# scripts/visualize_dependencies.py
import networkx as nx
import matplotlib.pyplot as plt
from typing import Dict, Set, List
import json

class DependencyVisualizer:
"""Visualize codebase dependencies as graphs."""

def __init__(self, dependencies: Dict[str, Set[str]]):
self.dependencies = dependencies
self.graph = self._build_graph()

def _build_graph(self) -> nx.DiGraph:
"""Build directed graph from dependencies."""
G = nx.DiGraph()

for module, imports in self.dependencies.items():
G.add_node(module)
for imported in imports:
G.add_edge(module, imported)

return G

def generate_report(self) -> Dict:
"""Generate comprehensive dependency report."""
return {
'summary': {
'total_modules': self.graph.number_of_nodes(),
'total_dependencies': self.graph.number_of_edges(),
'avg_dependencies': self.graph.number_of_edges() / self.graph.number_of_nodes()
},
'metrics': {
'most_imported': self._most_imported(),
'most_dependent': self._most_dependent(),
'isolated_modules': self._isolated_modules(),
'hub_modules': self._hub_modules()
},
'analysis': {
'strongly_connected': list(nx.strongly_connected_components(self.graph)),
'clustering_coefficient': nx.average_clustering(self.graph.to_undirected()),
'graph_density': nx.density(self.graph)
}
}

def _most_imported(self, top_n: int = 10) -> List[tuple]:
"""Find most imported modules (highest in-degree)."""
in_degrees = dict(self.graph.in_degree())
return sorted(in_degrees.items(), key=lambda x: x[1], reverse=True)[:top_n]

def _most_dependent(self, top_n: int = 10) -> List[tuple]:
"""Find most dependent modules (highest out-degree)."""
out_degrees = dict(self.graph.out_degree())
return sorted(out_degrees.items(), key=lambda x: x[1], reverse=True)[:top_n]

def _isolated_modules(self) -> List[str]:
"""Find modules with no dependencies."""
return [node for node in self.graph.nodes() if self.graph.degree(node) == 0]

def _hub_modules(self, threshold: int = 5) -> List[str]:
"""Find hub modules (high total degree)."""
return [
node for node in self.graph.nodes()
if self.graph.in_degree(node) + self.graph.out_degree(node) > threshold
]

def visualize(self, output_path: str = 'dependency_graph.png'):
"""Generate visual dependency graph."""
plt.figure(figsize=(20, 20))

# Use hierarchical layout
pos = nx.spring_layout(self.graph, k=2, iterations=50)

# Color nodes by importance (degree)
node_colors = [self.graph.degree(node) for node in self.graph.nodes()]

nx.draw_networkx_nodes(
self.graph, pos,
node_color=node_colors,
node_size=500,
cmap=plt.cm.Blues
)

nx.draw_networkx_edges(
self.graph, pos,
edge_color='gray',
alpha=0.5,
arrows=True,
arrowsize=10
)

nx.draw_networkx_labels(
self.graph, pos,
font_size=8
)

plt.title('Module Dependency Graph', fontsize=16)
plt.axis('off')
plt.tight_layout()
plt.savefig(output_path, dpi=300, bbox_inches='tight')
print(f"Dependency graph saved to {output_path}")


# Usage
dependencies = {
'app.models': {'app.utils', 'sqlalchemy'},
'app.views': {'app.models', 'app.services'},
'app.services': {'app.models', 'app.repositories'},
'app.repositories': {'app.models', 'app.db'}
}

visualizer = DependencyVisualizer(dependencies)
report = visualizer.generate_report()
visualizer.visualize()

Pattern Detection Engine

// tools/pattern-detector.ts
interface PatternMatch {
pattern: string;
file: string;
line: number;
confidence: number;
context: string;
}

interface DesignPattern {
name: string;
indicators: string[];
antiIndicators?: string[];
validate: (ast: any) => boolean;
}

class PatternDetector {
private patterns: DesignPattern[] = [
{
name: 'Singleton',
indicators: ['__instance', 'getInstance', '_instance = None'],
validate: (ast) => this.validateSingleton(ast)
},
{
name: 'Factory',
indicators: ['create', 'build', 'make', 'factory'],
validate: (ast) => this.validateFactory(ast)
},
{
name: 'Observer',
indicators: ['subscribe', 'notify', 'observer', 'listener'],
validate: (ast) => this.validateObserver(ast)
},
{
name: 'Strategy',
indicators: ['strategy', 'algorithm', 'execute'],
validate: (ast) => this.validateStrategy(ast)
},
{
name: 'Decorator',
indicators: ['@decorator', 'wrapper', 'wrap'],
validate: (ast) => this.validateDecorator(ast)
}
];

async detectPatterns(codebasePath: string): Promise<Map<string, PatternMatch[]>> {
const results = new Map<string, PatternMatch[]>();

for (const pattern of this.patterns) {
const matches = await this.findPattern(codebasePath, pattern);
if (matches.length > 0) {
results.set(pattern.name, matches);
}
}

return results;
}

private async findPattern(path: string, pattern: DesignPattern): Promise<PatternMatch[]> {
const matches: PatternMatch[] = [];

// Search for indicator keywords
for (const indicator of pattern.indicators) {
const grepResults = await this.grepCodebase(path, indicator);

for (const result of grepResults) {
const ast = await this.parseFile(result.file);

if (pattern.validate(ast)) {
matches.push({
pattern: pattern.name,
file: result.file,
line: result.line,
confidence: this.calculateConfidence(result, pattern),
context: result.context
});
}
}
}

return matches;
}

private validateSingleton(ast: any): boolean {
// Check for:
// 1. Private constructor or __new__ override
// 2. Class-level instance variable
// 3. getInstance or __new__ method that returns same instance

const hasPrivateConstructor = ast.classes?.some(cls =>
cls.methods?.some(m => m.name === '__new__' || m.name === '__init__')
);

const hasInstanceVariable = ast.classes?.some(cls =>
cls.classVariables?.some(v => v.name.includes('instance'))
);

return hasPrivateConstructor && hasInstanceVariable;
}

private validateFactory(ast: any): boolean {
// Check for:
// 1. Method/function that returns instances
// 2. Conditional logic to select class
// 3. No direct instantiation in calling code

const hasFactoryMethod = ast.functions?.some(fn =>
fn.name.toLowerCase().includes('create') ||
fn.name.toLowerCase().includes('build') ||
fn.name.toLowerCase().includes('make')
);

const hasConditionalReturn = ast.functions?.some(fn =>
fn.body?.includes('if') && fn.returns?.length > 1
);

return hasFactoryMethod && hasConditionalReturn;
}

private calculateConfidence(result: any, pattern: DesignPattern): number {
let confidence = 0.5; // Base confidence

// Increase confidence for multiple indicators
const indicatorCount = pattern.indicators.filter(ind =>
result.context.toLowerCase().includes(ind.toLowerCase())
).length;

confidence += (indicatorCount * 0.1);

// Decrease confidence for anti-indicators
if (pattern.antiIndicators) {
const antiCount = pattern.antiIndicators.filter(anti =>
result.context.toLowerCase().includes(anti.toLowerCase())
).length;

confidence -= (antiCount * 0.15);
}

return Math.min(Math.max(confidence, 0), 1);
}
}

// Usage
const detector = new PatternDetector();
const patterns = await detector.detectPatterns('/path/to/codebase');

console.log('Detected Design Patterns:');
for (const [pattern, matches] of patterns) {
console.log(`\n${pattern}: ${matches.length} matches`);
matches.forEach(match => {
console.log(` ${match.file}:${match.line} (confidence: ${match.confidence.toFixed(2)})`);
});
}

Code Metrics Dashboard

# scripts/metrics_dashboard.py
from dataclasses import dataclass
from typing import Dict, List
import ast
import os

@dataclass
class FileMetrics:
"""Metrics for a single file."""
path: str
lines: int
complexity: int
functions: int
classes: int
imports: int
maintainability_index: float

class MetricsCalculator:
"""Calculate comprehensive code metrics."""

def __init__(self):
self.thresholds = {
'complexity': 10,
'lines': 300,
'functions_per_file': 15,
'maintainability': 65
}

def calculate_file_metrics(self, file_path: str) -> FileMetrics:
"""Calculate metrics for a single file."""
with open(file_path, 'r') as f:
content = f.read()
tree = ast.parse(content)

return FileMetrics(
path=file_path,
lines=len(content.splitlines()),
complexity=self._calculate_complexity(tree),
functions=len([n for n in ast.walk(tree) if isinstance(n, ast.FunctionDef)]),
classes=len([n for n in ast.walk(tree) if isinstance(n, ast.ClassDef)]),
imports=len([n for n in ast.walk(tree) if isinstance(n, (ast.Import, ast.ImportFrom))]),
maintainability_index=self._calculate_maintainability(tree, content)
)

def _calculate_complexity(self, tree: ast.AST) -> int:
"""Calculate cyclomatic complexity."""
complexity = 1 # Base complexity

for node in ast.walk(tree):
if isinstance(node, (ast.If, ast.While, ast.For, ast.ExceptHandler)):
complexity += 1
elif isinstance(node, ast.BoolOp):
complexity += len(node.values) - 1

return complexity

def _calculate_maintainability(self, tree: ast.AST, content: str) -> float:
"""Calculate maintainability index (simplified)."""
volume = len(content)
complexity = self._calculate_complexity(tree)
lines = len(content.splitlines())

# Simplified MI formula: 171 - 5.2*ln(V) - 0.23*G - 16.2*ln(L)
import math
mi = max(0, 171 - 5.2 * math.log(volume) - 0.23 * complexity - 16.2 * math.log(lines))
return min(100, mi)

def generate_report(self, metrics: List[FileMetrics]) -> Dict:
"""Generate comprehensive metrics report."""
return {
'summary': {
'total_files': len(metrics),
'total_lines': sum(m.lines for m in metrics),
'avg_complexity': sum(m.complexity for m in metrics) / len(metrics),
'avg_maintainability': sum(m.maintainability_index for m in metrics) / len(metrics)
},
'warnings': self._generate_warnings(metrics),
'top_complex': sorted(metrics, key=lambda m: m.complexity, reverse=True)[:10],
'low_maintainability': [m for m in metrics if m.maintainability_index < self.thresholds['maintainability']]
}

def _generate_warnings(self, metrics: List[FileMetrics]) -> List[Dict]:
"""Generate warnings for threshold violations."""
warnings = []

for metric in metrics:
if metric.complexity > self.thresholds['complexity']:
warnings.append({
'file': metric.path,
'type': 'high_complexity',
'value': metric.complexity,
'threshold': self.thresholds['complexity']
})

if metric.lines > self.thresholds['lines']:
warnings.append({
'file': metric.path,
'type': 'large_file',
'value': metric.lines,
'threshold': self.thresholds['lines']
})

return warnings

Usage Examples

Analyze Codebase Architecture

Apply codebase-analysis-patterns skill to analyze architecture, detect layers, and map dependencies for the project

Generate Dependency Graph

Apply codebase-analysis-patterns skill to visualize module dependencies and identify circular dependencies

Detect Design Patterns

Apply codebase-analysis-patterns skill to detect Singleton, Factory, and Observer patterns in the codebase

Calculate Code Metrics

Apply codebase-analysis-patterns skill to calculate complexity, maintainability index, and identify high-risk files

Integration Points

  • codebase-navigation - File discovery and structure understanding
  • pattern-finding - Pattern matching and similar code detection
  • code-review-patterns - Review methodology and quality gates
  • software-design-patterns - Design pattern validation

Success Output

When this skill completes successfully, output:

✅ SKILL COMPLETE: codebase-analysis-patterns

Analysis Complete:
- [x] Architecture analyzed: [module-count] modules, [layer-count] layers
- [x] Dependency graph generated: [edge-count] dependencies
- [x] Design patterns detected: [pattern-count] patterns ([confidence-avg]% avg confidence)
- [x] Code metrics calculated: [file-count] files analyzed
- [x] Violations identified: [violation-count] issues

Outputs:
- Architecture report: [path]
- Dependency graph: [path] (PNG visualization)
- Pattern detection results: [path]
- Metrics dashboard: [path]
- Violation list: [path]

Metrics:
- Avg complexity: [score] (target: <10)
- Avg maintainability: [score] (target: >65)
- Coupling factor: [score]
- Circular dependencies: [count]

Completion Checklist

Before marking this skill as complete, verify:

  • All Python modules discovered and analyzed
  • Dependency graph built without errors
  • Architectural layers correctly detected
  • Design patterns validated with confidence scores
  • Code metrics calculated for all files
  • Violations categorized by severity
  • Visualizations generated (dependency graph PNG)
  • Reports exported to specified output paths

Failure Indicators

This skill has FAILED if:

  • ❌ AST parsing errors on valid Python files
  • ❌ Dependency graph has disconnected components (should be connected)
  • ❌ Layer detection misclassifies >20% of modules
  • ❌ Pattern detection confidence <50% on average
  • ❌ Metrics calculation throws exceptions
  • ❌ Circular dependency detection misses known cycles
  • ❌ Visualization rendering fails
  • ❌ Critical violations not identified (e.g., obvious layer violations)

When NOT to Use

Do NOT use this skill when:

  • Non-Python codebase (JavaScript, Java, etc.)
    • Solution: Use language-specific analysis tools
  • Simple script with <10 files
    • Solution: Manual code review sufficient
  • Analysis already performed and cached
    • Solution: Reuse existing analysis results
  • Real-time analysis needed (this skill is batch)
    • Solution: Use incremental analysis tools
  • Only need single metric (complexity, etc.)
    • Solution: Use targeted metric tool
  • Codebase uses unsupported Python syntax (e.g., Python 2)
    • Solution: Upgrade codebase or use compatible analyzer
  • Architecture not following standard patterns
    • Solution: Configure custom layer detection rules first

Anti-Patterns (Avoid)

Anti-PatternProblemSolution
Analyzing entire monorepoExcessive time, irrelevant resultsScope to relevant subdirectories
Ignoring layer detection configMisclassified modulesConfigure layer patterns before analysis
Over-reliance on pattern detectionFalse positivesValidate high-confidence matches manually
Not filtering test filesSkewed metricsExclude test/ directories from metrics
Analyzing generated codeNoise in resultsExclude build/, dist/, generated/
Single-threaded analysisSlow on large codebasesUse parallel file processing
No validation of resultsIncorrect conclusionsCross-check metrics with manual review
Treating all violations equallyMisplaced prioritiesTriage by severity (CRITICAL/HIGH/MEDIUM/LOW)
Analyzing without baselineNo improvement trackingEstablish baseline metrics first
Ignoring contextInappropriate patterns flaggedConsider domain-specific patterns

Principles

This skill embodies the following CODITECT principles:

#1 Recycle → Extend → Re-Use → Create

  • Reuses AST parsing (Python's ast module)
  • Extends with custom layer detection
  • Creates comprehensive analysis on top of base tools

#2 Automation with Minimal Human Intervention

  • Automated module discovery
  • Automatic dependency graph construction
  • Self-configuring layer detection
  • Automated pattern detection and validation

#3 Separation of Concerns

  • Architecture analysis separate from metrics
  • Dependency mapping isolated from pattern detection
  • Each analysis component independently runnable

#5 Eliminate Ambiguity

  • Explicit layer hierarchy (infrastructure → domain → application → presentation)
  • Clear violation severity levels
  • Confidence scores for pattern detection

#6 Clear, Understandable, Explainable

  • Visual dependency graphs
  • Detailed metrics dashboard
  • Violation reports with explanations
  • Pattern matches with context

#7 First Principles Thinking

  • Architecture based on fundamental software design principles
  • Layer violations defined by dependency direction
  • Complexity metrics grounded in cyclomatic complexity theory

#8 No Assumptions

  • Validates module paths before analysis
  • Confirms AST parseable before processing
  • Checks layer patterns against actual structure
  • Verifies pattern confidence before reporting