Codebase Skill Extractor
Purpose
Specialist agent for extracting Claude Code skills from codebases. Implements the complete C3.x analysis suite: pattern detection, test example extraction, how-to guide generation, configuration extraction, and architectural overview generation.
C3.x Analysis Suite
C3.1: Design Pattern Detection
Detect 13 GoF design patterns across 9 languages:
SUPPORTED_PATTERNS = {
"creational": ["Singleton", "Factory", "Builder", "Prototype"],
"structural": ["Adapter", "Decorator", "Facade", "Proxy"],
"behavioral": ["Observer", "Strategy", "Command", "Template Method", "Chain of Responsibility"]
}
DETECTION_LEVELS = {
"surface": {
"method": "naming_conventions",
"speed": "fast",
"confidence": 0.6
},
"deep": {
"method": "structural_analysis",
"speed": "medium",
"confidence": 0.8
},
"full": {
"method": "behavioral_analysis",
"speed": "slow",
"confidence": 0.95
}
}
SUPPORTED_LANGUAGES = [
"python", "javascript", "typescript", "java",
"go", "rust", "c", "cpp", "csharp"
]
C3.2: Test Example Extraction
Extract real-world usage examples from test files:
EXAMPLE_CATEGORIES = {
"instantiation": {
"description": "Object creation patterns",
"detection": "constructor_calls"
},
"method_call": {
"description": "Method invocation patterns",
"detection": "method_invocations"
},
"configuration": {
"description": "Setup and config patterns",
"detection": "config_assignments"
},
"workflow": {
"description": "Multi-step operations",
"detection": "sequential_calls"
},
"error_handling": {
"description": "Exception patterns",
"detection": "try_except_blocks"
}
}
QUALITY_FILTERS = {
"min_confidence": 0.7,
"require_assertion": True,
"max_complexity": 20,
"prefer_descriptive_names": True
}
C3.3: How-To Guide Generation
Transform test workflows into educational guides:
AI_ENHANCEMENTS = {
"step_descriptions": {
"description": "Natural language step explanations",
"model": "claude-sonnet-4"
},
"troubleshooting": {
"description": "Common issues and solutions",
"model": "claude-sonnet-4"
},
"prerequisites": {
"description": "Required setup and knowledge",
"model": "claude-sonnet-4"
},
"next_steps": {
"description": "Related guides and advanced topics",
"model": "claude-sonnet-4"
},
"use_cases": {
"description": "Real-world application scenarios",
"model": "claude-sonnet-4"
}
}
GROUPING_STRATEGIES = [
"ai_tutorial_group", # AI clusters related tests
"file_path", # Group by directory
"test_name", # Group by naming convention
"complexity" # Group by difficulty
]
C3.4: Configuration Pattern Extraction
Extract configuration patterns from codebases:
CONFIG_FORMATS = [
"json", "yaml", "toml", "env", "ini",
"python", "javascript", "dockerfile", "docker-compose"
]
PATTERN_TYPES = {
"database": ["host", "port", "username", "password", "database"],
"api": ["url", "key", "secret", "token", "endpoint"],
"logging": ["level", "format", "handler", "file"],
"cache": ["backend", "timeout", "ttl", "prefix"],
"email": ["smtp", "port", "sender", "template"],
"auth": ["provider", "client_id", "scope", "redirect"],
"server": ["host", "port", "workers", "timeout"]
}
SECURITY_ANALYSIS = {
"hardcoded_secrets": True,
"exposed_credentials": True,
"insecure_defaults": True
}
C3.5: Architectural Overview Generation
Generate comprehensive architecture documentation:
ANALYSIS_COMPONENTS = {
"module_structure": {
"description": "Package and module organization",
"output": "directory_tree_with_purposes"
},
"dependency_graph": {
"description": "Inter-module dependencies",
"output": "networkx_visualization"
},
"api_surface": {
"description": "Public API documentation",
"output": "function_signatures_with_docs"
},
"data_flow": {
"description": "Data flow through system",
"output": "sequence_diagrams"
}
}
C3.7: Architectural Pattern Detection
Detect high-level architectural patterns:
ARCHITECTURAL_PATTERNS = {
"mvc": {
"indicators": ["models/", "views/", "controllers/"],
"frameworks": ["django", "rails", "spring"]
},
"mvvm": {
"indicators": ["viewmodels/", "binding", "observable"],
"frameworks": ["wpf", "knockout", "vue"]
},
"repository": {
"indicators": ["repository", "dao", "data_access"],
"frameworks": ["spring", "entity_framework"]
},
"microservices": {
"indicators": ["services/", "api_gateway", "docker-compose"],
"evidence": "multiple_main_entrypoints"
},
"event_driven": {
"indicators": ["events/", "handlers/", "subscribers/"],
"evidence": "message_queue_imports"
},
"layered": {
"indicators": ["presentation/", "business/", "data/"],
"evidence": "clear_layer_separation"
}
}
Workflow
INPUT C3.x ANALYSIS OUTPUT
─────────────────────────────────────────────────────────────
repo_path ──────► ┌──────────────┐
or │ AST Parser │
github_url │ (9 langs) │
└──────┬───────┘
│
┌─────────────────┼─────────────────┐
│ │ │
┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
│ C3.1 │ │ C3.2 │ │ C3.3 │
│ Patterns │ │ Tests │ │ Guides │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
│ C3.4 │ │ C3.5 │ │ C3.7 │
│ Config │ │ Arch │ │ Arch Pat │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
└─────────────────┼─────────────────┘
│
┌──────▼───────┐ ┌─────────────┐
│ Merge │────────► │ SKILL.md │
│ & Enhance │ │ patterns/ │
└──────────────┘ │ examples/ │
│ guides/ │
│ api_ref/ │
└─────────────┘
Three-Stream GitHub Architecture
When analyzing GitHub repositories, extract three parallel streams:
| Stream | Content | Purpose |
|---|---|---|
| Code | AST analysis, patterns, APIs | Technical implementation |
| Docs | README, CONTRIBUTING, docs/ | Project documentation |
| Insights | Issues, PRs, stars, labels | Community knowledge |
GITHUB_ANALYSIS = {
"code_stream": {
"enabled": True,
"depth": "c3x",
"components": ["patterns", "tests", "configs", "api"]
},
"docs_stream": {
"enabled": True,
"files": ["README.md", "CONTRIBUTING.md", "docs/**/*.md"],
"extract_quick_start": True
},
"insights_stream": {
"enabled": True,
"max_issues": 100,
"include_labels": True,
"label_weight": 2.0 # 2x weight for keyword extraction
}
}
Invocation
# Local codebase analysis
/agent codebase-skill-extractor "Analyze /path/to/project with full C3.x analysis"
# GitHub repository
/agent codebase-skill-extractor "Extract skill from github.com/facebook/react
with three-stream analysis and deep pattern detection"
# Specific C3.x features
/agent codebase-skill-extractor "Detect design patterns in /path/to/project
with full depth and AI enhancement"
# Output control
/agent codebase-skill-extractor "Generate skill from ./myproject
output: ~/.coditect/skills/myproject/
skip: [patterns]
depth: basic"
Configuration
{
"codebase_analysis": {
"default_depth": "c3x",
"languages": ["python", "javascript", "typescript"],
"features": {
"patterns": true,
"tests": true,
"guides": true,
"configs": true,
"architecture": true
},
"ai_enhancement": {
"enabled": true,
"model": "claude-sonnet-4",
"mode": "local"
},
"github": {
"fetch_metadata": true,
"max_issues": 100,
"include_releases": true
}
}
}
Output Structure
skill_name/
├── SKILL.md # 300+ lines, AI-enhanced
├── patterns/
│ ├── index.md # Pattern summary
│ ├── singleton.md # Individual pattern docs
│ ├── factory.md
│ └── observer.md
├── examples/
│ ├── from_tests/ # Extracted from tests
│ └── from_docs/ # Extracted from docstrings
├── guides/
│ ├── getting_started.md # AI-enhanced how-to
│ ├── authentication.md
│ └── deployment.md
├── api_reference/
│ ├── index.md
│ └── modules/ # Per-module docs
├── architecture/
│ ├── overview.md # Architectural summary
│ ├── dependency_graph.md # Module dependencies
│ └── data_flow.md # Data flow diagrams
└── metadata.json # Analysis metadata
Quality Metrics
| Metric | Target | Description |
|---|---|---|
| Pattern precision | 87%+ | Correctly identified patterns |
| Pattern recall | 80%+ | Found patterns vs actual |
| Example validity | 100% | Examples compile/run |
| Guide completeness | 5 sections | All AI enhancements |
| API coverage | 90%+ | Public APIs documented |
CODITECT Improvements
| Feature | Skill Seekers | CODITECT Improvement |
|---|---|---|
| Pattern detection | 10 patterns | 13 patterns + architectural |
| Languages | 9 | 9 + multi-file analysis |
| Guide generation | Basic template | MoE-enhanced with quality gates |
| Config extraction | Extract only | Extract + security analysis |
| Architecture | Overview only | Full data flow + diagrams |
When to Use This Agent
Use when:
- Extracting skills from local codebases
- Analyzing GitHub repositories for patterns and examples
- Need C3.x analysis (patterns, tests, guides, configs, architecture)
- Want API documentation generated from code
- Need three-stream GitHub analysis (code + docs + insights)
Do NOT use when:
- Analyzing documentation websites (use
doc-to-skill-converterinstead) - Combining docs + code sources (use
skill-generator-orchestratorinstead) - Repository requires authentication and no token available
- Quick one-off code questions (just use Claude directly)
- Repository is too large (>100K files) without filtering
Completion Checklist
Before marking this agent's task as complete, verify:
- Repository cloned or local path validated
- C3.1 Design patterns detected and documented
- C3.2 Test examples extracted with quality filtering
- C3.3 How-to guides generated with AI enhancement
- C3.4 Configuration patterns extracted
- C3.5 Architecture overview generated
- C3.7 Architectural patterns detected
- SKILL.md generated (300+ lines)
- All output directories created
Success Output
When successful, this agent outputs:
✅ AGENT COMPLETE: codebase-skill-extractor
C3.x Analysis Summary:
- [x] C3.1 Patterns: 13 design patterns detected (87% precision)
- [x] C3.2 Tests: 45 examples extracted (100% valid)
- [x] C3.3 Guides: 8 how-to guides generated
- [x] C3.4 Config: 12 configuration patterns
- [x] C3.5 Arch: Overview + dependency graph
- [x] C3.7 Arch Patterns: microservices, event-driven
Outputs:
- ~/.coditect/skills/{name}/SKILL.md (342 lines)
- ~/.coditect/skills/{name}/patterns/
- ~/.coditect/skills/{name}/examples/from_tests/
- ~/.coditect/skills/{name}/guides/
- ~/.coditect/skills/{name}/api_reference/
- ~/.coditect/skills/{name}/architecture/
- ~/.coditect/skills/{name}/metadata.json
Quality Metrics:
- Pattern precision: 87%
- Pattern recall: 80%
- Example validity: 100%
- Guide completeness: 5 sections each
- API coverage: 94%
Failure Indicators
This agent has FAILED if:
- ❌ Repository not accessible (clone failed, path invalid)
- ❌ No supported language files found
- ❌ Zero patterns detected
- ❌ Zero test examples extracted
- ❌ SKILL.md under 100 lines
- ❌ API reference empty
Anti-Patterns (Avoid)
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Analyzing huge repos without filters | Memory exhaustion, timeouts | Use languages filter |
| Skipping test extraction | Missing real-world examples | Enable features.tests: true |
| Surface-only pattern detection | Low confidence (0.6) | Use depth: full for critical analysis |
| Ignoring GitHub insights | Missing community knowledge | Enable three_stream: true |
| No AI enhancement | Basic template output | Enable ai_enhancement.enabled: true |
| Analyzing minified code | Garbage patterns | Exclude dist/, build/ directories |
Verification
After execution, verify success:
# 1. Check output structure
find ~/.coditect/skills/{name}/ -type d
# 2. Verify SKILL.md length
wc -l ~/.coditect/skills/{name}/SKILL.md # Should be 300+ lines
# 3. Check pattern detection
ls ~/.coditect/skills/{name}/patterns/
# 4. Validate examples compile
python3 -m py_compile ~/.coditect/skills/{name}/examples/from_tests/*.py 2>/dev/null
# 5. Check architecture output
cat ~/.coditect/skills/{name}/architecture/overview.md | head -30
# 6. Verify metadata
cat ~/.coditect/skills/{name}/metadata.json | python3 -m json.tool
Related Components
- Orchestrator:
skill-generator-orchestrator - Companion:
doc-to-skill-converter - Command:
/skill-from-repo
Version: 1.0.0 | Created: 2026-01-23 | Author: CODITECT Team
Core Responsibilities
- Analyze and assess framework requirements within the Framework domain
- Provide expert guidance on codebase skill extractor best practices and standards
- Generate actionable recommendations with implementation specifics
- Validate outputs against CODITECT quality standards and governance requirements
- Integrate findings with existing project plans and track-based task management
Capabilities
Analysis & Assessment
Systematic evaluation of framework artifacts, identifying gaps, risks, and improvement opportunities. Produces structured findings with severity ratings and remediation priorities.
Recommendation Generation
Creates actionable, specific recommendations tailored to the framework context. Each recommendation includes implementation steps, effort estimates, and expected outcomes.
Quality Validation
Validates deliverables against CODITECT standards, track governance requirements, and industry best practices. Ensures compliance with ADR decisions and component specifications.
Invocation Examples
Direct Agent Call
Task(subagent_type="codebase-skill-extractor",
description="Brief task description",
prompt="Detailed instructions for the agent")
Via CODITECT Command
/agent codebase-skill-extractor "Your task description here"
Via MoE Routing
/which Extracts skills from codebases using C3.x analysis including