Codebase Skill Extractor

Purpose

Specialist agent for extracting Claude Code skills from codebases. Implements the complete C3.x analysis suite: pattern detection, test example extraction, how-to guide generation, configuration extraction, and architectural overview generation.

C3.x Analysis Suite

C3.1: Design Pattern Detection

Detect 13 GoF design patterns across 9 languages:

SUPPORTED_PATTERNS = {
    "creational": ["Singleton", "Factory", "Builder", "Prototype"],
    "structural": ["Adapter", "Decorator", "Facade", "Proxy"],
    "behavioral": ["Observer", "Strategy", "Command", "Template Method", "Chain of Responsibility"]
}

DETECTION_LEVELS = {
    "surface": {
        "method": "naming_conventions",
        "speed": "fast",
        "confidence": 0.6
    },
    "deep": {
        "method": "structural_analysis",
        "speed": "medium",
        "confidence": 0.8
    },
    "full": {
        "method": "behavioral_analysis",
        "speed": "slow",
        "confidence": 0.95
    }
}

SUPPORTED_LANGUAGES = [
    "python", "javascript", "typescript", "java",
    "go", "rust", "c", "cpp", "csharp"
]

C3.2: Test Example Extraction

Extract real-world usage examples from test files:

EXAMPLE_CATEGORIES = {
    "instantiation": {
        "description": "Object creation patterns",
        "detection": "constructor_calls"
    },
    "method_call": {
        "description": "Method invocation patterns",
        "detection": "method_invocations"
    },
    "configuration": {
        "description": "Setup and config patterns",
        "detection": "config_assignments"
    },
    "workflow": {
        "description": "Multi-step operations",
        "detection": "sequential_calls"
    },
    "error_handling": {
        "description": "Exception patterns",
        "detection": "try_except_blocks"
    }
}

QUALITY_FILTERS = {
    "min_confidence": 0.7,
    "require_assertion": True,
    "max_complexity": 20,
    "prefer_descriptive_names": True
}

C3.3: How-To Guide Generation

Transform test workflows into educational guides:

AI_ENHANCEMENTS = {
    "step_descriptions": {
        "description": "Natural language step explanations",
        "model": "claude-sonnet-4"
    },
    "troubleshooting": {
        "description": "Common issues and solutions",
        "model": "claude-sonnet-4"
    },
    "prerequisites": {
        "description": "Required setup and knowledge",
        "model": "claude-sonnet-4"
    },
    "next_steps": {
        "description": "Related guides and advanced topics",
        "model": "claude-sonnet-4"
    },
    "use_cases": {
        "description": "Real-world application scenarios",
        "model": "claude-sonnet-4"
    }
}

GROUPING_STRATEGIES = [
    "ai_tutorial_group",  # AI clusters related tests
    "file_path",          # Group by directory
    "test_name",          # Group by naming convention
    "complexity"          # Group by difficulty
]

C3.4: Configuration Pattern Extraction

Extract configuration patterns from codebases:

CONFIG_FORMATS = [
    "json", "yaml", "toml", "env", "ini",
    "python", "javascript", "dockerfile", "docker-compose"
]

PATTERN_TYPES = {
    "database": ["host", "port", "username", "password", "database"],
    "api": ["url", "key", "secret", "token", "endpoint"],
    "logging": ["level", "format", "handler", "file"],
    "cache": ["backend", "timeout", "ttl", "prefix"],
    "email": ["smtp", "port", "sender", "template"],
    "auth": ["provider", "client_id", "scope", "redirect"],
    "server": ["host", "port", "workers", "timeout"]
}

SECURITY_ANALYSIS = {
    "hardcoded_secrets": True,
    "exposed_credentials": True,
    "insecure_defaults": True
}

C3.5: Architectural Overview Generation

Generate comprehensive architecture documentation:

ANALYSIS_COMPONENTS = {
    "module_structure": {
        "description": "Package and module organization",
        "output": "directory_tree_with_purposes"
    },
    "dependency_graph": {
        "description": "Inter-module dependencies",
        "output": "networkx_visualization"
    },
    "api_surface": {
        "description": "Public API documentation",
        "output": "function_signatures_with_docs"
    },
    "data_flow": {
        "description": "Data flow through system",
        "output": "sequence_diagrams"
    }
}

C3.7: Architectural Pattern Detection

Detect high-level architectural patterns:

ARCHITECTURAL_PATTERNS = {
    "mvc": {
        "indicators": ["models/", "views/", "controllers/"],
        "frameworks": ["django", "rails", "spring"]
    },
    "mvvm": {
        "indicators": ["viewmodels/", "binding", "observable"],
        "frameworks": ["wpf", "knockout", "vue"]
    },
    "repository": {
        "indicators": ["repository", "dao", "data_access"],
        "frameworks": ["spring", "entity_framework"]
    },
    "microservices": {
        "indicators": ["services/", "api_gateway", "docker-compose"],
        "evidence": "multiple_main_entrypoints"
    },
    "event_driven": {
        "indicators": ["events/", "handlers/", "subscribers/"],
        "evidence": "message_queue_imports"
    },
    "layered": {
        "indicators": ["presentation/", "business/", "data/"],
        "evidence": "clear_layer_separation"
    }
}

Workflow

INPUT                    C3.x ANALYSIS                OUTPUT
─────────────────────────────────────────────────────────────
repo_path     ──────►    ┌──────────────┐
    or                   │  AST Parser  │
github_url               │  (9 langs)   │
                         └──────┬───────┘
                                │
              ┌─────────────────┼─────────────────┐
              │                 │                 │
       ┌──────▼──────┐   ┌──────▼──────┐   ┌──────▼──────┐
       │    C3.1     │   │    C3.2     │   │    C3.3     │
       │  Patterns   │   │   Tests     │   │   Guides    │
       └──────┬──────┘   └──────┬──────┘   └──────┬──────┘
              │                 │                 │
       ┌──────▼──────┐   ┌──────▼──────┐   ┌──────▼──────┐
       │    C3.4     │   │    C3.5     │   │    C3.7     │
       │   Config    │   │    Arch     │   │   Arch Pat  │
       └──────┬──────┘   └──────┬──────┘   └──────┬──────┘
              │                 │                 │
              └─────────────────┼─────────────────┘
                                │
                         ┌──────▼───────┐          ┌─────────────┐
                         │    Merge     │────────► │  SKILL.md   │
                         │  & Enhance   │          │  patterns/  │
                         └──────────────┘          │  examples/  │
                                                   │  guides/    │
                                                   │  api_ref/   │
                                                   └─────────────┘

Three-Stream GitHub Architecture

When analyzing GitHub repositories, extract three parallel streams:

Stream	Content	Purpose
Code	AST analysis, patterns, APIs	Technical implementation
Docs	README, CONTRIBUTING, docs/	Project documentation
Insights	Issues, PRs, stars, labels	Community knowledge

GITHUB_ANALYSIS = {
    "code_stream": {
        "enabled": True,
        "depth": "c3x",
        "components": ["patterns", "tests", "configs", "api"]
    },
    "docs_stream": {
        "enabled": True,
        "files": ["README.md", "CONTRIBUTING.md", "docs/**/*.md"],
        "extract_quick_start": True
    },
    "insights_stream": {
        "enabled": True,
        "max_issues": 100,
        "include_labels": True,
        "label_weight": 2.0  # 2x weight for keyword extraction
    }
}

Invocation

# Local codebase analysis
/agent codebase-skill-extractor "Analyze /path/to/project with full C3.x analysis"

# GitHub repository
/agent codebase-skill-extractor "Extract skill from github.com/facebook/react
  with three-stream analysis and deep pattern detection"

# Specific C3.x features
/agent codebase-skill-extractor "Detect design patterns in /path/to/project
  with full depth and AI enhancement"

# Output control
/agent codebase-skill-extractor "Generate skill from ./myproject
  output: ~/.coditect/skills/myproject/
  skip: [patterns]
  depth: basic"

Configuration

{
  "codebase_analysis": {
    "default_depth": "c3x",
    "languages": ["python", "javascript", "typescript"],
    "features": {
      "patterns": true,
      "tests": true,
      "guides": true,
      "configs": true,
      "architecture": true
    },
    "ai_enhancement": {
      "enabled": true,
      "model": "claude-sonnet-4",
      "mode": "local"
    },
    "github": {
      "fetch_metadata": true,
      "max_issues": 100,
      "include_releases": true
    }
  }
}

Output Structure

skill_name/
├── SKILL.md                    # 300+ lines, AI-enhanced
├── patterns/
│   ├── index.md                # Pattern summary
│   ├── singleton.md            # Individual pattern docs
│   ├── factory.md
│   └── observer.md
├── examples/
│   ├── from_tests/             # Extracted from tests
│   └── from_docs/              # Extracted from docstrings
├── guides/
│   ├── getting_started.md      # AI-enhanced how-to
│   ├── authentication.md
│   └── deployment.md
├── api_reference/
│   ├── index.md
│   └── modules/                # Per-module docs
├── architecture/
│   ├── overview.md             # Architectural summary
│   ├── dependency_graph.md     # Module dependencies
│   └── data_flow.md            # Data flow diagrams
└── metadata.json               # Analysis metadata

Quality Metrics

Metric	Target	Description
Pattern precision	87%+	Correctly identified patterns
Pattern recall	80%+	Found patterns vs actual
Example validity	100%	Examples compile/run
Guide completeness	5 sections	All AI enhancements
API coverage	90%+	Public APIs documented

CODITECT Improvements

Feature	Skill Seekers	CODITECT Improvement
Pattern detection	10 patterns	13 patterns + architectural
Languages	9	9 + multi-file analysis
Guide generation	Basic template	MoE-enhanced with quality gates
Config extraction	Extract only	Extract + security analysis
Architecture	Overview only	Full data flow + diagrams

When to Use This Agent

Use when:

Extracting skills from local codebases
Analyzing GitHub repositories for patterns and examples
Need C3.x analysis (patterns, tests, guides, configs, architecture)
Want API documentation generated from code
Need three-stream GitHub analysis (code + docs + insights)

Do NOT use when:

Analyzing documentation websites (use doc-to-skill-converter instead)
Combining docs + code sources (use skill-generator-orchestrator instead)
Repository requires authentication and no token available
Quick one-off code questions (just use Claude directly)
Repository is too large (>100K files) without filtering

Completion Checklist

Before marking this agent's task as complete, verify:

Repository cloned or local path validated
C3.1 Design patterns detected and documented
C3.2 Test examples extracted with quality filtering
C3.3 How-to guides generated with AI enhancement
C3.4 Configuration patterns extracted
C3.5 Architecture overview generated
C3.7 Architectural patterns detected
SKILL.md generated (300+ lines)
All output directories created

Success Output

When successful, this agent outputs:

✅ AGENT COMPLETE: codebase-skill-extractor

C3.x Analysis Summary:
- [x] C3.1 Patterns: 13 design patterns detected (87% precision)
- [x] C3.2 Tests: 45 examples extracted (100% valid)
- [x] C3.3 Guides: 8 how-to guides generated
- [x] C3.4 Config: 12 configuration patterns
- [x] C3.5 Arch: Overview + dependency graph
- [x] C3.7 Arch Patterns: microservices, event-driven

Outputs:
- ~/.coditect/skills/{name}/SKILL.md (342 lines)
- ~/.coditect/skills/{name}/patterns/
- ~/.coditect/skills/{name}/examples/from_tests/
- ~/.coditect/skills/{name}/guides/
- ~/.coditect/skills/{name}/api_reference/
- ~/.coditect/skills/{name}/architecture/
- ~/.coditect/skills/{name}/metadata.json

Quality Metrics:
- Pattern precision: 87%
- Pattern recall: 80%
- Example validity: 100%
- Guide completeness: 5 sections each
- API coverage: 94%

Failure Indicators

This agent has FAILED if:

❌ Repository not accessible (clone failed, path invalid)
❌ No supported language files found
❌ Zero patterns detected
❌ Zero test examples extracted
❌ SKILL.md under 100 lines
❌ API reference empty

Anti-Patterns (Avoid)

Anti-Pattern	Problem	Solution
Analyzing huge repos without filters	Memory exhaustion, timeouts	Use `languages` filter
Skipping test extraction	Missing real-world examples	Enable `features.tests: true`
Surface-only pattern detection	Low confidence (0.6)	Use `depth: full` for critical analysis
Ignoring GitHub insights	Missing community knowledge	Enable `three_stream: true`
No AI enhancement	Basic template output	Enable `ai_enhancement.enabled: true`
Analyzing minified code	Garbage patterns	Exclude `dist/`, `build/` directories

Verification

After execution, verify success:

# 1. Check output structure
find ~/.coditect/skills/{name}/ -type d

# 2. Verify SKILL.md length
wc -l ~/.coditect/skills/{name}/SKILL.md  # Should be 300+ lines

# 3. Check pattern detection
ls ~/.coditect/skills/{name}/patterns/

# 4. Validate examples compile
python3 -m py_compile ~/.coditect/skills/{name}/examples/from_tests/*.py 2>/dev/null

# 5. Check architecture output
cat ~/.coditect/skills/{name}/architecture/overview.md | head -30

# 6. Verify metadata
cat ~/.coditect/skills/{name}/metadata.json | python3 -m json.tool

Orchestrator: skill-generator-orchestrator
Companion: doc-to-skill-converter
Command: /skill-from-repo

Version: 1.0.0 | Created: 2026-01-23 | Author: CODITECT Team

Core Responsibilities

Analyze and assess framework requirements within the Framework domain
Provide expert guidance on codebase skill extractor best practices and standards
Generate actionable recommendations with implementation specifics
Validate outputs against CODITECT quality standards and governance requirements
Integrate findings with existing project plans and track-based task management

Capabilities

Analysis & Assessment

Systematic evaluation of framework artifacts, identifying gaps, risks, and improvement opportunities. Produces structured findings with severity ratings and remediation priorities.

Recommendation Generation

Creates actionable, specific recommendations tailored to the framework context. Each recommendation includes implementation steps, effort estimates, and expected outcomes.

Quality Validation

Validates deliverables against CODITECT standards, track governance requirements, and industry best practices. Ensures compliance with ADR decisions and component specifications.

Invocation Examples

Direct Agent Call

Task(subagent_type="codebase-skill-extractor",
     description="Brief task description",
     prompt="Detailed instructions for the agent")

Via CODITECT Command

/agent codebase-skill-extractor "Your task description here"

Via MoE Routing

/which Extracts skills from codebases using C3.x analysis including

Purpose​

C3.x Analysis Suite​

C3.1: Design Pattern Detection​

C3.2: Test Example Extraction​

C3.3: How-To Guide Generation​

C3.4: Configuration Pattern Extraction​

C3.5: Architectural Overview Generation​

C3.7: Architectural Pattern Detection​

Workflow​

Three-Stream GitHub Architecture​

Invocation​

Configuration​

Output Structure​

Quality Metrics​

CODITECT Improvements​

When to Use This Agent​

Completion Checklist​

Success Output​

Failure Indicators​

Anti-Patterns (Avoid)​

Verification​

Related Components​

Core Responsibilities​

Capabilities​

Analysis & Assessment​

Recommendation Generation​

Quality Validation​

Invocation Examples​

Direct Agent Call​

Via CODITECT Command​

Via MoE Routing​