Auto-Claude SPEC PIPELINE - Deep Technical Analysis

Analysis Date: December 22, 2025 Target System: Auto-Claude v2.x Spec Creation Pipeline Repository: /Users/halcasteel/PROJECTS/coditect-rollout-master/submodules/core/coditect-core/submodules/research/auto-claude/

Executive Summary

The Auto-Claude SPEC PIPELINE is a sophisticated dynamic multi-phase orchestration system that adaptively creates implementation specifications based on task complexity. It uses AI-driven complexity assessment to select between 3-8 phases, employs self-critique with extended thinking (ultrathink), and generates subtask-based implementation plans with verification strategies.

Key Innovation: Unlike static workflow systems, Auto-Claude assesses task complexity FIRST, then dynamically constructs the appropriate pipeline—ranging from a 3-phase quick spec for simple changes to an 8-phase rigorous process for complex integrations.

1. COMPLEXITY ASSESSMENT SYSTEM

1.1 Three-Tier Classification

Location: auto-claude/spec/complexity.py:17-23

class Complexity(Enum):
    """Task complexity tiers that determine which phases to run."""
    SIMPLE = "simple"      # 1-2 files, single service, no integrations
    STANDARD = "standard"  # 3-10 files, 1-2 services, minimal integrations
    COMPLEX = "complex"    # 10+ files, multiple services, external integrations

Why This Matters: The enum defines the entire pipeline branching logic. Each tier maps to a specific phase sequence.

1.2 Dual Assessment Strategy

AI-Based Assessment (Primary): auto-claude/spec/complexity.py:344-436

The system runs an AI agent with the complexity_assessor.md prompt that analyzes:

async def run_ai_complexity_assessment(
    spec_dir: Path,
    task_description: str,
    run_agent_fn,
) -> ComplexityAssessment | None:
    """Run AI agent to assess complexity. Returns None if it fails."""

Process:

Loads requirements.json with full user context (line 365-380)
Loads project_index.json if available (line 385-391)
Invokes complexity_assessor.md agent (line 394-397)
Parses AI output into structured ComplexityAssessment (line 399-430)
Returns assessment with recommended phases (line 427)

Heuristic Fallback: auto-claude/spec/complexity.py:79-341

If AI assessment fails, falls back to keyword-based heuristics:

class ComplexityAnalyzer:
    """Analyzes task description and context to determine complexity."""

    SIMPLE_KEYWORDS = [
        "fix", "typo", "update", "change", "rename", ...
    ]

    COMPLEX_KEYWORDS = [
        "integrate", "integration", "api", "sdk", "docker",
        "kubernetes", "deploy", "authentication", ...
    ]

Algorithm (lines 156-208):

Keyword frequency analysis (lines 164-172)
External integration detection via regex (lines 210-231)
Infrastructure change detection (lines 233-252)
File/service estimation (lines 254-293)
Confidence scoring (lines 295-341)

1.3 Assessment Output Structure

File: complexity_assessment.json saved at spec_dir/complexity_assessment.json

{
    "complexity": "simple|standard|complex",
    "confidence": 0.75,  # 0.0 to 1.0
    "reasoning": "2-3 sentence explanation",
    "signals": {
        "simple_keywords": 3,
        "complex_keywords": 0,
        "estimated_files": 2,
        "estimated_services": 1,
        ...
    },
    "recommended_phases": ["discovery", "quick_spec", "validation"],
    "flags": {
        "needs_research": false,
        "needs_self_critique": false
    }
}

Value Proposition: This structured output becomes the decision input for phase selection (line 47-76).

1.4 Dynamic Phase Selection

Location: auto-claude/spec/complexity.py:47-76

def phases_to_run(self) -> list[str]:
    """Return list of phase names to run based on complexity."""
    # If AI provided recommended phases, use those
    if self.recommended_phases:
        return self.recommended_phases

    # Otherwise fall back to default phase sets
    if self.complexity == Complexity.SIMPLE:
        return ["discovery", "historical_context", "quick_spec", "validation"]
    elif self.complexity == Complexity.STANDARD:
        # Standard can optionally include research if flagged
        phases = ["discovery", "historical_context", "requirements"]
        if self.needs_research:
            phases.append("research")
        phases.extend(["context", "spec_writing", "planning", "validation"])
        return phases
    else:  # COMPLEX
        return [
            "discovery", "historical_context", "requirements", "research",
            "context", "spec_writing", "self_critique", "planning", "validation",
        ]

Key Insight: The AI can override default phase sets by providing custom recommended_phases. This allows context-aware adaptation beyond the three-tier system.

CODITECT Enhancement Opportunity: This dynamic phase selection pattern could be adapted for workflow library tasks—assessing task complexity before selecting the appropriate skill/agent chain.

2. SPEC CREATION PIPELINE ORCHESTRATION

2.1 Orchestrator Architecture

Location: auto-claude/spec/pipeline/orchestrator.py:49-646

The SpecOrchestrator class coordinates the entire pipeline:

class SpecOrchestrator:
    """Orchestrates the spec creation process with dynamic complexity adaptation."""

    def __init__(
        self,
        project_dir: Path,
        task_description: str | None = None,
        model: str = "claude-sonnet-4-5-20250929",
        thinking_level: str = "medium",
        complexity_override: str | None = None,
        use_ai_assessment: bool = True,
    ):

Core Responsibilities:

Spec directory management (lines 87-104)
Agent runner coordination (lines 114-125)
Phase execution orchestration (lines 218-409)
Phase output compaction (lines 161-186)
Human review checkpoint (lines 408-409, 613-645)

2.2 Phase Execution Loop

Location: auto-claude/spec/pipeline/orchestrator.py:259-399

async def run(self, interactive: bool = True, auto_approve: bool = False) -> bool:
    """Run the spec creation process with dynamic phase selection."""

    # Fixed phases (always run first)
    # 1. Discovery
    result = await run_phase("discovery", phase_executor.phase_discovery)
    await self._store_phase_summary("discovery")

    # 2. Requirements
    result = await run_phase("requirements",
        lambda: phase_executor.phase_requirements(interactive))
    await self._store_phase_summary("requirements")

    # 3. Complexity Assessment
    result = await run_phase("complexity_assessment",
        lambda: self._phase_complexity_assessment_with_requirements())

    # Dynamic phases based on complexity
    all_phases_to_run = self.assessment.phases_to_run()
    phases_to_run = [p for p in all_phases_to_run
                     if p not in ["discovery", "requirements"]]

    for phase_name in phases_to_run:
        result = await run_phase(phase_name, all_phases[phase_name])
        if result.success:
            await self._store_phase_summary(phase_name)
        else:
            return False  # Stop on failure

    # Human review checkpoint
    return self._run_review_checkpoint(auto_approve)

Pattern: Fixed setup → Dynamic adaptation → Human gate

2.3 Phase Result Model

Location: auto-claude/spec/phases/models.py:11-19

@dataclass
class PhaseResult:
    """Result of a phase execution."""
    phase: str
    success: bool
    output_files: list[str]
    errors: list[str]
    retries: int

Retry Strategy: Each phase can retry up to MAX_RETRIES = 3 times (line 23).

2.4 Phase Compaction Strategy

Location: auto-claude/spec/pipeline/orchestrator.py:161-186

async def _store_phase_summary(self, phase_name: str) -> None:
    """Summarize and store phase output for subsequent phases."""
    try:
        # Gather outputs from this phase
        phase_output = gather_phase_outputs(self.spec_dir, phase_name)
        if not phase_output:
            return

        # Summarize the output (target: 500 words)
        summary = await summarize_phase_output(
            phase_name,
            phase_output,
            model="claude-sonnet-4-5-20250929",
            target_words=500,
        )

        if summary:
            self._phase_summaries[phase_name] = summary
    except Exception as e:
        print_status(f"Phase summarization skipped: {e}", "warning")

Why This Matters: As phases accumulate, context size grows. Summarization prevents token overflow while preserving critical information for subsequent phases.

Usage: Summaries are injected into subsequent agent prompts (line 151-158):

# Format prior phase summaries for context
prior_summaries = format_phase_summaries(self._phase_summaries)

return await runner.run_agent(
    prompt_file,
    additional_context,
    interactive,
    thinking_budget=thinking_budget,
    prior_phase_summaries=prior_summaries if prior_summaries else None,
)

3. SELF-CRITIQUE WITH ULTRATHINK

3.1 Critique System Architecture

Location: auto-claude/spec/critique.py:1-370

The self-critique system implements a mandatory quality gate before marking subtasks complete:

@dataclass
class CritiqueResult:
    """Result of a self-critique evaluation."""
    passes: bool
    issues: list[str] = field(default_factory=list)
    improvements_made: list[str] = field(default_factory=list)
    recommendations: list[str] = field(default_factory=list)

3.2 Critique Prompt Generation

Location: auto-claude/spec/critique.py:50-164

def generate_critique_prompt(
    subtask: dict,
    files_modified: list[str],
    patterns_from: list[str]
) -> str:
    """Generate a critique prompt for the agent to self-evaluate."""

Generates a structured checklist covering:

Code Quality Checklist (lines 78-104)
- Pattern adherence
- Error handling
- Code cleanliness
- Best practices
Implementation Completeness (lines 106-122)
- All expected files modified
- All expected files created
- Requirements fully met
Potential Issues Analysis (lines 124-132)
- Agent must list concerns honestly
Improvements Made (lines 134-140)
- Agent must fix issues before continuing
Final Verdict (lines 143-148)
- PROCEED: YES/NO
- REASON: Brief explanation
- CONFIDENCE: High/Medium/Low

3.3 Critique Response Parser

Location: auto-claude/spec/critique.py:167-261

def parse_critique_response(response: str) -> CritiqueResult:
    """Parse the agent's critique response into structured data."""

    # Extract PROCEED verdict
    proceed_match = re.search(
        r"\*\*PROCEED:\*\*\s*\[?\s*(YES|NO)", response, re.IGNORECASE
    )
    if proceed_match:
        passes = proceed_match.group(1).upper() == "YES"

    # Extract issues from Step 3
    issues_section = re.search(
        r"### STEP 3:.*?Potential Issues.*?\n\n(.*?)(?=###|\Z)",
        response, re.DOTALL | re.IGNORECASE,
    )
    # ... parse and filter issues ...

    # Extract improvements from Step 4
    improvements_section = re.search(
        r"### STEP 4:.*?Improvements Made.*?\n\n(.*?)(?=###|\Z)",
        response, re.DOTALL | re.IGNORECASE,
    )
    # ... parse and filter improvements ...

Value: Structured parsing ensures critique results are machine-readable for decision logic.

3.4 Critique Decision Logic

Location: auto-claude/spec/critique.py:264-283

def should_proceed(result: CritiqueResult) -> bool:
    """Determine if the subtask should be marked complete based on critique."""
    # Must pass the critique
    if not result.passes:
        return False

    # If there are unresolved issues, don't proceed
    if result.issues:
        return False

    return True

Rule: Subtask can ONLY proceed if:

Agent explicitly said PROCEED: YES
No unresolved issues remain

3.5 Spec-Level Self-Critique with Ultrathink

Prompt: auto-claude/prompts/spec_critic.md

Phase: Self-critique phase (runs only for COMPLEX tasks)

Key Instructions (lines 40-122):

## PHASE 1: DEEP ANALYSIS (USE EXTENDED THINKING)

**CRITICAL**: Use extended thinking for this phase. Think deeply about:

### 1.1: Technical Accuracy
Compare spec.md against research.json AND validate with Context7:
- **Package names**: Does spec use correct package names from research?
- **Import statements**: Do imports match researched API patterns?
- **API calls**: Do function signatures match documentation?
- **Configuration**: Are env vars and config options correct?

**USE CONTEXT7 TO VALIDATE TECHNICAL CLAIMS:**
If the spec mentions specific libraries or APIs, verify them against Context7:

Tool: mcp__context7__resolve-library-id
Tool: mcp__context7__get-library-docs

Process (lines 123-253):

Load all context (spec.md, research.json, requirements.json, context.json)
Deep analysis with extended thinking on 5 dimensions:
- Technical accuracy (package names, APIs, config)
- Completeness (requirements coverage, edge cases)
- Consistency (naming, paths, patterns)
- Feasibility (dependencies, infrastructure, order)
- Research alignment (verified info, gotchas, recommendations)
Catalog issues with severity (HIGH/MEDIUM/LOW) and categories
Fix issues directly in spec.md using edit commands
Create critique report (critique_report.json)
Verify fixes (validate markdown, check sections)

Extended Thinking Example (lines 300-318):

> "Looking at this spec.md, I need to deeply analyze it against the research findings...
>
> First, let me check all package names. The research says the package is [X],
> but the spec says [Y]. This is a mismatch that needs fixing.
>
> Let me also verify with Context7 - I'll look up the actual package name and
> API patterns to confirm...
> [Use mcp__context7__resolve-library-id to find the library]
> [Use mcp__context7__get-library-docs to check API patterns]
>
> Next, looking at the API patterns. The research shows initialization requires
> [steps], but the spec shows [different steps]. Let me cross-reference with
> Context7 documentation... Another issue confirmed.
> ...

CODITECT Enhancement Opportunity: This ultrathink self-critique pattern could be integrated as a quality gate before executing high-risk workflow steps.

4. SUBTASK-BASED IMPLEMENTATION PLANNING

4.1 Implementation Plan Model

Location: auto-claude/implementation_plan/plan.py:20-373

@dataclass
class ImplementationPlan:
    """Complete implementation plan for a feature/task."""
    feature: str
    workflow_type: WorkflowType = WorkflowType.FEATURE
    services_involved: list[str] = field(default_factory=list)
    phases: list[Phase] = field(default_factory=list)
    final_acceptance: list[str] = field(default_factory=list)

    # Status tracking (synced with UI)
    status: str | None = None              # backlog, in_progress, ai_review, human_review, done
    planStatus: str | None = None          # pending, in_progress, review, completed
    qa_signoff: dict | None = None

Key Methods:

Dependency Resolution (lines 178-189):

def get_available_phases(self) -> list[Phase]:
    """Get phases whose dependencies are satisfied."""
    completed_phases = {p.phase for p in self.phases if p.is_complete()}
    available = []

    for phase in self.phases:
        if phase.is_complete():
            continue
        deps_met = all(d in completed_phases for d in phase.depends_on)
        if deps_met:
            available.append(phase)

    return available

Next Subtask Selection (lines 192-198):

def get_next_subtask(self) -> tuple[Phase, Subtask] | None:
    """Get the next subtask to work on, respecting dependencies."""
    for phase in self.get_available_phases():
        pending = phase.get_pending_subtasks()
        if pending:
            return phase, pending[0]
    return None

Progress Tracking (lines 200-228):

def get_progress(self) -> dict:
    """Get overall progress statistics."""
    total_subtasks = sum(len(p.subtasks) for p in self.phases)
    done_subtasks = sum(
        1 for p in self.phases for s in p.subtasks
        if s.status == SubtaskStatus.COMPLETED
    )
    # ... calculate percentages, failed counts, etc.

Follow-Up Phase Addition (lines 259-316):

def add_followup_phase(
    self,
    name: str,
    subtasks: list[Subtask],
    phase_type: PhaseType = PhaseType.IMPLEMENTATION,
    parallel_safe: bool = False,
) -> Phase:
    """Add a new follow-up phase to an existing (typically completed) plan.

    This allows users to extend completed builds with additional work.
    The new phase depends on all existing phases to ensure proper sequencing.
    """

Value: Enables iterative enhancement of completed features without rebuilding plans from scratch.

4.2 Subtask Model

Location: auto-claude/implementation_plan/subtask.py:17-133

@dataclass
class Subtask:
    """A single unit of implementation work."""
    id: str
    description: str
    status: SubtaskStatus = SubtaskStatus.PENDING

    # Scoping
    service: str | None = None              # Which service (backend, frontend, worker)
    all_services: bool = False              # True for integration subtasks

    # Files
    files_to_modify: list[str] = field(default_factory=list)
    files_to_create: list[str] = field(default_factory=list)
    patterns_from: list[str] = field(default_factory=list)

    # Verification
    verification: Verification | None = None

    # For investigation subtasks
    expected_output: str | None = None      # Knowledge/decision output
    actual_output: str | None = None        # What was discovered

    # Self-Critique
    critique_result: dict | None = None     # Results from self-critique before completion

Key Methods:

Start Tracking (lines 107-114):

def start(self, session_id: int):
    """Mark subtask as in progress."""
    self.status = SubtaskStatus.IN_PROGRESS
    self.started_at = datetime.now().isoformat()
    self.session_id = session_id
    # Clear stale data from previous runs to ensure clean state
    self.completed_at = None
    self.actual_output = None

Completion (lines 116-121):

def complete(self, output: str | None = None):
    """Mark subtask as done."""
    self.status = SubtaskStatus.COMPLETED
    self.completed_at = datetime.now().isoformat()
    if output:
        self.actual_output = output

Failure (lines 123-128):

def fail(self, reason: str | None = None):
    """Mark subtask as failed."""
    self.status = SubtaskStatus.FAILED
    self.completed_at = None  # Clear to maintain consistency (failed != completed)
    if reason:
        self.actual_output = f"FAILED: {reason}"

4.3 Verification Strategy Model

Location: auto-claude/implementation_plan/verification.py:14-53

@dataclass
class Verification:
    """How to verify a subtask is complete."""
    type: VerificationType
    run: str | None = None              # Command to run
    url: str | None = None              # URL for API/browser tests
    method: str | None = None           # HTTP method for API tests
    expect_status: int | None = None    # Expected HTTP status
    expect_contains: str | None = None  # Expected content
    scenario: str | None = None         # Description for browser/manual tests

Verification Types:

class VerificationType(Enum):
    """Types of verification methods."""
    COMMAND = "command"        # Run shell command, check exit code
    API = "api"               # HTTP request, check status/content
    BROWSER = "browser"       # Manual browser test with scenario
    MANUAL = "manual"         # Human verification with instructions
    NONE = "none"            # No automated verification

Why This Matters: Each subtask has a built-in verification strategy, enabling automated QA checkpoints throughout implementation.

5. PLANNER AGENT PROMPT ARCHITECTURE

5.1 Phase 0: Deep Codebase Investigation (MANDATORY)

Location: auto-claude/prompts/planner.md:21-67

## PHASE 0: DEEP CODEBASE INVESTIGATION (MANDATORY)

**CRITICAL**: Before ANY planning, you MUST thoroughly investigate the existing
codebase. Poor investigation leads to plans that don't match the codebase's
actual patterns.

### 0.1: Understand Project Structure

find . -type f -name "*.py" -o -name "*.ts" -o -name "*.tsx" -o -name "*.js" | head -100

Identify:
- Main entry points (main.py, app.py, index.ts, etc.)
- Configuration files (settings.py, config.py, .env.example)
- Directory organization patterns

### 0.2: Analyze Existing Patterns for the Feature

**This is the most important step.** For whatever feature you're building,
find SIMILAR existing features:

# Example: If building "caching", search for existing cache implementations
grep -r "cache" --include="*.py" . | head -30
grep -r "redis\|memcache\|lru_cache" --include="*.py" . | head -30

**YOU MUST READ AT LEAST 3 PATTERN FILES** before planning

Pattern: Investigation → Pattern discovery → Plan creation

Why This Matters: Forces the agent to ground its plan in actual codebase patterns, preventing "generic" plans that don't match project conventions.

5.2 Context File Creation

Location: auto-claude/prompts/planner.md:89-157

The planner creates two critical context files:

project_index.json (lines 93-119):

{
  "project_type": "single|monorepo",
  "services": {
    "backend": {
      "path": ".",
      "tech_stack": ["python", "fastapi"],
      "port": 8000,
      "dev_command": "uvicorn main:app --reload",
      "test_command": "pytest"
    }
  },
  "infrastructure": {
    "docker": false,
    "database": "postgresql"
  }
}

context.json (lines 130-157):

{
  "files_to_modify": {
    "backend": ["app/services/existing_service.py", "app/routes/api.py"]
  },
  "files_to_reference": ["app/services/similar_service.py"],
  "patterns": {
    "service_pattern": "All services inherit from BaseService",
    "route_pattern": "Routes use APIRouter with prefix and tags"
  },
  "existing_implementations": {
    "description": "Found existing caching in app/utils/cache.py using Redis",
    "relevant_files": ["app/utils/cache.py", "app/config.py"]
  }
}

Value: These files become inputs for subsequent phases, ensuring consistency across the pipeline.

5.3 Workflow Type Awareness

Location: auto-claude/prompts/planner.md:161-200

The planner prompt defines 5 workflow types with distinct phase structures:

FEATURE (multi-service):
- Phase 1: Backend/API (testable with curl)
- Phase 2: Worker (background jobs)
- Phase 3: Frontend (UI components)
- Phase 4: Integration (wire everything)
REFACTOR (stage-based):
- Phase 1: Add New (build alongside old)
- Phase 2: Migrate (move consumers)
- Phase 3: Remove Old (delete deprecated)
- Phase 4: Cleanup (polish)
INVESTIGATION (bug hunting):
- Phase 1: Reproduce (logging, reproduction)
- Phase 2: Investigate (root cause → OUTPUT REQUIRED)
- Phase 3: Fix (BLOCKED until phase 2)
- Phase 4: Harden (tests, prevention)
MIGRATION (data pipeline):
- Phase 1: Prepare (scripts, setup)
- Phase 2: Test (small batch)
- Phase 3: Execute (full migration)
- Phase 4: Cleanup (verify, remove old)
SIMPLE (single-service quick):
- No phases, just subtasks

Pattern: Workflow type determines phase structure and dependency graph.

5.4 Implementation Plan Structure

Location: auto-claude/prompts/planner.md:215-264

{
  "feature": "Short descriptive name",
  "workflow_type": "feature|refactor|investigation|migration|simple",
  "workflow_rationale": "Why this workflow type was chosen",
  "phases": [
    {
      "id": "phase-1-backend",
      "name": "Backend API",
      "type": "implementation",
      "depends_on": [],
      "parallel_safe": true,
      "subtasks": [
        {
          "id": "subtask-1-1",
          "description": "Create data models for [feature]",
          "service": "backend",
          "files_to_modify": ["src/models/user.py"],
          "files_to_create": ["src/models/analytics.py"],
          "patterns_from": ["src/models/existing_model.py"],
          "verification": {
            "type": "command",
            "command": "python -c \"from src.models.analytics import Analytics\"",
            "expected": "OK"
          },
          "status": "pending"
        }
      ]
    }
  ]
}

Critical Requirements (lines 205-214):

MUST use Write tool to create implementation_plan.json
Plan structure matches ImplementationPlan dataclass
Each subtask includes verification strategy
Phase dependencies explicitly defined

6. COMPLEXITY ASSESSOR PROMPT ARCHITECTURE

6.1 Assessment Criteria

Location: auto-claude/prompts/complexity_assessor.md:103-132

The prompt defines 5 analysis dimensions:

Scope Analysis (lines 107-110):
- How many files will likely be touched?
- How many services are involved?
- Is this localized or cross-cutting?
Integration Analysis (lines 112-116):
- External services/APIs involved?
- New dependencies needed?
- Research required for correct usage?
Infrastructure Analysis (lines 118-122):
- Docker/container changes?
- Database schema changes?
- Environment configuration changes?
- Deployment considerations?
Knowledge Analysis (lines 124-127):
- Codebase has patterns for this?
- External docs research needed?
- Unfamiliar technologies involved?
Risk Analysis (lines 129-132):
- What could go wrong?
- Security considerations?
- Breaking existing functionality?

6.2 Decision Flowchart

Location: auto-claude/prompts/complexity_assessor.md:394-418

START
  │
  ├─► Are there 2+ external integrations OR unfamiliar technologies?
  │     YES → COMPLEX (needs research + critique)
  │     NO ↓
  │
  ├─► Are there infrastructure changes (Docker, DB, new services)?
  │     YES → COMPLEX (needs research + critique)
  │     NO ↓
  │
  ├─► Is there 1 external integration that needs research?
  │     YES → STANDARD + research phase
  │     NO ↓
  │
  ├─► Will this touch 3+ files across 1-2 services?
  │     YES → STANDARD
  │     NO ↓
  │
  └─► SIMPLE (1-2 files, single service, no integrations)

Pattern: Progressive elimination from high to low complexity.

6.3 Validation Recommendations

Location: auto-claude/prompts/complexity_assessor.md:261-391

The assessor also generates validation depth recommendations:

"validation_recommendations": {
  "risk_level": "trivial|low|medium|high|critical",
  "skip_validation": false,
  "minimal_mode": false,
  "test_types_required": ["unit", "integration", "e2e"],
  "security_scan_required": true,
  "staging_deployment_required": false,
  "reasoning": "1-2 sentences explaining validation depth choice"
}

Risk Levels:

Risk	When	Validation Depth
TRIVIAL	Docs-only, comments	Skip validation
LOW	Single service, <5 files	Unit tests only
MEDIUM	Multiple files, API changes	Unit + Integration
HIGH	DB changes, auth/security	Unit + Integration + E2E + Security scan
CRITICAL	Payments, data deletion	All above + Manual + Staging

Value: Tailors QA rigor to actual risk, avoiding over-testing simple changes and under-testing critical ones.

6.4 Example Assessments

Simple Task (lines 423-467):

{
  "complexity": "simple",
  "workflow_type": "simple",
  "reasoning": "Single file UI change with no dependencies or infrastructure impact.",
  "recommended_phases": ["discovery", "quick_spec", "validation"],
  "validation_recommendations": {
    "risk_level": "low",
    "minimal_mode": true,
    "test_types_required": ["unit"]
  }
}

Complex Integration (lines 589-647):

{
  "complexity": "complex",
  "workflow_type": "feature",
  "reasoning": "Multiple integrations (Graphiti, FalkorDB), infrastructure changes
                (Docker Compose), and new architectural pattern (optional memory layer)",
  "recommended_phases": [
    "discovery", "requirements", "research", "context",
    "spec_writing", "self_critique", "planning", "validation"
  ],
  "flags": {
    "needs_research": true,
    "needs_self_critique": true,
    "needs_infrastructure_setup": true
  },
  "validation_recommendations": {
    "risk_level": "high",
    "test_types_required": ["unit", "integration", "e2e"],
    "security_scan_required": true,
    "staging_deployment_required": true
  }
}

7. SPEC CRITIC PROMPT ARCHITECTURE

7.1 Critique Categories

Location: auto-claude/prompts/spec_critic.md:288-294

- **Accuracy**: Technical correctness (packages, APIs, config)
- **Completeness**: Coverage of requirements and edge cases
- **Consistency**: Internal coherence of the document
- **Feasibility**: Practical implementability
- **Alignment**: Match with research findings

7.2 Severity Guidelines

Location: auto-claude/prompts/spec_critic.md:266-284

HIGH - Will cause implementation failure:

Wrong package names
Incorrect API signatures
Missing critical requirements
Invalid configuration

MEDIUM - May cause issues:

Missing edge cases
Incomplete error handling
Unclear integration points
Inconsistent patterns

LOW - Minor improvements:

Terminology inconsistencies
Documentation gaps
Style issues
Minor optimizations

7.3 Context7 Integration for Validation

Location: auto-claude/prompts/spec_critic.md:55-77

**USE CONTEXT7 TO VALIDATE TECHNICAL CLAIMS:**

If the spec mentions specific libraries or APIs, verify them against Context7:

# Step 1: Resolve library ID
Tool: mcp__context7__resolve-library-id
Input: { "libraryName": "[library from spec]" }

# Step 2: Verify API patterns mentioned in spec
Tool: mcp__context7__get-library-docs
Input: {
  "context7CompatibleLibraryID": "[library-id]",
  "topic": "[specific API or feature mentioned in spec]",
  "mode": "code"
}

**Check for common spec errors:**
- Wrong package name (e.g., "react-query" vs "@tanstack/react-query")
- Outdated API patterns (e.g., using deprecated functions)
- Incorrect function signatures (e.g., wrong parameter order)
- Missing required configuration (e.g., missing env vars)

Pattern: External documentation lookup for technical validation beyond what's in the research phase.

8. KEY PATTERNS FOR CODITECT INTEGRATION

8.1 Pattern: Adaptive Pipeline Selection

What: System assesses task characteristics BEFORE selecting workflow

Implementation:

Analyze task with AI agent (complexity_assessor.md)
Generate structured assessment (complexity_assessment.json)
Map assessment to phase list (phases_to_run())
Execute only necessary phases

CODITECT Application:

Add complexity assessment to /new-project workflow
Simple projects → streamlined agent chain
Complex projects → full multi-agent orchestration
Save assessment for future reference

Files to Reference:

auto-claude/spec/complexity.py - Assessment logic
auto-claude/prompts/complexity_assessor.md - AI assessment prompt

8.2 Pattern: Self-Critique Quality Gates

What: Agent must self-review before marking work complete

Implementation:

Generate critique prompt with checklist (generate_critique_prompt())
Agent evaluates own work against criteria
Parse response for issues (parse_critique_response())
Block completion if issues exist (should_proceed())

CODITECT Application:

Add critique step to high-stakes skills (infrastructure, security)
Generate checklists based on skill requirements
Parse and store critique results for audit trail

Files to Reference:

auto-claude/spec/critique.py - Critique system
auto-claude/prompts/spec_critic.md - Ultrathink critique prompt

8.3 Pattern: Dependency-Aware Subtask Execution

What: Automatically determine which tasks can run based on completion state

Implementation:

Define phase dependencies (depends_on field)
Track completion state (status field)
Query available phases (get_available_phases())
Select next subtask (get_next_subtask())

CODITECT Application:

Implement dependency DAG for complex workflows
Track workflow progress with subtask granularity
Enable parallel execution where safe
Auto-resume workflows from last completed step

Files to Reference:

auto-claude/implementation_plan/plan.py - Dependency resolution
auto-claude/implementation_plan/subtask.py - Status tracking

8.4 Pattern: Verification Strategy Per Task

What: Each unit of work defines how to verify its completion

Implementation:

Define verification type (command, API, browser, manual)
Specify verification parameters (URL, expected output, etc.)
Execute verification after task completion
Block progress if verification fails

CODITECT Application:

Add verification fields to workflow subtasks
Auto-generate verification commands based on task type
Create verification report for audit/compliance

Files to Reference:

auto-claude/implementation_plan/verification.py - Verification model
auto-claude/prompts/planner.md:215-264 - Verification in plans

8.5 Pattern: Phase Output Compaction

What: Summarize completed phases to prevent context overflow

Implementation:

Gather outputs from completed phase (gather_phase_outputs())
Summarize with AI to target word count (500 words)
Store summary in phase dictionary (_phase_summaries)
Inject summaries into subsequent agent prompts

CODITECT Application:

Summarize long-running agent outputs
Provide compact context for downstream agents
Maintain critical information while reducing token usage

Files to Reference:

auto-claude/spec/pipeline/orchestrator.py:161-186 - Compaction logic
auto-claude/spec/compaction.py - Summarization functions

8.6 Pattern: Workflow Type Awareness

What: Different types of work (feature, refactor, investigation) follow different patterns

Implementation:

Define workflow types (FEATURE, REFACTOR, INVESTIGATION, MIGRATION, SIMPLE)
Map each type to appropriate phase structure
Use workflow type to determine phase dependencies
Tailor subtask templates to workflow

CODITECT Application:

Extend workflow library with workflow type metadata
Auto-select appropriate agent sequence based on type
Customize verification strategies per workflow type

Files to Reference:

auto-claude/prompts/planner.md:161-200 - Workflow type definitions
auto-claude/implementation_plan/plan.py:25 - WorkflowType enum

8.7 Pattern: Follow-Up Phase Addition

What: Extend completed plans with additional work without rebuilding

Implementation:

Load completed plan
Create new phase with subtasks
Set new phase to depend on all existing phases
Reset plan status to in_progress
Save updated plan

CODITECT Application:

Enable iterative feature enhancement
Add follow-up tasks after initial project completion
Maintain dependency chain across iterations

Files to Reference:

auto-claude/implementation_plan/plan.py:259-316 - add_followup_phase()
auto-claude/implementation_plan/plan.py:318-372 - reset_for_followup()

8.8 Pattern: Heuristic + AI Hybrid Assessment

What: Use fast heuristics as fallback when AI assessment fails

Implementation:

Attempt AI assessment first
If AI fails or returns None, fall back to heuristics
Both methods return same ComplexityAssessment structure
Continue pipeline regardless of assessment method

CODITECT Application:

Implement graceful degradation for agent failures
Use heuristics for offline/low-bandwidth scenarios
A/B test AI vs heuristic accuracy

Files to Reference:

auto-claude/spec/complexity.py:344-436 - AI assessment
auto-claude/spec/complexity.py:79-341 - Heuristic analyzer

9. ARCHITECTURAL INSIGHTS

9.1 Separation of Concerns

Orchestrator (coordination) vs PhaseExecutor (execution):

SpecOrchestrator
├── Determines which phases to run
├── Manages phase summaries (compaction)
├── Handles human review checkpoint
└── Coordinates AgentRunner

PhaseExecutor (composed of 4 mixins)
├── DiscoveryPhaseMixin - Discovery, context gathering
├── RequirementsPhaseMixin - Requirements, research, historical context
├── SpecPhaseMixin - Spec writing, self-critique
└── PlanningPhaseMixin - Implementation planning, validation

Value: Clean separation enables independent evolution of orchestration logic vs phase implementation.

9.2 Mixin-Based Phase Execution

Location: auto-claude/spec/phases/executor.py:19-33

class PhaseExecutor(
    DiscoveryPhaseMixin,
    RequirementsPhaseMixin,
    SpecPhaseMixin,
    PlanningPhaseMixin,
):
    """Executes individual phases of spec creation."""

Pattern: Each mixin implements related phases, all composed into single executor.

Benefits:

Phase implementations grouped by concern
Easy to add new phase types
Shared state via base class
Single entry point for orchestrator

CODITECT Application: Similar mixin pattern could organize CODITECT skills by domain (data_engineering, ml_ops, infrastructure).

9.3 File-Based Agent Communication

Pattern: Agents communicate via JSON files in spec directory, not in-memory state.

Inputs → Phase → Outputs:

requirements.json → Complexity Assessment → complexity_assessment.json
requirements.json + project_index.json → Research → research.json
research.json + requirements.json → Spec Writing → spec.md
spec.md + research.json → Self-Critique → critique_report.json + spec.md (updated)
spec.md + context.json → Planning → implementation_plan.json

Benefits:

Inspectable intermediate results
Easy debugging (read files directly)
Resumable pipelines (restart from any phase)
Persistence across sessions

CODITECT Application: Adopt file-based agent communication for workflow library to enable inspection and recovery.

9.4 Retry with Backoff

Location: auto-claude/spec/phases/models.py:22-23

# Maximum retry attempts for phase execution
MAX_RETRIES = 3

Each phase can retry up to 3 times. Failures accumulate in PhaseResult.errors list.

Pattern: Transient failures (API timeouts, temporary issues) don't fail the entire pipeline.

CODITECT Application: Add retry logic to network-dependent skills (API calls, database queries).

9.5 Human Review Checkpoint

Location: auto-claude/spec/pipeline/orchestrator.py:613-645

After all automated phases complete, a mandatory human review runs (unless --auto-approve):

def _run_review_checkpoint(self, auto_approve: bool) -> bool:
    """Run the human review checkpoint."""
    review_state = run_review_checkpoint(
        spec_dir=self.spec_dir,
        auto_approve=auto_approve,
    )

    if not review_state.is_approved():
        print_status("Build will not proceed without approval.", "warning")
        return False

    return True

Pattern: Human-in-the-loop gate before high-cost implementation phase.

CODITECT Application: Add human approval gates before:

Database migrations
Production deployments
High-cost operations (large ML training jobs)

10. PROMPT ENGINEERING INSIGHTS

10.1 Structured Output Requirements

All agent prompts enforce structured output via explicit file creation instructions:

Example: complexity_assessor.md (lines 190-257)

Create `complexity_assessment.json`:

cat > complexity_assessment.json << 'EOF'
{
  "complexity": "[simple|standard|complex]",
  "confidence": [0.0-1.0],
  ...
}
EOF

Pattern: Template + Instructions + Example = Consistent structured output

10.2 Mandatory Investigation Phase

Location: planner.md (lines 21-67)

## PHASE 0: DEEP CODEBASE INVESTIGATION (MANDATORY)

**CRITICAL**: Before ANY planning, you MUST thoroughly investigate the existing
codebase. Poor investigation leads to plans that don't match the codebase's
actual patterns.

**YOU MUST READ AT LEAST 3 PATTERN FILES** before planning

Pattern: Force information gathering before generation to ground outputs in reality.

CODITECT Application: Require codebase analysis before generating new code in workflows.

10.3 Checklist-Driven Critique

Location: critique.py (lines 78-104)

Critique prompt includes explicit checklists:

### STEP 1: Code Quality Checklist

**Pattern Adherence:**
- [ ] Follows patterns from reference files exactly
- [ ] Variable naming matches codebase conventions
- [ ] Imports organized correctly
...

Pattern: Checkboxes create psychological commitment to thorough review.

CODITECT Application: Use checklist prompts for quality gates in critical workflows.

10.4 Extended Thinking for Complex Analysis

Location: spec_critic.md (lines 40-77, 299-318)

## PHASE 1: DEEP ANALYSIS (USE EXTENDED THINKING)

**CRITICAL**: Use extended thinking for this phase.

Pattern: Explicitly request extended thinking for high-stakes analysis phases.

CODITECT Application: Request extended thinking for:

Architecture decisions
Security reviews
Performance optimization analysis

10.5 Context7 Integration for Technical Validation

Location: spec_critic.md (lines 55-77)

Critic prompt uses external documentation lookup:

Tool: mcp__context7__resolve-library-id
Tool: mcp__context7__get-library-docs

Pattern: Combine internal knowledge with external authoritative sources.

CODITECT Application: Integrate Context7 or similar documentation tools for technical skill validation.

11. QUANTITATIVE METRICS

11.1 Phase Count by Complexity

Complexity	Phase Count	Phases
SIMPLE	3-4	discovery, [historical_context], quick_spec, validation
STANDARD	6-7	discovery, [historical_context], requirements, [research], context, spec_writing, planning, validation
COMPLEX	8-9	discovery, [historical_context], requirements, research, context, spec_writing, self_critique, planning, validation

Note: historical_context is optional based on Graphiti availability.

11.2 Assessment Criteria Count

Complexity Analyzer:

25 SIMPLE_KEYWORDS
16 COMPLEX_KEYWORDS
11 MULTI_SERVICE_KEYWORDS
11 integration regex patterns
11 infrastructure regex patterns

AI Assessment Dimensions:

5 analysis categories (scope, integrations, infrastructure, knowledge, risk)
5 risk levels (trivial, low, medium, high, critical)

11.3 Critique Checklist Items

Self-Critique Prompt:

4 pattern adherence checks
3 error handling checks
4 code cleanliness checks
4 best practices checks
3 implementation completeness checks

Total: 18 explicit quality checks per subtask

11.4 Verification Types

5 verification types (command, api, browser, manual, none)
6 verification parameters (run, url, method, expect_status, expect_contains, scenario)

12. FILE REFERENCES SUMMARY

Core Pipeline Files

File	Lines	Purpose
`spec/complexity.py`	467	Complexity assessment (AI + heuristic)
`spec/critique.py`	370	Self-critique system
`spec/discovery.py`	78	Project structure discovery
`spec/pipeline/orchestrator.py`	687	Main pipeline orchestration
`spec/pipeline/models.py`	264	Pipeline data structures
`spec/phases/executor.py`	77	Phase execution coordinator
`spec/phases/models.py`	24	Phase result model

Implementation Plan Files

File	Lines	Purpose
`implementation_plan/plan.py`	373	Plan model with dependency resolution
`implementation_plan/subtask.py`	133	Subtask tracking and status
`implementation_plan/verification.py`	54	Verification strategy model

Prompt Files

File	Lines	Purpose
`prompts/complexity_assessor.md`	676	AI complexity assessment
`prompts/spec_critic.md`	325	Spec self-critique with ultrathink
`prompts/planner.md`	300+	Implementation plan generation

Runner Files

File	Lines	Purpose
`runners/spec_runner.py`	200+	CLI entry point for spec creation

13. RECOMMENDATIONS FOR CODITECT INTEGRATION

13.1 High-Priority Patterns

Adaptive Pipeline Selection (Priority: HIGH)
- Add complexity assessment to /new-project
- Map complexity to agent chain selection
- File: spec/complexity.py + prompts/complexity_assessor.md
Self-Critique Quality Gates (Priority: HIGH)
- Add critique step to security/infrastructure skills
- Generate skill-specific checklists
- File: spec/critique.py
Subtask-Based Execution (Priority: MEDIUM)
- Implement dependency DAG for complex workflows
- Enable parallel subtask execution where safe
- File: implementation_plan/plan.py
Verification Strategies (Priority: MEDIUM)
- Add verification metadata to workflow subtasks
- Auto-generate verification commands
- File: implementation_plan/verification.py

13.2 Medium-Priority Patterns

Phase Output Compaction (Priority: MEDIUM)
- Summarize long agent outputs
- Inject summaries into downstream agents
- File: spec/compaction.py (referenced, not analyzed in detail)
Workflow Type Awareness (Priority: MEDIUM)
- Add workflow type metadata to workflow library
- Customize agent sequences per type
- File: prompts/planner.md (workflow types section)
Human Review Checkpoints (Priority: LOW)
- Add approval gates before high-risk operations
- File: spec/pipeline/orchestrator.py:613-645

13.3 Implementation Guidance

For Complexity Assessment:

Create CODITECT skill: /assess-task-complexity
Adapt complexity_assessor.md prompt for CODITECT context
Store assessment in .coditect/task-assessment.json
Use assessment to select agent chain from workflow library

For Self-Critique:

Create critique-skill-execution helper
Generate checklists based on skill requirements
Parse critique responses with regex (adapt from critique.py)
Block skill completion if critique fails

For Subtask Execution:

Extend workflow library format with subtask dependencies
Implement get_next_subtask() logic for workflow execution
Track subtask status in .coditect/workflow-progress.json
Enable resume from last completed subtask

14. CONCLUSION

The Auto-Claude SPEC PIPELINE demonstrates adaptive orchestration at scale. By assessing task complexity FIRST and then dynamically constructing the appropriate workflow, it achieves both efficiency (3-phase pipeline for simple tasks) and rigor (8-phase pipeline with self-critique for complex integrations).

Key Takeaways for CODITECT:

Assessment Before Execution: Don't assume one workflow fits all—assess first, adapt second.
Self-Critique as Quality Gate: Agents can and should evaluate their own work before marking tasks complete.
Dependency-Aware Execution: Track dependencies explicitly to enable safe parallelism and automatic unblocking.
Verification Per Task: Each unit of work should define how to verify its completion.
File-Based Communication: Intermediate outputs as files enable inspection, debugging, and resumption.
Workflow Type Taxonomy: Different types of work (feature, refactor, investigation) follow different patterns—codify these patterns.
Human-in-the-Loop Gates: Automate aggressively, but gate high-cost or high-risk operations with human approval.

Next Steps:

Pilot complexity assessment for /new-project workflow
Add self-critique to infrastructure and security skills
Extend workflow library with dependency metadata
Implement verification command generation

Contact: This analysis prepared for CODITECT workflow library enhancement.

Executive Summary​

1. COMPLEXITY ASSESSMENT SYSTEM​

1.1 Three-Tier Classification​

1.2 Dual Assessment Strategy​

1.3 Assessment Output Structure​

1.4 Dynamic Phase Selection​

2. SPEC CREATION PIPELINE ORCHESTRATION​

2.1 Orchestrator Architecture​

2.2 Phase Execution Loop​

2.3 Phase Result Model​

2.4 Phase Compaction Strategy​

3. SELF-CRITIQUE WITH ULTRATHINK​

3.1 Critique System Architecture​

3.2 Critique Prompt Generation​

3.3 Critique Response Parser​

3.4 Critique Decision Logic​

3.5 Spec-Level Self-Critique with Ultrathink​

4. SUBTASK-BASED IMPLEMENTATION PLANNING​

4.1 Implementation Plan Model​

4.2 Subtask Model​

4.3 Verification Strategy Model​

5. PLANNER AGENT PROMPT ARCHITECTURE​

5.1 Phase 0: Deep Codebase Investigation (MANDATORY)​

5.2 Context File Creation​

5.3 Workflow Type Awareness​

5.4 Implementation Plan Structure​

6. COMPLEXITY ASSESSOR PROMPT ARCHITECTURE​

6.1 Assessment Criteria​

6.2 Decision Flowchart​

6.3 Validation Recommendations​

6.4 Example Assessments​

7. SPEC CRITIC PROMPT ARCHITECTURE​

7.1 Critique Categories​

7.2 Severity Guidelines​

7.3 Context7 Integration for Validation​

8. KEY PATTERNS FOR CODITECT INTEGRATION​

8.1 Pattern: Adaptive Pipeline Selection​

8.2 Pattern: Self-Critique Quality Gates​

8.3 Pattern: Dependency-Aware Subtask Execution​

8.4 Pattern: Verification Strategy Per Task​

8.5 Pattern: Phase Output Compaction​

8.6 Pattern: Workflow Type Awareness​

8.7 Pattern: Follow-Up Phase Addition​

8.8 Pattern: Heuristic + AI Hybrid Assessment​

9. ARCHITECTURAL INSIGHTS​

9.1 Separation of Concerns​

9.2 Mixin-Based Phase Execution​

9.3 File-Based Agent Communication​

9.4 Retry with Backoff​

9.5 Human Review Checkpoint​

10. PROMPT ENGINEERING INSIGHTS​

10.1 Structured Output Requirements​

10.2 Mandatory Investigation Phase​

10.3 Checklist-Driven Critique​

10.4 Extended Thinking for Complex Analysis​

10.5 Context7 Integration for Technical Validation​

11. QUANTITATIVE METRICS​

11.1 Phase Count by Complexity​

11.2 Assessment Criteria Count​

11.3 Critique Checklist Items​

11.4 Verification Types​

12. FILE REFERENCES SUMMARY​

Core Pipeline Files​

Implementation Plan Files​

Prompt Files​

Runner Files​

13. RECOMMENDATIONS FOR CODITECT INTEGRATION​

13.1 High-Priority Patterns​

13.2 Medium-Priority Patterns​

13.3 Implementation Guidance​

14. CONCLUSION​

End of Analysis​

Executive Summary

1. COMPLEXITY ASSESSMENT SYSTEM

1.1 Three-Tier Classification

1.2 Dual Assessment Strategy

1.3 Assessment Output Structure

1.4 Dynamic Phase Selection

2. SPEC CREATION PIPELINE ORCHESTRATION

2.1 Orchestrator Architecture

2.2 Phase Execution Loop

2.3 Phase Result Model

2.4 Phase Compaction Strategy

3. SELF-CRITIQUE WITH ULTRATHINK

3.1 Critique System Architecture

3.2 Critique Prompt Generation

3.3 Critique Response Parser

3.4 Critique Decision Logic

3.5 Spec-Level Self-Critique with Ultrathink

4. SUBTASK-BASED IMPLEMENTATION PLANNING

4.1 Implementation Plan Model

4.2 Subtask Model

4.3 Verification Strategy Model

5. PLANNER AGENT PROMPT ARCHITECTURE

5.1 Phase 0: Deep Codebase Investigation (MANDATORY)

5.2 Context File Creation

5.3 Workflow Type Awareness

5.4 Implementation Plan Structure

6. COMPLEXITY ASSESSOR PROMPT ARCHITECTURE

6.1 Assessment Criteria

6.2 Decision Flowchart

6.3 Validation Recommendations

6.4 Example Assessments

7. SPEC CRITIC PROMPT ARCHITECTURE

7.1 Critique Categories

7.2 Severity Guidelines

7.3 Context7 Integration for Validation

8. KEY PATTERNS FOR CODITECT INTEGRATION

8.1 Pattern: Adaptive Pipeline Selection

8.2 Pattern: Self-Critique Quality Gates

8.3 Pattern: Dependency-Aware Subtask Execution

8.4 Pattern: Verification Strategy Per Task

8.5 Pattern: Phase Output Compaction

8.6 Pattern: Workflow Type Awareness

8.7 Pattern: Follow-Up Phase Addition

8.8 Pattern: Heuristic + AI Hybrid Assessment

9. ARCHITECTURAL INSIGHTS

9.1 Separation of Concerns

9.2 Mixin-Based Phase Execution

9.3 File-Based Agent Communication

9.4 Retry with Backoff

9.5 Human Review Checkpoint

10. PROMPT ENGINEERING INSIGHTS

10.1 Structured Output Requirements

10.2 Mandatory Investigation Phase

10.3 Checklist-Driven Critique

10.4 Extended Thinking for Complex Analysis

10.5 Context7 Integration for Technical Validation

11. QUANTITATIVE METRICS

11.1 Phase Count by Complexity

11.2 Assessment Criteria Count

11.3 Critique Checklist Items

11.4 Verification Types

12. FILE REFERENCES SUMMARY

Core Pipeline Files

Implementation Plan Files

Prompt Files

Runner Files

13. RECOMMENDATIONS FOR CODITECT INTEGRATION

13.1 High-Priority Patterns

13.2 Medium-Priority Patterns

13.3 Implementation Guidance

14. CONCLUSION

End of Analysis