Skip to main content

Auto-Claude SPEC PIPELINE - Deep Technical Analysis

Analysis Date: December 22, 2025 Target System: Auto-Claude v2.x Spec Creation Pipeline Repository: /Users/halcasteel/PROJECTS/coditect-rollout-master/submodules/core/coditect-core/submodules/research/auto-claude/


Executive Summary

The Auto-Claude SPEC PIPELINE is a sophisticated dynamic multi-phase orchestration system that adaptively creates implementation specifications based on task complexity. It uses AI-driven complexity assessment to select between 3-8 phases, employs self-critique with extended thinking (ultrathink), and generates subtask-based implementation plans with verification strategies.

Key Innovation: Unlike static workflow systems, Auto-Claude assesses task complexity FIRST, then dynamically constructs the appropriate pipeline—ranging from a 3-phase quick spec for simple changes to an 8-phase rigorous process for complex integrations.


1. COMPLEXITY ASSESSMENT SYSTEM

1.1 Three-Tier Classification

Location: auto-claude/spec/complexity.py:17-23

class Complexity(Enum):
"""Task complexity tiers that determine which phases to run."""
SIMPLE = "simple" # 1-2 files, single service, no integrations
STANDARD = "standard" # 3-10 files, 1-2 services, minimal integrations
COMPLEX = "complex" # 10+ files, multiple services, external integrations

Why This Matters: The enum defines the entire pipeline branching logic. Each tier maps to a specific phase sequence.

1.2 Dual Assessment Strategy

AI-Based Assessment (Primary): auto-claude/spec/complexity.py:344-436

The system runs an AI agent with the complexity_assessor.md prompt that analyzes:

async def run_ai_complexity_assessment(
spec_dir: Path,
task_description: str,
run_agent_fn,
) -> ComplexityAssessment | None:
"""Run AI agent to assess complexity. Returns None if it fails."""

Process:

  1. Loads requirements.json with full user context (line 365-380)
  2. Loads project_index.json if available (line 385-391)
  3. Invokes complexity_assessor.md agent (line 394-397)
  4. Parses AI output into structured ComplexityAssessment (line 399-430)
  5. Returns assessment with recommended phases (line 427)

Heuristic Fallback: auto-claude/spec/complexity.py:79-341

If AI assessment fails, falls back to keyword-based heuristics:

class ComplexityAnalyzer:
"""Analyzes task description and context to determine complexity."""

SIMPLE_KEYWORDS = [
"fix", "typo", "update", "change", "rename", ...
]

COMPLEX_KEYWORDS = [
"integrate", "integration", "api", "sdk", "docker",
"kubernetes", "deploy", "authentication", ...
]

Algorithm (lines 156-208):

  1. Keyword frequency analysis (lines 164-172)
  2. External integration detection via regex (lines 210-231)
  3. Infrastructure change detection (lines 233-252)
  4. File/service estimation (lines 254-293)
  5. Confidence scoring (lines 295-341)

1.3 Assessment Output Structure

File: complexity_assessment.json saved at spec_dir/complexity_assessment.json

{
"complexity": "simple|standard|complex",
"confidence": 0.75, # 0.0 to 1.0
"reasoning": "2-3 sentence explanation",
"signals": {
"simple_keywords": 3,
"complex_keywords": 0,
"estimated_files": 2,
"estimated_services": 1,
...
},
"recommended_phases": ["discovery", "quick_spec", "validation"],
"flags": {
"needs_research": false,
"needs_self_critique": false
}
}

Value Proposition: This structured output becomes the decision input for phase selection (line 47-76).

1.4 Dynamic Phase Selection

Location: auto-claude/spec/complexity.py:47-76

def phases_to_run(self) -> list[str]:
"""Return list of phase names to run based on complexity."""
# If AI provided recommended phases, use those
if self.recommended_phases:
return self.recommended_phases

# Otherwise fall back to default phase sets
if self.complexity == Complexity.SIMPLE:
return ["discovery", "historical_context", "quick_spec", "validation"]
elif self.complexity == Complexity.STANDARD:
# Standard can optionally include research if flagged
phases = ["discovery", "historical_context", "requirements"]
if self.needs_research:
phases.append("research")
phases.extend(["context", "spec_writing", "planning", "validation"])
return phases
else: # COMPLEX
return [
"discovery", "historical_context", "requirements", "research",
"context", "spec_writing", "self_critique", "planning", "validation",
]

Key Insight: The AI can override default phase sets by providing custom recommended_phases. This allows context-aware adaptation beyond the three-tier system.

CODITECT Enhancement Opportunity: This dynamic phase selection pattern could be adapted for workflow library tasks—assessing task complexity before selecting the appropriate skill/agent chain.


2. SPEC CREATION PIPELINE ORCHESTRATION

2.1 Orchestrator Architecture

Location: auto-claude/spec/pipeline/orchestrator.py:49-646

The SpecOrchestrator class coordinates the entire pipeline:

class SpecOrchestrator:
"""Orchestrates the spec creation process with dynamic complexity adaptation."""

def __init__(
self,
project_dir: Path,
task_description: str | None = None,
model: str = "claude-sonnet-4-5-20250929",
thinking_level: str = "medium",
complexity_override: str | None = None,
use_ai_assessment: bool = True,
):

Core Responsibilities:

  1. Spec directory management (lines 87-104)
  2. Agent runner coordination (lines 114-125)
  3. Phase execution orchestration (lines 218-409)
  4. Phase output compaction (lines 161-186)
  5. Human review checkpoint (lines 408-409, 613-645)

2.2 Phase Execution Loop

Location: auto-claude/spec/pipeline/orchestrator.py:259-399

async def run(self, interactive: bool = True, auto_approve: bool = False) -> bool:
"""Run the spec creation process with dynamic phase selection."""

# Fixed phases (always run first)
# 1. Discovery
result = await run_phase("discovery", phase_executor.phase_discovery)
await self._store_phase_summary("discovery")

# 2. Requirements
result = await run_phase("requirements",
lambda: phase_executor.phase_requirements(interactive))
await self._store_phase_summary("requirements")

# 3. Complexity Assessment
result = await run_phase("complexity_assessment",
lambda: self._phase_complexity_assessment_with_requirements())

# Dynamic phases based on complexity
all_phases_to_run = self.assessment.phases_to_run()
phases_to_run = [p for p in all_phases_to_run
if p not in ["discovery", "requirements"]]

for phase_name in phases_to_run:
result = await run_phase(phase_name, all_phases[phase_name])
if result.success:
await self._store_phase_summary(phase_name)
else:
return False # Stop on failure

# Human review checkpoint
return self._run_review_checkpoint(auto_approve)

Pattern: Fixed setup → Dynamic adaptation → Human gate

2.3 Phase Result Model

Location: auto-claude/spec/phases/models.py:11-19

@dataclass
class PhaseResult:
"""Result of a phase execution."""
phase: str
success: bool
output_files: list[str]
errors: list[str]
retries: int

Retry Strategy: Each phase can retry up to MAX_RETRIES = 3 times (line 23).

2.4 Phase Compaction Strategy

Location: auto-claude/spec/pipeline/orchestrator.py:161-186

async def _store_phase_summary(self, phase_name: str) -> None:
"""Summarize and store phase output for subsequent phases."""
try:
# Gather outputs from this phase
phase_output = gather_phase_outputs(self.spec_dir, phase_name)
if not phase_output:
return

# Summarize the output (target: 500 words)
summary = await summarize_phase_output(
phase_name,
phase_output,
model="claude-sonnet-4-5-20250929",
target_words=500,
)

if summary:
self._phase_summaries[phase_name] = summary
except Exception as e:
print_status(f"Phase summarization skipped: {e}", "warning")

Why This Matters: As phases accumulate, context size grows. Summarization prevents token overflow while preserving critical information for subsequent phases.

Usage: Summaries are injected into subsequent agent prompts (line 151-158):

# Format prior phase summaries for context
prior_summaries = format_phase_summaries(self._phase_summaries)

return await runner.run_agent(
prompt_file,
additional_context,
interactive,
thinking_budget=thinking_budget,
prior_phase_summaries=prior_summaries if prior_summaries else None,
)

3. SELF-CRITIQUE WITH ULTRATHINK

3.1 Critique System Architecture

Location: auto-claude/spec/critique.py:1-370

The self-critique system implements a mandatory quality gate before marking subtasks complete:

@dataclass
class CritiqueResult:
"""Result of a self-critique evaluation."""
passes: bool
issues: list[str] = field(default_factory=list)
improvements_made: list[str] = field(default_factory=list)
recommendations: list[str] = field(default_factory=list)

3.2 Critique Prompt Generation

Location: auto-claude/spec/critique.py:50-164

def generate_critique_prompt(
subtask: dict,
files_modified: list[str],
patterns_from: list[str]
) -> str:
"""Generate a critique prompt for the agent to self-evaluate."""

Generates a structured checklist covering:

  1. Code Quality Checklist (lines 78-104)

    • Pattern adherence
    • Error handling
    • Code cleanliness
    • Best practices
  2. Implementation Completeness (lines 106-122)

    • All expected files modified
    • All expected files created
    • Requirements fully met
  3. Potential Issues Analysis (lines 124-132)

    • Agent must list concerns honestly
  4. Improvements Made (lines 134-140)

    • Agent must fix issues before continuing
  5. Final Verdict (lines 143-148)

    • PROCEED: YES/NO
    • REASON: Brief explanation
    • CONFIDENCE: High/Medium/Low

3.3 Critique Response Parser

Location: auto-claude/spec/critique.py:167-261

def parse_critique_response(response: str) -> CritiqueResult:
"""Parse the agent's critique response into structured data."""

# Extract PROCEED verdict
proceed_match = re.search(
r"\*\*PROCEED:\*\*\s*\[?\s*(YES|NO)", response, re.IGNORECASE
)
if proceed_match:
passes = proceed_match.group(1).upper() == "YES"

# Extract issues from Step 3
issues_section = re.search(
r"### STEP 3:.*?Potential Issues.*?\n\n(.*?)(?=###|\Z)",
response, re.DOTALL | re.IGNORECASE,
)
# ... parse and filter issues ...

# Extract improvements from Step 4
improvements_section = re.search(
r"### STEP 4:.*?Improvements Made.*?\n\n(.*?)(?=###|\Z)",
response, re.DOTALL | re.IGNORECASE,
)
# ... parse and filter improvements ...

Value: Structured parsing ensures critique results are machine-readable for decision logic.

3.4 Critique Decision Logic

Location: auto-claude/spec/critique.py:264-283

def should_proceed(result: CritiqueResult) -> bool:
"""Determine if the subtask should be marked complete based on critique."""
# Must pass the critique
if not result.passes:
return False

# If there are unresolved issues, don't proceed
if result.issues:
return False

return True

Rule: Subtask can ONLY proceed if:

  1. Agent explicitly said PROCEED: YES
  2. No unresolved issues remain

3.5 Spec-Level Self-Critique with Ultrathink

Prompt: auto-claude/prompts/spec_critic.md

Phase: Self-critique phase (runs only for COMPLEX tasks)

Key Instructions (lines 40-122):

## PHASE 1: DEEP ANALYSIS (USE EXTENDED THINKING)

**CRITICAL**: Use extended thinking for this phase. Think deeply about:

### 1.1: Technical Accuracy
Compare spec.md against research.json AND validate with Context7:
- **Package names**: Does spec use correct package names from research?
- **Import statements**: Do imports match researched API patterns?
- **API calls**: Do function signatures match documentation?
- **Configuration**: Are env vars and config options correct?

**USE CONTEXT7 TO VALIDATE TECHNICAL CLAIMS:**
If the spec mentions specific libraries or APIs, verify them against Context7:

Tool: mcp__context7__resolve-library-id
Tool: mcp__context7__get-library-docs

Process (lines 123-253):

  1. Load all context (spec.md, research.json, requirements.json, context.json)
  2. Deep analysis with extended thinking on 5 dimensions:
    • Technical accuracy (package names, APIs, config)
    • Completeness (requirements coverage, edge cases)
    • Consistency (naming, paths, patterns)
    • Feasibility (dependencies, infrastructure, order)
    • Research alignment (verified info, gotchas, recommendations)
  3. Catalog issues with severity (HIGH/MEDIUM/LOW) and categories
  4. Fix issues directly in spec.md using edit commands
  5. Create critique report (critique_report.json)
  6. Verify fixes (validate markdown, check sections)

Extended Thinking Example (lines 300-318):

> "Looking at this spec.md, I need to deeply analyze it against the research findings...
>
> First, let me check all package names. The research says the package is [X],
> but the spec says [Y]. This is a mismatch that needs fixing.
>
> Let me also verify with Context7 - I'll look up the actual package name and
> API patterns to confirm...
> [Use mcp__context7__resolve-library-id to find the library]
> [Use mcp__context7__get-library-docs to check API patterns]
>
> Next, looking at the API patterns. The research shows initialization requires
> [steps], but the spec shows [different steps]. Let me cross-reference with
> Context7 documentation... Another issue confirmed.
> ...

CODITECT Enhancement Opportunity: This ultrathink self-critique pattern could be integrated as a quality gate before executing high-risk workflow steps.


4. SUBTASK-BASED IMPLEMENTATION PLANNING

4.1 Implementation Plan Model

Location: auto-claude/implementation_plan/plan.py:20-373

@dataclass
class ImplementationPlan:
"""Complete implementation plan for a feature/task."""
feature: str
workflow_type: WorkflowType = WorkflowType.FEATURE
services_involved: list[str] = field(default_factory=list)
phases: list[Phase] = field(default_factory=list)
final_acceptance: list[str] = field(default_factory=list)

# Status tracking (synced with UI)
status: str | None = None # backlog, in_progress, ai_review, human_review, done
planStatus: str | None = None # pending, in_progress, review, completed
qa_signoff: dict | None = None

Key Methods:

  1. Dependency Resolution (lines 178-189):
def get_available_phases(self) -> list[Phase]:
"""Get phases whose dependencies are satisfied."""
completed_phases = {p.phase for p in self.phases if p.is_complete()}
available = []

for phase in self.phases:
if phase.is_complete():
continue
deps_met = all(d in completed_phases for d in phase.depends_on)
if deps_met:
available.append(phase)

return available
  1. Next Subtask Selection (lines 192-198):
def get_next_subtask(self) -> tuple[Phase, Subtask] | None:
"""Get the next subtask to work on, respecting dependencies."""
for phase in self.get_available_phases():
pending = phase.get_pending_subtasks()
if pending:
return phase, pending[0]
return None
  1. Progress Tracking (lines 200-228):
def get_progress(self) -> dict:
"""Get overall progress statistics."""
total_subtasks = sum(len(p.subtasks) for p in self.phases)
done_subtasks = sum(
1 for p in self.phases for s in p.subtasks
if s.status == SubtaskStatus.COMPLETED
)
# ... calculate percentages, failed counts, etc.
  1. Follow-Up Phase Addition (lines 259-316):
def add_followup_phase(
self,
name: str,
subtasks: list[Subtask],
phase_type: PhaseType = PhaseType.IMPLEMENTATION,
parallel_safe: bool = False,
) -> Phase:
"""Add a new follow-up phase to an existing (typically completed) plan.

This allows users to extend completed builds with additional work.
The new phase depends on all existing phases to ensure proper sequencing.
"""

Value: Enables iterative enhancement of completed features without rebuilding plans from scratch.

4.2 Subtask Model

Location: auto-claude/implementation_plan/subtask.py:17-133

@dataclass
class Subtask:
"""A single unit of implementation work."""
id: str
description: str
status: SubtaskStatus = SubtaskStatus.PENDING

# Scoping
service: str | None = None # Which service (backend, frontend, worker)
all_services: bool = False # True for integration subtasks

# Files
files_to_modify: list[str] = field(default_factory=list)
files_to_create: list[str] = field(default_factory=list)
patterns_from: list[str] = field(default_factory=list)

# Verification
verification: Verification | None = None

# For investigation subtasks
expected_output: str | None = None # Knowledge/decision output
actual_output: str | None = None # What was discovered

# Self-Critique
critique_result: dict | None = None # Results from self-critique before completion

Key Methods:

  1. Start Tracking (lines 107-114):
def start(self, session_id: int):
"""Mark subtask as in progress."""
self.status = SubtaskStatus.IN_PROGRESS
self.started_at = datetime.now().isoformat()
self.session_id = session_id
# Clear stale data from previous runs to ensure clean state
self.completed_at = None
self.actual_output = None
  1. Completion (lines 116-121):
def complete(self, output: str | None = None):
"""Mark subtask as done."""
self.status = SubtaskStatus.COMPLETED
self.completed_at = datetime.now().isoformat()
if output:
self.actual_output = output
  1. Failure (lines 123-128):
def fail(self, reason: str | None = None):
"""Mark subtask as failed."""
self.status = SubtaskStatus.FAILED
self.completed_at = None # Clear to maintain consistency (failed != completed)
if reason:
self.actual_output = f"FAILED: {reason}"

4.3 Verification Strategy Model

Location: auto-claude/implementation_plan/verification.py:14-53

@dataclass
class Verification:
"""How to verify a subtask is complete."""
type: VerificationType
run: str | None = None # Command to run
url: str | None = None # URL for API/browser tests
method: str | None = None # HTTP method for API tests
expect_status: int | None = None # Expected HTTP status
expect_contains: str | None = None # Expected content
scenario: str | None = None # Description for browser/manual tests

Verification Types:

class VerificationType(Enum):
"""Types of verification methods."""
COMMAND = "command" # Run shell command, check exit code
API = "api" # HTTP request, check status/content
BROWSER = "browser" # Manual browser test with scenario
MANUAL = "manual" # Human verification with instructions
NONE = "none" # No automated verification

Why This Matters: Each subtask has a built-in verification strategy, enabling automated QA checkpoints throughout implementation.


5. PLANNER AGENT PROMPT ARCHITECTURE

5.1 Phase 0: Deep Codebase Investigation (MANDATORY)

Location: auto-claude/prompts/planner.md:21-67

## PHASE 0: DEEP CODEBASE INVESTIGATION (MANDATORY)

**CRITICAL**: Before ANY planning, you MUST thoroughly investigate the existing
codebase. Poor investigation leads to plans that don't match the codebase's
actual patterns.

### 0.1: Understand Project Structure

find . -type f -name "*.py" -o -name "*.ts" -o -name "*.tsx" -o -name "*.js" | head -100

Identify:
- Main entry points (main.py, app.py, index.ts, etc.)
- Configuration files (settings.py, config.py, .env.example)
- Directory organization patterns

### 0.2: Analyze Existing Patterns for the Feature

**This is the most important step.** For whatever feature you're building,
find SIMILAR existing features:

# Example: If building "caching", search for existing cache implementations
grep -r "cache" --include="*.py" . | head -30
grep -r "redis\|memcache\|lru_cache" --include="*.py" . | head -30

**YOU MUST READ AT LEAST 3 PATTERN FILES** before planning

Pattern: Investigation → Pattern discovery → Plan creation

Why This Matters: Forces the agent to ground its plan in actual codebase patterns, preventing "generic" plans that don't match project conventions.

5.2 Context File Creation

Location: auto-claude/prompts/planner.md:89-157

The planner creates two critical context files:

  1. project_index.json (lines 93-119):
{
"project_type": "single|monorepo",
"services": {
"backend": {
"path": ".",
"tech_stack": ["python", "fastapi"],
"port": 8000,
"dev_command": "uvicorn main:app --reload",
"test_command": "pytest"
}
},
"infrastructure": {
"docker": false,
"database": "postgresql"
}
}
  1. context.json (lines 130-157):
{
"files_to_modify": {
"backend": ["app/services/existing_service.py", "app/routes/api.py"]
},
"files_to_reference": ["app/services/similar_service.py"],
"patterns": {
"service_pattern": "All services inherit from BaseService",
"route_pattern": "Routes use APIRouter with prefix and tags"
},
"existing_implementations": {
"description": "Found existing caching in app/utils/cache.py using Redis",
"relevant_files": ["app/utils/cache.py", "app/config.py"]
}
}

Value: These files become inputs for subsequent phases, ensuring consistency across the pipeline.

5.3 Workflow Type Awareness

Location: auto-claude/prompts/planner.md:161-200

The planner prompt defines 5 workflow types with distinct phase structures:

  1. FEATURE (multi-service):

    • Phase 1: Backend/API (testable with curl)
    • Phase 2: Worker (background jobs)
    • Phase 3: Frontend (UI components)
    • Phase 4: Integration (wire everything)
  2. REFACTOR (stage-based):

    • Phase 1: Add New (build alongside old)
    • Phase 2: Migrate (move consumers)
    • Phase 3: Remove Old (delete deprecated)
    • Phase 4: Cleanup (polish)
  3. INVESTIGATION (bug hunting):

    • Phase 1: Reproduce (logging, reproduction)
    • Phase 2: Investigate (root cause → OUTPUT REQUIRED)
    • Phase 3: Fix (BLOCKED until phase 2)
    • Phase 4: Harden (tests, prevention)
  4. MIGRATION (data pipeline):

    • Phase 1: Prepare (scripts, setup)
    • Phase 2: Test (small batch)
    • Phase 3: Execute (full migration)
    • Phase 4: Cleanup (verify, remove old)
  5. SIMPLE (single-service quick):

    • No phases, just subtasks

Pattern: Workflow type determines phase structure and dependency graph.

5.4 Implementation Plan Structure

Location: auto-claude/prompts/planner.md:215-264

{
"feature": "Short descriptive name",
"workflow_type": "feature|refactor|investigation|migration|simple",
"workflow_rationale": "Why this workflow type was chosen",
"phases": [
{
"id": "phase-1-backend",
"name": "Backend API",
"type": "implementation",
"depends_on": [],
"parallel_safe": true,
"subtasks": [
{
"id": "subtask-1-1",
"description": "Create data models for [feature]",
"service": "backend",
"files_to_modify": ["src/models/user.py"],
"files_to_create": ["src/models/analytics.py"],
"patterns_from": ["src/models/existing_model.py"],
"verification": {
"type": "command",
"command": "python -c \"from src.models.analytics import Analytics\"",
"expected": "OK"
},
"status": "pending"
}
]
}
]
}

Critical Requirements (lines 205-214):

  • MUST use Write tool to create implementation_plan.json
  • Plan structure matches ImplementationPlan dataclass
  • Each subtask includes verification strategy
  • Phase dependencies explicitly defined

6. COMPLEXITY ASSESSOR PROMPT ARCHITECTURE

6.1 Assessment Criteria

Location: auto-claude/prompts/complexity_assessor.md:103-132

The prompt defines 5 analysis dimensions:

  1. Scope Analysis (lines 107-110):

    • How many files will likely be touched?
    • How many services are involved?
    • Is this localized or cross-cutting?
  2. Integration Analysis (lines 112-116):

    • External services/APIs involved?
    • New dependencies needed?
    • Research required for correct usage?
  3. Infrastructure Analysis (lines 118-122):

    • Docker/container changes?
    • Database schema changes?
    • Environment configuration changes?
    • Deployment considerations?
  4. Knowledge Analysis (lines 124-127):

    • Codebase has patterns for this?
    • External docs research needed?
    • Unfamiliar technologies involved?
  5. Risk Analysis (lines 129-132):

    • What could go wrong?
    • Security considerations?
    • Breaking existing functionality?

6.2 Decision Flowchart

Location: auto-claude/prompts/complexity_assessor.md:394-418

START

├─► Are there 2+ external integrations OR unfamiliar technologies?
│ YES → COMPLEX (needs research + critique)
│ NO ↓

├─► Are there infrastructure changes (Docker, DB, new services)?
│ YES → COMPLEX (needs research + critique)
│ NO ↓

├─► Is there 1 external integration that needs research?
│ YES → STANDARD + research phase
│ NO ↓

├─► Will this touch 3+ files across 1-2 services?
│ YES → STANDARD
│ NO ↓

└─► SIMPLE (1-2 files, single service, no integrations)

Pattern: Progressive elimination from high to low complexity.

6.3 Validation Recommendations

Location: auto-claude/prompts/complexity_assessor.md:261-391

The assessor also generates validation depth recommendations:

"validation_recommendations": {
"risk_level": "trivial|low|medium|high|critical",
"skip_validation": false,
"minimal_mode": false,
"test_types_required": ["unit", "integration", "e2e"],
"security_scan_required": true,
"staging_deployment_required": false,
"reasoning": "1-2 sentences explaining validation depth choice"
}

Risk Levels:

RiskWhenValidation Depth
TRIVIALDocs-only, commentsSkip validation
LOWSingle service, <5 filesUnit tests only
MEDIUMMultiple files, API changesUnit + Integration
HIGHDB changes, auth/securityUnit + Integration + E2E + Security scan
CRITICALPayments, data deletionAll above + Manual + Staging

Value: Tailors QA rigor to actual risk, avoiding over-testing simple changes and under-testing critical ones.

6.4 Example Assessments

Simple Task (lines 423-467):

{
"complexity": "simple",
"workflow_type": "simple",
"reasoning": "Single file UI change with no dependencies or infrastructure impact.",
"recommended_phases": ["discovery", "quick_spec", "validation"],
"validation_recommendations": {
"risk_level": "low",
"minimal_mode": true,
"test_types_required": ["unit"]
}
}

Complex Integration (lines 589-647):

{
"complexity": "complex",
"workflow_type": "feature",
"reasoning": "Multiple integrations (Graphiti, FalkorDB), infrastructure changes
(Docker Compose), and new architectural pattern (optional memory layer)",
"recommended_phases": [
"discovery", "requirements", "research", "context",
"spec_writing", "self_critique", "planning", "validation"
],
"flags": {
"needs_research": true,
"needs_self_critique": true,
"needs_infrastructure_setup": true
},
"validation_recommendations": {
"risk_level": "high",
"test_types_required": ["unit", "integration", "e2e"],
"security_scan_required": true,
"staging_deployment_required": true
}
}

7. SPEC CRITIC PROMPT ARCHITECTURE

7.1 Critique Categories

Location: auto-claude/prompts/spec_critic.md:288-294

- **Accuracy**: Technical correctness (packages, APIs, config)
- **Completeness**: Coverage of requirements and edge cases
- **Consistency**: Internal coherence of the document
- **Feasibility**: Practical implementability
- **Alignment**: Match with research findings

7.2 Severity Guidelines

Location: auto-claude/prompts/spec_critic.md:266-284

HIGH - Will cause implementation failure:

  • Wrong package names
  • Incorrect API signatures
  • Missing critical requirements
  • Invalid configuration

MEDIUM - May cause issues:

  • Missing edge cases
  • Incomplete error handling
  • Unclear integration points
  • Inconsistent patterns

LOW - Minor improvements:

  • Terminology inconsistencies
  • Documentation gaps
  • Style issues
  • Minor optimizations

7.3 Context7 Integration for Validation

Location: auto-claude/prompts/spec_critic.md:55-77

**USE CONTEXT7 TO VALIDATE TECHNICAL CLAIMS:**

If the spec mentions specific libraries or APIs, verify them against Context7:

# Step 1: Resolve library ID
Tool: mcp__context7__resolve-library-id
Input: { "libraryName": "[library from spec]" }

# Step 2: Verify API patterns mentioned in spec
Tool: mcp__context7__get-library-docs
Input: {
"context7CompatibleLibraryID": "[library-id]",
"topic": "[specific API or feature mentioned in spec]",
"mode": "code"
}

**Check for common spec errors:**
- Wrong package name (e.g., "react-query" vs "@tanstack/react-query")
- Outdated API patterns (e.g., using deprecated functions)
- Incorrect function signatures (e.g., wrong parameter order)
- Missing required configuration (e.g., missing env vars)

Pattern: External documentation lookup for technical validation beyond what's in the research phase.


8. KEY PATTERNS FOR CODITECT INTEGRATION

8.1 Pattern: Adaptive Pipeline Selection

What: System assesses task characteristics BEFORE selecting workflow

Implementation:

  1. Analyze task with AI agent (complexity_assessor.md)
  2. Generate structured assessment (complexity_assessment.json)
  3. Map assessment to phase list (phases_to_run())
  4. Execute only necessary phases

CODITECT Application:

  • Add complexity assessment to /new-project workflow
  • Simple projects → streamlined agent chain
  • Complex projects → full multi-agent orchestration
  • Save assessment for future reference

Files to Reference:

  • auto-claude/spec/complexity.py - Assessment logic
  • auto-claude/prompts/complexity_assessor.md - AI assessment prompt

8.2 Pattern: Self-Critique Quality Gates

What: Agent must self-review before marking work complete

Implementation:

  1. Generate critique prompt with checklist (generate_critique_prompt())
  2. Agent evaluates own work against criteria
  3. Parse response for issues (parse_critique_response())
  4. Block completion if issues exist (should_proceed())

CODITECT Application:

  • Add critique step to high-stakes skills (infrastructure, security)
  • Generate checklists based on skill requirements
  • Parse and store critique results for audit trail

Files to Reference:

  • auto-claude/spec/critique.py - Critique system
  • auto-claude/prompts/spec_critic.md - Ultrathink critique prompt

8.3 Pattern: Dependency-Aware Subtask Execution

What: Automatically determine which tasks can run based on completion state

Implementation:

  1. Define phase dependencies (depends_on field)
  2. Track completion state (status field)
  3. Query available phases (get_available_phases())
  4. Select next subtask (get_next_subtask())

CODITECT Application:

  • Implement dependency DAG for complex workflows
  • Track workflow progress with subtask granularity
  • Enable parallel execution where safe
  • Auto-resume workflows from last completed step

Files to Reference:

  • auto-claude/implementation_plan/plan.py - Dependency resolution
  • auto-claude/implementation_plan/subtask.py - Status tracking

8.4 Pattern: Verification Strategy Per Task

What: Each unit of work defines how to verify its completion

Implementation:

  1. Define verification type (command, API, browser, manual)
  2. Specify verification parameters (URL, expected output, etc.)
  3. Execute verification after task completion
  4. Block progress if verification fails

CODITECT Application:

  • Add verification fields to workflow subtasks
  • Auto-generate verification commands based on task type
  • Create verification report for audit/compliance

Files to Reference:

  • auto-claude/implementation_plan/verification.py - Verification model
  • auto-claude/prompts/planner.md:215-264 - Verification in plans

8.5 Pattern: Phase Output Compaction

What: Summarize completed phases to prevent context overflow

Implementation:

  1. Gather outputs from completed phase (gather_phase_outputs())
  2. Summarize with AI to target word count (500 words)
  3. Store summary in phase dictionary (_phase_summaries)
  4. Inject summaries into subsequent agent prompts

CODITECT Application:

  • Summarize long-running agent outputs
  • Provide compact context for downstream agents
  • Maintain critical information while reducing token usage

Files to Reference:

  • auto-claude/spec/pipeline/orchestrator.py:161-186 - Compaction logic
  • auto-claude/spec/compaction.py - Summarization functions

8.6 Pattern: Workflow Type Awareness

What: Different types of work (feature, refactor, investigation) follow different patterns

Implementation:

  1. Define workflow types (FEATURE, REFACTOR, INVESTIGATION, MIGRATION, SIMPLE)
  2. Map each type to appropriate phase structure
  3. Use workflow type to determine phase dependencies
  4. Tailor subtask templates to workflow

CODITECT Application:

  • Extend workflow library with workflow type metadata
  • Auto-select appropriate agent sequence based on type
  • Customize verification strategies per workflow type

Files to Reference:

  • auto-claude/prompts/planner.md:161-200 - Workflow type definitions
  • auto-claude/implementation_plan/plan.py:25 - WorkflowType enum

8.7 Pattern: Follow-Up Phase Addition

What: Extend completed plans with additional work without rebuilding

Implementation:

  1. Load completed plan
  2. Create new phase with subtasks
  3. Set new phase to depend on all existing phases
  4. Reset plan status to in_progress
  5. Save updated plan

CODITECT Application:

  • Enable iterative feature enhancement
  • Add follow-up tasks after initial project completion
  • Maintain dependency chain across iterations

Files to Reference:

  • auto-claude/implementation_plan/plan.py:259-316 - add_followup_phase()
  • auto-claude/implementation_plan/plan.py:318-372 - reset_for_followup()

8.8 Pattern: Heuristic + AI Hybrid Assessment

What: Use fast heuristics as fallback when AI assessment fails

Implementation:

  1. Attempt AI assessment first
  2. If AI fails or returns None, fall back to heuristics
  3. Both methods return same ComplexityAssessment structure
  4. Continue pipeline regardless of assessment method

CODITECT Application:

  • Implement graceful degradation for agent failures
  • Use heuristics for offline/low-bandwidth scenarios
  • A/B test AI vs heuristic accuracy

Files to Reference:

  • auto-claude/spec/complexity.py:344-436 - AI assessment
  • auto-claude/spec/complexity.py:79-341 - Heuristic analyzer

9. ARCHITECTURAL INSIGHTS

9.1 Separation of Concerns

Orchestrator (coordination) vs PhaseExecutor (execution):

SpecOrchestrator
├── Determines which phases to run
├── Manages phase summaries (compaction)
├── Handles human review checkpoint
└── Coordinates AgentRunner

PhaseExecutor (composed of 4 mixins)
├── DiscoveryPhaseMixin - Discovery, context gathering
├── RequirementsPhaseMixin - Requirements, research, historical context
├── SpecPhaseMixin - Spec writing, self-critique
└── PlanningPhaseMixin - Implementation planning, validation

Value: Clean separation enables independent evolution of orchestration logic vs phase implementation.

9.2 Mixin-Based Phase Execution

Location: auto-claude/spec/phases/executor.py:19-33

class PhaseExecutor(
DiscoveryPhaseMixin,
RequirementsPhaseMixin,
SpecPhaseMixin,
PlanningPhaseMixin,
):
"""Executes individual phases of spec creation."""

Pattern: Each mixin implements related phases, all composed into single executor.

Benefits:

  • Phase implementations grouped by concern
  • Easy to add new phase types
  • Shared state via base class
  • Single entry point for orchestrator

CODITECT Application: Similar mixin pattern could organize CODITECT skills by domain (data_engineering, ml_ops, infrastructure).

9.3 File-Based Agent Communication

Pattern: Agents communicate via JSON files in spec directory, not in-memory state.

Inputs → Phase → Outputs:

requirements.json → Complexity Assessment → complexity_assessment.json
requirements.json + project_index.json → Research → research.json
research.json + requirements.json → Spec Writing → spec.md
spec.md + research.json → Self-Critique → critique_report.json + spec.md (updated)
spec.md + context.json → Planning → implementation_plan.json

Benefits:

  • Inspectable intermediate results
  • Easy debugging (read files directly)
  • Resumable pipelines (restart from any phase)
  • Persistence across sessions

CODITECT Application: Adopt file-based agent communication for workflow library to enable inspection and recovery.

9.4 Retry with Backoff

Location: auto-claude/spec/phases/models.py:22-23

# Maximum retry attempts for phase execution
MAX_RETRIES = 3

Each phase can retry up to 3 times. Failures accumulate in PhaseResult.errors list.

Pattern: Transient failures (API timeouts, temporary issues) don't fail the entire pipeline.

CODITECT Application: Add retry logic to network-dependent skills (API calls, database queries).

9.5 Human Review Checkpoint

Location: auto-claude/spec/pipeline/orchestrator.py:613-645

After all automated phases complete, a mandatory human review runs (unless --auto-approve):

def _run_review_checkpoint(self, auto_approve: bool) -> bool:
"""Run the human review checkpoint."""
review_state = run_review_checkpoint(
spec_dir=self.spec_dir,
auto_approve=auto_approve,
)

if not review_state.is_approved():
print_status("Build will not proceed without approval.", "warning")
return False

return True

Pattern: Human-in-the-loop gate before high-cost implementation phase.

CODITECT Application: Add human approval gates before:

  • Database migrations
  • Production deployments
  • High-cost operations (large ML training jobs)

10. PROMPT ENGINEERING INSIGHTS

10.1 Structured Output Requirements

All agent prompts enforce structured output via explicit file creation instructions:

Example: complexity_assessor.md (lines 190-257)

Create `complexity_assessment.json`:

cat > complexity_assessment.json << 'EOF'
{
"complexity": "[simple|standard|complex]",
"confidence": [0.0-1.0],
...
}
EOF

Pattern: Template + Instructions + Example = Consistent structured output

10.2 Mandatory Investigation Phase

Location: planner.md (lines 21-67)

## PHASE 0: DEEP CODEBASE INVESTIGATION (MANDATORY)

**CRITICAL**: Before ANY planning, you MUST thoroughly investigate the existing
codebase. Poor investigation leads to plans that don't match the codebase's
actual patterns.

**YOU MUST READ AT LEAST 3 PATTERN FILES** before planning

Pattern: Force information gathering before generation to ground outputs in reality.

CODITECT Application: Require codebase analysis before generating new code in workflows.

10.3 Checklist-Driven Critique

Location: critique.py (lines 78-104)

Critique prompt includes explicit checklists:

### STEP 1: Code Quality Checklist

**Pattern Adherence:**
- [ ] Follows patterns from reference files exactly
- [ ] Variable naming matches codebase conventions
- [ ] Imports organized correctly
...

Pattern: Checkboxes create psychological commitment to thorough review.

CODITECT Application: Use checklist prompts for quality gates in critical workflows.

10.4 Extended Thinking for Complex Analysis

Location: spec_critic.md (lines 40-77, 299-318)

## PHASE 1: DEEP ANALYSIS (USE EXTENDED THINKING)

**CRITICAL**: Use extended thinking for this phase.

Pattern: Explicitly request extended thinking for high-stakes analysis phases.

CODITECT Application: Request extended thinking for:

  • Architecture decisions
  • Security reviews
  • Performance optimization analysis

10.5 Context7 Integration for Technical Validation

Location: spec_critic.md (lines 55-77)

Critic prompt uses external documentation lookup:

Tool: mcp__context7__resolve-library-id
Tool: mcp__context7__get-library-docs

Pattern: Combine internal knowledge with external authoritative sources.

CODITECT Application: Integrate Context7 or similar documentation tools for technical skill validation.


11. QUANTITATIVE METRICS

11.1 Phase Count by Complexity

ComplexityPhase CountPhases
SIMPLE3-4discovery, [historical_context], quick_spec, validation
STANDARD6-7discovery, [historical_context], requirements, [research], context, spec_writing, planning, validation
COMPLEX8-9discovery, [historical_context], requirements, research, context, spec_writing, self_critique, planning, validation

Note: historical_context is optional based on Graphiti availability.

11.2 Assessment Criteria Count

Complexity Analyzer:

  • 25 SIMPLE_KEYWORDS
  • 16 COMPLEX_KEYWORDS
  • 11 MULTI_SERVICE_KEYWORDS
  • 11 integration regex patterns
  • 11 infrastructure regex patterns

AI Assessment Dimensions:

  • 5 analysis categories (scope, integrations, infrastructure, knowledge, risk)
  • 5 risk levels (trivial, low, medium, high, critical)

11.3 Critique Checklist Items

Self-Critique Prompt:

  • 4 pattern adherence checks
  • 3 error handling checks
  • 4 code cleanliness checks
  • 4 best practices checks
  • 3 implementation completeness checks

Total: 18 explicit quality checks per subtask

11.4 Verification Types

  • 5 verification types (command, api, browser, manual, none)
  • 6 verification parameters (run, url, method, expect_status, expect_contains, scenario)

12. FILE REFERENCES SUMMARY

Core Pipeline Files

FileLinesPurpose
spec/complexity.py467Complexity assessment (AI + heuristic)
spec/critique.py370Self-critique system
spec/discovery.py78Project structure discovery
spec/pipeline/orchestrator.py687Main pipeline orchestration
spec/pipeline/models.py264Pipeline data structures
spec/phases/executor.py77Phase execution coordinator
spec/phases/models.py24Phase result model

Implementation Plan Files

FileLinesPurpose
implementation_plan/plan.py373Plan model with dependency resolution
implementation_plan/subtask.py133Subtask tracking and status
implementation_plan/verification.py54Verification strategy model

Prompt Files

FileLinesPurpose
prompts/complexity_assessor.md676AI complexity assessment
prompts/spec_critic.md325Spec self-critique with ultrathink
prompts/planner.md300+Implementation plan generation

Runner Files

FileLinesPurpose
runners/spec_runner.py200+CLI entry point for spec creation

13. RECOMMENDATIONS FOR CODITECT INTEGRATION

13.1 High-Priority Patterns

  1. Adaptive Pipeline Selection (Priority: HIGH)

    • Add complexity assessment to /new-project
    • Map complexity to agent chain selection
    • File: spec/complexity.py + prompts/complexity_assessor.md
  2. Self-Critique Quality Gates (Priority: HIGH)

    • Add critique step to security/infrastructure skills
    • Generate skill-specific checklists
    • File: spec/critique.py
  3. Subtask-Based Execution (Priority: MEDIUM)

    • Implement dependency DAG for complex workflows
    • Enable parallel subtask execution where safe
    • File: implementation_plan/plan.py
  4. Verification Strategies (Priority: MEDIUM)

    • Add verification metadata to workflow subtasks
    • Auto-generate verification commands
    • File: implementation_plan/verification.py

13.2 Medium-Priority Patterns

  1. Phase Output Compaction (Priority: MEDIUM)

    • Summarize long agent outputs
    • Inject summaries into downstream agents
    • File: spec/compaction.py (referenced, not analyzed in detail)
  2. Workflow Type Awareness (Priority: MEDIUM)

    • Add workflow type metadata to workflow library
    • Customize agent sequences per type
    • File: prompts/planner.md (workflow types section)
  3. Human Review Checkpoints (Priority: LOW)

    • Add approval gates before high-risk operations
    • File: spec/pipeline/orchestrator.py:613-645

13.3 Implementation Guidance

For Complexity Assessment:

  1. Create CODITECT skill: /assess-task-complexity
  2. Adapt complexity_assessor.md prompt for CODITECT context
  3. Store assessment in .coditect/task-assessment.json
  4. Use assessment to select agent chain from workflow library

For Self-Critique:

  1. Create critique-skill-execution helper
  2. Generate checklists based on skill requirements
  3. Parse critique responses with regex (adapt from critique.py)
  4. Block skill completion if critique fails

For Subtask Execution:

  1. Extend workflow library format with subtask dependencies
  2. Implement get_next_subtask() logic for workflow execution
  3. Track subtask status in .coditect/workflow-progress.json
  4. Enable resume from last completed subtask

14. CONCLUSION

The Auto-Claude SPEC PIPELINE demonstrates adaptive orchestration at scale. By assessing task complexity FIRST and then dynamically constructing the appropriate workflow, it achieves both efficiency (3-phase pipeline for simple tasks) and rigor (8-phase pipeline with self-critique for complex integrations).

Key Takeaways for CODITECT:

  1. Assessment Before Execution: Don't assume one workflow fits all—assess first, adapt second.

  2. Self-Critique as Quality Gate: Agents can and should evaluate their own work before marking tasks complete.

  3. Dependency-Aware Execution: Track dependencies explicitly to enable safe parallelism and automatic unblocking.

  4. Verification Per Task: Each unit of work should define how to verify its completion.

  5. File-Based Communication: Intermediate outputs as files enable inspection, debugging, and resumption.

  6. Workflow Type Taxonomy: Different types of work (feature, refactor, investigation) follow different patterns—codify these patterns.

  7. Human-in-the-Loop Gates: Automate aggressively, but gate high-cost or high-risk operations with human approval.

Next Steps:

  1. Pilot complexity assessment for /new-project workflow
  2. Add self-critique to infrastructure and security skills
  3. Extend workflow library with dependency metadata
  4. Implement verification command generation

Contact: This analysis prepared for CODITECT workflow library enhancement.


End of Analysis