Auto-Claude SPEC PIPELINE - Deep Technical Analysis
Analysis Date: December 22, 2025
Target System: Auto-Claude v2.x Spec Creation Pipeline
Repository: /Users/halcasteel/PROJECTS/coditect-rollout-master/submodules/core/coditect-core/submodules/research/auto-claude/
Executive Summary
The Auto-Claude SPEC PIPELINE is a sophisticated dynamic multi-phase orchestration system that adaptively creates implementation specifications based on task complexity. It uses AI-driven complexity assessment to select between 3-8 phases, employs self-critique with extended thinking (ultrathink), and generates subtask-based implementation plans with verification strategies.
Key Innovation: Unlike static workflow systems, Auto-Claude assesses task complexity FIRST, then dynamically constructs the appropriate pipeline—ranging from a 3-phase quick spec for simple changes to an 8-phase rigorous process for complex integrations.
1. COMPLEXITY ASSESSMENT SYSTEM
1.1 Three-Tier Classification
Location: auto-claude/spec/complexity.py:17-23
class Complexity(Enum):
"""Task complexity tiers that determine which phases to run."""
SIMPLE = "simple" # 1-2 files, single service, no integrations
STANDARD = "standard" # 3-10 files, 1-2 services, minimal integrations
COMPLEX = "complex" # 10+ files, multiple services, external integrations
Why This Matters: The enum defines the entire pipeline branching logic. Each tier maps to a specific phase sequence.
1.2 Dual Assessment Strategy
AI-Based Assessment (Primary): auto-claude/spec/complexity.py:344-436
The system runs an AI agent with the complexity_assessor.md prompt that analyzes:
async def run_ai_complexity_assessment(
spec_dir: Path,
task_description: str,
run_agent_fn,
) -> ComplexityAssessment | None:
"""Run AI agent to assess complexity. Returns None if it fails."""
Process:
- Loads
requirements.jsonwith full user context (line 365-380) - Loads
project_index.jsonif available (line 385-391) - Invokes
complexity_assessor.mdagent (line 394-397) - Parses AI output into structured
ComplexityAssessment(line 399-430) - Returns assessment with recommended phases (line 427)
Heuristic Fallback: auto-claude/spec/complexity.py:79-341
If AI assessment fails, falls back to keyword-based heuristics:
class ComplexityAnalyzer:
"""Analyzes task description and context to determine complexity."""
SIMPLE_KEYWORDS = [
"fix", "typo", "update", "change", "rename", ...
]
COMPLEX_KEYWORDS = [
"integrate", "integration", "api", "sdk", "docker",
"kubernetes", "deploy", "authentication", ...
]
Algorithm (lines 156-208):
- Keyword frequency analysis (lines 164-172)
- External integration detection via regex (lines 210-231)
- Infrastructure change detection (lines 233-252)
- File/service estimation (lines 254-293)
- Confidence scoring (lines 295-341)
1.3 Assessment Output Structure
File: complexity_assessment.json saved at spec_dir/complexity_assessment.json
{
"complexity": "simple|standard|complex",
"confidence": 0.75, # 0.0 to 1.0
"reasoning": "2-3 sentence explanation",
"signals": {
"simple_keywords": 3,
"complex_keywords": 0,
"estimated_files": 2,
"estimated_services": 1,
...
},
"recommended_phases": ["discovery", "quick_spec", "validation"],
"flags": {
"needs_research": false,
"needs_self_critique": false
}
}
Value Proposition: This structured output becomes the decision input for phase selection (line 47-76).
1.4 Dynamic Phase Selection
Location: auto-claude/spec/complexity.py:47-76
def phases_to_run(self) -> list[str]:
"""Return list of phase names to run based on complexity."""
# If AI provided recommended phases, use those
if self.recommended_phases:
return self.recommended_phases
# Otherwise fall back to default phase sets
if self.complexity == Complexity.SIMPLE:
return ["discovery", "historical_context", "quick_spec", "validation"]
elif self.complexity == Complexity.STANDARD:
# Standard can optionally include research if flagged
phases = ["discovery", "historical_context", "requirements"]
if self.needs_research:
phases.append("research")
phases.extend(["context", "spec_writing", "planning", "validation"])
return phases
else: # COMPLEX
return [
"discovery", "historical_context", "requirements", "research",
"context", "spec_writing", "self_critique", "planning", "validation",
]
Key Insight: The AI can override default phase sets by providing custom recommended_phases. This allows context-aware adaptation beyond the three-tier system.
CODITECT Enhancement Opportunity: This dynamic phase selection pattern could be adapted for workflow library tasks—assessing task complexity before selecting the appropriate skill/agent chain.
2. SPEC CREATION PIPELINE ORCHESTRATION
2.1 Orchestrator Architecture
Location: auto-claude/spec/pipeline/orchestrator.py:49-646
The SpecOrchestrator class coordinates the entire pipeline:
class SpecOrchestrator:
"""Orchestrates the spec creation process with dynamic complexity adaptation."""
def __init__(
self,
project_dir: Path,
task_description: str | None = None,
model: str = "claude-sonnet-4-5-20250929",
thinking_level: str = "medium",
complexity_override: str | None = None,
use_ai_assessment: bool = True,
):
Core Responsibilities:
- Spec directory management (lines 87-104)
- Agent runner coordination (lines 114-125)
- Phase execution orchestration (lines 218-409)
- Phase output compaction (lines 161-186)
- Human review checkpoint (lines 408-409, 613-645)
2.2 Phase Execution Loop
Location: auto-claude/spec/pipeline/orchestrator.py:259-399
async def run(self, interactive: bool = True, auto_approve: bool = False) -> bool:
"""Run the spec creation process with dynamic phase selection."""
# Fixed phases (always run first)
# 1. Discovery
result = await run_phase("discovery", phase_executor.phase_discovery)
await self._store_phase_summary("discovery")
# 2. Requirements
result = await run_phase("requirements",
lambda: phase_executor.phase_requirements(interactive))
await self._store_phase_summary("requirements")
# 3. Complexity Assessment
result = await run_phase("complexity_assessment",
lambda: self._phase_complexity_assessment_with_requirements())
# Dynamic phases based on complexity
all_phases_to_run = self.assessment.phases_to_run()
phases_to_run = [p for p in all_phases_to_run
if p not in ["discovery", "requirements"]]
for phase_name in phases_to_run:
result = await run_phase(phase_name, all_phases[phase_name])
if result.success:
await self._store_phase_summary(phase_name)
else:
return False # Stop on failure
# Human review checkpoint
return self._run_review_checkpoint(auto_approve)
Pattern: Fixed setup → Dynamic adaptation → Human gate
2.3 Phase Result Model
Location: auto-claude/spec/phases/models.py:11-19
@dataclass
class PhaseResult:
"""Result of a phase execution."""
phase: str
success: bool
output_files: list[str]
errors: list[str]
retries: int
Retry Strategy: Each phase can retry up to MAX_RETRIES = 3 times (line 23).
2.4 Phase Compaction Strategy
Location: auto-claude/spec/pipeline/orchestrator.py:161-186
async def _store_phase_summary(self, phase_name: str) -> None:
"""Summarize and store phase output for subsequent phases."""
try:
# Gather outputs from this phase
phase_output = gather_phase_outputs(self.spec_dir, phase_name)
if not phase_output:
return
# Summarize the output (target: 500 words)
summary = await summarize_phase_output(
phase_name,
phase_output,
model="claude-sonnet-4-5-20250929",
target_words=500,
)
if summary:
self._phase_summaries[phase_name] = summary
except Exception as e:
print_status(f"Phase summarization skipped: {e}", "warning")
Why This Matters: As phases accumulate, context size grows. Summarization prevents token overflow while preserving critical information for subsequent phases.
Usage: Summaries are injected into subsequent agent prompts (line 151-158):
# Format prior phase summaries for context
prior_summaries = format_phase_summaries(self._phase_summaries)
return await runner.run_agent(
prompt_file,
additional_context,
interactive,
thinking_budget=thinking_budget,
prior_phase_summaries=prior_summaries if prior_summaries else None,
)
3. SELF-CRITIQUE WITH ULTRATHINK
3.1 Critique System Architecture
Location: auto-claude/spec/critique.py:1-370
The self-critique system implements a mandatory quality gate before marking subtasks complete:
@dataclass
class CritiqueResult:
"""Result of a self-critique evaluation."""
passes: bool
issues: list[str] = field(default_factory=list)
improvements_made: list[str] = field(default_factory=list)
recommendations: list[str] = field(default_factory=list)
3.2 Critique Prompt Generation
Location: auto-claude/spec/critique.py:50-164
def generate_critique_prompt(
subtask: dict,
files_modified: list[str],
patterns_from: list[str]
) -> str:
"""Generate a critique prompt for the agent to self-evaluate."""
Generates a structured checklist covering:
-
Code Quality Checklist (lines 78-104)
- Pattern adherence
- Error handling
- Code cleanliness
- Best practices
-
Implementation Completeness (lines 106-122)
- All expected files modified
- All expected files created
- Requirements fully met
-
Potential Issues Analysis (lines 124-132)
- Agent must list concerns honestly
-
Improvements Made (lines 134-140)
- Agent must fix issues before continuing
-
Final Verdict (lines 143-148)
- PROCEED: YES/NO
- REASON: Brief explanation
- CONFIDENCE: High/Medium/Low
3.3 Critique Response Parser
Location: auto-claude/spec/critique.py:167-261
def parse_critique_response(response: str) -> CritiqueResult:
"""Parse the agent's critique response into structured data."""
# Extract PROCEED verdict
proceed_match = re.search(
r"\*\*PROCEED:\*\*\s*\[?\s*(YES|NO)", response, re.IGNORECASE
)
if proceed_match:
passes = proceed_match.group(1).upper() == "YES"
# Extract issues from Step 3
issues_section = re.search(
r"### STEP 3:.*?Potential Issues.*?\n\n(.*?)(?=###|\Z)",
response, re.DOTALL | re.IGNORECASE,
)
# ... parse and filter issues ...
# Extract improvements from Step 4
improvements_section = re.search(
r"### STEP 4:.*?Improvements Made.*?\n\n(.*?)(?=###|\Z)",
response, re.DOTALL | re.IGNORECASE,
)
# ... parse and filter improvements ...
Value: Structured parsing ensures critique results are machine-readable for decision logic.
3.4 Critique Decision Logic
Location: auto-claude/spec/critique.py:264-283
def should_proceed(result: CritiqueResult) -> bool:
"""Determine if the subtask should be marked complete based on critique."""
# Must pass the critique
if not result.passes:
return False
# If there are unresolved issues, don't proceed
if result.issues:
return False
return True
Rule: Subtask can ONLY proceed if:
- Agent explicitly said
PROCEED: YES - No unresolved issues remain
3.5 Spec-Level Self-Critique with Ultrathink
Prompt: auto-claude/prompts/spec_critic.md
Phase: Self-critique phase (runs only for COMPLEX tasks)
Key Instructions (lines 40-122):
## PHASE 1: DEEP ANALYSIS (USE EXTENDED THINKING)
**CRITICAL**: Use extended thinking for this phase. Think deeply about:
### 1.1: Technical Accuracy
Compare spec.md against research.json AND validate with Context7:
- **Package names**: Does spec use correct package names from research?
- **Import statements**: Do imports match researched API patterns?
- **API calls**: Do function signatures match documentation?
- **Configuration**: Are env vars and config options correct?
**USE CONTEXT7 TO VALIDATE TECHNICAL CLAIMS:**
If the spec mentions specific libraries or APIs, verify them against Context7:
Tool: mcp__context7__resolve-library-id
Tool: mcp__context7__get-library-docs
Process (lines 123-253):
- Load all context (spec.md, research.json, requirements.json, context.json)
- Deep analysis with extended thinking on 5 dimensions:
- Technical accuracy (package names, APIs, config)
- Completeness (requirements coverage, edge cases)
- Consistency (naming, paths, patterns)
- Feasibility (dependencies, infrastructure, order)
- Research alignment (verified info, gotchas, recommendations)
- Catalog issues with severity (HIGH/MEDIUM/LOW) and categories
- Fix issues directly in spec.md using edit commands
- Create critique report (
critique_report.json) - Verify fixes (validate markdown, check sections)
Extended Thinking Example (lines 300-318):
> "Looking at this spec.md, I need to deeply analyze it against the research findings...
>
> First, let me check all package names. The research says the package is [X],
> but the spec says [Y]. This is a mismatch that needs fixing.
>
> Let me also verify with Context7 - I'll look up the actual package name and
> API patterns to confirm...
> [Use mcp__context7__resolve-library-id to find the library]
> [Use mcp__context7__get-library-docs to check API patterns]
>
> Next, looking at the API patterns. The research shows initialization requires
> [steps], but the spec shows [different steps]. Let me cross-reference with
> Context7 documentation... Another issue confirmed.
> ...
CODITECT Enhancement Opportunity: This ultrathink self-critique pattern could be integrated as a quality gate before executing high-risk workflow steps.
4. SUBTASK-BASED IMPLEMENTATION PLANNING
4.1 Implementation Plan Model
Location: auto-claude/implementation_plan/plan.py:20-373
@dataclass
class ImplementationPlan:
"""Complete implementation plan for a feature/task."""
feature: str
workflow_type: WorkflowType = WorkflowType.FEATURE
services_involved: list[str] = field(default_factory=list)
phases: list[Phase] = field(default_factory=list)
final_acceptance: list[str] = field(default_factory=list)
# Status tracking (synced with UI)
status: str | None = None # backlog, in_progress, ai_review, human_review, done
planStatus: str | None = None # pending, in_progress, review, completed
qa_signoff: dict | None = None
Key Methods:
- Dependency Resolution (lines 178-189):
def get_available_phases(self) -> list[Phase]:
"""Get phases whose dependencies are satisfied."""
completed_phases = {p.phase for p in self.phases if p.is_complete()}
available = []
for phase in self.phases:
if phase.is_complete():
continue
deps_met = all(d in completed_phases for d in phase.depends_on)
if deps_met:
available.append(phase)
return available
- Next Subtask Selection (lines 192-198):
def get_next_subtask(self) -> tuple[Phase, Subtask] | None:
"""Get the next subtask to work on, respecting dependencies."""
for phase in self.get_available_phases():
pending = phase.get_pending_subtasks()
if pending:
return phase, pending[0]
return None
- Progress Tracking (lines 200-228):
def get_progress(self) -> dict:
"""Get overall progress statistics."""
total_subtasks = sum(len(p.subtasks) for p in self.phases)
done_subtasks = sum(
1 for p in self.phases for s in p.subtasks
if s.status == SubtaskStatus.COMPLETED
)
# ... calculate percentages, failed counts, etc.
- Follow-Up Phase Addition (lines 259-316):
def add_followup_phase(
self,
name: str,
subtasks: list[Subtask],
phase_type: PhaseType = PhaseType.IMPLEMENTATION,
parallel_safe: bool = False,
) -> Phase:
"""Add a new follow-up phase to an existing (typically completed) plan.
This allows users to extend completed builds with additional work.
The new phase depends on all existing phases to ensure proper sequencing.
"""
Value: Enables iterative enhancement of completed features without rebuilding plans from scratch.
4.2 Subtask Model
Location: auto-claude/implementation_plan/subtask.py:17-133
@dataclass
class Subtask:
"""A single unit of implementation work."""
id: str
description: str
status: SubtaskStatus = SubtaskStatus.PENDING
# Scoping
service: str | None = None # Which service (backend, frontend, worker)
all_services: bool = False # True for integration subtasks
# Files
files_to_modify: list[str] = field(default_factory=list)
files_to_create: list[str] = field(default_factory=list)
patterns_from: list[str] = field(default_factory=list)
# Verification
verification: Verification | None = None
# For investigation subtasks
expected_output: str | None = None # Knowledge/decision output
actual_output: str | None = None # What was discovered
# Self-Critique
critique_result: dict | None = None # Results from self-critique before completion
Key Methods:
- Start Tracking (lines 107-114):
def start(self, session_id: int):
"""Mark subtask as in progress."""
self.status = SubtaskStatus.IN_PROGRESS
self.started_at = datetime.now().isoformat()
self.session_id = session_id
# Clear stale data from previous runs to ensure clean state
self.completed_at = None
self.actual_output = None
- Completion (lines 116-121):
def complete(self, output: str | None = None):
"""Mark subtask as done."""
self.status = SubtaskStatus.COMPLETED
self.completed_at = datetime.now().isoformat()
if output:
self.actual_output = output
- Failure (lines 123-128):
def fail(self, reason: str | None = None):
"""Mark subtask as failed."""
self.status = SubtaskStatus.FAILED
self.completed_at = None # Clear to maintain consistency (failed != completed)
if reason:
self.actual_output = f"FAILED: {reason}"
4.3 Verification Strategy Model
Location: auto-claude/implementation_plan/verification.py:14-53
@dataclass
class Verification:
"""How to verify a subtask is complete."""
type: VerificationType
run: str | None = None # Command to run
url: str | None = None # URL for API/browser tests
method: str | None = None # HTTP method for API tests
expect_status: int | None = None # Expected HTTP status
expect_contains: str | None = None # Expected content
scenario: str | None = None # Description for browser/manual tests
Verification Types:
class VerificationType(Enum):
"""Types of verification methods."""
COMMAND = "command" # Run shell command, check exit code
API = "api" # HTTP request, check status/content
BROWSER = "browser" # Manual browser test with scenario
MANUAL = "manual" # Human verification with instructions
NONE = "none" # No automated verification
Why This Matters: Each subtask has a built-in verification strategy, enabling automated QA checkpoints throughout implementation.
5. PLANNER AGENT PROMPT ARCHITECTURE
5.1 Phase 0: Deep Codebase Investigation (MANDATORY)
Location: auto-claude/prompts/planner.md:21-67
## PHASE 0: DEEP CODEBASE INVESTIGATION (MANDATORY)
**CRITICAL**: Before ANY planning, you MUST thoroughly investigate the existing
codebase. Poor investigation leads to plans that don't match the codebase's
actual patterns.
### 0.1: Understand Project Structure
find . -type f -name "*.py" -o -name "*.ts" -o -name "*.tsx" -o -name "*.js" | head -100
Identify:
- Main entry points (main.py, app.py, index.ts, etc.)
- Configuration files (settings.py, config.py, .env.example)
- Directory organization patterns
### 0.2: Analyze Existing Patterns for the Feature
**This is the most important step.** For whatever feature you're building,
find SIMILAR existing features:
# Example: If building "caching", search for existing cache implementations
grep -r "cache" --include="*.py" . | head -30
grep -r "redis\|memcache\|lru_cache" --include="*.py" . | head -30
**YOU MUST READ AT LEAST 3 PATTERN FILES** before planning
Pattern: Investigation → Pattern discovery → Plan creation
Why This Matters: Forces the agent to ground its plan in actual codebase patterns, preventing "generic" plans that don't match project conventions.
5.2 Context File Creation
Location: auto-claude/prompts/planner.md:89-157
The planner creates two critical context files:
- project_index.json (lines 93-119):
{
"project_type": "single|monorepo",
"services": {
"backend": {
"path": ".",
"tech_stack": ["python", "fastapi"],
"port": 8000,
"dev_command": "uvicorn main:app --reload",
"test_command": "pytest"
}
},
"infrastructure": {
"docker": false,
"database": "postgresql"
}
}
- context.json (lines 130-157):
{
"files_to_modify": {
"backend": ["app/services/existing_service.py", "app/routes/api.py"]
},
"files_to_reference": ["app/services/similar_service.py"],
"patterns": {
"service_pattern": "All services inherit from BaseService",
"route_pattern": "Routes use APIRouter with prefix and tags"
},
"existing_implementations": {
"description": "Found existing caching in app/utils/cache.py using Redis",
"relevant_files": ["app/utils/cache.py", "app/config.py"]
}
}
Value: These files become inputs for subsequent phases, ensuring consistency across the pipeline.
5.3 Workflow Type Awareness
Location: auto-claude/prompts/planner.md:161-200
The planner prompt defines 5 workflow types with distinct phase structures:
-
FEATURE (multi-service):
- Phase 1: Backend/API (testable with curl)
- Phase 2: Worker (background jobs)
- Phase 3: Frontend (UI components)
- Phase 4: Integration (wire everything)
-
REFACTOR (stage-based):
- Phase 1: Add New (build alongside old)
- Phase 2: Migrate (move consumers)
- Phase 3: Remove Old (delete deprecated)
- Phase 4: Cleanup (polish)
-
INVESTIGATION (bug hunting):
- Phase 1: Reproduce (logging, reproduction)
- Phase 2: Investigate (root cause → OUTPUT REQUIRED)
- Phase 3: Fix (BLOCKED until phase 2)
- Phase 4: Harden (tests, prevention)
-
MIGRATION (data pipeline):
- Phase 1: Prepare (scripts, setup)
- Phase 2: Test (small batch)
- Phase 3: Execute (full migration)
- Phase 4: Cleanup (verify, remove old)
-
SIMPLE (single-service quick):
- No phases, just subtasks
Pattern: Workflow type determines phase structure and dependency graph.
5.4 Implementation Plan Structure
Location: auto-claude/prompts/planner.md:215-264
{
"feature": "Short descriptive name",
"workflow_type": "feature|refactor|investigation|migration|simple",
"workflow_rationale": "Why this workflow type was chosen",
"phases": [
{
"id": "phase-1-backend",
"name": "Backend API",
"type": "implementation",
"depends_on": [],
"parallel_safe": true,
"subtasks": [
{
"id": "subtask-1-1",
"description": "Create data models for [feature]",
"service": "backend",
"files_to_modify": ["src/models/user.py"],
"files_to_create": ["src/models/analytics.py"],
"patterns_from": ["src/models/existing_model.py"],
"verification": {
"type": "command",
"command": "python -c \"from src.models.analytics import Analytics\"",
"expected": "OK"
},
"status": "pending"
}
]
}
]
}
Critical Requirements (lines 205-214):
- MUST use Write tool to create
implementation_plan.json - Plan structure matches
ImplementationPlandataclass - Each subtask includes verification strategy
- Phase dependencies explicitly defined
6. COMPLEXITY ASSESSOR PROMPT ARCHITECTURE
6.1 Assessment Criteria
Location: auto-claude/prompts/complexity_assessor.md:103-132
The prompt defines 5 analysis dimensions:
-
Scope Analysis (lines 107-110):
- How many files will likely be touched?
- How many services are involved?
- Is this localized or cross-cutting?
-
Integration Analysis (lines 112-116):
- External services/APIs involved?
- New dependencies needed?
- Research required for correct usage?
-
Infrastructure Analysis (lines 118-122):
- Docker/container changes?
- Database schema changes?
- Environment configuration changes?
- Deployment considerations?
-
Knowledge Analysis (lines 124-127):
- Codebase has patterns for this?
- External docs research needed?
- Unfamiliar technologies involved?
-
Risk Analysis (lines 129-132):
- What could go wrong?
- Security considerations?
- Breaking existing functionality?
6.2 Decision Flowchart
Location: auto-claude/prompts/complexity_assessor.md:394-418
START
│
├─► Are there 2+ external integrations OR unfamiliar technologies?
│ YES → COMPLEX (needs research + critique)
│ NO ↓
│
├─► Are there infrastructure changes (Docker, DB, new services)?
│ YES → COMPLEX (needs research + critique)
│ NO ↓
│
├─► Is there 1 external integration that needs research?
│ YES → STANDARD + research phase
│ NO ↓
│
├─► Will this touch 3+ files across 1-2 services?
│ YES → STANDARD
│ NO ↓
│
└─► SIMPLE (1-2 files, single service, no integrations)
Pattern: Progressive elimination from high to low complexity.
6.3 Validation Recommendations
Location: auto-claude/prompts/complexity_assessor.md:261-391
The assessor also generates validation depth recommendations:
"validation_recommendations": {
"risk_level": "trivial|low|medium|high|critical",
"skip_validation": false,
"minimal_mode": false,
"test_types_required": ["unit", "integration", "e2e"],
"security_scan_required": true,
"staging_deployment_required": false,
"reasoning": "1-2 sentences explaining validation depth choice"
}
Risk Levels:
| Risk | When | Validation Depth |
|---|---|---|
| TRIVIAL | Docs-only, comments | Skip validation |
| LOW | Single service, <5 files | Unit tests only |
| MEDIUM | Multiple files, API changes | Unit + Integration |
| HIGH | DB changes, auth/security | Unit + Integration + E2E + Security scan |
| CRITICAL | Payments, data deletion | All above + Manual + Staging |
Value: Tailors QA rigor to actual risk, avoiding over-testing simple changes and under-testing critical ones.
6.4 Example Assessments
Simple Task (lines 423-467):
{
"complexity": "simple",
"workflow_type": "simple",
"reasoning": "Single file UI change with no dependencies or infrastructure impact.",
"recommended_phases": ["discovery", "quick_spec", "validation"],
"validation_recommendations": {
"risk_level": "low",
"minimal_mode": true,
"test_types_required": ["unit"]
}
}
Complex Integration (lines 589-647):
{
"complexity": "complex",
"workflow_type": "feature",
"reasoning": "Multiple integrations (Graphiti, FalkorDB), infrastructure changes
(Docker Compose), and new architectural pattern (optional memory layer)",
"recommended_phases": [
"discovery", "requirements", "research", "context",
"spec_writing", "self_critique", "planning", "validation"
],
"flags": {
"needs_research": true,
"needs_self_critique": true,
"needs_infrastructure_setup": true
},
"validation_recommendations": {
"risk_level": "high",
"test_types_required": ["unit", "integration", "e2e"],
"security_scan_required": true,
"staging_deployment_required": true
}
}
7. SPEC CRITIC PROMPT ARCHITECTURE
7.1 Critique Categories
Location: auto-claude/prompts/spec_critic.md:288-294
- **Accuracy**: Technical correctness (packages, APIs, config)
- **Completeness**: Coverage of requirements and edge cases
- **Consistency**: Internal coherence of the document
- **Feasibility**: Practical implementability
- **Alignment**: Match with research findings
7.2 Severity Guidelines
Location: auto-claude/prompts/spec_critic.md:266-284
HIGH - Will cause implementation failure:
- Wrong package names
- Incorrect API signatures
- Missing critical requirements
- Invalid configuration
MEDIUM - May cause issues:
- Missing edge cases
- Incomplete error handling
- Unclear integration points
- Inconsistent patterns
LOW - Minor improvements:
- Terminology inconsistencies
- Documentation gaps
- Style issues
- Minor optimizations
7.3 Context7 Integration for Validation
Location: auto-claude/prompts/spec_critic.md:55-77
**USE CONTEXT7 TO VALIDATE TECHNICAL CLAIMS:**
If the spec mentions specific libraries or APIs, verify them against Context7:
# Step 1: Resolve library ID
Tool: mcp__context7__resolve-library-id
Input: { "libraryName": "[library from spec]" }
# Step 2: Verify API patterns mentioned in spec
Tool: mcp__context7__get-library-docs
Input: {
"context7CompatibleLibraryID": "[library-id]",
"topic": "[specific API or feature mentioned in spec]",
"mode": "code"
}
**Check for common spec errors:**
- Wrong package name (e.g., "react-query" vs "@tanstack/react-query")
- Outdated API patterns (e.g., using deprecated functions)
- Incorrect function signatures (e.g., wrong parameter order)
- Missing required configuration (e.g., missing env vars)
Pattern: External documentation lookup for technical validation beyond what's in the research phase.
8. KEY PATTERNS FOR CODITECT INTEGRATION
8.1 Pattern: Adaptive Pipeline Selection
What: System assesses task characteristics BEFORE selecting workflow
Implementation:
- Analyze task with AI agent (
complexity_assessor.md) - Generate structured assessment (
complexity_assessment.json) - Map assessment to phase list (
phases_to_run()) - Execute only necessary phases
CODITECT Application:
- Add complexity assessment to
/new-projectworkflow - Simple projects → streamlined agent chain
- Complex projects → full multi-agent orchestration
- Save assessment for future reference
Files to Reference:
auto-claude/spec/complexity.py- Assessment logicauto-claude/prompts/complexity_assessor.md- AI assessment prompt
8.2 Pattern: Self-Critique Quality Gates
What: Agent must self-review before marking work complete
Implementation:
- Generate critique prompt with checklist (
generate_critique_prompt()) - Agent evaluates own work against criteria
- Parse response for issues (
parse_critique_response()) - Block completion if issues exist (
should_proceed())
CODITECT Application:
- Add critique step to high-stakes skills (infrastructure, security)
- Generate checklists based on skill requirements
- Parse and store critique results for audit trail
Files to Reference:
auto-claude/spec/critique.py- Critique systemauto-claude/prompts/spec_critic.md- Ultrathink critique prompt
8.3 Pattern: Dependency-Aware Subtask Execution
What: Automatically determine which tasks can run based on completion state
Implementation:
- Define phase dependencies (
depends_onfield) - Track completion state (
statusfield) - Query available phases (
get_available_phases()) - Select next subtask (
get_next_subtask())
CODITECT Application:
- Implement dependency DAG for complex workflows
- Track workflow progress with subtask granularity
- Enable parallel execution where safe
- Auto-resume workflows from last completed step
Files to Reference:
auto-claude/implementation_plan/plan.py- Dependency resolutionauto-claude/implementation_plan/subtask.py- Status tracking
8.4 Pattern: Verification Strategy Per Task
What: Each unit of work defines how to verify its completion
Implementation:
- Define verification type (command, API, browser, manual)
- Specify verification parameters (URL, expected output, etc.)
- Execute verification after task completion
- Block progress if verification fails
CODITECT Application:
- Add verification fields to workflow subtasks
- Auto-generate verification commands based on task type
- Create verification report for audit/compliance
Files to Reference:
auto-claude/implementation_plan/verification.py- Verification modelauto-claude/prompts/planner.md:215-264- Verification in plans
8.5 Pattern: Phase Output Compaction
What: Summarize completed phases to prevent context overflow
Implementation:
- Gather outputs from completed phase (
gather_phase_outputs()) - Summarize with AI to target word count (500 words)
- Store summary in phase dictionary (
_phase_summaries) - Inject summaries into subsequent agent prompts
CODITECT Application:
- Summarize long-running agent outputs
- Provide compact context for downstream agents
- Maintain critical information while reducing token usage
Files to Reference:
auto-claude/spec/pipeline/orchestrator.py:161-186- Compaction logicauto-claude/spec/compaction.py- Summarization functions
8.6 Pattern: Workflow Type Awareness
What: Different types of work (feature, refactor, investigation) follow different patterns
Implementation:
- Define workflow types (FEATURE, REFACTOR, INVESTIGATION, MIGRATION, SIMPLE)
- Map each type to appropriate phase structure
- Use workflow type to determine phase dependencies
- Tailor subtask templates to workflow
CODITECT Application:
- Extend workflow library with workflow type metadata
- Auto-select appropriate agent sequence based on type
- Customize verification strategies per workflow type
Files to Reference:
auto-claude/prompts/planner.md:161-200- Workflow type definitionsauto-claude/implementation_plan/plan.py:25- WorkflowType enum
8.7 Pattern: Follow-Up Phase Addition
What: Extend completed plans with additional work without rebuilding
Implementation:
- Load completed plan
- Create new phase with subtasks
- Set new phase to depend on all existing phases
- Reset plan status to
in_progress - Save updated plan
CODITECT Application:
- Enable iterative feature enhancement
- Add follow-up tasks after initial project completion
- Maintain dependency chain across iterations
Files to Reference:
auto-claude/implementation_plan/plan.py:259-316-add_followup_phase()auto-claude/implementation_plan/plan.py:318-372-reset_for_followup()
8.8 Pattern: Heuristic + AI Hybrid Assessment
What: Use fast heuristics as fallback when AI assessment fails
Implementation:
- Attempt AI assessment first
- If AI fails or returns None, fall back to heuristics
- Both methods return same
ComplexityAssessmentstructure - Continue pipeline regardless of assessment method
CODITECT Application:
- Implement graceful degradation for agent failures
- Use heuristics for offline/low-bandwidth scenarios
- A/B test AI vs heuristic accuracy
Files to Reference:
auto-claude/spec/complexity.py:344-436- AI assessmentauto-claude/spec/complexity.py:79-341- Heuristic analyzer
9. ARCHITECTURAL INSIGHTS
9.1 Separation of Concerns
Orchestrator (coordination) vs PhaseExecutor (execution):
SpecOrchestrator
├── Determines which phases to run
├── Manages phase summaries (compaction)
├── Handles human review checkpoint
└── Coordinates AgentRunner
PhaseExecutor (composed of 4 mixins)
├── DiscoveryPhaseMixin - Discovery, context gathering
├── RequirementsPhaseMixin - Requirements, research, historical context
├── SpecPhaseMixin - Spec writing, self-critique
└── PlanningPhaseMixin - Implementation planning, validation
Value: Clean separation enables independent evolution of orchestration logic vs phase implementation.
9.2 Mixin-Based Phase Execution
Location: auto-claude/spec/phases/executor.py:19-33
class PhaseExecutor(
DiscoveryPhaseMixin,
RequirementsPhaseMixin,
SpecPhaseMixin,
PlanningPhaseMixin,
):
"""Executes individual phases of spec creation."""
Pattern: Each mixin implements related phases, all composed into single executor.
Benefits:
- Phase implementations grouped by concern
- Easy to add new phase types
- Shared state via base class
- Single entry point for orchestrator
CODITECT Application: Similar mixin pattern could organize CODITECT skills by domain (data_engineering, ml_ops, infrastructure).
9.3 File-Based Agent Communication
Pattern: Agents communicate via JSON files in spec directory, not in-memory state.
Inputs → Phase → Outputs:
requirements.json → Complexity Assessment → complexity_assessment.json
requirements.json + project_index.json → Research → research.json
research.json + requirements.json → Spec Writing → spec.md
spec.md + research.json → Self-Critique → critique_report.json + spec.md (updated)
spec.md + context.json → Planning → implementation_plan.json
Benefits:
- Inspectable intermediate results
- Easy debugging (read files directly)
- Resumable pipelines (restart from any phase)
- Persistence across sessions
CODITECT Application: Adopt file-based agent communication for workflow library to enable inspection and recovery.
9.4 Retry with Backoff
Location: auto-claude/spec/phases/models.py:22-23
# Maximum retry attempts for phase execution
MAX_RETRIES = 3
Each phase can retry up to 3 times. Failures accumulate in PhaseResult.errors list.
Pattern: Transient failures (API timeouts, temporary issues) don't fail the entire pipeline.
CODITECT Application: Add retry logic to network-dependent skills (API calls, database queries).
9.5 Human Review Checkpoint
Location: auto-claude/spec/pipeline/orchestrator.py:613-645
After all automated phases complete, a mandatory human review runs (unless --auto-approve):
def _run_review_checkpoint(self, auto_approve: bool) -> bool:
"""Run the human review checkpoint."""
review_state = run_review_checkpoint(
spec_dir=self.spec_dir,
auto_approve=auto_approve,
)
if not review_state.is_approved():
print_status("Build will not proceed without approval.", "warning")
return False
return True
Pattern: Human-in-the-loop gate before high-cost implementation phase.
CODITECT Application: Add human approval gates before:
- Database migrations
- Production deployments
- High-cost operations (large ML training jobs)
10. PROMPT ENGINEERING INSIGHTS
10.1 Structured Output Requirements
All agent prompts enforce structured output via explicit file creation instructions:
Example: complexity_assessor.md (lines 190-257)
Create `complexity_assessment.json`:
cat > complexity_assessment.json << 'EOF'
{
"complexity": "[simple|standard|complex]",
"confidence": [0.0-1.0],
...
}
EOF
Pattern: Template + Instructions + Example = Consistent structured output
10.2 Mandatory Investigation Phase
Location: planner.md (lines 21-67)
## PHASE 0: DEEP CODEBASE INVESTIGATION (MANDATORY)
**CRITICAL**: Before ANY planning, you MUST thoroughly investigate the existing
codebase. Poor investigation leads to plans that don't match the codebase's
actual patterns.
**YOU MUST READ AT LEAST 3 PATTERN FILES** before planning
Pattern: Force information gathering before generation to ground outputs in reality.
CODITECT Application: Require codebase analysis before generating new code in workflows.
10.3 Checklist-Driven Critique
Location: critique.py (lines 78-104)
Critique prompt includes explicit checklists:
### STEP 1: Code Quality Checklist
**Pattern Adherence:**
- [ ] Follows patterns from reference files exactly
- [ ] Variable naming matches codebase conventions
- [ ] Imports organized correctly
...
Pattern: Checkboxes create psychological commitment to thorough review.
CODITECT Application: Use checklist prompts for quality gates in critical workflows.
10.4 Extended Thinking for Complex Analysis
Location: spec_critic.md (lines 40-77, 299-318)
## PHASE 1: DEEP ANALYSIS (USE EXTENDED THINKING)
**CRITICAL**: Use extended thinking for this phase.
Pattern: Explicitly request extended thinking for high-stakes analysis phases.
CODITECT Application: Request extended thinking for:
- Architecture decisions
- Security reviews
- Performance optimization analysis
10.5 Context7 Integration for Technical Validation
Location: spec_critic.md (lines 55-77)
Critic prompt uses external documentation lookup:
Tool: mcp__context7__resolve-library-id
Tool: mcp__context7__get-library-docs
Pattern: Combine internal knowledge with external authoritative sources.
CODITECT Application: Integrate Context7 or similar documentation tools for technical skill validation.
11. QUANTITATIVE METRICS
11.1 Phase Count by Complexity
| Complexity | Phase Count | Phases |
|---|---|---|
| SIMPLE | 3-4 | discovery, [historical_context], quick_spec, validation |
| STANDARD | 6-7 | discovery, [historical_context], requirements, [research], context, spec_writing, planning, validation |
| COMPLEX | 8-9 | discovery, [historical_context], requirements, research, context, spec_writing, self_critique, planning, validation |
Note: historical_context is optional based on Graphiti availability.
11.2 Assessment Criteria Count
Complexity Analyzer:
- 25 SIMPLE_KEYWORDS
- 16 COMPLEX_KEYWORDS
- 11 MULTI_SERVICE_KEYWORDS
- 11 integration regex patterns
- 11 infrastructure regex patterns
AI Assessment Dimensions:
- 5 analysis categories (scope, integrations, infrastructure, knowledge, risk)
- 5 risk levels (trivial, low, medium, high, critical)
11.3 Critique Checklist Items
Self-Critique Prompt:
- 4 pattern adherence checks
- 3 error handling checks
- 4 code cleanliness checks
- 4 best practices checks
- 3 implementation completeness checks
Total: 18 explicit quality checks per subtask
11.4 Verification Types
- 5 verification types (command, api, browser, manual, none)
- 6 verification parameters (run, url, method, expect_status, expect_contains, scenario)
12. FILE REFERENCES SUMMARY
Core Pipeline Files
| File | Lines | Purpose |
|---|---|---|
spec/complexity.py | 467 | Complexity assessment (AI + heuristic) |
spec/critique.py | 370 | Self-critique system |
spec/discovery.py | 78 | Project structure discovery |
spec/pipeline/orchestrator.py | 687 | Main pipeline orchestration |
spec/pipeline/models.py | 264 | Pipeline data structures |
spec/phases/executor.py | 77 | Phase execution coordinator |
spec/phases/models.py | 24 | Phase result model |
Implementation Plan Files
| File | Lines | Purpose |
|---|---|---|
implementation_plan/plan.py | 373 | Plan model with dependency resolution |
implementation_plan/subtask.py | 133 | Subtask tracking and status |
implementation_plan/verification.py | 54 | Verification strategy model |
Prompt Files
| File | Lines | Purpose |
|---|---|---|
prompts/complexity_assessor.md | 676 | AI complexity assessment |
prompts/spec_critic.md | 325 | Spec self-critique with ultrathink |
prompts/planner.md | 300+ | Implementation plan generation |
Runner Files
| File | Lines | Purpose |
|---|---|---|
runners/spec_runner.py | 200+ | CLI entry point for spec creation |
13. RECOMMENDATIONS FOR CODITECT INTEGRATION
13.1 High-Priority Patterns
-
Adaptive Pipeline Selection (Priority: HIGH)
- Add complexity assessment to
/new-project - Map complexity to agent chain selection
- File:
spec/complexity.py+prompts/complexity_assessor.md
- Add complexity assessment to
-
Self-Critique Quality Gates (Priority: HIGH)
- Add critique step to security/infrastructure skills
- Generate skill-specific checklists
- File:
spec/critique.py
-
Subtask-Based Execution (Priority: MEDIUM)
- Implement dependency DAG for complex workflows
- Enable parallel subtask execution where safe
- File:
implementation_plan/plan.py
-
Verification Strategies (Priority: MEDIUM)
- Add verification metadata to workflow subtasks
- Auto-generate verification commands
- File:
implementation_plan/verification.py
13.2 Medium-Priority Patterns
-
Phase Output Compaction (Priority: MEDIUM)
- Summarize long agent outputs
- Inject summaries into downstream agents
- File:
spec/compaction.py(referenced, not analyzed in detail)
-
Workflow Type Awareness (Priority: MEDIUM)
- Add workflow type metadata to workflow library
- Customize agent sequences per type
- File:
prompts/planner.md(workflow types section)
-
Human Review Checkpoints (Priority: LOW)
- Add approval gates before high-risk operations
- File:
spec/pipeline/orchestrator.py:613-645
13.3 Implementation Guidance
For Complexity Assessment:
- Create CODITECT skill:
/assess-task-complexity - Adapt
complexity_assessor.mdprompt for CODITECT context - Store assessment in
.coditect/task-assessment.json - Use assessment to select agent chain from workflow library
For Self-Critique:
- Create
critique-skill-executionhelper - Generate checklists based on skill requirements
- Parse critique responses with regex (adapt from
critique.py) - Block skill completion if critique fails
For Subtask Execution:
- Extend workflow library format with subtask dependencies
- Implement
get_next_subtask()logic for workflow execution - Track subtask status in
.coditect/workflow-progress.json - Enable resume from last completed subtask
14. CONCLUSION
The Auto-Claude SPEC PIPELINE demonstrates adaptive orchestration at scale. By assessing task complexity FIRST and then dynamically constructing the appropriate workflow, it achieves both efficiency (3-phase pipeline for simple tasks) and rigor (8-phase pipeline with self-critique for complex integrations).
Key Takeaways for CODITECT:
-
Assessment Before Execution: Don't assume one workflow fits all—assess first, adapt second.
-
Self-Critique as Quality Gate: Agents can and should evaluate their own work before marking tasks complete.
-
Dependency-Aware Execution: Track dependencies explicitly to enable safe parallelism and automatic unblocking.
-
Verification Per Task: Each unit of work should define how to verify its completion.
-
File-Based Communication: Intermediate outputs as files enable inspection, debugging, and resumption.
-
Workflow Type Taxonomy: Different types of work (feature, refactor, investigation) follow different patterns—codify these patterns.
-
Human-in-the-Loop Gates: Automate aggressively, but gate high-cost or high-risk operations with human approval.
Next Steps:
- Pilot complexity assessment for
/new-projectworkflow - Add self-critique to infrastructure and security skills
- Extend workflow library with dependency metadata
- Implement verification command generation
Contact: This analysis prepared for CODITECT workflow library enhancement.