Article Analysis: "I Built the Same App 9 Times"

Implications for Coditect Autonomous Development Platform

Source: Video transcript - AI-assisted development experiment
Analysis Date: January 2026
Relevance: CRITICAL - Validates core Coditect architecture decisions

Executive Summary

This experiment provides empirical evidence for architectural decisions embedded in the Coditect PRD Standards. The key finding—that intent preservation, not model intelligence, determines build quality—directly validates Coditect's Intent Understanding Framework and reveals a critical gap in current AI development tools that Coditect can exploit as competitive differentiation.

1. Article Summary

Experiment Design

Parameter	Value
Application	"Spotlight for words" - hotkey-triggered dictionary/etymology/synonym panel
Input	15-minute voice memo describing desired features ("Intent Document")
PRD Generation	8 different AI models with identical instructions
Build System	Claude Code with Opus 4.5 in planning mode
Control Variable	Only the PRD-generating model varied

Models Tested

Model	Type	PRD Generation Time	Notable Result
GPT-5.2 Instant	Fast/cheap	~15 seconds	Scored highest (paradoxically)
GPT-5.2 Fast	Standard	Quick	Mid-range
GPT-5.2 Thinking	Reasoning	Moderate	Showed inline diff for misspellings
GPT-5.2 Pro	Premium	~15 minutes	Missed basic features (misspelling → "hours" not "horse")
Opus 4.5	Anthropic flagship	Moderate	Strong but not exceptional
Sonnet	Anthropic mid-tier	Quick	Adequate
Direct to Planning	No PRD step	N/A	Surprisingly good
Direct to Execution	No PRD, no plan	N/A	One of the best-looking outputs

Hypothesis vs Reality

HYPOTHESIS (Expected):
┌─────────────────────────────────────────────────────────────┐
│  Smarter Model → Better PRD → Better Build                  │
│                                                             │
│  GPT-5.2 Pro >> GPT-5.2 Instant                            │
│                                                             │
│  Clear correlation: model intelligence ∝ output quality     │
└─────────────────────────────────────────────────────────────┘

REALITY (Observed):
┌─────────────────────────────────────────────────────────────┐
│  Model intelligence ≠ Build quality                         │
│                                                             │
│  All builds scored in the 80s (within noise margin)         │
│                                                             │
│  GPT-5.2 Instant OUTSCORED GPT-5.2 Pro                     │
│                                                             │
│  "I could not tell them apart"                              │
└─────────────────────────────────────────────────────────────┘

2. Key Findings

Finding 1: Planning Mode is a "Powerful Planning Filter"

The Claude Code planning step normalizes all inputs, smoothing out differences between PRDs regardless of quality. This has dual implications:

Positive: Users don't need expensive models for PRD generation
Negative: Planning mode also drops requirements silently

"The planning step was dropping 20 to 30% of my requirements. Not because they weren't clear, not because they were unreasonable, they were just lost."

Finding 2: Intent Preservation is the Critical Variable

When the author added explicit instructions to "carry intent through" and "explain why each one matters," results diverged dramatically:

Model	Score WITHOUT intent instruction	Score WITH intent instruction
GPT-5.2 Thinking	Mid-80s	Mid-80s (no change)
Opus 4.5	Mid-80s	99%

12-point gap from next closest model when intent was explicitly preserved.

Finding 3: Silent Losses at Each Handoff

The "gap between your best idea and what gets built" occurs at multiple stages:

┌─────────────┐    LOSS #1     ┌─────────────┐    LOSS #2     ┌─────────────┐    LOSS #3     ┌─────────────┐
│   INTENT    │──────────────▶│     PRD     │──────────────▶│    PLAN     │──────────────▶│    BUILD    │
│  DOCUMENT   │  Intent lost   │             │  Requirements │             │  Features     │             │
│             │  in PRD        │             │  dropped in   │             │  missing in   │             │
│             │  translation   │             │  planning     │             │  execution    │             │
└─────────────┘                └─────────────┘                └─────────────┘                └─────────────┘

Finding 4: Verification Loop Recovers Lost Requirements

Planning Pass	Missing Items	Recovery Rate
First pass	18 items	Baseline
Second pass	8 items	56% recovery
Third pass	Only ambiguous items	~90%+ recovery

Pattern discovered:

Generate plan
Compare plan against PRD
Find everything missing → create fallout list
Update plan to include missing items
Repeat until stable

3. Categorization of Insights

Category A: Architecture Validation

Insight	Coditect Component Validated	Evidence
Intent preservation is critical	Intent Understanding Framework	12-point score gap with intent instructions
Requirements get dropped silently	Disambiguation Framework	20-30% loss in planning mode
Verification loops recover losses	Agent Interpretation Guide	3-pass recovery pattern
Model intelligence matters less than process	Multi-agent token economics	Instant ≈ Pro in controlled experiment

Category B: Competitive Intelligence

Observation	Competitive Implication
Planning modes drop requirements	Cursor, Claude Code, Windsurf all have this gap
No tool verifies against original request	First-mover advantage for Coditect
Intent preservation isn't built-in	Differentiator for regulated industries
Triple-pass planning is manual	Automation opportunity

Category C: Technical Debt Warnings

Anti-Pattern Identified	Risk Level	Mitigation in Coditect
Single-pass planning	HIGH	Built-in verification loops
PRD as checklist (not conversation)	MEDIUM	Intent blocks with "why"
Trusting planning output	HIGH	Fallout list generation
Model-shopping for quality	LOW	Process > model intelligence

4. Coditect Impact Analysis

4.1 Validation of PRD Standards Template

The article directly validates the following sections of the Coditect PRD Standards:

Intent Understanding Framework (Section 2)

Article Quote:

"The issue isn't how smart the model is, the issue is how much intent survives the translation."

Coditect Solution: The Intent Block template captures:

Strategic Context (the "why")
Success/Failure indicators
Implicit assumptions
Anti-requirements

This ensures intent is explicitly documented rather than implicitly assumed.

Disambiguation Framework (Section 3)

Article Quote:

"The remaining items were things that I hadn't really specified all that well anyway, they were pretty kind of ambiguous."

Coditect Solution: Ambiguity scoring (0.0-1.0) with escalation triggers ensures ambiguous requirements are resolved before planning, not lost during planning.

AI Agent Interpretation Guide (Section 6)

Article Quote:

"They should be able to evaluate against the original request... It's kind of surprising that they're not doing that."

Coditect Solution: Rule 5 (Document Everything) and the verification loop requirement ensure agents validate against source documents at each stage.

4.2 Required Coditect Architecture Enhancements

Based on this analysis, Coditect should implement:

Enhancement 1: Planning Verification Service

┌─────────────────────────────────────────────────────────────┐
│                PLANNING VERIFICATION SERVICE                 │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  INPUT: PRD + Generated Plan                                │
│                                                             │
│  PROCESS:                                                   │
│  1. Extract all requirements from PRD                       │
│  2. Extract all planned items from Plan                     │
│  3. Diff: PRD requirements - Plan items = FALLOUT LIST      │
│  4. Score coverage percentage                               │
│  5. If coverage < 95%: ITERATE                              │
│  6. If coverage < 80%: ESCALATE to human                    │
│                                                             │
│  OUTPUT: Verified Plan + Fallout Report + Coverage Score    │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Enhancement 2: Intent Survival Tracking

@dataclass
class IntentSurvivalMetrics:
    """Track intent through the development pipeline"""
    
    original_intent_items: int      # From intent document
    prd_intent_items: int           # Captured in PRD
    plan_intent_items: int          # Made it to plan
    build_intent_items: int         # Implemented in code
    
    @property
    def prd_survival_rate(self) -> float:
        return self.prd_intent_items / self.original_intent_items
    
    @property
    def plan_survival_rate(self) -> float:
        return self.plan_intent_items / self.prd_intent_items
    
    @property
    def build_survival_rate(self) -> float:
        return self.build_intent_items / self.plan_intent_items
    
    @property
    def total_survival_rate(self) -> float:
        return self.build_intent_items / self.original_intent_items
    
    def identify_loss_stage(self) -> str:
        """Find where most intent is lost"""
        losses = {
            "PRD Translation": 1 - self.prd_survival_rate,
            "Planning": 1 - self.plan_survival_rate,
            "Build": 1 - self.build_survival_rate
        }
        return max(losses, key=losses.get)

Enhancement 3: Fallout List Automation

After each planning phase, automatically generate:

## FALLOUT REPORT - Planning Pass #1

### Missing Requirements (18 items)

| Req ID | Requirement | PRD Section | Severity |
|--------|-------------|-------------|----------|
| FR-012 | Word of the day integration | 4.3.12 | MEDIUM |
| FR-015 | Sound effects on lookup | 4.3.15 | LOW |
| FR-018 | Auto-dismiss behavior | 4.3.18 | HIGH |
| ... | ... | ... | ... |

### Coverage Analysis
- **PRD Requirements:** 67
- **Planned Items:** 49
- **Coverage:** 73.1%
- **Status:** ⚠️ BELOW THRESHOLD (95%)

### Recommended Action
Iterate planning with fallout list appended to context.

4.3 Token Economics Impact

The article reveals that model cost ≠ output quality in planning-mediated workflows:

Model	Cost (relative)	Output Quality	Implication
GPT-5.2 Instant	1x	~85%	Use for PRD generation
GPT-5.2 Pro	100x+	~85%	Wasteful for PRD generation
Opus 4.5 (planning)	10x	Normalizes all inputs	Use for planning orchestration

Coditect Token Strategy:

Use cost-effective models for PRD generation
Invest tokens in verification loops rather than expensive base models
The 3-pass verification pattern costs ~3x tokens but recovers 90%+ of lost requirements

4.4 Competitive Positioning

┌─────────────────────────────────────────────────────────────────────────┐
│                    COMPETITIVE LANDSCAPE                                 │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  CURRENT STATE (All Competitors):                                       │
│  ┌─────────┐    ┌─────────┐    ┌─────────┐    ┌─────────┐              │
│  │  PRD    │───▶│  Plan   │───▶│  Build  │───▶│ Output  │              │
│  └─────────┘    └─────────┘    └─────────┘    └─────────┘              │
│       │              │              │                                   │
│       ▼              ▼              ▼                                   │
│    [LOSS]         [LOSS]         [LOSS]                                │
│    10-20%         20-30%         10-15%         = 40-65% TOTAL LOSS    │
│                                                                         │
│  CODITECT (With Verification Loops):                                    │
│  ┌─────────┐    ┌─────────┐    ┌─────────┐    ┌─────────┐              │
│  │  PRD    │◀──▶│  Plan   │◀──▶│  Build  │───▶│ Output  │              │
│  └─────────┘    └─────────┘    └─────────┘    └─────────┘              │
│       │              │              │                                   │
│       ▼              ▼              ▼                                   │
│   [VERIFY]       [VERIFY]       [VERIFY]                               │
│    95%+           95%+           95%+          = <15% TOTAL LOSS       │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

5. Actionable Recommendations

Immediate (Week 1-2)

Add fallout list generation to planning phase
Implement 3-pass verification as default behavior
Track intent survival metrics in development dashboard

Short-term (Month 1)

Update PRD Standards template with explicit intent-carrying instructions
Create Intent Preservation Score as a first-class metric
Build automated PRD ↔ Plan diff tool

Medium-term (Quarter 1)

Train custom model for intent extraction from voice memos
Implement real-time verification during planning (not just post-hoc)
Create "Intent Debugger" showing where losses occur

6. Key Quotes for Reference

"The gap between your best idea and what gets built isn't about using GPT-5 Pro versus 5.2 instant. It's about the silent losses at each handoff."

"The planning step was dropping 20 to 30% of my requirements. Not because they weren't clear, not because they were unreasonable, they were just lost."

"Your intent is the actual you in all of this."

"What actually matters here is getting your intent to survive the journey and then verifying that nothing got dropped along the way."

"They should be able to evaluate against the original request... It's kind of surprising that they're not doing that."

7. Conclusion

This experiment provides empirical validation for Coditect's core architectural decisions:

Intent preservation > model intelligence - The Intent Understanding Framework addresses this
Silent losses are the real enemy - Disambiguation and verification loops address this
Verification against source is not standard - This is Coditect's competitive advantage
Process beats model selection - Multi-pass verification recovers lost requirements

The finding that current AI development tools (Claude Code, Cursor, etc.) do not verify against original requirements represents a significant market opportunity for Coditect in regulated industries where requirement traceability is mandatory.

Analysis prepared for Coditect Architecture Team
Classification: Strategic Intelligence

Implications for Coditect Autonomous Development Platform​

Executive Summary​

1. Article Summary​

Experiment Design​

Models Tested​

Hypothesis vs Reality​

2. Key Findings​

Finding 1: Planning Mode is a "Powerful Planning Filter"​

Finding 2: Intent Preservation is the Critical Variable​

Finding 3: Silent Losses at Each Handoff​

Finding 4: Verification Loop Recovers Lost Requirements​

3. Categorization of Insights​

Category A: Architecture Validation​

Category B: Competitive Intelligence​

Category C: Technical Debt Warnings​

4. Coditect Impact Analysis​

4.1 Validation of PRD Standards Template​

Intent Understanding Framework (Section 2)​

Disambiguation Framework (Section 3)​

AI Agent Interpretation Guide (Section 6)​

4.2 Required Coditect Architecture Enhancements​

Enhancement 1: Planning Verification Service​

Enhancement 2: Intent Survival Tracking​

Enhancement 3: Fallout List Automation​

4.3 Token Economics Impact​

4.4 Competitive Positioning​

5. Actionable Recommendations​

Immediate (Week 1-2)​

Short-term (Month 1)​

Medium-term (Quarter 1)​

6. Key Quotes for Reference​

7. Conclusion​

Implications for Coditect Autonomous Development Platform

Executive Summary

1. Article Summary

Experiment Design

Models Tested

Hypothesis vs Reality

2. Key Findings

Finding 1: Planning Mode is a "Powerful Planning Filter"

Finding 2: Intent Preservation is the Critical Variable

Finding 3: Silent Losses at Each Handoff

Finding 4: Verification Loop Recovers Lost Requirements

3. Categorization of Insights

Category A: Architecture Validation

Category B: Competitive Intelligence

Category C: Technical Debt Warnings

4. Coditect Impact Analysis

4.1 Validation of PRD Standards Template

Intent Understanding Framework (Section 2)

Disambiguation Framework (Section 3)

AI Agent Interpretation Guide (Section 6)

4.2 Required Coditect Architecture Enhancements

Enhancement 1: Planning Verification Service

Enhancement 2: Intent Survival Tracking

Enhancement 3: Fallout List Automation

4.3 Token Economics Impact

4.4 Competitive Positioning

5. Actionable Recommendations

Immediate (Week 1-2)

Short-term (Month 1)

Medium-term (Quarter 1)

6. Key Quotes for Reference

7. Conclusion