LLM Provider Comparison
Performance Analysis by Paradigm and Use Case
Document ID: F4-LLM-COMPARISON
Version: 1.0
Category: Technical Reference
Provider Overview (January 2025)
| Provider | Model | Context | Input $/1M | Output $/1M | Strengths |
|---|---|---|---|---|---|
| Anthropic | Claude Opus 4 | 200K | $15.00 | $75.00 | Complex reasoning, safety |
| Anthropic | Claude Sonnet 4 | 200K | $3.00 | $15.00 | Best balance quality/cost |
| Anthropic | Claude Haiku 4 | 200K | $0.25 | $1.25 | Speed, efficiency |
| OpenAI | GPT-4o | 128K | $2.50 | $10.00 | Multimodal, tool use |
| OpenAI | GPT-4o-mini | 128K | $0.15 | $0.60 | Cost efficiency |
| Gemini 1.5 Pro | 1M | $1.25 | $5.00 | Long context | |
| Gemini 1.5 Flash | 1M | $0.075 | $0.30 | Speed, long context | |
| Meta | Llama 3.1 405B | 128K | Self-host | Self-host | Privacy, customization |
Paradigm Fit Matrix
LSR (Latent Space Reasoner)
| Model | Creative | Reasoning | Consistency | Recommended |
|---|---|---|---|---|
| Claude Opus 4 | ★★★★★ | ★★★★★ | ★★★★☆ | Complex creative |
| Claude Sonnet 4 | ★★★★☆ | ★★★★☆ | ★★★★☆ | General creative |
| GPT-4o | ★★★★☆ | ★★★★☆ | ★★★★☆ | Multimodal creative |
| Gemini Pro | ★★★★☆ | ★★★★☆ | ★★★☆☆ | Long-form creative |
LSR Recommendation: Claude Sonnet 4 for most use cases; Opus 4 for complex synthesis
GS (Grounded Synthesizer)
| Model | Citation | Accuracy | Synthesis | Recommended |
|---|---|---|---|---|
| Claude Sonnet 4 | ★★★★★ | ★★★★★ | ★★★★★ | Research, analysis |
| Claude Opus 4 | ★★★★★ | ★★★★★ | ★★★★★ | Complex research |
| GPT-4o | ★★★★☆ | ★★★★☆ | ★★★★☆ | General research |
| Gemini Pro | ★★★★☆ | ★★★★☆ | ★★★★☆ | Long-document analysis |
GS Recommendation: Claude Sonnet 4 for citation-heavy work; Gemini for very long documents
EP (Emergent Planner)
| Model | Planning | Tool Use | Adaptation | Recommended |
|---|---|---|---|---|
| Claude Opus 4 | ★★★★★ | ★★★★★ | ★★★★★ | Complex planning |
| Claude Sonnet 4 | ★★★★☆ | ★★★★★ | ★★★★☆ | General planning |
| GPT-4o | ★★★★☆ | ★★★★★ | ★★★★☆ | Tool-heavy tasks |
| Claude Haiku 4 | ★★★☆☆ | ★★★★☆ | ★★★☆☆ | Fast iteration |
EP Recommendation: Claude Sonnet 4 or GPT-4o for tool use; consider Haiku for iteration speed
VE (Verifiable Executor)
| Model | Instruction Following | Consistency | Audit Quality | Recommended |
|---|---|---|---|---|
| Claude Sonnet 4 | ★★★★★ | ★★★★★ | ★★★★★ | Compliance workflows |
| Claude Opus 4 | ★★★★★ | ★★★★☆ | ★★★★★ | Complex protocols |
| GPT-4o | ★★★★☆ | ★★★★☆ | ★★★★☆ | General workflows |
| Claude Haiku 4 | ★★★★☆ | ★★★★★ | ★★★★☆ | High-volume processing |
VE Recommendation: Claude models for compliance; Haiku for high-volume
Use Case Recommendations
Content Creation
- Best: Claude Sonnet 4
- Budget: GPT-4o-mini
- Premium: Claude Opus 4
Research & Analysis
- Best: Claude Sonnet 4
- Long documents: Gemini 1.5 Pro
- Budget: GPT-4o-mini + Gemini Flash
Customer Service
- Best: Claude Sonnet 4
- High volume: Claude Haiku 4
- Budget: GPT-4o-mini
Code Generation
- Best: Claude Sonnet 4
- Complex: Claude Opus 4
- Budget: GPT-4o-mini
Compliance/Audit
- Best: Claude Sonnet 4
- High volume: Claude Haiku 4
- Self-hosted: Llama 3.1 405B
Multi-Model Architecture
Recommended Tiering
┌─────────────────────────────────────────────┐
│ ROUTER/CLASSIFIER │
│ (Haiku or GPT-4o-mini) │
└─────────────────────┬───────────────────────┘
│
┌─────────────────┼─────────────────┐
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ FAST │ │ STANDARD│ │ COMPLEX │
│ Haiku / │ │ Sonnet /│ │ Opus / │
│ Mini │ │ GPT-4o │ │ Opus │
└─────────┘ └─────────┘ └─────────┘
Simple Normal Complex
tasks tasks tasks
Router Implementation
class ModelRouter:
def __init__(self):
self.classifier = ClaudeHaiku()
self.models = {
'fast': ClaudeHaiku(),
'standard': ClaudeSonnet(),
'complex': ClaudeOpus()
}
async def route(self, task):
complexity = await self.classifier.classify(task)
model = self.models[complexity]
return await model.execute(task)
Cost Optimization Strategies
Strategy 1: Tiered Execution
- Route 70% to Haiku/Mini (low cost)
- Route 25% to Sonnet/GPT-4o (medium cost)
- Route 5% to Opus (high cost)
Strategy 2: Prompt Caching
- Claude: Automatic caching for repeated prefixes
- OpenAI: Implement application-level caching
Strategy 3: Batch Processing
- Gemini: Native batch API (50% discount)
- Anthropic: Message batching
Projected Cost Comparison (100K tasks/month)
| Strategy | Haiku | Sonnet | GPT-4o | Gemini Flash |
|---|---|---|---|---|
| All tasks | $375 | $4,500 | $3,750 | $112 |
| Tiered (70/25/5) | - | $2,475 | - | - |
| With caching (50%) | $188 | $2,250 | $1,875 | $56 |
Fallback Configuration
class LLMWithFallback:
def __init__(self):
self.primary = ClaudeSonnet()
self.fallbacks = [
GPT4o(),
GeminiPro(),
ClaudeHaiku() # Last resort - fast but less capable
]
async def call(self, prompt, **kwargs):
for model in [self.primary] + self.fallbacks:
try:
return await model.call(prompt, **kwargs)
except (RateLimitError, ServiceUnavailableError) as e:
logger.warning(f"{model.name} failed: {e}")
continue
raise AllModelsFailedError()
Selection Quick Reference
| If you need... | Use... |
|---|---|
| Best overall quality | Claude Sonnet 4 |
| Complex reasoning | Claude Opus 4 |
| Lowest cost | Gemini Flash / GPT-4o-mini |
| Fastest response | Claude Haiku 4 |
| Long context (>200K) | Gemini 1.5 Pro |
| Multimodal | GPT-4o |
| Self-hosted/private | Llama 3.1 405B |
| Best tool use | Claude Sonnet 4 / GPT-4o |
Document maintained by CODITECT Engineering Team. Updated monthly.