/batch-pipeline
Manage staged LLM batch processing pipelines with checkpointing, cost estimation, and parallel execution.
System Prompt
EXECUTION DIRECTIVE: When the user invokes this command, you MUST:
- Verify batch-id is provided
- Confirm stage is valid
- For 'process' stage: Show cost estimate and request confirmation
- Execute requested stage with progress reporting
- Report results with metrics
DO NOT:
- Skip confirmation for process stage (involves API costs)
- Proceed with invalid batch-id
- Silently fail on errors
Usage
/batch-pipeline <stage> --batch-id <id> [options]
Pipeline Stages
┌─────────────────────────────────────────────────────────────┐
│ BATCH PIPELINE │
├─────────────────────────────────────────────────────────────┤
│ │
│ ACQUIRE → PREPARE → PROCESS → PARSE → RENDER │
│ │ │ │ │ │ │
│ ▼ ▼ ▼ ▼ ▼ │
│ raw.json prompt.md response.md parsed.json index.html │
│ │
│ [Cheap] [Cheap] [Expensive] [Cheap] [Cheap] │
│ [Fast] [Fast] [Slow] [Fast] [Fast] │
│ [Safe] [Safe] [Non-det] [Det] [Det] │
│ │
└─────────────────────────────────────────────────────────────┘
Stage 1: ACQUIRE
Fetch raw data from sources.
Input: Data sources (APIs, databases, files)
Output: {batch_dir}/{item_id}/raw.json
Cost: Low (data transfer only)
Deterministic: Yes
Stage 2: PREPARE
Generate prompts from raw data.
Input: raw.json files
Output: {batch_dir}/{item_id}/prompt.md
Cost: Low (local processing)
Deterministic: Yes
Stage 3: PROCESS
Execute LLM calls.
Input: prompt.md files
Output: {batch_dir}/{item_id}/response.md
Cost: HIGH (API calls)
Deterministic: NO
Stage 4: PARSE
Extract structured data from responses.
Input: response.md files
Output: {batch_dir}/{item_id}/parsed.json
Cost: Low (local processing)
Deterministic: Yes
Stage 5: RENDER
Generate final outputs.
Input: parsed.json files
Output: {output_dir}/index.html
Cost: Low (local processing)
Deterministic: Yes
Options
| Option | Type | Default | Description |
|---|---|---|---|
--batch-id | string | required | Unique batch identifier |
--limit | number | null | Limit items (for testing) |
--workers | number | 5 | Parallel workers for process stage |
--model | string | sonnet | Model for processing |
--clean-stage | string | null | Stage to clean from |
Stage Commands
Acquire Data
/batch-pipeline acquire --batch-id 2025-01-15 --limit 100
Prepare Prompts
/batch-pipeline prepare --batch-id 2025-01-15
Process (LLM Calls)
/batch-pipeline process --batch-id 2025-01-15 --workers 10 --model sonnet
Parse Results
/batch-pipeline parse --batch-id 2025-01-15
Render Output
/batch-pipeline render --batch-id 2025-01-15
Run All Stages
/batch-pipeline all --batch-id 2025-01-15
Estimate Costs
/batch-pipeline estimate --batch-id 2025-01-15
Clean and Re-run
/batch-pipeline clean --batch-id 2025-01-15 --clean-stage process
Output Examples
Estimate Output
COST ESTIMATE: 2025-01-15
=========================
Items: 100
Estimated input tokens: 250,000
Estimated output tokens: 50,000
Cost Breakdown:
├── Input tokens: $2.50 (@ $10/MTok)
├── Output tokens: $1.50 (@ $30/MTok)
└── Total: $4.00
Note: Add 20-30% buffer for retries.
Proceed with processing? [y/N]
Progress Output
PROCESSING: 2025-01-15
======================
Workers: 10
Model: claude-sonnet
Progress:
[████████████░░░░░░░░] 60/100 (60%)
Completed:
├── item-0001: Done (1,245 chars)
├── item-0002: Done (982 chars)
├── item-0003: Error - Rate limited, queued for retry
...
Metrics:
├── Success rate: 98%
├── Avg response time: 2.3s
├── Tokens used: 180,000 / 250,000
└── Estimated remaining: 4 minutes
Summary Output
BATCH COMPLETE: 2025-01-15
==========================
Total items: 100
Successful: 98
Failed: 2
Parse errors: 3
Output: output/2025-01-15/index.html
Cost Summary:
├── Actual tokens: 245,000
├── Actual cost: $3.85
└── Under estimate by: 4%
Next Steps:
1. Review output at output/2025-01-15/index.html
2. Check failed items in data/2025-01-15/
3. Re-run failures with: /batch-pipeline process --batch-id 2025-01-15
Directory Structure
data/
└── 2025-01-15/ # Batch directory
├── item-0001/
│ ├── raw.json # Stage 1 output
│ ├── prompt.md # Stage 2 output
│ ├── response.md # Stage 3 output
│ └── parsed.json # Stage 4 output
├── item-0002/
│ └── ...
└── all_results.json # Aggregated results
output/
└── 2025-01-15/
└── index.html # Stage 5 output
Error Handling
Retry Logic
- Rate limits: Automatic retry with exponential backoff
- Timeouts: Retry up to 3 times
- Parse errors: Logged but don't block pipeline
Recovery
- Each stage is checkpointed
- Re-run from any stage
- Clean specific stages to force re-processing
Related Components
Invokes Skills
project-development: Pipeline patterns and templates
Related Agents
multi-agent-coordinator: Can orchestrate batch processingllm-judge: Can evaluate batch outputs
Related Workflows
batch-processing-workflow.md: Complete workflow documentation
Best Practices
Cost Management
- Always run
estimatebeforeprocess - Use
--limitfor testing new prompts - Monitor progress for unexpected costs
Reliability
- Use checkpointing (stages are automatic checkpoints)
- Review parse errors before re-running
- Keep batch sizes manageable (100-1000 items)
Performance
- Tune
--workersbased on rate limits - Use appropriate model for complexity
- Optimize prompts in
preparestage
Success Output
When batch pipeline completes:
✅ COMMAND COMPLETE: /batch-pipeline
Stage: <stage executed>
Batch ID: <batch-id>
Items: N processed
Success Rate: X%
Cost: $Y.YY
Output: <output path>
Completion Checklist
Before marking complete:
- Stage executed successfully
- All items processed
- Results saved to batch directory
- Metrics reported
- Next steps provided
Failure Indicators
This command has FAILED if:
- ❌ Invalid batch-id
- ❌ Stage execution error
- ❌ No progress reported
- ❌ Missing output files
When NOT to Use
Do NOT use when:
- Single item processing (use direct LLM call)
- No batch structure needed
- Testing prompts (use --limit first)
Anti-Patterns (Avoid)
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Skip estimate | Unexpected costs | Always estimate first |
| Too many workers | Rate limiting | Tune based on API limits |
| No checkpoints | Lost progress | Use staged approach |
Principles
This command embodies:
- #3 Complete Execution - Full pipeline stages
- #4 Separation of Concerns - Staged processing
- #5 Confirm Destructive - Cost confirmation
Full Standard: CODITECT-STANDARD-AUTOMATION.md