/batch-pipeline

Manage staged LLM batch processing pipelines with checkpointing, cost estimation, and parallel execution.

System Prompt

EXECUTION DIRECTIVE: When the user invokes this command, you MUST:

Verify batch-id is provided
Confirm stage is valid
For 'process' stage: Show cost estimate and request confirmation
Execute requested stage with progress reporting
Report results with metrics

DO NOT:

Skip confirmation for process stage (involves API costs)
Proceed with invalid batch-id
Silently fail on errors

Usage

/batch-pipeline <stage> --batch-id <id> [options]

Pipeline Stages

┌─────────────────────────────────────────────────────────────┐
│                    BATCH PIPELINE                           │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   ACQUIRE → PREPARE → PROCESS → PARSE → RENDER              │
│      │         │         │         │        │               │
│      ▼         ▼         ▼         ▼        ▼               │
│   raw.json  prompt.md  response.md parsed.json index.html   │
│                                                             │
│   [Cheap]   [Cheap]    [Expensive]  [Cheap]   [Cheap]       │
│   [Fast]    [Fast]     [Slow]       [Fast]    [Fast]        │
│   [Safe]    [Safe]     [Non-det]    [Det]     [Det]         │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Stage 1: ACQUIRE

Fetch raw data from sources.

Input: Data sources (APIs, databases, files) Output: {batch_dir}/{item_id}/raw.json Cost: Low (data transfer only) Deterministic: Yes

Stage 2: PREPARE

Generate prompts from raw data.

Input: raw.json files Output: {batch_dir}/{item_id}/prompt.md Cost: Low (local processing) Deterministic: Yes

Stage 3: PROCESS

Execute LLM calls.

Input: prompt.md files Output: {batch_dir}/{item_id}/response.md Cost: HIGH (API calls) Deterministic: NO

Stage 4: PARSE

Extract structured data from responses.

Input: response.md files Output: {batch_dir}/{item_id}/parsed.json Cost: Low (local processing) Deterministic: Yes

Stage 5: RENDER

Generate final outputs.

Input: parsed.json files Output: {output_dir}/index.html Cost: Low (local processing) Deterministic: Yes

Options

Option	Type	Default	Description
`--batch-id`	string	required	Unique batch identifier
`--limit`	number	null	Limit items (for testing)
`--workers`	number	5	Parallel workers for process stage
`--model`	string	sonnet	Model for processing
`--clean-stage`	string	null	Stage to clean from

Stage Commands

Acquire Data

/batch-pipeline acquire --batch-id 2025-01-15 --limit 100

Prepare Prompts

/batch-pipeline prepare --batch-id 2025-01-15

Process (LLM Calls)

/batch-pipeline process --batch-id 2025-01-15 --workers 10 --model sonnet

Parse Results

/batch-pipeline parse --batch-id 2025-01-15

Render Output

/batch-pipeline render --batch-id 2025-01-15

Run All Stages

/batch-pipeline all --batch-id 2025-01-15

Estimate Costs

/batch-pipeline estimate --batch-id 2025-01-15

Clean and Re-run

/batch-pipeline clean --batch-id 2025-01-15 --clean-stage process

Output Examples

Estimate Output

COST ESTIMATE: 2025-01-15
=========================
Items: 100
Estimated input tokens: 250,000
Estimated output tokens: 50,000

Cost Breakdown:
├── Input tokens:  $2.50 (@ $10/MTok)
├── Output tokens: $1.50 (@ $30/MTok)
└── Total:         $4.00

Note: Add 20-30% buffer for retries.

Proceed with processing? [y/N]

Progress Output

PROCESSING: 2025-01-15
======================
Workers: 10
Model: claude-sonnet

Progress:
[████████████░░░░░░░░] 60/100 (60%)

Completed:
├── item-0001: Done (1,245 chars)
├── item-0002: Done (982 chars)
├── item-0003: Error - Rate limited, queued for retry
...

Metrics:
├── Success rate: 98%
├── Avg response time: 2.3s
├── Tokens used: 180,000 / 250,000
└── Estimated remaining: 4 minutes

Summary Output

BATCH COMPLETE: 2025-01-15
==========================
Total items: 100
Successful: 98
Failed: 2
Parse errors: 3

Output: output/2025-01-15/index.html

Cost Summary:
├── Actual tokens: 245,000
├── Actual cost: $3.85
└── Under estimate by: 4%

Next Steps:
1. Review output at output/2025-01-15/index.html
2. Check failed items in data/2025-01-15/
3. Re-run failures with: /batch-pipeline process --batch-id 2025-01-15

Directory Structure

data/
└── 2025-01-15/                    # Batch directory
    ├── item-0001/
    │   ├── raw.json               # Stage 1 output
    │   ├── prompt.md              # Stage 2 output
    │   ├── response.md            # Stage 3 output
    │   └── parsed.json            # Stage 4 output
    ├── item-0002/
    │   └── ...
    └── all_results.json           # Aggregated results

output/
└── 2025-01-15/
    └── index.html                 # Stage 5 output

Error Handling

Retry Logic

Rate limits: Automatic retry with exponential backoff
Timeouts: Retry up to 3 times
Parse errors: Logged but don't block pipeline

Recovery

Each stage is checkpointed
Re-run from any stage
Clean specific stages to force re-processing

Invokes Skills

project-development: Pipeline patterns and templates

multi-agent-coordinator: Can orchestrate batch processing
llm-judge: Can evaluate batch outputs

batch-processing-workflow.md: Complete workflow documentation

Best Practices

Cost Management

Always run estimate before process
Use --limit for testing new prompts
Monitor progress for unexpected costs

Reliability

Use checkpointing (stages are automatic checkpoints)
Review parse errors before re-running
Keep batch sizes manageable (100-1000 items)

Performance

Tune --workers based on rate limits
Use appropriate model for complexity
Optimize prompts in prepare stage

Success Output

When batch pipeline completes:

✅ COMMAND COMPLETE: /batch-pipeline
Stage: <stage executed>
Batch ID: <batch-id>
Items: N processed
Success Rate: X%
Cost: $Y.YY
Output: <output path>

Completion Checklist

Before marking complete:

Failure Indicators

This command has FAILED if:

❌ Invalid batch-id
❌ Stage execution error
❌ No progress reported
❌ Missing output files

When NOT to Use

Do NOT use when:

Single item processing (use direct LLM call)
No batch structure needed
Testing prompts (use --limit first)

Anti-Patterns (Avoid)

Anti-Pattern	Problem	Solution
Skip estimate	Unexpected costs	Always estimate first
Too many workers	Rate limiting	Tune based on API limits
No checkpoints	Lost progress	Use staged approach

Principles

This command embodies:

#3 Complete Execution - Full pipeline stages
#4 Separation of Concerns - Staged processing
#5 Confirm Destructive - Cost confirmation

Full Standard: CODITECT-STANDARD-AUTOMATION.md

System Prompt​

Usage​

Pipeline Stages​

Stage 1: ACQUIRE​

Stage 2: PREPARE​

Stage 3: PROCESS​

Stage 4: PARSE​

Stage 5: RENDER​

Options​

Stage Commands​

Acquire Data​

Prepare Prompts​

Process (LLM Calls)​

Parse Results​

Render Output​

Run All Stages​

Estimate Costs​

Clean and Re-run​

Output Examples​

Estimate Output​

Progress Output​

Summary Output​

Directory Structure​

Error Handling​

Retry Logic​

Recovery​

Related Components​

Invokes Skills​

Related Agents​

Related Workflows​

Best Practices​

Cost Management​

Reliability​

Performance​

Success Output​

Completion Checklist​

Failure Indicators​

When NOT to Use​

Anti-Patterns (Avoid)​

Principles​