Skip to main content

/batch-pipeline

Manage staged LLM batch processing pipelines with checkpointing, cost estimation, and parallel execution.

System Prompt

EXECUTION DIRECTIVE: When the user invokes this command, you MUST:

  1. Verify batch-id is provided
  2. Confirm stage is valid
  3. For 'process' stage: Show cost estimate and request confirmation
  4. Execute requested stage with progress reporting
  5. Report results with metrics

DO NOT:

  • Skip confirmation for process stage (involves API costs)
  • Proceed with invalid batch-id
  • Silently fail on errors

Usage

/batch-pipeline <stage> --batch-id <id> [options]

Pipeline Stages

┌─────────────────────────────────────────────────────────────┐
│ BATCH PIPELINE │
├─────────────────────────────────────────────────────────────┤
│ │
│ ACQUIRE → PREPARE → PROCESS → PARSE → RENDER │
│ │ │ │ │ │ │
│ ▼ ▼ ▼ ▼ ▼ │
│ raw.json prompt.md response.md parsed.json index.html │
│ │
│ [Cheap] [Cheap] [Expensive] [Cheap] [Cheap] │
│ [Fast] [Fast] [Slow] [Fast] [Fast] │
│ [Safe] [Safe] [Non-det] [Det] [Det] │
│ │
└─────────────────────────────────────────────────────────────┘

Stage 1: ACQUIRE

Fetch raw data from sources.

Input: Data sources (APIs, databases, files) Output: {batch_dir}/{item_id}/raw.json Cost: Low (data transfer only) Deterministic: Yes

Stage 2: PREPARE

Generate prompts from raw data.

Input: raw.json files Output: {batch_dir}/{item_id}/prompt.md Cost: Low (local processing) Deterministic: Yes

Stage 3: PROCESS

Execute LLM calls.

Input: prompt.md files Output: {batch_dir}/{item_id}/response.md Cost: HIGH (API calls) Deterministic: NO

Stage 4: PARSE

Extract structured data from responses.

Input: response.md files Output: {batch_dir}/{item_id}/parsed.json Cost: Low (local processing) Deterministic: Yes

Stage 5: RENDER

Generate final outputs.

Input: parsed.json files Output: {output_dir}/index.html Cost: Low (local processing) Deterministic: Yes

Options

OptionTypeDefaultDescription
--batch-idstringrequiredUnique batch identifier
--limitnumbernullLimit items (for testing)
--workersnumber5Parallel workers for process stage
--modelstringsonnetModel for processing
--clean-stagestringnullStage to clean from

Stage Commands

Acquire Data

/batch-pipeline acquire --batch-id 2025-01-15 --limit 100

Prepare Prompts

/batch-pipeline prepare --batch-id 2025-01-15

Process (LLM Calls)

/batch-pipeline process --batch-id 2025-01-15 --workers 10 --model sonnet

Parse Results

/batch-pipeline parse --batch-id 2025-01-15

Render Output

/batch-pipeline render --batch-id 2025-01-15

Run All Stages

/batch-pipeline all --batch-id 2025-01-15

Estimate Costs

/batch-pipeline estimate --batch-id 2025-01-15

Clean and Re-run

/batch-pipeline clean --batch-id 2025-01-15 --clean-stage process

Output Examples

Estimate Output

COST ESTIMATE: 2025-01-15
=========================
Items: 100
Estimated input tokens: 250,000
Estimated output tokens: 50,000

Cost Breakdown:
├── Input tokens: $2.50 (@ $10/MTok)
├── Output tokens: $1.50 (@ $30/MTok)
└── Total: $4.00

Note: Add 20-30% buffer for retries.

Proceed with processing? [y/N]

Progress Output

PROCESSING: 2025-01-15
======================
Workers: 10
Model: claude-sonnet

Progress:
[████████████░░░░░░░░] 60/100 (60%)

Completed:
├── item-0001: Done (1,245 chars)
├── item-0002: Done (982 chars)
├── item-0003: Error - Rate limited, queued for retry
...

Metrics:
├── Success rate: 98%
├── Avg response time: 2.3s
├── Tokens used: 180,000 / 250,000
└── Estimated remaining: 4 minutes

Summary Output

BATCH COMPLETE: 2025-01-15
==========================
Total items: 100
Successful: 98
Failed: 2
Parse errors: 3

Output: output/2025-01-15/index.html

Cost Summary:
├── Actual tokens: 245,000
├── Actual cost: $3.85
└── Under estimate by: 4%

Next Steps:
1. Review output at output/2025-01-15/index.html
2. Check failed items in data/2025-01-15/
3. Re-run failures with: /batch-pipeline process --batch-id 2025-01-15

Directory Structure

data/
└── 2025-01-15/ # Batch directory
├── item-0001/
│ ├── raw.json # Stage 1 output
│ ├── prompt.md # Stage 2 output
│ ├── response.md # Stage 3 output
│ └── parsed.json # Stage 4 output
├── item-0002/
│ └── ...
└── all_results.json # Aggregated results

output/
└── 2025-01-15/
└── index.html # Stage 5 output

Error Handling

Retry Logic

  • Rate limits: Automatic retry with exponential backoff
  • Timeouts: Retry up to 3 times
  • Parse errors: Logged but don't block pipeline

Recovery

  • Each stage is checkpointed
  • Re-run from any stage
  • Clean specific stages to force re-processing

Invokes Skills

  • project-development: Pipeline patterns and templates
  • multi-agent-coordinator: Can orchestrate batch processing
  • llm-judge: Can evaluate batch outputs
  • batch-processing-workflow.md: Complete workflow documentation

Best Practices

Cost Management

  1. Always run estimate before process
  2. Use --limit for testing new prompts
  3. Monitor progress for unexpected costs

Reliability

  1. Use checkpointing (stages are automatic checkpoints)
  2. Review parse errors before re-running
  3. Keep batch sizes manageable (100-1000 items)

Performance

  1. Tune --workers based on rate limits
  2. Use appropriate model for complexity
  3. Optimize prompts in prepare stage

Success Output

When batch pipeline completes:

✅ COMMAND COMPLETE: /batch-pipeline
Stage: <stage executed>
Batch ID: <batch-id>
Items: N processed
Success Rate: X%
Cost: $Y.YY
Output: <output path>

Completion Checklist

Before marking complete:

  • Stage executed successfully
  • All items processed
  • Results saved to batch directory
  • Metrics reported
  • Next steps provided

Failure Indicators

This command has FAILED if:

  • ❌ Invalid batch-id
  • ❌ Stage execution error
  • ❌ No progress reported
  • ❌ Missing output files

When NOT to Use

Do NOT use when:

  • Single item processing (use direct LLM call)
  • No batch structure needed
  • Testing prompts (use --limit first)

Anti-Patterns (Avoid)

Anti-PatternProblemSolution
Skip estimateUnexpected costsAlways estimate first
Too many workersRate limitingTune based on API limits
No checkpointsLost progressUse staged approach

Principles

This command embodies:

  • #3 Complete Execution - Full pipeline stages
  • #4 Separation of Concerns - Staged processing
  • #5 Confirm Destructive - Cost confirmation

Full Standard: CODITECT-STANDARD-AUTOMATION.md