Skip to main content

Efficiency Optimization Skill

Optimize AI agent workflows for maximum efficiency, minimal token usage, and faster task completion while maintaining quality.

Purpose

This skill ensures AI agent workflows are optimized for efficiency across multiple dimensions: token usage, execution time, context utilization, and resource consumption. It enables sustainable, cost-effective agentic operations.

When to Use

  • Planning multi-task workflows
  • Designing context handoffs between sessions
  • Optimizing agent selection and routing
  • Reducing token consumption
  • Improving task execution speed

Baseline Metrics Template

Before optimizing, establish baselines using this template:

# efficiency-baseline.yaml
project: "Your Project Name"
date: "YYYY-MM-DD"
session_id: "session-xxx"

# Token Metrics (measure before optimization)
tokens:
context_loaded: 0 # Tokens loaded at session start
execution_used: 0 # Tokens used during execution
output_generated: 0 # Tokens in final output
total: 0 # Sum of above
lines_generated: 0 # Lines of code/docs generated
token_per_line: 0.0 # Calculate: total / lines_generated
target: 1.0 # Target: <1.0 tokens/line

# Time Metrics
time:
sequential_estimate_min: 0 # If run sequentially
actual_execution_min: 0 # Actual time taken
parallelization_savings: 0 # Calculate: 1 - (actual / sequential)
target_savings: 0.5 # Target: >50% time saved

# Context Metrics
context:
available_tokens: 200000 # Context window size
used_tokens: 0 # Tokens actually used
utilization: 0.0 # Calculate: used / available
target_utilization: 0.8 # Target: >80%

# Cache Metrics
cache:
total_lookups: 0 # Times cache was queried
cache_hits: 0 # Successful cache retrievals
hit_rate: 0.0 # Calculate: hits / lookups
target_hit_rate: 0.4 # Target: >40%

# Agent Metrics
agents:
agents_used: [] # List of agents invoked
total_active_time_min: 0 # Time agents were working
total_elapsed_time_min: 0 # Wall clock time
utilization: 0.0 # Calculate: active / elapsed
target_utilization: 0.7 # Target: >70%

# Error Metrics
errors:
total_attempts: 0 # Total task attempts
failed_attempts: 0 # Failed attempts
error_rate: 0.0 # Calculate: failed / total
target_error_rate: 0.05 # Target: <5%

Quick Baseline Check:

MetricYour ValueTargetStatus
Token/Line Ratio_____<1.0
Time Savings_____%>50%
Context Utilization_____%>80%
Cache Hit Rate_____%>40%
Agent Utilization_____%>70%
Error Rate_____%<5%

Efficiency Dimensions

1. Token Efficiency

Minimize token consumption while maintaining quality:

**Token Budget Analysis**

Task: Create E2E test suite for Track E.1
├── Context Loading: 2,500 tokens
│ ├── PILOT plan (relevant sections): 800 tokens
│ ├── Previous session context: 1,200 tokens
│ └── Pattern examples: 500 tokens
├── Execution: 15,000 tokens
│ ├── E.1.1 (10 tests): 3,000 tokens
│ ├── E.1.2 (15 tests): 3,500 tokens
│ ├── E.1.3 (14 tests): 3,200 tokens
│ ├── E.1.4 (15 tests): 2,800 tokens
│ └── E.1.5 (16 tests): 2,500 tokens
├── Output: 8,000 tokens
│ └── 2,150 lines of test code
└── Total: 25,500 tokens

Efficiency Score: 0.84 tokens/line (target: <1.0)

2. Time Efficiency

Minimize execution time through parallelization:

Sequential Execution (Baseline)
═══════════════════════════════════════════════════════════════
E.1.1 ████████████████████ 15 min
E.1.2 ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 12 min
E.1.3 ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 18 min
E.1.4 ░░░░░░░░░░░░░░░░░░░░ 14 min
E.1.5 ████████████████████ 16 min
Total: 75 minutes

Parallel Execution (Optimized)
═══════════════════════════════════════════════════════════════
E.1.1 ████████████████████ }
E.1.2 ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ } 18 min (parallel batch 1)
E.1.3 ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ }
E.1.4 ░░░░░░░░░░░░░░░░░░░░ } 16 min (parallel batch 2)
E.1.5 ████████████████████ }
Total: 34 minutes (55% reduction)

3. Context Efficiency

Maximize context window utilization:

# Context Window Optimization
def optimize_context(task: str, available_tokens: int) -> dict:
"""Optimize context selection for maximum relevance."""

# Priority 1: Essential context (always include)
essential = {
"current_task": 200, # Task description
"plan_section": 500, # Relevant plan section
"constraints": 300 # Quality requirements
}

# Priority 2: Supporting context (include if room)
supporting = {
"patterns": 500, # Similar implementations
"session_history": 800, # Recent decisions
"error_solutions": 400 # Known solutions
}

# Priority 3: Background context (nice to have)
background = {
"full_plan": 2000, # Complete plan
"all_history": 3000, # Full session history
"codebase_map": 1500 # File structure
}

# Fit within budget
context = {}
remaining = available_tokens

for category in [essential, supporting, background]:
for key, tokens in category.items():
if tokens <= remaining:
context[key] = tokens
remaining -= tokens

return {
"selected_context": context,
"total_tokens": available_tokens - remaining,
"utilization": (available_tokens - remaining) / available_tokens
}

4. Resource Efficiency

Optimize agent and compute resource usage:

**Resource Utilization Report**

Agent Efficiency:
├── testing-specialist: 92% utilization
│ └─ 5 tasks, 1h 15m active, 0 idle
├── codi-qa-specialist: 0% (not used)
│ └─ Correctly avoided - testing-specialist better match
└── general-purpose: 8% utilization
└─ Fallback for 2 simple tasks

Compute Efficiency:
├── API Calls: 23 (optimized from estimated 45)
├── Token Usage: 25,500 (budget: 50,000)
├── Cache Hits: 12/23 (52% reuse)
└── Error Retries: 0 (no wasted calls)

Optimization Strategies

Strategy 1: Batch Processing

Group related tasks for efficiency:

def batch_tasks(tasks: list[dict]) -> list[list[dict]]:
"""Group tasks into efficient batches."""
batches = []
current_batch = []
current_tokens = 0

for task in sorted(tasks, key=lambda t: t["estimated_tokens"]):
if current_tokens + task["estimated_tokens"] <= MAX_BATCH_TOKENS:
current_batch.append(task)
current_tokens += task["estimated_tokens"]
else:
batches.append(current_batch)
current_batch = [task]
current_tokens = task["estimated_tokens"]

if current_batch:
batches.append(current_batch)

return batches

Strategy 2: Context Compression

Compress context for session handoffs:

# Full Context (2,500 tokens)
Session completed Track E.1 Integration Testing with 5 E2E test files...
[Full 50-line description]

# Compressed Context (<500 tokens)
**Track E.1 Complete** (2026-01-02)
- 5 E2E tests: signup, cross-platform, webhooks, offline, activation
- 70 test methods, 2,150 lines
- Next: Track E.2 (Performance Testing)
- Patterns: fixtures (api_client, mock_stripe, mock_redis)

Strategy 3: Incremental Processing

Process large tasks incrementally:

async def process_incrementally(large_task: dict) -> list[dict]:
"""Process large task in efficient increments."""

subtasks = decompose_task(large_task)
results = []

for i, subtask in enumerate(subtasks):
# Process subtask
result = await execute_subtask(subtask)
results.append(result)

# Update progress (enables early termination if needed)
update_progress(large_task["id"], (i + 1) / len(subtasks))

# Check for early success
if can_terminate_early(results, large_task["criteria"]):
break

return results

Strategy 4: Smart Caching

Cache reusable results:

class EfficiencyCache:
def __init__(self):
self.pattern_cache = {} # Reusable patterns
self.decision_cache = {} # Previous decisions
self.output_cache = {} # Generated outputs

def cache_pattern(self, pattern_id: str, pattern: dict):
"""Cache reusable pattern for future tasks."""
self.pattern_cache[pattern_id] = {
"pattern": pattern,
"uses": 0,
"created": datetime.now(timezone.utc).isoformat()
}

def get_cached_decision(self, decision_key: str) -> dict | None:
"""Retrieve cached decision if still valid."""
cached = self.decision_cache.get(decision_key)
if cached and not self._is_stale(cached):
cached["uses"] += 1
return cached["decision"]
return None

def get_efficiency_report(self) -> dict:
"""Report cache efficiency."""
return {
"patterns_cached": len(self.pattern_cache),
"pattern_reuses": sum(p["uses"] for p in self.pattern_cache.values()),
"decisions_cached": len(self.decision_cache),
"cache_hit_rate": self._calculate_hit_rate()
}

Efficiency Metrics

MetricTargetMeasurement
Token/Line Ratio<1.0Tokens used / lines generated
Context Utilization>80%Relevant tokens / total context
Parallel Efficiency>50%Time saved via parallelization
Cache Hit Rate>40%Cached reuses / total lookups
Agent Utilization>70%Active time / total time
Error Rate<5%Failed attempts / total attempts

Efficiency Anti-Patterns

Anti-PatternWaste TypeSolution
Loading full contextToken wasteSelective context loading
Sequential when parallel possibleTime wasteIdentify independent tasks
Re-computing decisionsCompute wasteCache decisions
Over-engineering simple tasksEffort wasteMatch complexity to task
Verbose outputsToken wasteConcise, structured output

Session Handoff Optimization

Minimize context for efficient handoffs:

# Efficient Session Handoff (<500 tokens)

## Session Summary
- **Completed:** Track E.1 (5 tasks, 70 tests, 2,150 lines)
- **Duration:** 1h 15m
- **Token Usage:** 25,500 / 50,000 budget

## Key Decisions
1. Use pytest fixtures for all test setup
2. Mock stripe/redis for isolation
3. Separate classes per test domain

## Next Session
- **Priority:** Track E.2 Performance Testing
- **Dependencies:** None (E.1 complete)
- **Estimated:** 45 min, ~15,000 tokens

## Reusable Patterns
- `conftest.py` fixture pattern
- Mock decorator usage
- Parametrized test structure
  • Commands: /context-snapshot, /pilot --dashboard
  • Skills: process-transparency, task-accountability
  • Hooks: session-handoff
  • Scripts: context_snapshot.py

Success Output

When this skill completes successfully, output:

✅ SKILL COMPLETE: efficiency-optimization

Optimization Results:
- [x] Token efficiency achieved: [ratio] tokens/line (target: <1.0)
- [x] Time efficiency: [time-saved]% reduction via parallelization
- [x] Context utilization: [percentage]% (target: >80%)
- [x] Cache hit rate: [percentage]% (target: >40%)
- [x] Agent utilization: [percentage]% (target: >70%)

Metrics:
- Total tokens used: [used] / [budget]
- Execution time: [actual] (estimated: [estimate])
- Tasks parallelized: [count]
- Context compressed: [original-size] → [compressed-size]

Outputs:
- Efficiency report: [path]
- Optimization recommendations: [path]
- Session handoff summary: [path]

Completion Checklist

Before marking this skill as complete, verify:

  • Token/line ratio below 1.0 for generated content
  • Context utilization above 80%
  • Parallel execution achieved >50% time savings (where applicable)
  • Cache hit rate above 40% for reusable patterns
  • Agent utilization above 70% (no significant idle time)
  • Error rate below 5%
  • All efficiency metrics calculated and documented
  • Session handoff optimized (<500 tokens)

Failure Indicators

This skill has FAILED if:

  • ❌ Token/line ratio exceeds 1.5 (inefficient generation)
  • ❌ Context utilization below 50% (wasted context window)
  • ❌ Sequential execution when parallelization possible (time waste)
  • ❌ Zero cache reuse on repeated patterns
  • ❌ Agent utilization below 30% (poor task allocation)
  • ❌ Error rate exceeds 10% (quality issues)
  • ❌ Session handoff exceeds 2000 tokens (poor compression)
  • ❌ Task estimated time exceeds actual by >100% (planning failure)

When NOT to Use

Do NOT use this skill when:

  • Simple, one-off tasks with no reuse potential
    • Solution: Execute directly without optimization overhead
  • Tasks already optimized to target metrics
    • Solution: Focus on new optimization opportunities
  • Optimization overhead exceeds efficiency gains
    • Solution: Use simpler approach for small tasks
  • Real-time requirements preclude optimization analysis
    • Solution: Optimize in separate session
  • Task complexity unknown (cannot estimate tokens/time)
    • Solution: Run once first, then optimize on repeat
  • Quality must not be compromised for efficiency
    • Solution: Prioritize quality, optimize separately
  • Learning/exploration tasks (efficiency not primary goal)
    • Solution: Focus on understanding first, optimize later

Anti-Patterns (Avoid)

Anti-PatternProblemSolution
Loading full context every timeToken waste, slow startupSelective context loading by priority
Sequential when parallel possibleTime wasteIdentify independent tasks, batch process
Re-computing cached decisionsCompute wasteImplement decision cache with validation
Over-engineering simple tasksEffort waste, complexityMatch optimization effort to task value
Verbose outputs for structured dataToken wasteUse concise, structured formats (JSON, tables)
No session handoff planningNext session context reloadCompress learnings to <500 tokens
Ignoring cache opportunitiesRepeated pattern detectionTrack pattern usage, cache frequently used
Premature optimizationComplexity before understandingMeasure first, optimize second
Optimizing non-bottlenecksWasted effortProfile to find actual bottlenecks
Sacrificing clarity for tokensComprehension lossBalance efficiency with understandability

Principles

This skill embodies the following CODITECT principles:

#1 Recycle → Extend → Re-Use → Create

  • Reuse patterns through caching (40%+ hit rate)
  • Extend existing workflows rather than rebuild
  • Create new patterns only when no reusable match

#2 Automation with Minimal Human Intervention

  • Automated efficiency metric calculation
  • Smart context compression for handoffs
  • Automatic batch processing and parallelization
  • Self-optimizing cache management

#3 Separation of Concerns

  • Token efficiency separate from time efficiency
  • Context management isolated from execution
  • Metrics collection independent of task logic

#4 Keep It Simple

  • Optimize only when ROI justifies effort
  • Simplest solution that meets efficiency targets
  • Avoid over-engineering for marginal gains

#5 Eliminate Ambiguity

  • Clear efficiency targets (token/line <1.0, context >80%)
  • Measurable success criteria
  • Explicit optimization vs. quality tradeoffs

#6 Clear, Understandable, Explainable

  • All optimizations documented with metrics
  • Compression maintains comprehension
  • Efficiency reports show impact clearly

#8 No Assumptions

  • Measure baseline before optimizing
  • Validate efficiency gains with metrics
  • Confirm cache validity before reuse

Skill Version: 1.0.0 Created: 2026-01-02 Author: CODITECT Process Refinement