Efficiency Optimization Skill
Optimize AI agent workflows for maximum efficiency, minimal token usage, and faster task completion while maintaining quality.
Purpose
This skill ensures AI agent workflows are optimized for efficiency across multiple dimensions: token usage, execution time, context utilization, and resource consumption. It enables sustainable, cost-effective agentic operations.
When to Use
- Planning multi-task workflows
- Designing context handoffs between sessions
- Optimizing agent selection and routing
- Reducing token consumption
- Improving task execution speed
Baseline Metrics Template
Before optimizing, establish baselines using this template:
# efficiency-baseline.yaml
project: "Your Project Name"
date: "YYYY-MM-DD"
session_id: "session-xxx"
# Token Metrics (measure before optimization)
tokens:
context_loaded: 0 # Tokens loaded at session start
execution_used: 0 # Tokens used during execution
output_generated: 0 # Tokens in final output
total: 0 # Sum of above
lines_generated: 0 # Lines of code/docs generated
token_per_line: 0.0 # Calculate: total / lines_generated
target: 1.0 # Target: <1.0 tokens/line
# Time Metrics
time:
sequential_estimate_min: 0 # If run sequentially
actual_execution_min: 0 # Actual time taken
parallelization_savings: 0 # Calculate: 1 - (actual / sequential)
target_savings: 0.5 # Target: >50% time saved
# Context Metrics
context:
available_tokens: 200000 # Context window size
used_tokens: 0 # Tokens actually used
utilization: 0.0 # Calculate: used / available
target_utilization: 0.8 # Target: >80%
# Cache Metrics
cache:
total_lookups: 0 # Times cache was queried
cache_hits: 0 # Successful cache retrievals
hit_rate: 0.0 # Calculate: hits / lookups
target_hit_rate: 0.4 # Target: >40%
# Agent Metrics
agents:
agents_used: [] # List of agents invoked
total_active_time_min: 0 # Time agents were working
total_elapsed_time_min: 0 # Wall clock time
utilization: 0.0 # Calculate: active / elapsed
target_utilization: 0.7 # Target: >70%
# Error Metrics
errors:
total_attempts: 0 # Total task attempts
failed_attempts: 0 # Failed attempts
error_rate: 0.0 # Calculate: failed / total
target_error_rate: 0.05 # Target: <5%
Quick Baseline Check:
| Metric | Your Value | Target | Status |
|---|---|---|---|
| Token/Line Ratio | _____ | <1.0 | ⬜ |
| Time Savings | _____% | >50% | ⬜ |
| Context Utilization | _____% | >80% | ⬜ |
| Cache Hit Rate | _____% | >40% | ⬜ |
| Agent Utilization | _____% | >70% | ⬜ |
| Error Rate | _____% | <5% | ⬜ |
Efficiency Dimensions
1. Token Efficiency
Minimize token consumption while maintaining quality:
**Token Budget Analysis**
Task: Create E2E test suite for Track E.1
├── Context Loading: 2,500 tokens
│ ├── PILOT plan (relevant sections): 800 tokens
│ ├── Previous session context: 1,200 tokens
│ └── Pattern examples: 500 tokens
├── Execution: 15,000 tokens
│ ├── E.1.1 (10 tests): 3,000 tokens
│ ├── E.1.2 (15 tests): 3,500 tokens
│ ├── E.1.3 (14 tests): 3,200 tokens
│ ├── E.1.4 (15 tests): 2,800 tokens
│ └── E.1.5 (16 tests): 2,500 tokens
├── Output: 8,000 tokens
│ └── 2,150 lines of test code
└── Total: 25,500 tokens
Efficiency Score: 0.84 tokens/line (target: <1.0)
2. Time Efficiency
Minimize execution time through parallelization:
Sequential Execution (Baseline)
═══════════════════════════════════════════════════════════════
E.1.1 ████████████████████ 15 min
E.1.2 ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 12 min
E.1.3 ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 18 min
E.1.4 ░░░░░░░░░░░░░░░░░░░░ 14 min
E.1.5 ████████████████████ 16 min
Total: 75 minutes
Parallel Execution (Optimized)
═══════════════════════════════════════════════════════════════
E.1.1 ████████████████████ }
E.1.2 ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ } 18 min (parallel batch 1)
E.1.3 ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ }
E.1.4 ░░░░░░░░░░░░░░░░░░░░ } 16 min (parallel batch 2)
E.1.5 ████████████████████ }
Total: 34 minutes (55% reduction)
3. Context Efficiency
Maximize context window utilization:
# Context Window Optimization
def optimize_context(task: str, available_tokens: int) -> dict:
"""Optimize context selection for maximum relevance."""
# Priority 1: Essential context (always include)
essential = {
"current_task": 200, # Task description
"plan_section": 500, # Relevant plan section
"constraints": 300 # Quality requirements
}
# Priority 2: Supporting context (include if room)
supporting = {
"patterns": 500, # Similar implementations
"session_history": 800, # Recent decisions
"error_solutions": 400 # Known solutions
}
# Priority 3: Background context (nice to have)
background = {
"full_plan": 2000, # Complete plan
"all_history": 3000, # Full session history
"codebase_map": 1500 # File structure
}
# Fit within budget
context = {}
remaining = available_tokens
for category in [essential, supporting, background]:
for key, tokens in category.items():
if tokens <= remaining:
context[key] = tokens
remaining -= tokens
return {
"selected_context": context,
"total_tokens": available_tokens - remaining,
"utilization": (available_tokens - remaining) / available_tokens
}
4. Resource Efficiency
Optimize agent and compute resource usage:
**Resource Utilization Report**
Agent Efficiency:
├── testing-specialist: 92% utilization
│ └─ 5 tasks, 1h 15m active, 0 idle
├── codi-qa-specialist: 0% (not used)
│ └─ Correctly avoided - testing-specialist better match
└── general-purpose: 8% utilization
└─ Fallback for 2 simple tasks
Compute Efficiency:
├── API Calls: 23 (optimized from estimated 45)
├── Token Usage: 25,500 (budget: 50,000)
├── Cache Hits: 12/23 (52% reuse)
└── Error Retries: 0 (no wasted calls)
Optimization Strategies
Strategy 1: Batch Processing
Group related tasks for efficiency:
def batch_tasks(tasks: list[dict]) -> list[list[dict]]:
"""Group tasks into efficient batches."""
batches = []
current_batch = []
current_tokens = 0
for task in sorted(tasks, key=lambda t: t["estimated_tokens"]):
if current_tokens + task["estimated_tokens"] <= MAX_BATCH_TOKENS:
current_batch.append(task)
current_tokens += task["estimated_tokens"]
else:
batches.append(current_batch)
current_batch = [task]
current_tokens = task["estimated_tokens"]
if current_batch:
batches.append(current_batch)
return batches
Strategy 2: Context Compression
Compress context for session handoffs:
# Full Context (2,500 tokens)
Session completed Track E.1 Integration Testing with 5 E2E test files...
[Full 50-line description]
# Compressed Context (<500 tokens)
**Track E.1 Complete** (2026-01-02)
- 5 E2E tests: signup, cross-platform, webhooks, offline, activation
- 70 test methods, 2,150 lines
- Next: Track E.2 (Performance Testing)
- Patterns: fixtures (api_client, mock_stripe, mock_redis)
Strategy 3: Incremental Processing
Process large tasks incrementally:
async def process_incrementally(large_task: dict) -> list[dict]:
"""Process large task in efficient increments."""
subtasks = decompose_task(large_task)
results = []
for i, subtask in enumerate(subtasks):
# Process subtask
result = await execute_subtask(subtask)
results.append(result)
# Update progress (enables early termination if needed)
update_progress(large_task["id"], (i + 1) / len(subtasks))
# Check for early success
if can_terminate_early(results, large_task["criteria"]):
break
return results
Strategy 4: Smart Caching
Cache reusable results:
class EfficiencyCache:
def __init__(self):
self.pattern_cache = {} # Reusable patterns
self.decision_cache = {} # Previous decisions
self.output_cache = {} # Generated outputs
def cache_pattern(self, pattern_id: str, pattern: dict):
"""Cache reusable pattern for future tasks."""
self.pattern_cache[pattern_id] = {
"pattern": pattern,
"uses": 0,
"created": datetime.now(timezone.utc).isoformat()
}
def get_cached_decision(self, decision_key: str) -> dict | None:
"""Retrieve cached decision if still valid."""
cached = self.decision_cache.get(decision_key)
if cached and not self._is_stale(cached):
cached["uses"] += 1
return cached["decision"]
return None
def get_efficiency_report(self) -> dict:
"""Report cache efficiency."""
return {
"patterns_cached": len(self.pattern_cache),
"pattern_reuses": sum(p["uses"] for p in self.pattern_cache.values()),
"decisions_cached": len(self.decision_cache),
"cache_hit_rate": self._calculate_hit_rate()
}
Efficiency Metrics
| Metric | Target | Measurement |
|---|---|---|
| Token/Line Ratio | <1.0 | Tokens used / lines generated |
| Context Utilization | >80% | Relevant tokens / total context |
| Parallel Efficiency | >50% | Time saved via parallelization |
| Cache Hit Rate | >40% | Cached reuses / total lookups |
| Agent Utilization | >70% | Active time / total time |
| Error Rate | <5% | Failed attempts / total attempts |
Efficiency Anti-Patterns
| Anti-Pattern | Waste Type | Solution |
|---|---|---|
| Loading full context | Token waste | Selective context loading |
| Sequential when parallel possible | Time waste | Identify independent tasks |
| Re-computing decisions | Compute waste | Cache decisions |
| Over-engineering simple tasks | Effort waste | Match complexity to task |
| Verbose outputs | Token waste | Concise, structured output |
Session Handoff Optimization
Minimize context for efficient handoffs:
# Efficient Session Handoff (<500 tokens)
## Session Summary
- **Completed:** Track E.1 (5 tasks, 70 tests, 2,150 lines)
- **Duration:** 1h 15m
- **Token Usage:** 25,500 / 50,000 budget
## Key Decisions
1. Use pytest fixtures for all test setup
2. Mock stripe/redis for isolation
3. Separate classes per test domain
## Next Session
- **Priority:** Track E.2 Performance Testing
- **Dependencies:** None (E.1 complete)
- **Estimated:** 45 min, ~15,000 tokens
## Reusable Patterns
- `conftest.py` fixture pattern
- Mock decorator usage
- Parametrized test structure
Related Components
- Commands:
/context-snapshot,/pilot --dashboard - Skills: process-transparency, task-accountability
- Hooks: session-handoff
- Scripts: context_snapshot.py
Success Output
When this skill completes successfully, output:
✅ SKILL COMPLETE: efficiency-optimization
Optimization Results:
- [x] Token efficiency achieved: [ratio] tokens/line (target: <1.0)
- [x] Time efficiency: [time-saved]% reduction via parallelization
- [x] Context utilization: [percentage]% (target: >80%)
- [x] Cache hit rate: [percentage]% (target: >40%)
- [x] Agent utilization: [percentage]% (target: >70%)
Metrics:
- Total tokens used: [used] / [budget]
- Execution time: [actual] (estimated: [estimate])
- Tasks parallelized: [count]
- Context compressed: [original-size] → [compressed-size]
Outputs:
- Efficiency report: [path]
- Optimization recommendations: [path]
- Session handoff summary: [path]
Completion Checklist
Before marking this skill as complete, verify:
- Token/line ratio below 1.0 for generated content
- Context utilization above 80%
- Parallel execution achieved >50% time savings (where applicable)
- Cache hit rate above 40% for reusable patterns
- Agent utilization above 70% (no significant idle time)
- Error rate below 5%
- All efficiency metrics calculated and documented
- Session handoff optimized (<500 tokens)
Failure Indicators
This skill has FAILED if:
- ❌ Token/line ratio exceeds 1.5 (inefficient generation)
- ❌ Context utilization below 50% (wasted context window)
- ❌ Sequential execution when parallelization possible (time waste)
- ❌ Zero cache reuse on repeated patterns
- ❌ Agent utilization below 30% (poor task allocation)
- ❌ Error rate exceeds 10% (quality issues)
- ❌ Session handoff exceeds 2000 tokens (poor compression)
- ❌ Task estimated time exceeds actual by >100% (planning failure)
When NOT to Use
Do NOT use this skill when:
- Simple, one-off tasks with no reuse potential
- Solution: Execute directly without optimization overhead
- Tasks already optimized to target metrics
- Solution: Focus on new optimization opportunities
- Optimization overhead exceeds efficiency gains
- Solution: Use simpler approach for small tasks
- Real-time requirements preclude optimization analysis
- Solution: Optimize in separate session
- Task complexity unknown (cannot estimate tokens/time)
- Solution: Run once first, then optimize on repeat
- Quality must not be compromised for efficiency
- Solution: Prioritize quality, optimize separately
- Learning/exploration tasks (efficiency not primary goal)
- Solution: Focus on understanding first, optimize later
Anti-Patterns (Avoid)
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Loading full context every time | Token waste, slow startup | Selective context loading by priority |
| Sequential when parallel possible | Time waste | Identify independent tasks, batch process |
| Re-computing cached decisions | Compute waste | Implement decision cache with validation |
| Over-engineering simple tasks | Effort waste, complexity | Match optimization effort to task value |
| Verbose outputs for structured data | Token waste | Use concise, structured formats (JSON, tables) |
| No session handoff planning | Next session context reload | Compress learnings to <500 tokens |
| Ignoring cache opportunities | Repeated pattern detection | Track pattern usage, cache frequently used |
| Premature optimization | Complexity before understanding | Measure first, optimize second |
| Optimizing non-bottlenecks | Wasted effort | Profile to find actual bottlenecks |
| Sacrificing clarity for tokens | Comprehension loss | Balance efficiency with understandability |
Principles
This skill embodies the following CODITECT principles:
#1 Recycle → Extend → Re-Use → Create
- Reuse patterns through caching (40%+ hit rate)
- Extend existing workflows rather than rebuild
- Create new patterns only when no reusable match
#2 Automation with Minimal Human Intervention
- Automated efficiency metric calculation
- Smart context compression for handoffs
- Automatic batch processing and parallelization
- Self-optimizing cache management
#3 Separation of Concerns
- Token efficiency separate from time efficiency
- Context management isolated from execution
- Metrics collection independent of task logic
#4 Keep It Simple
- Optimize only when ROI justifies effort
- Simplest solution that meets efficiency targets
- Avoid over-engineering for marginal gains
#5 Eliminate Ambiguity
- Clear efficiency targets (token/line <1.0, context >80%)
- Measurable success criteria
- Explicit optimization vs. quality tradeoffs
#6 Clear, Understandable, Explainable
- All optimizations documented with metrics
- Compression maintains comprehension
- Efficiency reports show impact clearly
#8 No Assumptions
- Measure baseline before optimizing
- Validate efficiency gains with metrics
- Confirm cache validity before reuse
Skill Version: 1.0.0 Created: 2026-01-02 Author: CODITECT Process Refinement