Efficiency Optimization Skill

Optimize AI agent workflows for maximum efficiency, minimal token usage, and faster task completion while maintaining quality.

Purpose

This skill ensures AI agent workflows are optimized for efficiency across multiple dimensions: token usage, execution time, context utilization, and resource consumption. It enables sustainable, cost-effective agentic operations.

When to Use

Planning multi-task workflows
Designing context handoffs between sessions
Optimizing agent selection and routing
Reducing token consumption
Improving task execution speed

Baseline Metrics Template

Before optimizing, establish baselines using this template:

# efficiency-baseline.yaml
project: "Your Project Name"
date: "YYYY-MM-DD"
session_id: "session-xxx"

# Token Metrics (measure before optimization)
tokens:
  context_loaded: 0          # Tokens loaded at session start
  execution_used: 0          # Tokens used during execution
  output_generated: 0        # Tokens in final output
  total: 0                   # Sum of above
  lines_generated: 0         # Lines of code/docs generated
  token_per_line: 0.0        # Calculate: total / lines_generated
  target: 1.0                # Target: <1.0 tokens/line

# Time Metrics
time:
  sequential_estimate_min: 0 # If run sequentially
  actual_execution_min: 0    # Actual time taken
  parallelization_savings: 0 # Calculate: 1 - (actual / sequential)
  target_savings: 0.5        # Target: >50% time saved

# Context Metrics
context:
  available_tokens: 200000   # Context window size
  used_tokens: 0             # Tokens actually used
  utilization: 0.0           # Calculate: used / available
  target_utilization: 0.8    # Target: >80%

# Cache Metrics
cache:
  total_lookups: 0           # Times cache was queried
  cache_hits: 0              # Successful cache retrievals
  hit_rate: 0.0              # Calculate: hits / lookups
  target_hit_rate: 0.4       # Target: >40%

# Agent Metrics
agents:
  agents_used: []            # List of agents invoked
  total_active_time_min: 0   # Time agents were working
  total_elapsed_time_min: 0  # Wall clock time
  utilization: 0.0           # Calculate: active / elapsed
  target_utilization: 0.7    # Target: >70%

# Error Metrics
errors:
  total_attempts: 0          # Total task attempts
  failed_attempts: 0         # Failed attempts
  error_rate: 0.0            # Calculate: failed / total
  target_error_rate: 0.05    # Target: <5%

Quick Baseline Check:

Metric	Your Value	Target	Status
Token/Line Ratio	_____	<1.0	⬜
Time Savings	_____%	>50%	⬜
Context Utilization	_____%	>80%	⬜
Cache Hit Rate	_____%	>40%	⬜
Agent Utilization	_____%	>70%	⬜
Error Rate	_____%	<5%	⬜

Efficiency Dimensions

1. Token Efficiency

Minimize token consumption while maintaining quality:

**Token Budget Analysis**

Task: Create E2E test suite for Track E.1
├── Context Loading: 2,500 tokens
│   ├── PILOT plan (relevant sections): 800 tokens
│   ├── Previous session context: 1,200 tokens
│   └── Pattern examples: 500 tokens
├── Execution: 15,000 tokens
│   ├── E.1.1 (10 tests): 3,000 tokens
│   ├── E.1.2 (15 tests): 3,500 tokens
│   ├── E.1.3 (14 tests): 3,200 tokens
│   ├── E.1.4 (15 tests): 2,800 tokens
│   └── E.1.5 (16 tests): 2,500 tokens
├── Output: 8,000 tokens
│   └── 2,150 lines of test code
└── Total: 25,500 tokens

Efficiency Score: 0.84 tokens/line (target: <1.0)

2. Time Efficiency

Minimize execution time through parallelization:

Sequential Execution (Baseline)
═══════════════════════════════════════════════════════════════
E.1.1 ████████████████████ 15 min
E.1.2 ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 12 min
E.1.3 ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 18 min
E.1.4 ░░░░░░░░░░░░░░░░░░░░ 14 min
E.1.5 ████████████████████ 16 min
Total: 75 minutes

Parallel Execution (Optimized)
═══════════════════════════════════════════════════════════════
E.1.1 ████████████████████ }
E.1.2 ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ } 18 min (parallel batch 1)
E.1.3 ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ }
E.1.4 ░░░░░░░░░░░░░░░░░░░░ } 16 min (parallel batch 2)
E.1.5 ████████████████████ }
Total: 34 minutes (55% reduction)

3. Context Efficiency

Maximize context window utilization:

# Context Window Optimization
def optimize_context(task: str, available_tokens: int) -> dict:
    """Optimize context selection for maximum relevance."""

    # Priority 1: Essential context (always include)
    essential = {
        "current_task": 200,      # Task description
        "plan_section": 500,      # Relevant plan section
        "constraints": 300        # Quality requirements
    }

    # Priority 2: Supporting context (include if room)
    supporting = {
        "patterns": 500,          # Similar implementations
        "session_history": 800,   # Recent decisions
        "error_solutions": 400    # Known solutions
    }

    # Priority 3: Background context (nice to have)
    background = {
        "full_plan": 2000,        # Complete plan
        "all_history": 3000,      # Full session history
        "codebase_map": 1500      # File structure
    }

    # Fit within budget
    context = {}
    remaining = available_tokens

    for category in [essential, supporting, background]:
        for key, tokens in category.items():
            if tokens <= remaining:
                context[key] = tokens
                remaining -= tokens

    return {
        "selected_context": context,
        "total_tokens": available_tokens - remaining,
        "utilization": (available_tokens - remaining) / available_tokens
    }

4. Resource Efficiency

Optimize agent and compute resource usage:

**Resource Utilization Report**

Agent Efficiency:
├── testing-specialist: 92% utilization
│   └─ 5 tasks, 1h 15m active, 0 idle
├── codi-qa-specialist: 0% (not used)
│   └─ Correctly avoided - testing-specialist better match
└── general-purpose: 8% utilization
    └─ Fallback for 2 simple tasks

Compute Efficiency:
├── API Calls: 23 (optimized from estimated 45)
├── Token Usage: 25,500 (budget: 50,000)
├── Cache Hits: 12/23 (52% reuse)
└── Error Retries: 0 (no wasted calls)

Optimization Strategies

Strategy 1: Batch Processing

Group related tasks for efficiency:

def batch_tasks(tasks: list[dict]) -> list[list[dict]]:
    """Group tasks into efficient batches."""
    batches = []
    current_batch = []
    current_tokens = 0

    for task in sorted(tasks, key=lambda t: t["estimated_tokens"]):
        if current_tokens + task["estimated_tokens"] <= MAX_BATCH_TOKENS:
            current_batch.append(task)
            current_tokens += task["estimated_tokens"]
        else:
            batches.append(current_batch)
            current_batch = [task]
            current_tokens = task["estimated_tokens"]

    if current_batch:
        batches.append(current_batch)

    return batches

Strategy 2: Context Compression

Compress context for session handoffs:

# Full Context (2,500 tokens)
Session completed Track E.1 Integration Testing with 5 E2E test files...
[Full 50-line description]

# Compressed Context (<500 tokens)
**Track E.1 Complete** (2026-01-02)
- 5 E2E tests: signup, cross-platform, webhooks, offline, activation
- 70 test methods, 2,150 lines
- Next: Track E.2 (Performance Testing)
- Patterns: fixtures (api_client, mock_stripe, mock_redis)

Strategy 3: Incremental Processing

Process large tasks incrementally:

async def process_incrementally(large_task: dict) -> list[dict]:
    """Process large task in efficient increments."""

    subtasks = decompose_task(large_task)
    results = []

    for i, subtask in enumerate(subtasks):
        # Process subtask
        result = await execute_subtask(subtask)
        results.append(result)

        # Update progress (enables early termination if needed)
        update_progress(large_task["id"], (i + 1) / len(subtasks))

        # Check for early success
        if can_terminate_early(results, large_task["criteria"]):
            break

    return results

Strategy 4: Smart Caching

Cache reusable results:

class EfficiencyCache:
    def __init__(self):
        self.pattern_cache = {}  # Reusable patterns
        self.decision_cache = {} # Previous decisions
        self.output_cache = {}   # Generated outputs

    def cache_pattern(self, pattern_id: str, pattern: dict):
        """Cache reusable pattern for future tasks."""
        self.pattern_cache[pattern_id] = {
            "pattern": pattern,
            "uses": 0,
            "created": datetime.now(timezone.utc).isoformat()
        }

    def get_cached_decision(self, decision_key: str) -> dict | None:
        """Retrieve cached decision if still valid."""
        cached = self.decision_cache.get(decision_key)
        if cached and not self._is_stale(cached):
            cached["uses"] += 1
            return cached["decision"]
        return None

    def get_efficiency_report(self) -> dict:
        """Report cache efficiency."""
        return {
            "patterns_cached": len(self.pattern_cache),
            "pattern_reuses": sum(p["uses"] for p in self.pattern_cache.values()),
            "decisions_cached": len(self.decision_cache),
            "cache_hit_rate": self._calculate_hit_rate()
        }

Efficiency Metrics

Metric	Target	Measurement
Token/Line Ratio	<1.0	Tokens used / lines generated
Context Utilization	>80%	Relevant tokens / total context
Parallel Efficiency	>50%	Time saved via parallelization
Cache Hit Rate	>40%	Cached reuses / total lookups
Agent Utilization	>70%	Active time / total time
Error Rate	<5%	Failed attempts / total attempts

Efficiency Anti-Patterns

Anti-Pattern	Waste Type	Solution
Loading full context	Token waste	Selective context loading
Sequential when parallel possible	Time waste	Identify independent tasks
Re-computing decisions	Compute waste	Cache decisions
Over-engineering simple tasks	Effort waste	Match complexity to task
Verbose outputs	Token waste	Concise, structured output

Session Handoff Optimization

Minimize context for efficient handoffs:

# Efficient Session Handoff (<500 tokens)

## Session Summary
- **Completed:** Track E.1 (5 tasks, 70 tests, 2,150 lines)
- **Duration:** 1h 15m
- **Token Usage:** 25,500 / 50,000 budget

## Key Decisions
1. Use pytest fixtures for all test setup
2. Mock stripe/redis for isolation
3. Separate classes per test domain

## Next Session
- **Priority:** Track E.2 Performance Testing
- **Dependencies:** None (E.1 complete)
- **Estimated:** 45 min, ~15,000 tokens

## Reusable Patterns
- `conftest.py` fixture pattern
- Mock decorator usage
- Parametrized test structure

Commands: /context-snapshot, /pilot --dashboard
Skills: process-transparency, task-accountability
Hooks: session-handoff
Scripts: context_snapshot.py

Success Output

When this skill completes successfully, output:

✅ SKILL COMPLETE: efficiency-optimization

Optimization Results:
- [x] Token efficiency achieved: [ratio] tokens/line (target: <1.0)
- [x] Time efficiency: [time-saved]% reduction via parallelization
- [x] Context utilization: [percentage]% (target: >80%)
- [x] Cache hit rate: [percentage]% (target: >40%)
- [x] Agent utilization: [percentage]% (target: >70%)

Metrics:
- Total tokens used: [used] / [budget]
- Execution time: [actual] (estimated: [estimate])
- Tasks parallelized: [count]
- Context compressed: [original-size] → [compressed-size]

Outputs:
- Efficiency report: [path]
- Optimization recommendations: [path]
- Session handoff summary: [path]

Completion Checklist

Before marking this skill as complete, verify:

Token/line ratio below 1.0 for generated content
Context utilization above 80%
Parallel execution achieved >50% time savings (where applicable)
Cache hit rate above 40% for reusable patterns
Agent utilization above 70% (no significant idle time)
Error rate below 5%
All efficiency metrics calculated and documented
Session handoff optimized (<500 tokens)

Failure Indicators

This skill has FAILED if:

❌ Token/line ratio exceeds 1.5 (inefficient generation)
❌ Context utilization below 50% (wasted context window)
❌ Sequential execution when parallelization possible (time waste)
❌ Zero cache reuse on repeated patterns
❌ Agent utilization below 30% (poor task allocation)
❌ Error rate exceeds 10% (quality issues)
❌ Session handoff exceeds 2000 tokens (poor compression)
❌ Task estimated time exceeds actual by >100% (planning failure)

When NOT to Use

Do NOT use this skill when:

Simple, one-off tasks with no reuse potential
- Solution: Execute directly without optimization overhead
Tasks already optimized to target metrics
- Solution: Focus on new optimization opportunities
Optimization overhead exceeds efficiency gains
- Solution: Use simpler approach for small tasks
Real-time requirements preclude optimization analysis
- Solution: Optimize in separate session
Task complexity unknown (cannot estimate tokens/time)
- Solution: Run once first, then optimize on repeat
Quality must not be compromised for efficiency
- Solution: Prioritize quality, optimize separately
Learning/exploration tasks (efficiency not primary goal)
- Solution: Focus on understanding first, optimize later

Anti-Patterns (Avoid)

Anti-Pattern	Problem	Solution
Loading full context every time	Token waste, slow startup	Selective context loading by priority
Sequential when parallel possible	Time waste	Identify independent tasks, batch process
Re-computing cached decisions	Compute waste	Implement decision cache with validation
Over-engineering simple tasks	Effort waste, complexity	Match optimization effort to task value
Verbose outputs for structured data	Token waste	Use concise, structured formats (JSON, tables)
No session handoff planning	Next session context reload	Compress learnings to <500 tokens
Ignoring cache opportunities	Repeated pattern detection	Track pattern usage, cache frequently used
Premature optimization	Complexity before understanding	Measure first, optimize second
Optimizing non-bottlenecks	Wasted effort	Profile to find actual bottlenecks
Sacrificing clarity for tokens	Comprehension loss	Balance efficiency with understandability

Principles

This skill embodies the following CODITECT principles:

#1 Recycle → Extend → Re-Use → Create

Reuse patterns through caching (40%+ hit rate)
Extend existing workflows rather than rebuild
Create new patterns only when no reusable match

#2 Automation with Minimal Human Intervention

Automated efficiency metric calculation
Smart context compression for handoffs
Automatic batch processing and parallelization
Self-optimizing cache management

#3 Separation of Concerns

Token efficiency separate from time efficiency
Context management isolated from execution
Metrics collection independent of task logic

#4 Keep It Simple

Optimize only when ROI justifies effort
Simplest solution that meets efficiency targets
Avoid over-engineering for marginal gains

#5 Eliminate Ambiguity

Clear efficiency targets (token/line <1.0, context >80%)
Measurable success criteria
Explicit optimization vs. quality tradeoffs

#6 Clear, Understandable, Explainable

All optimizations documented with metrics
Compression maintains comprehension
Efficiency reports show impact clearly

#8 No Assumptions

Measure baseline before optimizing
Validate efficiency gains with metrics
Confirm cache validity before reuse

Skill Version: 1.0.0 Created: 2026-01-02 Author: CODITECT Process Refinement

Purpose​

When to Use​

Baseline Metrics Template​

Efficiency Dimensions​

1. Token Efficiency​

2. Time Efficiency​

3. Context Efficiency​

4. Resource Efficiency​

Optimization Strategies​

Strategy 1: Batch Processing​

Strategy 2: Context Compression​

Strategy 3: Incremental Processing​

Strategy 4: Smart Caching​

Efficiency Metrics​

Efficiency Anti-Patterns​

Session Handoff Optimization​

Related Components​

Success Output​

Completion Checklist​

Failure Indicators​

When NOT to Use​

Anti-Patterns (Avoid)​

Principles​