Skip to main content

Context Optimization Techniques

Context Optimization Techniques

Context optimization extends the effective capacity of limited context windows through strategic compression, masking, caching, and partitioning. The goal is not to magically increase context windows but to make better use of available capacity. Effective optimization can double or triple effective context capacity without requiring larger models or longer contexts.

When to Use

Use this skill when:

  • Context limits constrain task complexity
  • Optimizing for cost reduction (fewer tokens = lower costs)
  • Reducing latency for long conversations
  • Implementing long-running agent systems
  • Needing to handle larger documents or conversations
  • Building production systems at scale

Don't use this skill when:

  • Short conversations with ample context budget
  • Latency-insensitive applications where optimization overhead isn't worth it
  • Single-turn interactions

Core Concepts

Context optimization extends effective capacity through four primary strategies:

StrategyMechanismToken SavingsQuality Impact
CompactionSummarize context near limits50-70%Low if done well
Observation MaskingReplace verbose outputs with references60-80%Very low
KV-Cache OptimizationReuse cached computationsCost savingsNone
Context PartitioningSplit work across isolated contextsVariableEnables isolation

Compaction Strategies

Compaction summarizes context contents when approaching limits, then reinitializes with the summary.

What to Compress (Priority Order)

  1. Tool outputs: Replace with summaries (highest impact)
  2. Old turns: Summarize early conversation
  3. Retrieved docs: Summarize if recent versions exist
  4. Never compress: System prompt

Summary Generation by Type

Message TypePreserveRemove
Tool outputsKey findings, metrics, conclusionsVerbose raw output
ConversationalDecisions, commitments, context shiftsFiller, back-and-forth
DocumentsKey facts and claimsSupporting evidence, elaboration

Observation Masking

Tool outputs can comprise 80%+ of token usage in agent trajectories. Once an agent has used a tool output to make a decision, keeping the full output provides diminishing value.

Masking Strategy Selection

ConditionAction
Critical to current taskNever mask
Most recent turnNever mask
Used in active reasoningNever mask
3+ turns agoConsider masking
Purpose has been servedConsider masking
Already summarizedAlways mask
Boilerplate headers/footersAlways mask

Implementation Pattern

def mask_observation(observation: str, max_length: int = 500):
if len(observation) <= max_length:
return observation

# Store full observation
ref_id = store_observation(observation)
key_info = extract_key_points(observation)

return f"[Obs:{ref_id} elided. Key: {key_info}]"

KV-Cache Optimization

The KV-cache stores Key and Value tensors computed during inference. Caching across requests with identical prefixes avoids recomputation.

Cache-Friendly Context Ordering

# Optimal ordering for cache hits
context = []

# 1. Stable content first (highly cacheable)
context.extend([system_prompt, tool_definitions])

# 2. Frequently reused elements
context.extend([reused_templates, common_instructions])

# 3. Unique content last (not cached)
context.extend([unique_query, dynamic_content])

Cache Stability Design

  • Avoid dynamic content like timestamps in stable sections
  • Use consistent formatting across sessions
  • Keep structure stable, vary only what must vary

Context Partitioning

The most aggressive optimization: partition work across sub-agents with isolated contexts.

When to Partition

  • Single context cannot contain all required information
  • Subtasks are logically independent
  • Context degradation is already occurring
  • Parallel execution would speed completion

Partitioning Pattern

┌─────────────────────────────────────────────┐
│ Coordinator Agent │
│ (Minimal context: task + routing) │
└─────────────────┬───────────────────────────┘

┌─────────────┼─────────────┐
▼ ▼ ▼
┌───────┐ ┌───────┐ ┌───────┐
│ Sub-1 │ │ Sub-2 │ │ Sub-3 │
│(Clean │ │(Clean │ │(Clean │
│context)│ │context)│ │context)│
└───────┘ └───────┘ └───────┘

Each sub-agent operates in a clean context focused on its subtask.

Budget Management

Context Budget Allocation

context_budget:
system_prompt: 2000 # Fixed, never compress
tool_definitions: 1500 # Fixed
retrieved_docs: 3000 # Compressible
message_history: 5000 # Compressible
reserved_buffer: 2000 # Safety margin
total_limit: 16000

Trigger-Based Optimization

def should_optimize(context_usage: int, budget: int) -> bool:
utilization = context_usage / budget

if utilization > 0.8:
return True # Hard trigger

if utilization > 0.7 and quality_degradation_detected():
return True # Soft trigger with quality check

return False

Performance Targets

TechniqueTarget SavingsQuality Threshold
Compaction50-70% reduction<5% quality degradation
Masking60-80% on masked obsMinimal impact
Cache70%+ hit rateNo impact
PartitioningVariableEnables capability

Optimization Decision Framework

Context utilization > 70%?
├── Yes → Check what dominates context
│ ├── Tool outputs dominate → Apply observation masking
│ ├── Retrieved docs dominate → Summarize or partition
│ ├── Message history dominates → Compaction with summarization
│ └── Multiple components → Combine strategies

└── No → Continue, but monitor

Example: Combined Optimization

async def optimize_context(context: Context) -> Context:
# Step 1: Mask old observations
context.mask_old_observations(turns_threshold=3)

# Step 2: Check if still over budget
if context.utilization > 0.8:
# Step 3: Compact message history
context.compact_history(preserve_recent=5)

# Step 4: If still over, partition to sub-agents
if context.utilization > 0.9:
return await partition_to_subagents(context)

return context

Guidelines

  1. Measure before optimizing—know your current state
  2. Apply compaction before masking when possible
  3. Design for cache stability with consistent prompts
  4. Partition before context becomes problematic
  5. Monitor optimization effectiveness over time
  6. Balance token savings against quality preservation
  7. Test optimization at production scale
  8. Implement graceful degradation for edge cases

Skills

  • context-fundamentals - Context basics (prerequisite)
  • context-degradation - Understanding when to optimize
  • context-compression - Detailed compression strategies

Agents

  • context-health-analyst - Monitor optimization needs
  • multi-agent-coordinator - Partition coordination

Scripts

  • external/Agent-Skills-for-Context-Engineering/skills/context-optimization/scripts/compaction.py - Observation store and budget management

Success Output

When successful, this skill MUST output:

✅ SKILL COMPLETE: context-optimization

Completed:
- [x] Context utilization measured (before: X%, after: Y%)
- [x] Optimization strategy selected and applied
- [x] Token savings achieved (Z% reduction)
- [x] Quality validation passed (<5% degradation)
- [x] Performance metrics recorded

Optimization Applied:
- Strategy: [Compaction/Masking/Partitioning/Combined]
- Tokens before: X | Tokens after: Y | Savings: Z%
- Quality impact: <5% degradation
- Cache hit rate: N% (if KV-cache optimization)

Outputs:
- Optimized context with reduced token count
- Quality metrics report
- Performance benchmarks

Completion Checklist

Before marking this skill as complete, verify:

  • Baseline context utilization measured before optimization
  • Optimization strategy selected based on what dominates context
  • Compaction applied to message history (if needed)
  • Observation masking applied to old tool outputs (if needed)
  • KV-cache stability design implemented (if applicable)
  • Context partitioning to sub-agents (if needed)
  • Token reduction achieved (target: 50-70%)
  • Quality degradation measured (<5% threshold)
  • Performance benchmarks recorded (latency, throughput)
  • Budget allocation documented
  • Trigger-based optimization rules configured
  • Monitoring in place for ongoing optimization

Failure Indicators

This skill has FAILED if:

  • ❌ Context utilization still >80% after optimization
  • ❌ Quality degradation >5% (unacceptable quality loss)
  • ❌ No token savings achieved (ineffective optimization)
  • ❌ System prompt accidentally compressed (critical data lost)
  • ❌ Recent tool outputs masked (breaks current reasoning)
  • ❌ Cache hit rate <50% (poor cache design)
  • ❌ Partitioning overhead exceeds savings (wrong strategy)
  • ❌ Agent cannot complete task after optimization
  • ❌ Optimization triggers too frequently (thrashing)
  • ❌ No measurement before/after (cannot prove effectiveness)

When NOT to Use

Do NOT use context-optimization when:

  • Plenty of context available - Utilization <70%, no optimization needed
  • Single-turn interactions - No context accumulation to optimize
  • Short conversations - Optimization overhead not worth it
  • Latency-critical applications - Optimization adds processing time
  • Prototyping phase - Premature optimization
  • Context already minimal - Nothing left to compress
  • Quality cannot be compromised - Risk of degradation too high

Skip optimization when: Context utilization <70% and no quality issues Use larger model when: More context needed and quality is paramount Use this skill when: Context limits constrain capability or cost reduction needed

Anti-Patterns (Avoid)

Anti-PatternProblemSolution
Compress system promptCritical instructions lost, agent failsNever compress system prompt
Mask recent outputsBreaks current reasoning chainOnly mask outputs 3+ turns old
No quality measurementDegradation goes unnoticedMeasure quality before and after
Optimize prematurelyOverhead without benefitWait until utilization >70%
Fixed compression ratioOver/under compressionAdaptive compression based on content type
No baseline metricsCannot prove effectivenessMeasure tokens and quality before optimizing
Aggressive partitioningCoordination overhead exceeds savingsOnly partition when single context insufficient
Ignore cache stabilityPoor cache hit ratesDesign for cache reuse (stable prefixes)
No monitoringOptimization degrades over timeMonitor utilization and quality continuously
One-size-fits-allWrong strategy for content typeSelect strategy based on what dominates context

Principles

This skill embodies the following CODITECT principles:

#2 First Principles Thinking:

  • Understand WHY context limits exist: computational cost, attention degradation
  • Apply optimization only when benefits exceed costs

#4 Measure and Verify:

  • Measure baseline before optimizing
  • Verify quality impact <5% threshold
  • Benchmark performance improvements

#5 Eliminate Ambiguity:

  • Clear optimization targets (50-70% reduction, <5% quality loss)
  • Explicit strategy selection criteria

#8 No Assumptions:

  • Never assume optimization worked (measure it)
  • Verify cache hit rates, don't assume high reuse
  • Test quality on representative examples

#10 Automation First:

  • Trigger-based optimization (not manual)
  • Automated quality validation
  • Dynamic strategy selection based on context composition

Full Standard: CODITECT-STANDARD-AUTOMATION.md