Context Compression Strategies
Context Compression Strategies
When agent sessions generate millions of tokens of conversation history, compression becomes mandatory. The naive approach is aggressive compression to minimize tokens per request. The correct optimization target is tokens per task: total tokens consumed to complete a task, including re-fetching costs when compression loses critical information.
When to Use
✅ Use this skill when:
- Agent sessions exceed context window limits
- Codebases exceed context windows (5M+ token systems)
- Designing conversation summarization strategies
- Debugging cases where agents "forget" modified files
- Building evaluation frameworks for compression quality
❌ Don't use this skill when:
- Short sessions well within context limits
- Tasks requiring complete conversation history
- Single-turn interactions
Core Concepts
Context compression trades token savings against information loss. Three production-ready approaches exist:
1. Anchored Iterative Summarization
Maintain structured, persistent summaries with explicit sections. When compression triggers, summarize only the newly-truncated span and merge with existing summary. Structure forces preservation by dedicating sections to specific information types.
Best for: Long-running sessions (100+ messages), file tracking, verifiable preservation
2. Opaque Compression
Produce compressed representations optimized for reconstruction fidelity. Achieves highest compression ratios (99%+) but sacrifices interpretability.
Best for: Maximum token savings, short sessions, low re-fetching costs
3. Regenerative Full Summary
Generate detailed structured summaries on each compression. Produces readable output but may lose details across repeated compression cycles.
Best for: Summary interpretability, clear phase boundaries, acceptable context review
Compression Method Comparison
| Method | Compression Ratio | Quality Score | Trade-off |
|---|---|---|---|
| Anchored Iterative | 98.6% | 3.70 | Best quality, slightly less compression |
| Regenerative | 98.7% | 3.44 | Good quality, moderate compression |
| Opaque | 99.3% | 3.35 | Best compression, quality loss |
The 0.7% additional tokens retained by structured summarization buys 0.35 quality points.
Structured Summary Sections
Effective structured summaries include explicit sections:
## Session Intent
[What the user is trying to accomplish]
## Files Modified
- auth.controller.ts: Fixed JWT token generation
- config/redis.ts: Updated connection pooling
- tests/auth.test.ts: Added mock setup for new config
## Decisions Made
- Using Redis connection pool instead of per-request connections
- Retry logic with exponential backoff for transient failures
## Current State
- 14 tests passing, 2 failing
- Remaining: mock setup for session service tests
## Next Steps
1. Fix remaining test failures
2. Run full test suite
3. Update documentation
This structure prevents silent loss because each section must be explicitly addressed.
The Artifact Trail Problem
Artifact trail integrity is the weakest dimension across all compression methods, scoring 2.2-2.5 out of 5.0. Even structured summarization struggles to maintain complete file tracking.
Coding agents need to know:
- Which files were created
- Which files were modified and what changed
- Which files were read but not changed
- Function names, variable names, error messages
This likely requires specialized handling beyond general summarization: a separate artifact index or explicit file-state tracking.
Probe-Based Evaluation
Traditional metrics like ROUGE fail to capture functional compression quality. Probe-based evaluation directly measures functional quality by asking questions after compression:
| Probe Type | What It Tests | Example Question |
|---|---|---|
| Recall | Factual retention | "What was the original error message?" |
| Artifact | File tracking | "Which files have we modified?" |
| Continuation | Task planning | "What should we do next?" |
| Decision | Reasoning chain | "What did we decide about the Redis issue?" |
If compression preserved the right information, the agent answers correctly.
Six Evaluation Dimensions
- Accuracy: Are technical details correct? File paths, function names, error codes
- Context Awareness: Does the response reflect current conversation state?
- Artifact Trail: Does the agent know which files were read or modified?
- Completeness: Does the response address all parts of the question?
- Continuity: Can work continue without re-fetching information?
- Instruction Following: Does the response respect stated constraints?
Compression Trigger Strategies
| Strategy | Trigger Point | Trade-off |
|---|---|---|
| Fixed threshold | 70-80% context utilization | Simple but may compress too early |
| Sliding window | Keep last N turns + summary | Predictable context size |
| Importance-based | Compress low-relevance sections first | Complex but preserves signal |
| Task-boundary | Compress at logical task completions | Clean summaries but unpredictable |
Implementing Anchored Iterative Summarization
- Define explicit summary sections matching your agent's needs
- On first compression trigger, summarize truncated history into sections
- On subsequent compressions, summarize only new truncated content
- Merge new summary into existing sections rather than regenerating
- Track which information came from which compression cycle
Example: Debugging Session Compression
Original context (89,000 tokens, 178 messages):
- 401 error on /api/auth/login endpoint
- Traced through auth controller, middleware, session store
- Found stale Redis connection
- Fixed connection pooling, added retry logic
- 14 tests passing, 2 failing
Structured summary after compression:
## Session Intent
Debug 401 Unauthorized error on /api/auth/login despite valid credentials.
## Root Cause
Stale Redis connection in session store. JWT generated correctly but session could not be persisted.
## Files Modified
- auth.controller.ts: No changes (read only)
- config/redis.ts: Fixed connection pooling configuration
- services/session.service.ts: Added retry logic for transient failures
- tests/auth.test.ts: Updated mock setup
## Test Status
14 passing, 2 failing (mock setup issues)
## Next Steps
1. Fix remaining test failures (mock session service)
2. Run full test suite
3. Deploy to staging
Guidelines
- Optimize for tokens-per-task, not tokens-per-request
- Use structured summaries with explicit sections for file tracking
- Trigger compression at 70-80% context utilization
- Implement incremental merging rather than full regeneration
- Test compression quality with probe-based evaluation
- Track artifact trail separately if file tracking is critical
- Accept slightly lower compression ratios for better quality retention
- Monitor re-fetching frequency as a compression quality signal
Related Components
Skills
context-fundamentals- Context basics (prerequisite)context-degradation- Understanding what compression preventscontext-optimization- Broader optimization strategies
Agents
compression-evaluator- Evaluate compression quality using probescontext-health-analyst- Monitor context health
Commands
/context-health- Run context health analysis
Scripts
external/Agent-Skills-for-Context-Engineering/skills/context-compression/scripts/compression_evaluator.py- Probe-based compression evaluation
Success Output
When successful, this skill MUST output:
✅ SKILL COMPLETE: context-compression
Completed:
- [x] Compression strategy selected and documented
- [x] Structured summary sections defined
- [x] Compression trigger strategy configured
- [x] Probe-based evaluation implemented
- [x] Tokens-per-task optimization verified
Outputs:
- Compression strategy specification (Anchored Iterative/Opaque/Regenerative)
- Structured summary template with explicit sections
- Compression quality metrics (compression ratio, quality score)
- Probe query results showing information retention
Metrics:
- Compression ratio: XX.X%
- Quality score: X.XX/5.0
- Artifact trail score: X.X/5.0
- Probe success rate: XX%
Completion Checklist
Before marking this skill as complete, verify:
- Compression method selected based on use case requirements
- Structured summary sections defined matching agent needs
- Compression trigger point configured (70-80% utilization recommended)
- Probe-based evaluation framework implemented
- All probe queries answered correctly after compression
- Artifact trail integrity verified (file tracking maintained)
- Tokens-per-task calculated and optimized (not just tokens-per-request)
- Re-fetching frequency monitored as quality signal
Failure Indicators
This skill has FAILED if:
- ❌ Agent cannot answer probe queries after compression (information loss)
- ❌ Artifact trail broken (agent doesn't know which files were modified)
- ❌ Re-fetching frequency increases (compression too aggressive)
- ❌ Quality score below 3.0/5.0 on probe evaluation
- ❌ Context continuity lost (agent cannot continue task without re-reading)
- ❌ Compression ratio below 90% (insufficient token savings)
When NOT to Use
Do NOT use this skill when:
- Short sessions within context limits (< 50% context window used)
- Tasks require complete conversation history without loss
- Single-turn interactions or simple Q&A
- Sessions with fewer than 50 messages
- Real-time chat where compression latency is unacceptable
- Use context-fundamentals for basic context management instead
Alternative skills for different needs:
- context-optimization - Broader optimization without compression
- context-degradation - Understanding context loss without compression
- memory-management - Long-term storage vs. compression trade-offs
Anti-Patterns (Avoid)
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Optimizing tokens-per-request only | Causes excessive re-fetching, higher total cost | Optimize tokens-per-task (includes re-fetch costs) |
| Using opaque compression for file tracking | Artifact trail score 2.2-2.5, files get lost | Use anchored iterative summarization with explicit file sections |
| Regenerating full summary each time | Quality degrades across compression cycles | Use incremental merging, append new summaries to existing |
| Compressing too early (< 50% context) | Unnecessary information loss | Wait until 70-80% context utilization |
| No probe-based validation | Silent information loss undetected | Implement probe queries matching task requirements |
| Skipping artifact trail section | Agent forgets modified files | Always include "Files Modified" section in structured summary |
| Measuring quality with ROUGE metrics | Doesn't capture functional compression quality | Use probe-based evaluation with task-specific questions |
Principles
This skill embodies CODITECT foundational principles:
#2 First Principles Thinking
- Optimization target is tokens-per-task, not tokens-per-request (includes re-fetching costs)
- Compression trades token savings against information loss
- Structure forces preservation by dedicating sections to specific information types
#5 Eliminate Ambiguity
- Explicit sections in structured summaries prevent silent loss
- Probe-based evaluation directly measures functional quality
- Clear failure indicators (quality score < 3.0, artifact trail broken)
#6 Clear, Understandable, Explainable
- Structured summaries remain interpretable (vs. opaque compression)
- Each section addresses specific information type explicitly
- Compression method comparison table shows clear trade-offs
#8 No Assumptions
- Verify compression quality with probe queries
- Monitor re-fetching frequency as validation signal
- Track artifact trail integrity separately if critical
Full Principles: CODITECT-STANDARD-AUTOMATION.md
Version: 1.1.0 | Updated: 2026-01-04 | Author: CODITECT Team