Skip to main content

Context Compression Strategies

Context Compression Strategies

When agent sessions generate millions of tokens of conversation history, compression becomes mandatory. The naive approach is aggressive compression to minimize tokens per request. The correct optimization target is tokens per task: total tokens consumed to complete a task, including re-fetching costs when compression loses critical information.

When to Use

Use this skill when:

  • Agent sessions exceed context window limits
  • Codebases exceed context windows (5M+ token systems)
  • Designing conversation summarization strategies
  • Debugging cases where agents "forget" modified files
  • Building evaluation frameworks for compression quality

Don't use this skill when:

  • Short sessions well within context limits
  • Tasks requiring complete conversation history
  • Single-turn interactions

Core Concepts

Context compression trades token savings against information loss. Three production-ready approaches exist:

1. Anchored Iterative Summarization

Maintain structured, persistent summaries with explicit sections. When compression triggers, summarize only the newly-truncated span and merge with existing summary. Structure forces preservation by dedicating sections to specific information types.

Best for: Long-running sessions (100+ messages), file tracking, verifiable preservation

2. Opaque Compression

Produce compressed representations optimized for reconstruction fidelity. Achieves highest compression ratios (99%+) but sacrifices interpretability.

Best for: Maximum token savings, short sessions, low re-fetching costs

3. Regenerative Full Summary

Generate detailed structured summaries on each compression. Produces readable output but may lose details across repeated compression cycles.

Best for: Summary interpretability, clear phase boundaries, acceptable context review

Compression Method Comparison

MethodCompression RatioQuality ScoreTrade-off
Anchored Iterative98.6%3.70Best quality, slightly less compression
Regenerative98.7%3.44Good quality, moderate compression
Opaque99.3%3.35Best compression, quality loss

The 0.7% additional tokens retained by structured summarization buys 0.35 quality points.

Structured Summary Sections

Effective structured summaries include explicit sections:

## Session Intent
[What the user is trying to accomplish]

## Files Modified
- auth.controller.ts: Fixed JWT token generation
- config/redis.ts: Updated connection pooling
- tests/auth.test.ts: Added mock setup for new config

## Decisions Made
- Using Redis connection pool instead of per-request connections
- Retry logic with exponential backoff for transient failures

## Current State
- 14 tests passing, 2 failing
- Remaining: mock setup for session service tests

## Next Steps
1. Fix remaining test failures
2. Run full test suite
3. Update documentation

This structure prevents silent loss because each section must be explicitly addressed.

The Artifact Trail Problem

Artifact trail integrity is the weakest dimension across all compression methods, scoring 2.2-2.5 out of 5.0. Even structured summarization struggles to maintain complete file tracking.

Coding agents need to know:

  • Which files were created
  • Which files were modified and what changed
  • Which files were read but not changed
  • Function names, variable names, error messages

This likely requires specialized handling beyond general summarization: a separate artifact index or explicit file-state tracking.

Probe-Based Evaluation

Traditional metrics like ROUGE fail to capture functional compression quality. Probe-based evaluation directly measures functional quality by asking questions after compression:

Probe TypeWhat It TestsExample Question
RecallFactual retention"What was the original error message?"
ArtifactFile tracking"Which files have we modified?"
ContinuationTask planning"What should we do next?"
DecisionReasoning chain"What did we decide about the Redis issue?"

If compression preserved the right information, the agent answers correctly.

Six Evaluation Dimensions

  1. Accuracy: Are technical details correct? File paths, function names, error codes
  2. Context Awareness: Does the response reflect current conversation state?
  3. Artifact Trail: Does the agent know which files were read or modified?
  4. Completeness: Does the response address all parts of the question?
  5. Continuity: Can work continue without re-fetching information?
  6. Instruction Following: Does the response respect stated constraints?

Compression Trigger Strategies

StrategyTrigger PointTrade-off
Fixed threshold70-80% context utilizationSimple but may compress too early
Sliding windowKeep last N turns + summaryPredictable context size
Importance-basedCompress low-relevance sections firstComplex but preserves signal
Task-boundaryCompress at logical task completionsClean summaries but unpredictable

Implementing Anchored Iterative Summarization

  1. Define explicit summary sections matching your agent's needs
  2. On first compression trigger, summarize truncated history into sections
  3. On subsequent compressions, summarize only new truncated content
  4. Merge new summary into existing sections rather than regenerating
  5. Track which information came from which compression cycle

Example: Debugging Session Compression

Original context (89,000 tokens, 178 messages):

  • 401 error on /api/auth/login endpoint
  • Traced through auth controller, middleware, session store
  • Found stale Redis connection
  • Fixed connection pooling, added retry logic
  • 14 tests passing, 2 failing

Structured summary after compression:

## Session Intent
Debug 401 Unauthorized error on /api/auth/login despite valid credentials.

## Root Cause
Stale Redis connection in session store. JWT generated correctly but session could not be persisted.

## Files Modified
- auth.controller.ts: No changes (read only)
- config/redis.ts: Fixed connection pooling configuration
- services/session.service.ts: Added retry logic for transient failures
- tests/auth.test.ts: Updated mock setup

## Test Status
14 passing, 2 failing (mock setup issues)

## Next Steps
1. Fix remaining test failures (mock session service)
2. Run full test suite
3. Deploy to staging

Guidelines

  1. Optimize for tokens-per-task, not tokens-per-request
  2. Use structured summaries with explicit sections for file tracking
  3. Trigger compression at 70-80% context utilization
  4. Implement incremental merging rather than full regeneration
  5. Test compression quality with probe-based evaluation
  6. Track artifact trail separately if file tracking is critical
  7. Accept slightly lower compression ratios for better quality retention
  8. Monitor re-fetching frequency as a compression quality signal

Skills

  • context-fundamentals - Context basics (prerequisite)
  • context-degradation - Understanding what compression prevents
  • context-optimization - Broader optimization strategies

Agents

  • compression-evaluator - Evaluate compression quality using probes
  • context-health-analyst - Monitor context health

Commands

  • /context-health - Run context health analysis

Scripts

  • external/Agent-Skills-for-Context-Engineering/skills/context-compression/scripts/compression_evaluator.py - Probe-based compression evaluation

Success Output

When successful, this skill MUST output:

✅ SKILL COMPLETE: context-compression

Completed:
- [x] Compression strategy selected and documented
- [x] Structured summary sections defined
- [x] Compression trigger strategy configured
- [x] Probe-based evaluation implemented
- [x] Tokens-per-task optimization verified

Outputs:
- Compression strategy specification (Anchored Iterative/Opaque/Regenerative)
- Structured summary template with explicit sections
- Compression quality metrics (compression ratio, quality score)
- Probe query results showing information retention

Metrics:
- Compression ratio: XX.X%
- Quality score: X.XX/5.0
- Artifact trail score: X.X/5.0
- Probe success rate: XX%

Completion Checklist

Before marking this skill as complete, verify:

  • Compression method selected based on use case requirements
  • Structured summary sections defined matching agent needs
  • Compression trigger point configured (70-80% utilization recommended)
  • Probe-based evaluation framework implemented
  • All probe queries answered correctly after compression
  • Artifact trail integrity verified (file tracking maintained)
  • Tokens-per-task calculated and optimized (not just tokens-per-request)
  • Re-fetching frequency monitored as quality signal

Failure Indicators

This skill has FAILED if:

  • ❌ Agent cannot answer probe queries after compression (information loss)
  • ❌ Artifact trail broken (agent doesn't know which files were modified)
  • ❌ Re-fetching frequency increases (compression too aggressive)
  • ❌ Quality score below 3.0/5.0 on probe evaluation
  • ❌ Context continuity lost (agent cannot continue task without re-reading)
  • ❌ Compression ratio below 90% (insufficient token savings)

When NOT to Use

Do NOT use this skill when:

  • Short sessions within context limits (< 50% context window used)
  • Tasks require complete conversation history without loss
  • Single-turn interactions or simple Q&A
  • Sessions with fewer than 50 messages
  • Real-time chat where compression latency is unacceptable
  • Use context-fundamentals for basic context management instead

Alternative skills for different needs:

  • context-optimization - Broader optimization without compression
  • context-degradation - Understanding context loss without compression
  • memory-management - Long-term storage vs. compression trade-offs

Anti-Patterns (Avoid)

Anti-PatternProblemSolution
Optimizing tokens-per-request onlyCauses excessive re-fetching, higher total costOptimize tokens-per-task (includes re-fetch costs)
Using opaque compression for file trackingArtifact trail score 2.2-2.5, files get lostUse anchored iterative summarization with explicit file sections
Regenerating full summary each timeQuality degrades across compression cyclesUse incremental merging, append new summaries to existing
Compressing too early (< 50% context)Unnecessary information lossWait until 70-80% context utilization
No probe-based validationSilent information loss undetectedImplement probe queries matching task requirements
Skipping artifact trail sectionAgent forgets modified filesAlways include "Files Modified" section in structured summary
Measuring quality with ROUGE metricsDoesn't capture functional compression qualityUse probe-based evaluation with task-specific questions

Principles

This skill embodies CODITECT foundational principles:

#2 First Principles Thinking

  • Optimization target is tokens-per-task, not tokens-per-request (includes re-fetching costs)
  • Compression trades token savings against information loss
  • Structure forces preservation by dedicating sections to specific information types

#5 Eliminate Ambiguity

  • Explicit sections in structured summaries prevent silent loss
  • Probe-based evaluation directly measures functional quality
  • Clear failure indicators (quality score < 3.0, artifact trail broken)

#6 Clear, Understandable, Explainable

  • Structured summaries remain interpretable (vs. opaque compression)
  • Each section addresses specific information type explicitly
  • Compression method comparison table shows clear trade-offs

#8 No Assumptions

  • Verify compression quality with probe queries
  • Monitor re-fetching frequency as validation signal
  • Track artifact trail integrity separately if critical

Full Principles: CODITECT-STANDARD-AUTOMATION.md


Version: 1.1.0 | Updated: 2026-01-04 | Author: CODITECT Team