Context Degradation Patterns
Context Degradation Patterns
Language models exhibit predictable degradation patterns as context length increases. Understanding these patterns is essential for diagnosing failures and designing resilient systems. Context degradation is not a binary state but a continuum of performance degradation that manifests in several distinct ways.
When to Use
✅ Use this skill when:
- Agent performance degrades unexpectedly during long conversations
- Debugging cases where agents produce incorrect or irrelevant outputs
- Designing systems that must handle large contexts reliably
- Investigating "lost in middle" phenomena in agent outputs
- Analyzing context-related failures in agent behavior
❌ Don't use this skill when:
- Short conversations with minimal context accumulation
- Single-turn interactions without history
- Tasks where context never approaches limits
Core Concepts
Context degradation manifests through several distinct patterns:
| Pattern | Description | Symptom |
|---|---|---|
| Lost-in-Middle | Information in context center receives less attention | 10-40% lower recall for middle content |
| Context Poisoning | Errors compound through repeated reference | Persistent hallucinations despite correction |
| Context Distraction | Irrelevant information overwhelms relevant content | Quality drop even with single distractor |
| Context Confusion | Model cannot determine which context applies | Wrong tool calls, mixed requirements |
| Context Clash | Accumulated information directly conflicts | Contradictory outputs, inconsistent behavior |
Detailed Degradation Patterns
The Lost-in-Middle Phenomenon
The most well-documented degradation pattern. Models demonstrate U-shaped attention curves where information at the beginning and end of context receives reliable attention, while middle content suffers 10-40% lower recall accuracy.
Cause: Models allocate massive attention to the first token (BOS token) to stabilize internal states, creating an "attention sink" that depletes budget for middle tokens.
Mitigation:
- Place critical information at beginning or end of context
- Use summary structures that surface key information at attention-favored positions
- Explicit section headers and transitions help models navigate structure
Context Poisoning
Occurs when hallucinations, errors, or incorrect information enters context and compounds through repeated reference.
Entry Points:
- Tool outputs containing errors or unexpected formats
- Retrieved documents with incorrect/outdated information
- Model-generated summaries introducing hallucinations
Symptoms:
- Degraded output quality on previously successful tasks
- Tool misalignment (wrong tools or parameters)
- Persistent hallucinations despite correction attempts
Recovery:
- Truncate context to before poisoning point
- Explicitly note poisoning and request re-evaluation
- Restart with clean context, preserving only verified information
Context Distraction
Emerges when irrelevant information competes for limited attention budget.
The Distractor Effect: Even a single irrelevant document reduces performance on tasks involving relevant documents. The effect compounds with multiple distractors.
Mitigation:
- Apply relevance filtering before loading retrieved documents
- Use namespacing to make irrelevant sections easy to ignore
- Consider whether information truly needs to be in context vs. accessed through tool calls
Context Confusion
Arises when irrelevant information influences responses, distinct from distraction (which concerns attention allocation).
Signs:
- Responses addressing wrong aspects of queries
- Tool calls appropriate for different tasks
- Outputs mixing requirements from multiple sources
Solutions:
- Explicit task segmentation
- Clear transitions between task contexts
- State management isolating context for different objectives
Context Clash
Develops when accumulated information directly conflicts.
Sources:
- Multi-source retrieval with contradictory information
- Version conflicts (outdated + current information)
- Perspective conflicts (valid but incompatible viewpoints)
Resolution:
- Explicit conflict marking with clarification requests
- Priority rules establishing source precedence
- Version filtering excluding outdated information
Model-Specific Degradation Thresholds
| Model | Degradation Onset | Severe Degradation | Notes |
|---|---|---|---|
| GPT-5.2 | ~64K tokens | ~200K tokens | Best overall with thinking mode |
| Claude Opus 4.5 | ~100K tokens | ~180K tokens | Strong attention management |
| Claude Sonnet 4.5 | ~80K tokens | ~150K tokens | Optimized for agents/coding |
| Gemini 3 Pro | ~500K tokens | ~800K tokens | 1M context window |
The Four-Bucket Mitigation Approach
| Strategy | Description | Use Case |
|---|---|---|
| Write | Save context outside the window | Keep active context lean |
| Select | Pull only relevant context | Address distraction |
| Compress | Reduce tokens, preserve information | Extend effective capacity |
| Isolate | Split context across sub-agents | Prevent single-context growth |
Practical Detection
# Context monitoring pattern
turn_1: 1000 tokens
turn_5: 8000 tokens
turn_10: 25000 tokens
turn_20: 60000 tokens # ⚠️ Degradation begins
turn_30: 90000 tokens # 🔴 Significant degradation
Mitigating Lost-in-Middle
# Organize context with critical info at edges
[CURRENT TASK] # At start (high attention)
- Goal: Generate quarterly report
- Deadline: End of week
[DETAILED CONTEXT] # Middle (lower attention)
- 50 pages of data
- Multiple analysis sections
[KEY FINDINGS] # At end (high attention)
- Revenue up 15%
- Costs down 8%
Guidelines
- Monitor context length and performance correlation during development
- Place critical information at beginning or end of context
- Implement compaction triggers before degradation becomes severe
- Validate retrieved documents for accuracy before adding to context
- Use versioning to prevent outdated information causing clash
- Segment tasks to prevent context confusion across objectives
- Design for graceful degradation rather than assuming perfect conditions
- Test with progressively larger contexts to find degradation thresholds
Degradation Diagnostic Flowchart
Use this flowchart to diagnose context degradation issues:
START: Agent performance degraded
│
├── Check context length
│ │
│ ├── <50K tokens → Likely NOT context degradation
│ │ └── Check: prompt quality, task complexity, model fit
│ │
│ └── >50K tokens → Continue diagnosis
│
├── Identify symptom pattern
│ │
│ ├── Middle content ignored
│ │ └── DIAGNOSIS: Lost-in-Middle
│ │ └── FIX: Move critical info to start/end
│ │
│ ├── Persistent errors despite correction
│ │ └── DIAGNOSIS: Context Poisoning
│ │ └── FIX: Truncate to before poisoning, restart
│ │
│ ├── Irrelevant info affecting output
│ │ └── DIAGNOSIS: Context Distraction
│ │ └── FIX: Filter relevance, namespace sections
│ │
│ ├── Wrong task context applied
│ │ └── DIAGNOSIS: Context Confusion
│ │ └── FIX: Explicit task segmentation, clear transitions
│ │
│ └── Contradictory outputs
│ └── DIAGNOSIS: Context Clash
│ └── FIX: Priority rules, version filtering
│
└── Select mitigation strategy
│
├── WRITE: Save context externally, keep window lean
├── SELECT: Pull only relevant context per task
├── COMPRESS: Summarize while preserving key info
└── ISOLATE: Split across sub-agents
Quick Diagnostic Questions:
| Question | Yes → Diagnosis |
|---|---|
| Is critical info in middle 50% of context? | Lost-in-Middle |
| Did a hallucination appear and persist? | Context Poisoning |
| Is >30% of context irrelevant to current task? | Context Distraction |
| Is agent mixing requirements from different tasks? | Context Confusion |
| Are there contradictory facts in context? | Context Clash |
Related Components
Skills
context-fundamentals- Context basics (prerequisite)context-optimization- Mitigation techniquescontext-compression- Compression strategies
Agents
context-health-analyst- Real-time degradation detectioncompression-evaluator- Compression quality evaluation
Scripts
external/Agent-Skills-for-Context-Engineering/skills/context-degradation/scripts/degradation_detector.py- Context health analysis
Success Output
When successful, this skill MUST output:
✅ SKILL COMPLETE: context-degradation
Completed:
- [x] Degradation pattern identified (lost-in-middle detected)
- [x] Mitigation strategy applied (critical info moved to edges)
- [x] Context compaction performed (30% reduction)
- [x] Performance validation confirms improvement
Outputs:
- reports/context-health-analysis.md (Degradation assessment)
- logs/context-before-after.json (Token usage comparison)
Context Health: GOOD (68K tokens, below degradation threshold)
Completion Checklist
Before marking this skill as complete, verify:
- Context length monitored across session
- Degradation pattern correctly identified
- Mitigation strategy matches pattern type
- Critical information repositioned to attention-favored zones
- Context compaction reduces tokens by >20%
- Performance metrics show improvement post-mitigation
- No context poisoning introduced during compaction
- Lost-in-middle phenomenon mitigated
- Four-bucket strategy documented (Write/Select/Compress/Isolate)
- Model-specific thresholds respected
Failure Indicators
This skill has FAILED if:
- ❌ Context continues growing past degradation threshold
- ❌ Mitigation strategy worsens performance
- ❌ Critical information lost during compaction
- ❌ Context poisoning introduced by summarization errors
- ❌ Performance degradation persists after mitigation
- ❌ Wrong pattern identified (confusion vs distraction)
- ❌ Context clash unresolved after filtering
- ❌ No measurement of before/after performance
When NOT to Use
Do NOT use this skill when:
- Short conversations with <10K tokens (use
context-fundamentalsinstead) - Single-turn interactions without history
- Context never approaches model limits
- Task requires full conversation history (no compaction safe)
- Building systems with guaranteed <32K context
- Working with models that handle long context perfectly (if such exist)
- Debugging non-context-related failures (use other diagnostics)
Anti-Patterns (Avoid)
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Compacting too early | Unnecessary information loss | Wait until approaching threshold |
| Summarization without validation | Context poisoning via hallucinations | Validate summaries against originals |
| Ignoring lost-in-middle | Middle content unreliable | Place critical info at edges |
| No performance tracking | Can't measure mitigation success | Log metrics before/after |
| Uniform compaction | Loses context hierarchy | Preserve recent/critical content |
| Mixing conflicting contexts | Context clash | Segment or prioritize sources |
| Over-compaction | Loses necessary details | Compress to 70-80% not 10% |
| No model-specific tuning | Wrong thresholds | Use model-specific degradation points |
Principles
This skill embodies:
- #4 Monitor and Measure - Tracks context length and performance continuously
- #5 Eliminate Ambiguity - Clear pattern identification and mitigation mapping
- #6 Clear, Understandable, Explainable - Documents degradation causes and solutions
- #8 No Assumptions - Validates mitigation effectiveness with metrics
- #10 Test Everything - Probes context recall before/after to confirm improvement
Full Standard: CODITECT-STANDARD-AUTOMATION.md