Context Degradation Patterns

Language models exhibit predictable degradation patterns as context length increases. Understanding these patterns is essential for diagnosing failures and designing resilient systems. Context degradation is not a binary state but a continuum of performance degradation that manifests in several distinct ways.

When to Use

✅ Use this skill when:

Agent performance degrades unexpectedly during long conversations
Debugging cases where agents produce incorrect or irrelevant outputs
Designing systems that must handle large contexts reliably
Investigating "lost in middle" phenomena in agent outputs
Analyzing context-related failures in agent behavior

❌ Don't use this skill when:

Short conversations with minimal context accumulation
Single-turn interactions without history
Tasks where context never approaches limits

Core Concepts

Context degradation manifests through several distinct patterns:

Pattern	Description	Symptom
Lost-in-Middle	Information in context center receives less attention	10-40% lower recall for middle content
Context Poisoning	Errors compound through repeated reference	Persistent hallucinations despite correction
Context Distraction	Irrelevant information overwhelms relevant content	Quality drop even with single distractor
Context Confusion	Model cannot determine which context applies	Wrong tool calls, mixed requirements
Context Clash	Accumulated information directly conflicts	Contradictory outputs, inconsistent behavior

Detailed Degradation Patterns

The Lost-in-Middle Phenomenon

The most well-documented degradation pattern. Models demonstrate U-shaped attention curves where information at the beginning and end of context receives reliable attention, while middle content suffers 10-40% lower recall accuracy.

Cause: Models allocate massive attention to the first token (BOS token) to stabilize internal states, creating an "attention sink" that depletes budget for middle tokens.

Mitigation:

Place critical information at beginning or end of context
Use summary structures that surface key information at attention-favored positions
Explicit section headers and transitions help models navigate structure

Context Poisoning

Occurs when hallucinations, errors, or incorrect information enters context and compounds through repeated reference.

Entry Points:

Tool outputs containing errors or unexpected formats
Retrieved documents with incorrect/outdated information
Model-generated summaries introducing hallucinations

Symptoms:

Degraded output quality on previously successful tasks
Tool misalignment (wrong tools or parameters)
Persistent hallucinations despite correction attempts

Recovery:

Truncate context to before poisoning point
Explicitly note poisoning and request re-evaluation
Restart with clean context, preserving only verified information

Context Distraction

Emerges when irrelevant information competes for limited attention budget.

The Distractor Effect: Even a single irrelevant document reduces performance on tasks involving relevant documents. The effect compounds with multiple distractors.

Mitigation:

Apply relevance filtering before loading retrieved documents
Use namespacing to make irrelevant sections easy to ignore
Consider whether information truly needs to be in context vs. accessed through tool calls

Context Confusion

Arises when irrelevant information influences responses, distinct from distraction (which concerns attention allocation).

Signs:

Responses addressing wrong aspects of queries
Tool calls appropriate for different tasks
Outputs mixing requirements from multiple sources

Solutions:

Explicit task segmentation
Clear transitions between task contexts
State management isolating context for different objectives

Context Clash

Develops when accumulated information directly conflicts.

Sources:

Multi-source retrieval with contradictory information
Version conflicts (outdated + current information)
Perspective conflicts (valid but incompatible viewpoints)

Resolution:

Explicit conflict marking with clarification requests
Priority rules establishing source precedence
Version filtering excluding outdated information

Model-Specific Degradation Thresholds

Model	Degradation Onset	Severe Degradation	Notes
GPT-5.2	~64K tokens	~200K tokens	Best overall with thinking mode
Claude Opus 4.5	~100K tokens	~180K tokens	Strong attention management
Claude Sonnet 4.5	~80K tokens	~150K tokens	Optimized for agents/coding
Gemini 3 Pro	~500K tokens	~800K tokens	1M context window

The Four-Bucket Mitigation Approach

Strategy	Description	Use Case
Write	Save context outside the window	Keep active context lean
Select	Pull only relevant context	Address distraction
Compress	Reduce tokens, preserve information	Extend effective capacity
Isolate	Split context across sub-agents	Prevent single-context growth

Practical Detection

# Context monitoring pattern
turn_1: 1000 tokens
turn_5: 8000 tokens
turn_10: 25000 tokens
turn_20: 60000 tokens  # ⚠️ Degradation begins
turn_30: 90000 tokens  # 🔴 Significant degradation

Mitigating Lost-in-Middle

# Organize context with critical info at edges

[CURRENT TASK]                      # At start (high attention)
- Goal: Generate quarterly report
- Deadline: End of week

[DETAILED CONTEXT]                  # Middle (lower attention)
- 50 pages of data
- Multiple analysis sections

[KEY FINDINGS]                      # At end (high attention)
- Revenue up 15%
- Costs down 8%

Guidelines

Monitor context length and performance correlation during development
Place critical information at beginning or end of context
Implement compaction triggers before degradation becomes severe
Validate retrieved documents for accuracy before adding to context
Use versioning to prevent outdated information causing clash
Segment tasks to prevent context confusion across objectives
Design for graceful degradation rather than assuming perfect conditions
Test with progressively larger contexts to find degradation thresholds

Degradation Diagnostic Flowchart

Use this flowchart to diagnose context degradation issues:

START: Agent performance degraded
│
├── Check context length
│   │
│   ├── <50K tokens → Likely NOT context degradation
│   │                 └── Check: prompt quality, task complexity, model fit
│   │
│   └── >50K tokens → Continue diagnosis
│
├── Identify symptom pattern
│   │
│   ├── Middle content ignored
│   │   └── DIAGNOSIS: Lost-in-Middle
│   │       └── FIX: Move critical info to start/end
│   │
│   ├── Persistent errors despite correction
│   │   └── DIAGNOSIS: Context Poisoning
│   │       └── FIX: Truncate to before poisoning, restart
│   │
│   ├── Irrelevant info affecting output
│   │   └── DIAGNOSIS: Context Distraction
│   │       └── FIX: Filter relevance, namespace sections
│   │
│   ├── Wrong task context applied
│   │   └── DIAGNOSIS: Context Confusion
│   │       └── FIX: Explicit task segmentation, clear transitions
│   │
│   └── Contradictory outputs
│       └── DIAGNOSIS: Context Clash
│           └── FIX: Priority rules, version filtering
│
└── Select mitigation strategy
    │
    ├── WRITE: Save context externally, keep window lean
    ├── SELECT: Pull only relevant context per task
    ├── COMPRESS: Summarize while preserving key info
    └── ISOLATE: Split across sub-agents

Quick Diagnostic Questions:

Question	Yes → Diagnosis
Is critical info in middle 50% of context?	Lost-in-Middle
Did a hallucination appear and persist?	Context Poisoning
Is >30% of context irrelevant to current task?	Context Distraction
Is agent mixing requirements from different tasks?	Context Confusion
Are there contradictory facts in context?	Context Clash

Skills

context-fundamentals - Context basics (prerequisite)
context-optimization - Mitigation techniques
context-compression - Compression strategies

Agents

context-health-analyst - Real-time degradation detection
compression-evaluator - Compression quality evaluation

Scripts

external/Agent-Skills-for-Context-Engineering/skills/context-degradation/scripts/degradation_detector.py - Context health analysis

Success Output

When successful, this skill MUST output:

✅ SKILL COMPLETE: context-degradation

Completed:
- [x] Degradation pattern identified (lost-in-middle detected)
- [x] Mitigation strategy applied (critical info moved to edges)
- [x] Context compaction performed (30% reduction)
- [x] Performance validation confirms improvement

Outputs:
- reports/context-health-analysis.md (Degradation assessment)
- logs/context-before-after.json (Token usage comparison)

Context Health: GOOD (68K tokens, below degradation threshold)

Completion Checklist

Before marking this skill as complete, verify:

Failure Indicators

This skill has FAILED if:

❌ Context continues growing past degradation threshold
❌ Mitigation strategy worsens performance
❌ Critical information lost during compaction
❌ Context poisoning introduced by summarization errors
❌ Performance degradation persists after mitigation
❌ Wrong pattern identified (confusion vs distraction)
❌ Context clash unresolved after filtering
❌ No measurement of before/after performance

When NOT to Use

Do NOT use this skill when:

Short conversations with <10K tokens (use context-fundamentals instead)
Single-turn interactions without history
Context never approaches model limits
Task requires full conversation history (no compaction safe)
Building systems with guaranteed <32K context
Working with models that handle long context perfectly (if such exist)
Debugging non-context-related failures (use other diagnostics)

Anti-Patterns (Avoid)

Anti-Pattern	Problem	Solution
Compacting too early	Unnecessary information loss	Wait until approaching threshold
Summarization without validation	Context poisoning via hallucinations	Validate summaries against originals
Ignoring lost-in-middle	Middle content unreliable	Place critical info at edges
No performance tracking	Can't measure mitigation success	Log metrics before/after
Uniform compaction	Loses context hierarchy	Preserve recent/critical content
Mixing conflicting contexts	Context clash	Segment or prioritize sources
Over-compaction	Loses necessary details	Compress to 70-80% not 10%
No model-specific tuning	Wrong thresholds	Use model-specific degradation points

Principles

This skill embodies:

#4 Monitor and Measure - Tracks context length and performance continuously
#5 Eliminate Ambiguity - Clear pattern identification and mitigation mapping
#6 Clear, Understandable, Explainable - Documents degradation causes and solutions
#8 No Assumptions - Validates mitigation effectiveness with metrics
#10 Test Everything - Probes context recall before/after to confirm improvement

Full Standard: CODITECT-STANDARD-AUTOMATION.md

When to Use​

Core Concepts​

Detailed Degradation Patterns​

The Lost-in-Middle Phenomenon​

Context Poisoning​

Context Distraction​

Context Confusion​

Context Clash​

Model-Specific Degradation Thresholds​

The Four-Bucket Mitigation Approach​

Practical Detection​

Mitigating Lost-in-Middle​

Guidelines​

Degradation Diagnostic Flowchart​

Related Components​

Skills​

Agents​

Scripts​

Success Output​

Completion Checklist​

Failure Indicators​

When NOT to Use​

Anti-Patterns (Avoid)​

Principles​