Skip to main content

Context Degradation Patterns

Context Degradation Patterns

Language models exhibit predictable degradation patterns as context length increases. Understanding these patterns is essential for diagnosing failures and designing resilient systems. Context degradation is not a binary state but a continuum of performance degradation that manifests in several distinct ways.

When to Use

Use this skill when:

  • Agent performance degrades unexpectedly during long conversations
  • Debugging cases where agents produce incorrect or irrelevant outputs
  • Designing systems that must handle large contexts reliably
  • Investigating "lost in middle" phenomena in agent outputs
  • Analyzing context-related failures in agent behavior

Don't use this skill when:

  • Short conversations with minimal context accumulation
  • Single-turn interactions without history
  • Tasks where context never approaches limits

Core Concepts

Context degradation manifests through several distinct patterns:

PatternDescriptionSymptom
Lost-in-MiddleInformation in context center receives less attention10-40% lower recall for middle content
Context PoisoningErrors compound through repeated referencePersistent hallucinations despite correction
Context DistractionIrrelevant information overwhelms relevant contentQuality drop even with single distractor
Context ConfusionModel cannot determine which context appliesWrong tool calls, mixed requirements
Context ClashAccumulated information directly conflictsContradictory outputs, inconsistent behavior

Detailed Degradation Patterns

The Lost-in-Middle Phenomenon

The most well-documented degradation pattern. Models demonstrate U-shaped attention curves where information at the beginning and end of context receives reliable attention, while middle content suffers 10-40% lower recall accuracy.

Cause: Models allocate massive attention to the first token (BOS token) to stabilize internal states, creating an "attention sink" that depletes budget for middle tokens.

Mitigation:

  • Place critical information at beginning or end of context
  • Use summary structures that surface key information at attention-favored positions
  • Explicit section headers and transitions help models navigate structure

Context Poisoning

Occurs when hallucinations, errors, or incorrect information enters context and compounds through repeated reference.

Entry Points:

  1. Tool outputs containing errors or unexpected formats
  2. Retrieved documents with incorrect/outdated information
  3. Model-generated summaries introducing hallucinations

Symptoms:

  • Degraded output quality on previously successful tasks
  • Tool misalignment (wrong tools or parameters)
  • Persistent hallucinations despite correction attempts

Recovery:

  • Truncate context to before poisoning point
  • Explicitly note poisoning and request re-evaluation
  • Restart with clean context, preserving only verified information

Context Distraction

Emerges when irrelevant information competes for limited attention budget.

The Distractor Effect: Even a single irrelevant document reduces performance on tasks involving relevant documents. The effect compounds with multiple distractors.

Mitigation:

  • Apply relevance filtering before loading retrieved documents
  • Use namespacing to make irrelevant sections easy to ignore
  • Consider whether information truly needs to be in context vs. accessed through tool calls

Context Confusion

Arises when irrelevant information influences responses, distinct from distraction (which concerns attention allocation).

Signs:

  • Responses addressing wrong aspects of queries
  • Tool calls appropriate for different tasks
  • Outputs mixing requirements from multiple sources

Solutions:

  • Explicit task segmentation
  • Clear transitions between task contexts
  • State management isolating context for different objectives

Context Clash

Develops when accumulated information directly conflicts.

Sources:

  • Multi-source retrieval with contradictory information
  • Version conflicts (outdated + current information)
  • Perspective conflicts (valid but incompatible viewpoints)

Resolution:

  • Explicit conflict marking with clarification requests
  • Priority rules establishing source precedence
  • Version filtering excluding outdated information

Model-Specific Degradation Thresholds

ModelDegradation OnsetSevere DegradationNotes
GPT-5.2~64K tokens~200K tokensBest overall with thinking mode
Claude Opus 4.5~100K tokens~180K tokensStrong attention management
Claude Sonnet 4.5~80K tokens~150K tokensOptimized for agents/coding
Gemini 3 Pro~500K tokens~800K tokens1M context window

The Four-Bucket Mitigation Approach

StrategyDescriptionUse Case
WriteSave context outside the windowKeep active context lean
SelectPull only relevant contextAddress distraction
CompressReduce tokens, preserve informationExtend effective capacity
IsolateSplit context across sub-agentsPrevent single-context growth

Practical Detection

# Context monitoring pattern
turn_1: 1000 tokens
turn_5: 8000 tokens
turn_10: 25000 tokens
turn_20: 60000 tokens # ⚠️ Degradation begins
turn_30: 90000 tokens # 🔴 Significant degradation

Mitigating Lost-in-Middle

# Organize context with critical info at edges

[CURRENT TASK] # At start (high attention)
- Goal: Generate quarterly report
- Deadline: End of week

[DETAILED CONTEXT] # Middle (lower attention)
- 50 pages of data
- Multiple analysis sections

[KEY FINDINGS] # At end (high attention)
- Revenue up 15%
- Costs down 8%

Guidelines

  1. Monitor context length and performance correlation during development
  2. Place critical information at beginning or end of context
  3. Implement compaction triggers before degradation becomes severe
  4. Validate retrieved documents for accuracy before adding to context
  5. Use versioning to prevent outdated information causing clash
  6. Segment tasks to prevent context confusion across objectives
  7. Design for graceful degradation rather than assuming perfect conditions
  8. Test with progressively larger contexts to find degradation thresholds

Degradation Diagnostic Flowchart

Use this flowchart to diagnose context degradation issues:

START: Agent performance degraded

├── Check context length
│ │
│ ├── <50K tokens → Likely NOT context degradation
│ │ └── Check: prompt quality, task complexity, model fit
│ │
│ └── >50K tokens → Continue diagnosis

├── Identify symptom pattern
│ │
│ ├── Middle content ignored
│ │ └── DIAGNOSIS: Lost-in-Middle
│ │ └── FIX: Move critical info to start/end
│ │
│ ├── Persistent errors despite correction
│ │ └── DIAGNOSIS: Context Poisoning
│ │ └── FIX: Truncate to before poisoning, restart
│ │
│ ├── Irrelevant info affecting output
│ │ └── DIAGNOSIS: Context Distraction
│ │ └── FIX: Filter relevance, namespace sections
│ │
│ ├── Wrong task context applied
│ │ └── DIAGNOSIS: Context Confusion
│ │ └── FIX: Explicit task segmentation, clear transitions
│ │
│ └── Contradictory outputs
│ └── DIAGNOSIS: Context Clash
│ └── FIX: Priority rules, version filtering

└── Select mitigation strategy

├── WRITE: Save context externally, keep window lean
├── SELECT: Pull only relevant context per task
├── COMPRESS: Summarize while preserving key info
└── ISOLATE: Split across sub-agents

Quick Diagnostic Questions:

QuestionYes → Diagnosis
Is critical info in middle 50% of context?Lost-in-Middle
Did a hallucination appear and persist?Context Poisoning
Is >30% of context irrelevant to current task?Context Distraction
Is agent mixing requirements from different tasks?Context Confusion
Are there contradictory facts in context?Context Clash

Skills

  • context-fundamentals - Context basics (prerequisite)
  • context-optimization - Mitigation techniques
  • context-compression - Compression strategies

Agents

  • context-health-analyst - Real-time degradation detection
  • compression-evaluator - Compression quality evaluation

Scripts

  • external/Agent-Skills-for-Context-Engineering/skills/context-degradation/scripts/degradation_detector.py - Context health analysis

Success Output

When successful, this skill MUST output:

✅ SKILL COMPLETE: context-degradation

Completed:
- [x] Degradation pattern identified (lost-in-middle detected)
- [x] Mitigation strategy applied (critical info moved to edges)
- [x] Context compaction performed (30% reduction)
- [x] Performance validation confirms improvement

Outputs:
- reports/context-health-analysis.md (Degradation assessment)
- logs/context-before-after.json (Token usage comparison)

Context Health: GOOD (68K tokens, below degradation threshold)

Completion Checklist

Before marking this skill as complete, verify:

  • Context length monitored across session
  • Degradation pattern correctly identified
  • Mitigation strategy matches pattern type
  • Critical information repositioned to attention-favored zones
  • Context compaction reduces tokens by >20%
  • Performance metrics show improvement post-mitigation
  • No context poisoning introduced during compaction
  • Lost-in-middle phenomenon mitigated
  • Four-bucket strategy documented (Write/Select/Compress/Isolate)
  • Model-specific thresholds respected

Failure Indicators

This skill has FAILED if:

  • ❌ Context continues growing past degradation threshold
  • ❌ Mitigation strategy worsens performance
  • ❌ Critical information lost during compaction
  • ❌ Context poisoning introduced by summarization errors
  • ❌ Performance degradation persists after mitigation
  • ❌ Wrong pattern identified (confusion vs distraction)
  • ❌ Context clash unresolved after filtering
  • ❌ No measurement of before/after performance

When NOT to Use

Do NOT use this skill when:

  • Short conversations with <10K tokens (use context-fundamentals instead)
  • Single-turn interactions without history
  • Context never approaches model limits
  • Task requires full conversation history (no compaction safe)
  • Building systems with guaranteed <32K context
  • Working with models that handle long context perfectly (if such exist)
  • Debugging non-context-related failures (use other diagnostics)

Anti-Patterns (Avoid)

Anti-PatternProblemSolution
Compacting too earlyUnnecessary information lossWait until approaching threshold
Summarization without validationContext poisoning via hallucinationsValidate summaries against originals
Ignoring lost-in-middleMiddle content unreliablePlace critical info at edges
No performance trackingCan't measure mitigation successLog metrics before/after
Uniform compactionLoses context hierarchyPreserve recent/critical content
Mixing conflicting contextsContext clashSegment or prioritize sources
Over-compactionLoses necessary detailsCompress to 70-80% not 10%
No model-specific tuningWrong thresholdsUse model-specific degradation points

Principles

This skill embodies:

  • #4 Monitor and Measure - Tracks context length and performance continuously
  • #5 Eliminate Ambiguity - Clear pattern identification and mitigation mapping
  • #6 Clear, Understandable, Explainable - Documents degradation causes and solutions
  • #8 No Assumptions - Validates mitigation effectiveness with metrics
  • #10 Test Everything - Probes context recall before/after to confirm improvement

Full Standard: CODITECT-STANDARD-AUTOMATION.md