Auto Trigger Configuration (see skills/auto trigger framework/SKILL.md)
Adaptive Retry Parameters
How to Use This Skill
- Review the patterns and examples below
- Apply the relevant patterns to your implementation
- Follow the best practices outlined in this skill
Expert skill for intelligent retry handling that adjusts LLM parameters based on error context. Provides graceful degradation when context limits are exceeded.
When to Use
Use this skill when:
- LLM API calls fail with context-exceeded errors
- Need graceful degradation instead of hard failure
- Implementing retry logic for LLM operations
- Building resilient agent orchestration
- Handling rate limits or quota errors
- Need consistent retry behavior across agents
Don't use this skill when:
- Error is not retryable (validation, auth failures)
- Simple HTTP retry is sufficient (network errors)
- User explicitly requested specific parameters
- Testing requires exact parameter control
Core Algorithm
Adaptive Request Parameters
from dataclasses import dataclass
from typing import Optional, Dict, Any
from enum import Enum
class ErrorType(Enum):
"""Categorized error types for retry handling"""
CONTEXT_EXCEEDED = "context_exceeded"
RATE_LIMITED = "rate_limited"
TIMEOUT = "timeout"
SERVER_ERROR = "server_error"
UNKNOWN = "unknown"
@dataclass
class AdaptiveRequestConfig:
"""Configuration for adaptive retry parameters"""
base_max_tokens: int = 20000
retry_reduction: float = 0.25 # Reduce by 25% per retry
min_max_tokens: int = 4000 # Never go below this
base_temperature: float = 0.7
temperature_reduction: float = 0.1 # Reduce by 0.1 per retry
min_temperature: float = 0.1
max_retries: int = 3
class AdaptiveRequestParameters:
"""
Adjusts LLM request parameters based on retry context.
Key innovation: On context-exceeded errors, reduce max_tokens to request
less output, giving more room for input context.
"""
def __init__(self, config: AdaptiveRequestConfig = None):
self.config = config or AdaptiveRequestConfig()
def classify_error(self, error: Exception) -> ErrorType:
"""Classify error to determine retry strategy"""
error_str = str(error).lower()
if any(term in error_str for term in [
"context", "token limit", "maximum context", "too long"
]):
return ErrorType.CONTEXT_EXCEEDED
if any(term in error_str for term in [
"rate limit", "rate_limit", "too many requests", "429"
]):
return ErrorType.RATE_LIMITED
if any(term in error_str for term in [
"timeout", "timed out", "deadline exceeded"
]):
return ErrorType.TIMEOUT
if any(term in error_str for term in [
"500", "502", "503", "504", "server error", "internal error"
]):
return ErrorType.SERVER_ERROR
return ErrorType.UNKNOWN
def get_params(
self,
retry_count: int,
error_type: ErrorType = None,
original_params: Dict[str, Any] = None
) -> Dict[str, Any]:
"""
Calculate optimal parameters for retry attempt.
Args:
retry_count: Number of previous retry attempts (0 = first try)
error_type: Type of error that triggered retry
original_params: Original request parameters
Returns:
Dictionary of adjusted parameters
"""
original_params = original_params or {}
# Base parameters
max_tokens = original_params.get("max_tokens", self.config.base_max_tokens)
temperature = original_params.get("temperature", self.config.base_temperature)
# No adjustment on first try
if retry_count == 0:
return {
"max_tokens": max_tokens,
"temperature": temperature,
"retry_count": 0,
}
# Adjust based on error type
if error_type == ErrorType.CONTEXT_EXCEEDED:
# Reduce max_tokens aggressively to leave room for input
reduction = self.config.retry_reduction * retry_count
max_tokens = max(
self.config.min_max_tokens,
int(max_tokens * (1 - reduction))
)
# Lower temperature for more focused output
temperature = max(
self.config.min_temperature,
temperature - (self.config.temperature_reduction * retry_count)
)
elif error_type == ErrorType.RATE_LIMITED:
# Don't change params, just add backoff (handled separately)
pass
elif error_type == ErrorType.TIMEOUT:
# Reduce max_tokens slightly to speed up response
max_tokens = max(
self.config.min_max_tokens,
int(max_tokens * 0.8)
)
elif error_type == ErrorType.SERVER_ERROR:
# Slight temperature reduction for stability
temperature = max(
self.config.min_temperature,
temperature - 0.1
)
return {
"max_tokens": max_tokens,
"temperature": round(temperature, 2),
"retry_count": retry_count,
"error_type": error_type.value if error_type else None,
}
def should_retry(self, error_type: ErrorType, retry_count: int) -> bool:
"""Determine if error is retryable"""
if retry_count >= self.config.max_retries:
return False
# Most errors are retryable with adjusted params
retryable_errors = {
ErrorType.CONTEXT_EXCEEDED,
ErrorType.RATE_LIMITED,
ErrorType.TIMEOUT,
ErrorType.SERVER_ERROR,
}
return error_type in retryable_errors
Backoff Calculation
import asyncio
import random
from typing import Optional
class AdaptiveBackoff:
"""Calculate backoff delays for retries"""
def __init__(
self,
base_delay: float = 1.0,
max_delay: float = 60.0,
exponential_base: float = 2.0,
jitter: bool = True
):
self.base_delay = base_delay
self.max_delay = max_delay
self.exponential_base = exponential_base
self.jitter = jitter
def get_delay(
self,
retry_count: int,
error_type: ErrorType = None
) -> float:
"""
Calculate delay before next retry.
Args:
retry_count: Current retry attempt number
error_type: Type of error for specialized handling
Returns:
Delay in seconds
"""
# Base exponential backoff
delay = self.base_delay * (self.exponential_base ** retry_count)
# Adjust for error type
if error_type == ErrorType.RATE_LIMITED:
# Longer delay for rate limits
delay *= 2
elif error_type == ErrorType.TIMEOUT:
# Shorter delay, problem may be transient
delay *= 0.5
# Cap at max
delay = min(delay, self.max_delay)
# Add jitter to prevent thundering herd
if self.jitter:
delay = delay * (0.5 + random.random())
return delay
async def wait(self, retry_count: int, error_type: ErrorType = None):
"""Wait for calculated backoff delay"""
delay = self.get_delay(retry_count, error_type)
await asyncio.sleep(delay)
Integration with LLM Wrapper
import asyncio
from typing import TypeVar, Callable, Awaitable, Optional
import logging
T = TypeVar('T')
logger = logging.getLogger(__name__)
class AdaptiveLLMWrapper:
"""
LLM wrapper with adaptive retry handling.
Automatically adjusts parameters on failure and retries with
optimal settings.
"""
def __init__(
self,
llm_client,
config: AdaptiveRequestConfig = None
):
self.llm_client = llm_client
self.params = AdaptiveRequestParameters(config)
self.backoff = AdaptiveBackoff()
self.metrics = RetryMetrics()
async def call(
self,
prompt: str,
max_tokens: int = None,
temperature: float = None,
**kwargs
) -> str:
"""
Make LLM call with adaptive retry.
Automatically adjusts parameters on failure and retries
with optimal settings.
"""
original_params = {
"max_tokens": max_tokens,
"temperature": temperature,
}
last_error = None
for retry_count in range(self.params.config.max_retries + 1):
# Get parameters for this attempt
error_type = self.params.classify_error(last_error) if last_error else None
params = self.params.get_params(retry_count, error_type, original_params)
try:
logger.info(
f"LLM call attempt {retry_count + 1}, "
f"max_tokens={params['max_tokens']}, "
f"temperature={params['temperature']}"
)
result = await self.llm_client.complete(
prompt=prompt,
max_tokens=params["max_tokens"],
temperature=params["temperature"],
**kwargs
)
# Record success
self.metrics.record_success(retry_count)
return result
except Exception as e:
last_error = e
error_type = self.params.classify_error(e)
logger.warning(
f"LLM call failed (attempt {retry_count + 1}): "
f"{error_type.value} - {str(e)[:100]}"
)
# Record failure
self.metrics.record_failure(error_type, retry_count)
# Check if we should retry
if not self.params.should_retry(error_type, retry_count + 1):
break
# Wait before retry
await self.backoff.wait(retry_count, error_type)
# All retries exhausted
raise last_error
class RetryMetrics:
"""Track retry statistics for observability"""
def __init__(self):
self.total_calls = 0
self.successful_calls = 0
self.retried_calls = 0
self.failed_calls = 0
self.errors_by_type: Dict[str, int] = {}
def record_success(self, retry_count: int):
self.total_calls += 1
self.successful_calls += 1
if retry_count > 0:
self.retried_calls += 1
def record_failure(self, error_type: ErrorType, retry_count: int):
self.errors_by_type[error_type.value] = (
self.errors_by_type.get(error_type.value, 0) + 1
)
def record_exhausted(self):
self.total_calls += 1
self.failed_calls += 1
def get_stats(self) -> dict:
return {
"total_calls": self.total_calls,
"success_rate": (
self.successful_calls / self.total_calls
if self.total_calls > 0 else 0
),
"retry_rate": (
self.retried_calls / self.total_calls
if self.total_calls > 0 else 0
),
"errors_by_type": self.errors_by_type,
}
Usage Examples
Basic Parameter Adjustment
# Create parameter manager
params = AdaptiveRequestParameters()
# First attempt - use original params
p1 = params.get_params(retry_count=0)
# Result: {"max_tokens": 20000, "temperature": 0.7, "retry_count": 0}
# After context exceeded error
p2 = params.get_params(
retry_count=1,
error_type=ErrorType.CONTEXT_EXCEEDED
)
# Result: {"max_tokens": 15000, "temperature": 0.6, "retry_count": 1}
# Second retry
p3 = params.get_params(
retry_count=2,
error_type=ErrorType.CONTEXT_EXCEEDED
)
# Result: {"max_tokens": 10000, "temperature": 0.5, "retry_count": 2}
With LLM Wrapper
# Create adaptive wrapper
wrapper = AdaptiveLLMWrapper(
llm_client=anthropic_client,
config=AdaptiveRequestConfig(
base_max_tokens=16000,
retry_reduction=0.3,
max_retries=3
)
)
# Make call - automatically handles retries
try:
response = await wrapper.call(
prompt="Implement the authentication handler...",
max_tokens=8000,
temperature=0.5
)
except Exception as e:
logger.error(f"All retries failed: {e}")
# Handle final failure
Custom Configuration
# Conservative config (for critical operations)
conservative = AdaptiveRequestConfig(
base_max_tokens=10000,
retry_reduction=0.2, # Smaller reduction per retry
min_max_tokens=4000,
max_retries=5, # More retry attempts
)
# Aggressive config (for speed)
aggressive = AdaptiveRequestConfig(
base_max_tokens=20000,
retry_reduction=0.4, # Larger reduction
min_max_tokens=2000,
max_retries=2, # Fewer retries
)
Best Practices
DO
- Classify errors properly - Different errors need different strategies
- Reduce max_tokens on context errors - Leave room for input
- Use exponential backoff - Prevent overwhelming the service
- Add jitter - Prevent thundering herd on recovery
- Track metrics - Monitor retry rates and patterns
- Log retry attempts - For debugging and observability
- Set minimum thresholds - Don't reduce parameters too far
DON'T
- Don't retry validation errors - They won't succeed with different params
- Don't ignore error types - Each needs specific handling
- Don't retry forever - Set max retry limits
- Don't forget backoff - Rapid retries cause rate limits
- Don't hide failures - Surface errors after max retries
- Don't ignore original params - Use them as baseline
Configuration Reference
| Parameter | Default | Description |
|---|---|---|
base_max_tokens | 20000 | Starting max_tokens value |
retry_reduction | 0.25 | Reduction per retry (25%) |
min_max_tokens | 4000 | Never reduce below this |
base_temperature | 0.7 | Starting temperature |
temperature_reduction | 0.1 | Temperature reduction per retry |
min_temperature | 0.1 | Never reduce below this |
max_retries | 3 | Maximum retry attempts |
base_delay | 1.0s | Initial backoff delay |
max_delay | 60.0s | Maximum backoff delay |
Error Type Handling
| Error Type | Parameter Adjustment | Backoff Multiplier |
|---|---|---|
CONTEXT_EXCEEDED | Reduce max_tokens 25%, temp 0.1 | 1x |
RATE_LIMITED | No change | 2x |
TIMEOUT | Reduce max_tokens 20% | 0.5x |
SERVER_ERROR | Reduce temperature 0.1 | 1x |
UNKNOWN | No change | 1x |
Integration with CODITECT
Recommended integration points:
| Component | Configuration | Notes |
|---|---|---|
| Orchestrator | Conservative | Critical coordination |
| Implementation agents | Default | Balance speed/reliability |
| Research agents | Aggressive | Can accept shorter responses |
| Code generation | Conservative | Quality matters |
Success Metrics
| Metric | Target | Measurement |
|---|---|---|
| Context-exceeded recoveries | 70%+ | Successful after param reduction |
| False exhaustion | <5% | Gave up when retry would succeed |
| Average retry count | <1.5 | Most calls succeed first try |
| Response quality after retry | >90% | Quality maintained with reduced params |
Quick Scenario Configuration
Copy-paste configurations for common retry scenarios:
| Scenario | Config | Rationale |
|---|---|---|
| LLM Code Generation | max_retries=3, retry_reduction=0.25, min_max_tokens=4000 | Quality matters, allow 3 attempts |
| LLM Summarization | max_retries=2, retry_reduction=0.4, min_max_tokens=2000 | Shorter output acceptable |
| Critical Production | max_retries=5, retry_reduction=0.15, min_max_tokens=8000 | High reliability, slow reduction |
| Cost-Sensitive Batch | max_retries=1, retry_reduction=0.5, min_max_tokens=1000 | Fail fast, minimize token usage |
| Interactive Chat | max_retries=2, retry_reduction=0.3, min_max_tokens=3000 | Balance responsiveness and quality |
Decision Tree:
What's the priority?
├── Quality → Conservative: max_retries=5, retry_reduction=0.15
├── Speed → Aggressive: max_retries=2, retry_reduction=0.4
├── Cost → Minimal: max_retries=1, retry_reduction=0.5
└── Balance → Default: max_retries=3, retry_reduction=0.25
Source Reference
This pattern was extracted from DeepCode (HKUDS/DeepCode) multi-agent system.
Original location: workflows/agent_orchestration_engine.py (Adaptive Request Parameters section)
Original codebase stats:
- 51 Python files analyzed
- 33,497 lines of code
- 12 patterns extracted
See /submodules/labs/DeepCode/DEEP-ANALYSIS.md for complete analysis.
Multi-Context Window Support
State Tracking
Retry State (JSON):
{
"checkpoint_id": "ckpt_retry_20251214",
"retry_metrics": {
"total_calls": 150,
"successful_calls": 145,
"retried_calls": 23,
"failed_calls": 5,
"errors_by_type": {
"context_exceeded": 18,
"rate_limited": 3,
"timeout": 2
}
},
"config": {
"base_max_tokens": 20000,
"retry_reduction": 0.25,
"max_retries": 3
}
}
Progress Notes (Markdown):
# Adaptive Retry Progress - 2025-12-14
## Metrics Summary
- Success rate: 96.7%
- Retry rate: 15.3%
- Recovery rate (of retried): 78.3%
## Observations
- Context exceeded most common error (78%)
- Rate limits rare after adding backoff
- Average 1.2 retries when needed
## Configuration Adjustments
- Increased retry_reduction from 0.2 to 0.25
- Added 2x multiplier for rate limit backoff
Session Recovery
When starting a fresh context window:
- Load metrics: Read
.coditect/checkpoints/adaptive-retry-latest.json - Review configuration: Check if adjustments needed based on metrics
- Continue monitoring: Resume normal retry handling
Token Savings: ~30% reduction by maintaining optimized configuration
Success Output
When successful, this skill MUST output:
✅ SKILL COMPLETE: adaptive-retry
Completed:
- [x] AdaptiveRequestParameters class implemented with error classification
- [x] Exponential backoff with jitter configured
- [x] LLM wrapper with automatic retry logic deployed
- [x] Retry metrics tracking operational
- [x] Context-exceeded errors recover via max_tokens reduction
- [x] Rate limit handling with extended backoff functional
Outputs:
- src/retry/adaptive_request_parameters.py (Parameter adjustment logic)
- src/retry/adaptive_backoff.py (Backoff calculation with jitter)
- src/retry/adaptive_llm_wrapper.py (LLM client wrapper with retry)
- src/retry/retry_metrics.py (Success/failure tracking)
- config/retry_config.json (Retry configuration settings)
Performance Metrics:
- Context-exceeded recovery rate: 78% (target: >70%)
- Average retry count: 1.2 retries (target: <1.5)
- False exhaustion rate: 3% (target: <5%)
- Response quality after retry: 92% (target: >90%)
Completion Checklist
Before marking this skill as complete, verify:
- AdaptiveRequestParameters correctly classifies error types (context, rate_limited, timeout, server_error)
- max_tokens reduces by configured percentage on context-exceeded errors
- Temperature adjusts downward on retries for more focused output
- Minimum thresholds prevent excessive parameter reduction
- Exponential backoff calculates delay correctly (base_delay * 2^retry_count)
- Jitter adds randomization to prevent thundering herd
- Error-specific backoff multipliers applied (2x for rate_limited, 0.5x for timeout)
- should_retry() correctly identifies retryable vs non-retryable errors
- RetryMetrics tracks total calls, successes, retries, and failures
- LLM wrapper exhausts all retries before raising final error
- Configuration allows customization of reduction percentages and max retries
- All outputs exist at expected locations and pass validation
Failure Indicators
This skill has FAILED if:
- ❌ Context-exceeded errors not classified correctly
- ❌ max_tokens not reduced on retry attempts
- ❌ Parameters reduced below minimum thresholds
- ❌ Backoff delay not increasing exponentially
- ❌ Jitter not applied (multiple failures cluster at same time)
- ❌ Non-retryable errors (auth, validation) attempted for retry
- ❌ Retry count exceeds max_retries configuration
- ❌ Metrics not recording retry attempts
- ❌ Original parameters not used as baseline
- ❌ Wrapper allows infinite retry loops
- ❌ Rate limit errors retried immediately without backoff
When NOT to Use
Do NOT use this skill when:
- Single LLM call with no retry requirement
- Error is definitively non-retryable (invalid API key, malformed request)
- User explicitly requested specific LLM parameters (must honor exact values)
- Testing requires deterministic behavior (retries introduce variability)
- Simple HTTP retry middleware sufficient (network errors only)
- Cost optimization prioritizes minimal API calls over success rate
- Real-time applications cannot tolerate retry delays
- Circuit breaker already open (retry will fail anyway)
Alternative approaches:
- Non-retryable errors: Fail fast and surface error to user
- Deterministic testing: Use mock LLM client with controlled responses
- Network errors only: Use standard HTTP retry library (e.g., requests.adapters.Retry)
- Cost-sensitive: Set max_retries=0 or use cheaper model
Anti-Patterns (Avoid)
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Retrying validation errors | Validation won't succeed with different params | Classify errors correctly; only retry transient failures |
| Ignoring error type | All errors retried the same way | Use error classification to apply appropriate strategy |
| No backoff between retries | Rapid retries trigger rate limits | Implement exponential backoff with jitter |
| Reducing parameters too aggressively | Output quality degrades unacceptably | Set reasonable min_max_tokens and retry_reduction values |
| Infinite retries | Wastes tokens and time on doomed requests | Enforce max_retries limit |
| Hiding final failures | User never sees underlying error | Raise last error after max retries exhausted |
| Not tracking metrics | No visibility into retry patterns | Log all retry attempts with RetryMetrics |
| Overriding user parameters | User asked for specific temperature | Only adjust on retries, respect original on first attempt |
| Retrying without backoff | Overwhelms service during recovery | Always wait before retry (even 1 second minimum) |
Principles
This skill embodies:
- #2 Resilience First - Graceful degradation via parameter adjustment prevents hard failures
- #3 Fail Gracefully - Reduce output quality incrementally rather than failing outright
- #5 Eliminate Ambiguity - Clear error classification determines retry strategy
- #6 Clear, Understandable, Explainable - Explicit parameter adjustment logic; no magic numbers
- #7 Optimize for Context - Context-exceeded errors handled specifically by reducing max_tokens
- #8 No Assumptions - Classify every error type; don't assume all errors are retryable
- #10 Automation First - Retry logic automated; no manual parameter tuning required
- #11 Observability - Metrics track retry patterns for continuous improvement
Full Standard: CODITECT-STANDARD-AUTOMATION.md