Skip to main content

Auto Trigger Configuration (see skills/auto trigger framework/SKILL.md)

Adaptive Retry Parameters

How to Use This Skill

  1. Review the patterns and examples below
  2. Apply the relevant patterns to your implementation
  3. Follow the best practices outlined in this skill

Expert skill for intelligent retry handling that adjusts LLM parameters based on error context. Provides graceful degradation when context limits are exceeded.

When to Use

Use this skill when:

  • LLM API calls fail with context-exceeded errors
  • Need graceful degradation instead of hard failure
  • Implementing retry logic for LLM operations
  • Building resilient agent orchestration
  • Handling rate limits or quota errors
  • Need consistent retry behavior across agents

Don't use this skill when:

  • Error is not retryable (validation, auth failures)
  • Simple HTTP retry is sufficient (network errors)
  • User explicitly requested specific parameters
  • Testing requires exact parameter control

Core Algorithm

Adaptive Request Parameters

from dataclasses import dataclass
from typing import Optional, Dict, Any
from enum import Enum


class ErrorType(Enum):
"""Categorized error types for retry handling"""
CONTEXT_EXCEEDED = "context_exceeded"
RATE_LIMITED = "rate_limited"
TIMEOUT = "timeout"
SERVER_ERROR = "server_error"
UNKNOWN = "unknown"


@dataclass
class AdaptiveRequestConfig:
"""Configuration for adaptive retry parameters"""
base_max_tokens: int = 20000
retry_reduction: float = 0.25 # Reduce by 25% per retry
min_max_tokens: int = 4000 # Never go below this
base_temperature: float = 0.7
temperature_reduction: float = 0.1 # Reduce by 0.1 per retry
min_temperature: float = 0.1
max_retries: int = 3


class AdaptiveRequestParameters:
"""
Adjusts LLM request parameters based on retry context.

Key innovation: On context-exceeded errors, reduce max_tokens to request
less output, giving more room for input context.
"""

def __init__(self, config: AdaptiveRequestConfig = None):
self.config = config or AdaptiveRequestConfig()

def classify_error(self, error: Exception) -> ErrorType:
"""Classify error to determine retry strategy"""
error_str = str(error).lower()

if any(term in error_str for term in [
"context", "token limit", "maximum context", "too long"
]):
return ErrorType.CONTEXT_EXCEEDED

if any(term in error_str for term in [
"rate limit", "rate_limit", "too many requests", "429"
]):
return ErrorType.RATE_LIMITED

if any(term in error_str for term in [
"timeout", "timed out", "deadline exceeded"
]):
return ErrorType.TIMEOUT

if any(term in error_str for term in [
"500", "502", "503", "504", "server error", "internal error"
]):
return ErrorType.SERVER_ERROR

return ErrorType.UNKNOWN

def get_params(
self,
retry_count: int,
error_type: ErrorType = None,
original_params: Dict[str, Any] = None
) -> Dict[str, Any]:
"""
Calculate optimal parameters for retry attempt.

Args:
retry_count: Number of previous retry attempts (0 = first try)
error_type: Type of error that triggered retry
original_params: Original request parameters

Returns:
Dictionary of adjusted parameters
"""
original_params = original_params or {}

# Base parameters
max_tokens = original_params.get("max_tokens", self.config.base_max_tokens)
temperature = original_params.get("temperature", self.config.base_temperature)

# No adjustment on first try
if retry_count == 0:
return {
"max_tokens": max_tokens,
"temperature": temperature,
"retry_count": 0,
}

# Adjust based on error type
if error_type == ErrorType.CONTEXT_EXCEEDED:
# Reduce max_tokens aggressively to leave room for input
reduction = self.config.retry_reduction * retry_count
max_tokens = max(
self.config.min_max_tokens,
int(max_tokens * (1 - reduction))
)

# Lower temperature for more focused output
temperature = max(
self.config.min_temperature,
temperature - (self.config.temperature_reduction * retry_count)
)

elif error_type == ErrorType.RATE_LIMITED:
# Don't change params, just add backoff (handled separately)
pass

elif error_type == ErrorType.TIMEOUT:
# Reduce max_tokens slightly to speed up response
max_tokens = max(
self.config.min_max_tokens,
int(max_tokens * 0.8)
)

elif error_type == ErrorType.SERVER_ERROR:
# Slight temperature reduction for stability
temperature = max(
self.config.min_temperature,
temperature - 0.1
)

return {
"max_tokens": max_tokens,
"temperature": round(temperature, 2),
"retry_count": retry_count,
"error_type": error_type.value if error_type else None,
}

def should_retry(self, error_type: ErrorType, retry_count: int) -> bool:
"""Determine if error is retryable"""
if retry_count >= self.config.max_retries:
return False

# Most errors are retryable with adjusted params
retryable_errors = {
ErrorType.CONTEXT_EXCEEDED,
ErrorType.RATE_LIMITED,
ErrorType.TIMEOUT,
ErrorType.SERVER_ERROR,
}

return error_type in retryable_errors

Backoff Calculation

import asyncio
import random
from typing import Optional


class AdaptiveBackoff:
"""Calculate backoff delays for retries"""

def __init__(
self,
base_delay: float = 1.0,
max_delay: float = 60.0,
exponential_base: float = 2.0,
jitter: bool = True
):
self.base_delay = base_delay
self.max_delay = max_delay
self.exponential_base = exponential_base
self.jitter = jitter

def get_delay(
self,
retry_count: int,
error_type: ErrorType = None
) -> float:
"""
Calculate delay before next retry.

Args:
retry_count: Current retry attempt number
error_type: Type of error for specialized handling

Returns:
Delay in seconds
"""
# Base exponential backoff
delay = self.base_delay * (self.exponential_base ** retry_count)

# Adjust for error type
if error_type == ErrorType.RATE_LIMITED:
# Longer delay for rate limits
delay *= 2

elif error_type == ErrorType.TIMEOUT:
# Shorter delay, problem may be transient
delay *= 0.5

# Cap at max
delay = min(delay, self.max_delay)

# Add jitter to prevent thundering herd
if self.jitter:
delay = delay * (0.5 + random.random())

return delay

async def wait(self, retry_count: int, error_type: ErrorType = None):
"""Wait for calculated backoff delay"""
delay = self.get_delay(retry_count, error_type)
await asyncio.sleep(delay)

Integration with LLM Wrapper

import asyncio
from typing import TypeVar, Callable, Awaitable, Optional
import logging

T = TypeVar('T')
logger = logging.getLogger(__name__)


class AdaptiveLLMWrapper:
"""
LLM wrapper with adaptive retry handling.

Automatically adjusts parameters on failure and retries with
optimal settings.
"""

def __init__(
self,
llm_client,
config: AdaptiveRequestConfig = None
):
self.llm_client = llm_client
self.params = AdaptiveRequestParameters(config)
self.backoff = AdaptiveBackoff()
self.metrics = RetryMetrics()

async def call(
self,
prompt: str,
max_tokens: int = None,
temperature: float = None,
**kwargs
) -> str:
"""
Make LLM call with adaptive retry.

Automatically adjusts parameters on failure and retries
with optimal settings.
"""
original_params = {
"max_tokens": max_tokens,
"temperature": temperature,
}

last_error = None
for retry_count in range(self.params.config.max_retries + 1):
# Get parameters for this attempt
error_type = self.params.classify_error(last_error) if last_error else None
params = self.params.get_params(retry_count, error_type, original_params)

try:
logger.info(
f"LLM call attempt {retry_count + 1}, "
f"max_tokens={params['max_tokens']}, "
f"temperature={params['temperature']}"
)

result = await self.llm_client.complete(
prompt=prompt,
max_tokens=params["max_tokens"],
temperature=params["temperature"],
**kwargs
)

# Record success
self.metrics.record_success(retry_count)
return result

except Exception as e:
last_error = e
error_type = self.params.classify_error(e)

logger.warning(
f"LLM call failed (attempt {retry_count + 1}): "
f"{error_type.value} - {str(e)[:100]}"
)

# Record failure
self.metrics.record_failure(error_type, retry_count)

# Check if we should retry
if not self.params.should_retry(error_type, retry_count + 1):
break

# Wait before retry
await self.backoff.wait(retry_count, error_type)

# All retries exhausted
raise last_error


class RetryMetrics:
"""Track retry statistics for observability"""

def __init__(self):
self.total_calls = 0
self.successful_calls = 0
self.retried_calls = 0
self.failed_calls = 0
self.errors_by_type: Dict[str, int] = {}

def record_success(self, retry_count: int):
self.total_calls += 1
self.successful_calls += 1
if retry_count > 0:
self.retried_calls += 1

def record_failure(self, error_type: ErrorType, retry_count: int):
self.errors_by_type[error_type.value] = (
self.errors_by_type.get(error_type.value, 0) + 1
)

def record_exhausted(self):
self.total_calls += 1
self.failed_calls += 1

def get_stats(self) -> dict:
return {
"total_calls": self.total_calls,
"success_rate": (
self.successful_calls / self.total_calls
if self.total_calls > 0 else 0
),
"retry_rate": (
self.retried_calls / self.total_calls
if self.total_calls > 0 else 0
),
"errors_by_type": self.errors_by_type,
}

Usage Examples

Basic Parameter Adjustment

# Create parameter manager
params = AdaptiveRequestParameters()

# First attempt - use original params
p1 = params.get_params(retry_count=0)
# Result: {"max_tokens": 20000, "temperature": 0.7, "retry_count": 0}

# After context exceeded error
p2 = params.get_params(
retry_count=1,
error_type=ErrorType.CONTEXT_EXCEEDED
)
# Result: {"max_tokens": 15000, "temperature": 0.6, "retry_count": 1}

# Second retry
p3 = params.get_params(
retry_count=2,
error_type=ErrorType.CONTEXT_EXCEEDED
)
# Result: {"max_tokens": 10000, "temperature": 0.5, "retry_count": 2}

With LLM Wrapper

# Create adaptive wrapper
wrapper = AdaptiveLLMWrapper(
llm_client=anthropic_client,
config=AdaptiveRequestConfig(
base_max_tokens=16000,
retry_reduction=0.3,
max_retries=3
)
)

# Make call - automatically handles retries
try:
response = await wrapper.call(
prompt="Implement the authentication handler...",
max_tokens=8000,
temperature=0.5
)
except Exception as e:
logger.error(f"All retries failed: {e}")
# Handle final failure

Custom Configuration

# Conservative config (for critical operations)
conservative = AdaptiveRequestConfig(
base_max_tokens=10000,
retry_reduction=0.2, # Smaller reduction per retry
min_max_tokens=4000,
max_retries=5, # More retry attempts
)

# Aggressive config (for speed)
aggressive = AdaptiveRequestConfig(
base_max_tokens=20000,
retry_reduction=0.4, # Larger reduction
min_max_tokens=2000,
max_retries=2, # Fewer retries
)

Best Practices

DO

  • Classify errors properly - Different errors need different strategies
  • Reduce max_tokens on context errors - Leave room for input
  • Use exponential backoff - Prevent overwhelming the service
  • Add jitter - Prevent thundering herd on recovery
  • Track metrics - Monitor retry rates and patterns
  • Log retry attempts - For debugging and observability
  • Set minimum thresholds - Don't reduce parameters too far

DON'T

  • Don't retry validation errors - They won't succeed with different params
  • Don't ignore error types - Each needs specific handling
  • Don't retry forever - Set max retry limits
  • Don't forget backoff - Rapid retries cause rate limits
  • Don't hide failures - Surface errors after max retries
  • Don't ignore original params - Use them as baseline

Configuration Reference

ParameterDefaultDescription
base_max_tokens20000Starting max_tokens value
retry_reduction0.25Reduction per retry (25%)
min_max_tokens4000Never reduce below this
base_temperature0.7Starting temperature
temperature_reduction0.1Temperature reduction per retry
min_temperature0.1Never reduce below this
max_retries3Maximum retry attempts
base_delay1.0sInitial backoff delay
max_delay60.0sMaximum backoff delay

Error Type Handling

Error TypeParameter AdjustmentBackoff Multiplier
CONTEXT_EXCEEDEDReduce max_tokens 25%, temp 0.11x
RATE_LIMITEDNo change2x
TIMEOUTReduce max_tokens 20%0.5x
SERVER_ERRORReduce temperature 0.11x
UNKNOWNNo change1x

Integration with CODITECT

Recommended integration points:

ComponentConfigurationNotes
OrchestratorConservativeCritical coordination
Implementation agentsDefaultBalance speed/reliability
Research agentsAggressiveCan accept shorter responses
Code generationConservativeQuality matters

Success Metrics

MetricTargetMeasurement
Context-exceeded recoveries70%+Successful after param reduction
False exhaustion<5%Gave up when retry would succeed
Average retry count<1.5Most calls succeed first try
Response quality after retry>90%Quality maintained with reduced params

Quick Scenario Configuration

Copy-paste configurations for common retry scenarios:

ScenarioConfigRationale
LLM Code Generationmax_retries=3, retry_reduction=0.25, min_max_tokens=4000Quality matters, allow 3 attempts
LLM Summarizationmax_retries=2, retry_reduction=0.4, min_max_tokens=2000Shorter output acceptable
Critical Productionmax_retries=5, retry_reduction=0.15, min_max_tokens=8000High reliability, slow reduction
Cost-Sensitive Batchmax_retries=1, retry_reduction=0.5, min_max_tokens=1000Fail fast, minimize token usage
Interactive Chatmax_retries=2, retry_reduction=0.3, min_max_tokens=3000Balance responsiveness and quality

Decision Tree:

What's the priority?
├── Quality → Conservative: max_retries=5, retry_reduction=0.15
├── Speed → Aggressive: max_retries=2, retry_reduction=0.4
├── Cost → Minimal: max_retries=1, retry_reduction=0.5
└── Balance → Default: max_retries=3, retry_reduction=0.25

Source Reference

This pattern was extracted from DeepCode (HKUDS/DeepCode) multi-agent system.

Original location: workflows/agent_orchestration_engine.py (Adaptive Request Parameters section)

Original codebase stats:

  • 51 Python files analyzed
  • 33,497 lines of code
  • 12 patterns extracted

See /submodules/labs/DeepCode/DEEP-ANALYSIS.md for complete analysis.

Multi-Context Window Support

State Tracking

Retry State (JSON):

{
"checkpoint_id": "ckpt_retry_20251214",
"retry_metrics": {
"total_calls": 150,
"successful_calls": 145,
"retried_calls": 23,
"failed_calls": 5,
"errors_by_type": {
"context_exceeded": 18,
"rate_limited": 3,
"timeout": 2
}
},
"config": {
"base_max_tokens": 20000,
"retry_reduction": 0.25,
"max_retries": 3
}
}

Progress Notes (Markdown):

# Adaptive Retry Progress - 2025-12-14

## Metrics Summary
- Success rate: 96.7%
- Retry rate: 15.3%
- Recovery rate (of retried): 78.3%

## Observations
- Context exceeded most common error (78%)
- Rate limits rare after adding backoff
- Average 1.2 retries when needed

## Configuration Adjustments
- Increased retry_reduction from 0.2 to 0.25
- Added 2x multiplier for rate limit backoff

Session Recovery

When starting a fresh context window:

  1. Load metrics: Read .coditect/checkpoints/adaptive-retry-latest.json
  2. Review configuration: Check if adjustments needed based on metrics
  3. Continue monitoring: Resume normal retry handling

Token Savings: ~30% reduction by maintaining optimized configuration


Success Output

When successful, this skill MUST output:

✅ SKILL COMPLETE: adaptive-retry

Completed:
- [x] AdaptiveRequestParameters class implemented with error classification
- [x] Exponential backoff with jitter configured
- [x] LLM wrapper with automatic retry logic deployed
- [x] Retry metrics tracking operational
- [x] Context-exceeded errors recover via max_tokens reduction
- [x] Rate limit handling with extended backoff functional

Outputs:
- src/retry/adaptive_request_parameters.py (Parameter adjustment logic)
- src/retry/adaptive_backoff.py (Backoff calculation with jitter)
- src/retry/adaptive_llm_wrapper.py (LLM client wrapper with retry)
- src/retry/retry_metrics.py (Success/failure tracking)
- config/retry_config.json (Retry configuration settings)

Performance Metrics:
- Context-exceeded recovery rate: 78% (target: >70%)
- Average retry count: 1.2 retries (target: <1.5)
- False exhaustion rate: 3% (target: <5%)
- Response quality after retry: 92% (target: >90%)

Completion Checklist

Before marking this skill as complete, verify:

  • AdaptiveRequestParameters correctly classifies error types (context, rate_limited, timeout, server_error)
  • max_tokens reduces by configured percentage on context-exceeded errors
  • Temperature adjusts downward on retries for more focused output
  • Minimum thresholds prevent excessive parameter reduction
  • Exponential backoff calculates delay correctly (base_delay * 2^retry_count)
  • Jitter adds randomization to prevent thundering herd
  • Error-specific backoff multipliers applied (2x for rate_limited, 0.5x for timeout)
  • should_retry() correctly identifies retryable vs non-retryable errors
  • RetryMetrics tracks total calls, successes, retries, and failures
  • LLM wrapper exhausts all retries before raising final error
  • Configuration allows customization of reduction percentages and max retries
  • All outputs exist at expected locations and pass validation

Failure Indicators

This skill has FAILED if:

  • ❌ Context-exceeded errors not classified correctly
  • ❌ max_tokens not reduced on retry attempts
  • ❌ Parameters reduced below minimum thresholds
  • ❌ Backoff delay not increasing exponentially
  • ❌ Jitter not applied (multiple failures cluster at same time)
  • ❌ Non-retryable errors (auth, validation) attempted for retry
  • ❌ Retry count exceeds max_retries configuration
  • ❌ Metrics not recording retry attempts
  • ❌ Original parameters not used as baseline
  • ❌ Wrapper allows infinite retry loops
  • ❌ Rate limit errors retried immediately without backoff

When NOT to Use

Do NOT use this skill when:

  • Single LLM call with no retry requirement
  • Error is definitively non-retryable (invalid API key, malformed request)
  • User explicitly requested specific LLM parameters (must honor exact values)
  • Testing requires deterministic behavior (retries introduce variability)
  • Simple HTTP retry middleware sufficient (network errors only)
  • Cost optimization prioritizes minimal API calls over success rate
  • Real-time applications cannot tolerate retry delays
  • Circuit breaker already open (retry will fail anyway)

Alternative approaches:

  • Non-retryable errors: Fail fast and surface error to user
  • Deterministic testing: Use mock LLM client with controlled responses
  • Network errors only: Use standard HTTP retry library (e.g., requests.adapters.Retry)
  • Cost-sensitive: Set max_retries=0 or use cheaper model

Anti-Patterns (Avoid)

Anti-PatternProblemSolution
Retrying validation errorsValidation won't succeed with different paramsClassify errors correctly; only retry transient failures
Ignoring error typeAll errors retried the same wayUse error classification to apply appropriate strategy
No backoff between retriesRapid retries trigger rate limitsImplement exponential backoff with jitter
Reducing parameters too aggressivelyOutput quality degrades unacceptablySet reasonable min_max_tokens and retry_reduction values
Infinite retriesWastes tokens and time on doomed requestsEnforce max_retries limit
Hiding final failuresUser never sees underlying errorRaise last error after max retries exhausted
Not tracking metricsNo visibility into retry patternsLog all retry attempts with RetryMetrics
Overriding user parametersUser asked for specific temperatureOnly adjust on retries, respect original on first attempt
Retrying without backoffOverwhelms service during recoveryAlways wait before retry (even 1 second minimum)

Principles

This skill embodies:

  • #2 Resilience First - Graceful degradation via parameter adjustment prevents hard failures
  • #3 Fail Gracefully - Reduce output quality incrementally rather than failing outright
  • #5 Eliminate Ambiguity - Clear error classification determines retry strategy
  • #6 Clear, Understandable, Explainable - Explicit parameter adjustment logic; no magic numbers
  • #7 Optimize for Context - Context-exceeded errors handled specifically by reducing max_tokens
  • #8 No Assumptions - Classify every error type; don't assume all errors are retryable
  • #10 Automation First - Retry logic automated; no manual parameter tuning required
  • #11 Observability - Metrics track retry patterns for continuous improvement

Full Standard: CODITECT-STANDARD-AUTOMATION.md