Auto Trigger Configuration (see skills/auto trigger framework/SKILL.md)

Adaptive Retry Parameters

How to Use This Skill

Review the patterns and examples below
Apply the relevant patterns to your implementation
Follow the best practices outlined in this skill

Expert skill for intelligent retry handling that adjusts LLM parameters based on error context. Provides graceful degradation when context limits are exceeded.

When to Use

Use this skill when:

LLM API calls fail with context-exceeded errors
Need graceful degradation instead of hard failure
Implementing retry logic for LLM operations
Building resilient agent orchestration
Handling rate limits or quota errors
Need consistent retry behavior across agents

Don't use this skill when:

Error is not retryable (validation, auth failures)
Simple HTTP retry is sufficient (network errors)
User explicitly requested specific parameters
Testing requires exact parameter control

Core Algorithm

Adaptive Request Parameters

from dataclasses import dataclass
from typing import Optional, Dict, Any
from enum import Enum


class ErrorType(Enum):
    """Categorized error types for retry handling"""
    CONTEXT_EXCEEDED = "context_exceeded"
    RATE_LIMITED = "rate_limited"
    TIMEOUT = "timeout"
    SERVER_ERROR = "server_error"
    UNKNOWN = "unknown"


@dataclass
class AdaptiveRequestConfig:
    """Configuration for adaptive retry parameters"""
    base_max_tokens: int = 20000
    retry_reduction: float = 0.25  # Reduce by 25% per retry
    min_max_tokens: int = 4000  # Never go below this
    base_temperature: float = 0.7
    temperature_reduction: float = 0.1  # Reduce by 0.1 per retry
    min_temperature: float = 0.1
    max_retries: int = 3


class AdaptiveRequestParameters:
    """
    Adjusts LLM request parameters based on retry context.

    Key innovation: On context-exceeded errors, reduce max_tokens to request
    less output, giving more room for input context.
    """

    def __init__(self, config: AdaptiveRequestConfig = None):
        self.config = config or AdaptiveRequestConfig()

    def classify_error(self, error: Exception) -> ErrorType:
        """Classify error to determine retry strategy"""
        error_str = str(error).lower()

        if any(term in error_str for term in [
            "context", "token limit", "maximum context", "too long"
        ]):
            return ErrorType.CONTEXT_EXCEEDED

        if any(term in error_str for term in [
            "rate limit", "rate_limit", "too many requests", "429"
        ]):
            return ErrorType.RATE_LIMITED

        if any(term in error_str for term in [
            "timeout", "timed out", "deadline exceeded"
        ]):
            return ErrorType.TIMEOUT

        if any(term in error_str for term in [
            "500", "502", "503", "504", "server error", "internal error"
        ]):
            return ErrorType.SERVER_ERROR

        return ErrorType.UNKNOWN

    def get_params(
        self,
        retry_count: int,
        error_type: ErrorType = None,
        original_params: Dict[str, Any] = None
    ) -> Dict[str, Any]:
        """
        Calculate optimal parameters for retry attempt.

        Args:
            retry_count: Number of previous retry attempts (0 = first try)
            error_type: Type of error that triggered retry
            original_params: Original request parameters

        Returns:
            Dictionary of adjusted parameters
        """
        original_params = original_params or {}

        # Base parameters
        max_tokens = original_params.get("max_tokens", self.config.base_max_tokens)
        temperature = original_params.get("temperature", self.config.base_temperature)

        # No adjustment on first try
        if retry_count == 0:
            return {
                "max_tokens": max_tokens,
                "temperature": temperature,
                "retry_count": 0,
            }

        # Adjust based on error type
        if error_type == ErrorType.CONTEXT_EXCEEDED:
            # Reduce max_tokens aggressively to leave room for input
            reduction = self.config.retry_reduction * retry_count
            max_tokens = max(
                self.config.min_max_tokens,
                int(max_tokens * (1 - reduction))
            )

            # Lower temperature for more focused output
            temperature = max(
                self.config.min_temperature,
                temperature - (self.config.temperature_reduction * retry_count)
            )

        elif error_type == ErrorType.RATE_LIMITED:
            # Don't change params, just add backoff (handled separately)
            pass

        elif error_type == ErrorType.TIMEOUT:
            # Reduce max_tokens slightly to speed up response
            max_tokens = max(
                self.config.min_max_tokens,
                int(max_tokens * 0.8)
            )

        elif error_type == ErrorType.SERVER_ERROR:
            # Slight temperature reduction for stability
            temperature = max(
                self.config.min_temperature,
                temperature - 0.1
            )

        return {
            "max_tokens": max_tokens,
            "temperature": round(temperature, 2),
            "retry_count": retry_count,
            "error_type": error_type.value if error_type else None,
        }

    def should_retry(self, error_type: ErrorType, retry_count: int) -> bool:
        """Determine if error is retryable"""
        if retry_count >= self.config.max_retries:
            return False

        # Most errors are retryable with adjusted params
        retryable_errors = {
            ErrorType.CONTEXT_EXCEEDED,
            ErrorType.RATE_LIMITED,
            ErrorType.TIMEOUT,
            ErrorType.SERVER_ERROR,
        }

        return error_type in retryable_errors

Backoff Calculation

import asyncio
import random
from typing import Optional


class AdaptiveBackoff:
    """Calculate backoff delays for retries"""

    def __init__(
        self,
        base_delay: float = 1.0,
        max_delay: float = 60.0,
        exponential_base: float = 2.0,
        jitter: bool = True
    ):
        self.base_delay = base_delay
        self.max_delay = max_delay
        self.exponential_base = exponential_base
        self.jitter = jitter

    def get_delay(
        self,
        retry_count: int,
        error_type: ErrorType = None
    ) -> float:
        """
        Calculate delay before next retry.

        Args:
            retry_count: Current retry attempt number
            error_type: Type of error for specialized handling

        Returns:
            Delay in seconds
        """
        # Base exponential backoff
        delay = self.base_delay * (self.exponential_base ** retry_count)

        # Adjust for error type
        if error_type == ErrorType.RATE_LIMITED:
            # Longer delay for rate limits
            delay *= 2

        elif error_type == ErrorType.TIMEOUT:
            # Shorter delay, problem may be transient
            delay *= 0.5

        # Cap at max
        delay = min(delay, self.max_delay)

        # Add jitter to prevent thundering herd
        if self.jitter:
            delay = delay * (0.5 + random.random())

        return delay

    async def wait(self, retry_count: int, error_type: ErrorType = None):
        """Wait for calculated backoff delay"""
        delay = self.get_delay(retry_count, error_type)
        await asyncio.sleep(delay)

Integration with LLM Wrapper

import asyncio
from typing import TypeVar, Callable, Awaitable, Optional
import logging

T = TypeVar('T')
logger = logging.getLogger(__name__)


class AdaptiveLLMWrapper:
    """
    LLM wrapper with adaptive retry handling.

    Automatically adjusts parameters on failure and retries with
    optimal settings.
    """

    def __init__(
        self,
        llm_client,
        config: AdaptiveRequestConfig = None
    ):
        self.llm_client = llm_client
        self.params = AdaptiveRequestParameters(config)
        self.backoff = AdaptiveBackoff()
        self.metrics = RetryMetrics()

    async def call(
        self,
        prompt: str,
        max_tokens: int = None,
        temperature: float = None,
        **kwargs
    ) -> str:
        """
        Make LLM call with adaptive retry.

        Automatically adjusts parameters on failure and retries
        with optimal settings.
        """
        original_params = {
            "max_tokens": max_tokens,
            "temperature": temperature,
        }

        last_error = None
        for retry_count in range(self.params.config.max_retries + 1):
            # Get parameters for this attempt
            error_type = self.params.classify_error(last_error) if last_error else None
            params = self.params.get_params(retry_count, error_type, original_params)

            try:
                logger.info(
                    f"LLM call attempt {retry_count + 1}, "
                    f"max_tokens={params['max_tokens']}, "
                    f"temperature={params['temperature']}"
                )

                result = await self.llm_client.complete(
                    prompt=prompt,
                    max_tokens=params["max_tokens"],
                    temperature=params["temperature"],
                    **kwargs
                )

                # Record success
                self.metrics.record_success(retry_count)
                return result

            except Exception as e:
                last_error = e
                error_type = self.params.classify_error(e)

                logger.warning(
                    f"LLM call failed (attempt {retry_count + 1}): "
                    f"{error_type.value} - {str(e)[:100]}"
                )

                # Record failure
                self.metrics.record_failure(error_type, retry_count)

                # Check if we should retry
                if not self.params.should_retry(error_type, retry_count + 1):
                    break

                # Wait before retry
                await self.backoff.wait(retry_count, error_type)

        # All retries exhausted
        raise last_error


class RetryMetrics:
    """Track retry statistics for observability"""

    def __init__(self):
        self.total_calls = 0
        self.successful_calls = 0
        self.retried_calls = 0
        self.failed_calls = 0
        self.errors_by_type: Dict[str, int] = {}

    def record_success(self, retry_count: int):
        self.total_calls += 1
        self.successful_calls += 1
        if retry_count > 0:
            self.retried_calls += 1

    def record_failure(self, error_type: ErrorType, retry_count: int):
        self.errors_by_type[error_type.value] = (
            self.errors_by_type.get(error_type.value, 0) + 1
        )

    def record_exhausted(self):
        self.total_calls += 1
        self.failed_calls += 1

    def get_stats(self) -> dict:
        return {
            "total_calls": self.total_calls,
            "success_rate": (
                self.successful_calls / self.total_calls
                if self.total_calls > 0 else 0
            ),
            "retry_rate": (
                self.retried_calls / self.total_calls
                if self.total_calls > 0 else 0
            ),
            "errors_by_type": self.errors_by_type,
        }

Usage Examples

Basic Parameter Adjustment

# Create parameter manager
params = AdaptiveRequestParameters()

# First attempt - use original params
p1 = params.get_params(retry_count=0)
# Result: {"max_tokens": 20000, "temperature": 0.7, "retry_count": 0}

# After context exceeded error
p2 = params.get_params(
    retry_count=1,
    error_type=ErrorType.CONTEXT_EXCEEDED
)
# Result: {"max_tokens": 15000, "temperature": 0.6, "retry_count": 1}

# Second retry
p3 = params.get_params(
    retry_count=2,
    error_type=ErrorType.CONTEXT_EXCEEDED
)
# Result: {"max_tokens": 10000, "temperature": 0.5, "retry_count": 2}

With LLM Wrapper

# Create adaptive wrapper
wrapper = AdaptiveLLMWrapper(
    llm_client=anthropic_client,
    config=AdaptiveRequestConfig(
        base_max_tokens=16000,
        retry_reduction=0.3,
        max_retries=3
    )
)

# Make call - automatically handles retries
try:
    response = await wrapper.call(
        prompt="Implement the authentication handler...",
        max_tokens=8000,
        temperature=0.5
    )
except Exception as e:
    logger.error(f"All retries failed: {e}")
    # Handle final failure

Custom Configuration

# Conservative config (for critical operations)
conservative = AdaptiveRequestConfig(
    base_max_tokens=10000,
    retry_reduction=0.2,  # Smaller reduction per retry
    min_max_tokens=4000,
    max_retries=5,  # More retry attempts
)

# Aggressive config (for speed)
aggressive = AdaptiveRequestConfig(
    base_max_tokens=20000,
    retry_reduction=0.4,  # Larger reduction
    min_max_tokens=2000,
    max_retries=2,  # Fewer retries
)

Best Practices

DO

Classify errors properly - Different errors need different strategies
Reduce max_tokens on context errors - Leave room for input
Use exponential backoff - Prevent overwhelming the service
Add jitter - Prevent thundering herd on recovery
Track metrics - Monitor retry rates and patterns
Log retry attempts - For debugging and observability
Set minimum thresholds - Don't reduce parameters too far

DON'T

Don't retry validation errors - They won't succeed with different params
Don't ignore error types - Each needs specific handling
Don't retry forever - Set max retry limits
Don't forget backoff - Rapid retries cause rate limits
Don't hide failures - Surface errors after max retries
Don't ignore original params - Use them as baseline

Configuration Reference

Parameter	Default	Description
`base_max_tokens`	20000	Starting max_tokens value
`retry_reduction`	0.25	Reduction per retry (25%)
`min_max_tokens`	4000	Never reduce below this
`base_temperature`	0.7	Starting temperature
`temperature_reduction`	0.1	Temperature reduction per retry
`min_temperature`	0.1	Never reduce below this
`max_retries`	3	Maximum retry attempts
`base_delay`	1.0s	Initial backoff delay
`max_delay`	60.0s	Maximum backoff delay

Error Type Handling

Error Type	Parameter Adjustment	Backoff Multiplier
`CONTEXT_EXCEEDED`	Reduce max_tokens 25%, temp 0.1	1x
`RATE_LIMITED`	No change	2x
`TIMEOUT`	Reduce max_tokens 20%	0.5x
`SERVER_ERROR`	Reduce temperature 0.1	1x
`UNKNOWN`	No change	1x

Integration with CODITECT

Recommended integration points:

Component	Configuration	Notes
Orchestrator	Conservative	Critical coordination
Implementation agents	Default	Balance speed/reliability
Research agents	Aggressive	Can accept shorter responses
Code generation	Conservative	Quality matters

Success Metrics

Metric	Target	Measurement
Context-exceeded recoveries	70%+	Successful after param reduction
False exhaustion	<5%	Gave up when retry would succeed
Average retry count	<1.5	Most calls succeed first try
Response quality after retry	>90%	Quality maintained with reduced params

Quick Scenario Configuration

Copy-paste configurations for common retry scenarios:

Scenario	Config	Rationale
LLM Code Generation	`max_retries=3, retry_reduction=0.25, min_max_tokens=4000`	Quality matters, allow 3 attempts
LLM Summarization	`max_retries=2, retry_reduction=0.4, min_max_tokens=2000`	Shorter output acceptable
Critical Production	`max_retries=5, retry_reduction=0.15, min_max_tokens=8000`	High reliability, slow reduction
Cost-Sensitive Batch	`max_retries=1, retry_reduction=0.5, min_max_tokens=1000`	Fail fast, minimize token usage
Interactive Chat	`max_retries=2, retry_reduction=0.3, min_max_tokens=3000`	Balance responsiveness and quality

Decision Tree:

What's the priority?
├── Quality → Conservative: max_retries=5, retry_reduction=0.15
├── Speed → Aggressive: max_retries=2, retry_reduction=0.4
├── Cost → Minimal: max_retries=1, retry_reduction=0.5
└── Balance → Default: max_retries=3, retry_reduction=0.25

Source Reference

This pattern was extracted from DeepCode (HKUDS/DeepCode) multi-agent system.

Original location: workflows/agent_orchestration_engine.py (Adaptive Request Parameters section)

Original codebase stats:

51 Python files analyzed
33,497 lines of code
12 patterns extracted

See /submodules/labs/DeepCode/DEEP-ANALYSIS.md for complete analysis.

Multi-Context Window Support

State Tracking

Retry State (JSON):

{
  "checkpoint_id": "ckpt_retry_20251214",
  "retry_metrics": {
    "total_calls": 150,
    "successful_calls": 145,
    "retried_calls": 23,
    "failed_calls": 5,
    "errors_by_type": {
      "context_exceeded": 18,
      "rate_limited": 3,
      "timeout": 2
    }
  },
  "config": {
    "base_max_tokens": 20000,
    "retry_reduction": 0.25,
    "max_retries": 3
  }
}

Progress Notes (Markdown):

# Adaptive Retry Progress - 2025-12-14

## Metrics Summary
- Success rate: 96.7%
- Retry rate: 15.3%
- Recovery rate (of retried): 78.3%

## Observations
- Context exceeded most common error (78%)
- Rate limits rare after adding backoff
- Average 1.2 retries when needed

## Configuration Adjustments
- Increased retry_reduction from 0.2 to 0.25
- Added 2x multiplier for rate limit backoff

Session Recovery

When starting a fresh context window:

Load metrics: Read .coditect/checkpoints/adaptive-retry-latest.json
Review configuration: Check if adjustments needed based on metrics
Continue monitoring: Resume normal retry handling

Token Savings: ~30% reduction by maintaining optimized configuration

Success Output

When successful, this skill MUST output:

✅ SKILL COMPLETE: adaptive-retry

Completed:
- [x] AdaptiveRequestParameters class implemented with error classification
- [x] Exponential backoff with jitter configured
- [x] LLM wrapper with automatic retry logic deployed
- [x] Retry metrics tracking operational
- [x] Context-exceeded errors recover via max_tokens reduction
- [x] Rate limit handling with extended backoff functional

Outputs:
- src/retry/adaptive_request_parameters.py (Parameter adjustment logic)
- src/retry/adaptive_backoff.py (Backoff calculation with jitter)
- src/retry/adaptive_llm_wrapper.py (LLM client wrapper with retry)
- src/retry/retry_metrics.py (Success/failure tracking)
- config/retry_config.json (Retry configuration settings)

Performance Metrics:
- Context-exceeded recovery rate: 78% (target: >70%)
- Average retry count: 1.2 retries (target: <1.5)
- False exhaustion rate: 3% (target: <5%)
- Response quality after retry: 92% (target: >90%)

Completion Checklist

Before marking this skill as complete, verify:

Failure Indicators

This skill has FAILED if:

❌ Context-exceeded errors not classified correctly
❌ max_tokens not reduced on retry attempts
❌ Parameters reduced below minimum thresholds
❌ Backoff delay not increasing exponentially
❌ Jitter not applied (multiple failures cluster at same time)
❌ Non-retryable errors (auth, validation) attempted for retry
❌ Retry count exceeds max_retries configuration
❌ Metrics not recording retry attempts
❌ Original parameters not used as baseline
❌ Wrapper allows infinite retry loops
❌ Rate limit errors retried immediately without backoff

When NOT to Use

Do NOT use this skill when:

Single LLM call with no retry requirement
Error is definitively non-retryable (invalid API key, malformed request)
User explicitly requested specific LLM parameters (must honor exact values)
Testing requires deterministic behavior (retries introduce variability)
Simple HTTP retry middleware sufficient (network errors only)
Cost optimization prioritizes minimal API calls over success rate
Real-time applications cannot tolerate retry delays
Circuit breaker already open (retry will fail anyway)

Alternative approaches:

Non-retryable errors: Fail fast and surface error to user
Deterministic testing: Use mock LLM client with controlled responses
Network errors only: Use standard HTTP retry library (e.g., requests.adapters.Retry)
Cost-sensitive: Set max_retries=0 or use cheaper model

Anti-Patterns (Avoid)

Anti-Pattern	Problem	Solution
Retrying validation errors	Validation won't succeed with different params	Classify errors correctly; only retry transient failures
Ignoring error type	All errors retried the same way	Use error classification to apply appropriate strategy
No backoff between retries	Rapid retries trigger rate limits	Implement exponential backoff with jitter
Reducing parameters too aggressively	Output quality degrades unacceptably	Set reasonable min_max_tokens and retry_reduction values
Infinite retries	Wastes tokens and time on doomed requests	Enforce max_retries limit
Hiding final failures	User never sees underlying error	Raise last error after max retries exhausted
Not tracking metrics	No visibility into retry patterns	Log all retry attempts with RetryMetrics
Overriding user parameters	User asked for specific temperature	Only adjust on retries, respect original on first attempt
Retrying without backoff	Overwhelms service during recovery	Always wait before retry (even 1 second minimum)

Principles

This skill embodies:

#2 Resilience First - Graceful degradation via parameter adjustment prevents hard failures
#3 Fail Gracefully - Reduce output quality incrementally rather than failing outright
#5 Eliminate Ambiguity - Clear error classification determines retry strategy
#6 Clear, Understandable, Explainable - Explicit parameter adjustment logic; no magic numbers
#7 Optimize for Context - Context-exceeded errors handled specifically by reducing max_tokens
#8 No Assumptions - Classify every error type; don't assume all errors are retryable
#10 Automation First - Retry logic automated; no manual parameter tuning required
#11 Observability - Metrics track retry patterns for continuous improvement

Full Standard: CODITECT-STANDARD-AUTOMATION.md

How to Use This Skill​

When to Use​

Core Algorithm​

Adaptive Request Parameters​

Backoff Calculation​

Integration with LLM Wrapper​

Usage Examples​

Basic Parameter Adjustment​

With LLM Wrapper​

Custom Configuration​

Best Practices​

DO​

DON'T​

Configuration Reference​

Error Type Handling​

Integration with CODITECT​

Success Metrics​

Quick Scenario Configuration​

Source Reference​

Multi-Context Window Support​

State Tracking​

Session Recovery​

Success Output​

Completion Checklist​

Failure Indicators​

When NOT to Use​

Anti-Patterns (Avoid)​

Principles​