Auto trigger configuration for implicit activation

Multi-Provider LLM Fallback

How to Use This Skill

Review the patterns and examples below
Apply the relevant patterns to your implementation
Follow the best practices outlined in this skill

Expert skill for intelligent routing across multiple LLM providers with automatic failover, cost optimization, and capability-based selection. Ensures high availability and optimal cost/performance balance.

When to Use

Use this skill when:

Building production systems requiring high availability
Need cost optimization across multiple providers
Want automatic failover without manual intervention
Different tasks need different model capabilities
Rate limits on primary provider are a concern

Don't use this skill when:

Single provider is sufficient and reliable
Testing/development with no reliability requirements
Fixed provider required by policy/compliance
Latency-critical with no time for fallback

Supported Providers

Provider	Models	Strengths
Anthropic	Claude 3.5 Sonnet, Claude 3 Opus	Reasoning, coding, long context
OpenAI	GPT-4, GPT-4 Turbo, GPT-3.5	General purpose, fast
Google	Gemini Pro, Gemini Ultra	Multimodal, large context
Azure OpenAI	GPT-4, GPT-3.5 (hosted)	Enterprise compliance
Local	Ollama, LM Studio	Privacy, no API costs

Core Algorithm

Provider Router

from typing import List, Dict, Optional, Callable, Any
from dataclasses import dataclass, field
from enum import Enum
import time
import asyncio
from abc import ABC, abstractmethod


class ProviderStatus(Enum):
    HEALTHY = "healthy"
    DEGRADED = "degraded"
    UNAVAILABLE = "unavailable"
    RATE_LIMITED = "rate_limited"


class TaskType(Enum):
    CODING = "coding"
    REASONING = "reasoning"
    CREATIVE = "creative"
    FAST_RESPONSE = "fast_response"
    LONG_CONTEXT = "long_context"
    COST_SENSITIVE = "cost_sensitive"


@dataclass
class ProviderConfig:
    """Configuration for an LLM provider"""
    name: str
    api_key_env: str  # Environment variable for API key
    base_url: Optional[str] = None
    models: List[str] = field(default_factory=list)
    default_model: str = ""
    max_tokens: int = 4096
    cost_per_1k_input: float = 0.0
    cost_per_1k_output: float = 0.0
    rate_limit_rpm: int = 60
    timeout_seconds: int = 120
    strengths: List[TaskType] = field(default_factory=list)
    priority: int = 50  # Higher = preferred


@dataclass
class ProviderHealth:
    """Health status of a provider"""
    provider: str
    status: ProviderStatus
    last_success: float = 0.0
    last_failure: float = 0.0
    failure_count: int = 0
    rate_limit_reset: float = 0.0
    latency_ms: float = 0.0


@dataclass
class RoutingDecision:
    """Result of routing decision"""
    provider: str
    model: str
    reason: str
    fallback_chain: List[str]
    estimated_cost: float


class LLMProvider(ABC):
    """Abstract base for LLM providers"""

    @abstractmethod
    async def complete(self, prompt: str, **kwargs) -> Dict:
        pass

    @abstractmethod
    def health_check(self) -> ProviderHealth:
        pass


class MultiProviderRouter:
    """
    Intelligent router across multiple LLM providers.

    Key Features:
    - Automatic failover on errors
    - Cost-based routing
    - Capability-based selection
    - Health monitoring
    - Rate limit awareness
    """

    # Default provider configurations
    DEFAULT_PROVIDERS = {
        "anthropic": ProviderConfig(
            name="anthropic",
            api_key_env="ANTHROPIC_API_KEY",
            models=["claude-3-5-sonnet-20241022", "claude-3-opus-20240229"],
            default_model="claude-3-5-sonnet-20241022",
            max_tokens=8192,
            cost_per_1k_input=0.003,
            cost_per_1k_output=0.015,
            rate_limit_rpm=50,
            timeout_seconds=120,
            strengths=[TaskType.CODING, TaskType.REASONING, TaskType.LONG_CONTEXT],
            priority=90
        ),
        "openai": ProviderConfig(
            name="openai",
            api_key_env="OPENAI_API_KEY",
            models=["gpt-4-turbo", "gpt-4", "gpt-3.5-turbo"],
            default_model="gpt-4-turbo",
            max_tokens=4096,
            cost_per_1k_input=0.01,
            cost_per_1k_output=0.03,
            rate_limit_rpm=60,
            timeout_seconds=60,
            strengths=[TaskType.FAST_RESPONSE, TaskType.CREATIVE],
            priority=80
        ),
        "google": ProviderConfig(
            name="google",
            api_key_env="GOOGLE_API_KEY",
            base_url="https://generativelanguage.googleapis.com",
            models=["gemini-pro", "gemini-ultra"],
            default_model="gemini-pro",
            max_tokens=8192,
            cost_per_1k_input=0.0005,
            cost_per_1k_output=0.0015,
            rate_limit_rpm=60,
            timeout_seconds=90,
            strengths=[TaskType.LONG_CONTEXT, TaskType.COST_SENSITIVE],
            priority=70
        ),
        "local": ProviderConfig(
            name="local",
            api_key_env="",  # No key needed
            base_url="http://localhost:11434",  # Ollama default
            models=["llama2", "codellama", "mistral"],
            default_model="codellama",
            max_tokens=4096,
            cost_per_1k_input=0.0,
            cost_per_1k_output=0.0,
            rate_limit_rpm=1000,
            timeout_seconds=300,
            strengths=[TaskType.COST_SENSITIVE],
            priority=30
        ),
    }

    def __init__(self, providers: Optional[Dict[str, ProviderConfig]] = None):
        self.providers = providers or self.DEFAULT_PROVIDERS
        self.health: Dict[str, ProviderHealth] = {}
        self.request_counts: Dict[str, int] = {}
        self._initialize_health()

    def _initialize_health(self):
        """Initialize health status for all providers"""
        for name in self.providers:
            self.health[name] = ProviderHealth(
                provider=name,
                status=ProviderStatus.HEALTHY,
                last_success=time.time()
            )
            self.request_counts[name] = 0

    def select_provider(
        self,
        task_type: Optional[TaskType] = None,
        prefer_cost: bool = False,
        prefer_speed: bool = False,
        exclude: Optional[List[str]] = None,
        required_context_length: int = 4000
    ) -> RoutingDecision:
        """
        Select best provider for the task.

        Args:
            task_type: Type of task (affects capability matching)
            prefer_cost: Prioritize cost over quality
            prefer_speed: Prioritize speed over quality
            exclude: Providers to skip (e.g., already failed)
            required_context_length: Minimum context window needed

        Returns:
            RoutingDecision with selected provider and fallback chain
        """
        exclude = exclude or []
        candidates = []

        for name, config in self.providers.items():
            # Skip excluded and unhealthy providers
            if name in exclude:
                continue

            health = self.health[name]
            if health.status == ProviderStatus.UNAVAILABLE:
                continue

            # Check rate limits
            if health.status == ProviderStatus.RATE_LIMITED:
                if time.time() < health.rate_limit_reset:
                    continue

            # Check context length requirement
            if config.max_tokens < required_context_length:
                continue

            # Calculate score
            score = config.priority

            # Boost for matching task type
            if task_type and task_type in config.strengths:
                score += 20

            # Adjust for cost preference
            if prefer_cost:
                # Lower cost = higher score
                cost_score = 100 - (config.cost_per_1k_output * 1000)
                score += cost_score * 0.3

            # Adjust for speed preference
            if prefer_speed:
                # Lower timeout = faster expected response
                speed_score = 100 - (config.timeout_seconds / 3)
                score += speed_score * 0.3

            # Penalize degraded providers
            if health.status == ProviderStatus.DEGRADED:
                score -= 20

            # Penalize high recent latency
            if health.latency_ms > 5000:
                score -= 10

            candidates.append((name, score, config))

        # Sort by score
        candidates.sort(key=lambda x: x[1], reverse=True)

        if not candidates:
            raise Exception("No available providers")

        selected = candidates[0]
        fallback_chain = [c[0] for c in candidates[1:4]]  # Next 3 as fallback

        return RoutingDecision(
            provider=selected[0],
            model=selected[2].default_model,
            reason=f"Score: {selected[1]:.1f}, Priority: {selected[2].priority}",
            fallback_chain=fallback_chain,
            estimated_cost=selected[2].cost_per_1k_output
        )

    async def complete_with_fallback(
        self,
        prompt: str,
        task_type: Optional[TaskType] = None,
        max_retries: int = 3,
        **kwargs
    ) -> Dict:
        """
        Complete request with automatic fallback.

        Tries providers in order until success or all exhausted.
        """
        attempted = []
        last_error = None

        for attempt in range(max_retries):
            try:
                decision = self.select_provider(
                    task_type=task_type,
                    exclude=attempted
                )

                provider_name = decision.provider
                attempted.append(provider_name)

                # Execute request
                start_time = time.time()
                result = await self._execute_request(
                    provider_name,
                    prompt,
                    decision.model,
                    **kwargs
                )
                latency = (time.time() - start_time) * 1000

                # Update health on success
                self._update_health(provider_name, success=True, latency=latency)

                return {
                    "content": result["content"],
                    "provider": provider_name,
                    "model": decision.model,
                    "attempts": len(attempted),
                    "latency_ms": latency
                }

            except RateLimitError as e:
                self._update_health(
                    attempted[-1],
                    success=False,
                    rate_limited=True,
                    reset_time=e.reset_time
                )
                last_error = e

            except ProviderError as e:
                self._update_health(attempted[-1], success=False)
                last_error = e

            except TimeoutError as e:
                self._update_health(attempted[-1], success=False, timeout=True)
                last_error = e

        raise FallbackExhaustedError(
            f"All providers failed after {max_retries} attempts",
            attempted=attempted,
            last_error=last_error
        )

    async def _execute_request(
        self,
        provider_name: str,
        prompt: str,
        model: str,
        **kwargs
    ) -> Dict:
        """Execute request to specific provider"""
        config = self.providers[provider_name]

        # Provider-specific implementation
        if provider_name == "anthropic":
            return await self._anthropic_complete(config, prompt, model, **kwargs)
        elif provider_name == "openai":
            return await self._openai_complete(config, prompt, model, **kwargs)
        elif provider_name == "google":
            return await self._google_complete(config, prompt, model, **kwargs)
        elif provider_name == "local":
            return await self._local_complete(config, prompt, model, **kwargs)
        else:
            raise ValueError(f"Unknown provider: {provider_name}")

    def _update_health(
        self,
        provider: str,
        success: bool,
        latency: float = 0,
        rate_limited: bool = False,
        reset_time: float = 0,
        timeout: bool = False
    ):
        """Update provider health status"""
        health = self.health[provider]

        if success:
            health.status = ProviderStatus.HEALTHY
            health.last_success = time.time()
            health.failure_count = 0
            health.latency_ms = latency
        else:
            health.last_failure = time.time()
            health.failure_count += 1

            if rate_limited:
                health.status = ProviderStatus.RATE_LIMITED
                health.rate_limit_reset = reset_time
            elif health.failure_count >= 3:
                health.status = ProviderStatus.UNAVAILABLE
            else:
                health.status = ProviderStatus.DEGRADED

    def get_health_report(self) -> Dict:
        """Get health status of all providers"""
        return {
            name: {
                "status": health.status.value,
                "last_success": health.last_success,
                "failure_count": health.failure_count,
                "latency_ms": health.latency_ms
            }
            for name, health in self.health.items()
        }

    # Provider-specific implementations (simplified)
    async def _anthropic_complete(self, config, prompt, model, **kwargs):
        # Implementation using anthropic SDK
        pass

    async def _openai_complete(self, config, prompt, model, **kwargs):
        # Implementation using openai SDK
        pass

    async def _google_complete(self, config, prompt, model, **kwargs):
        # Implementation using google SDK
        pass

    async def _local_complete(self, config, prompt, model, **kwargs):
        # Implementation using local Ollama/LM Studio
        pass


# Custom exceptions
class ProviderError(Exception):
    pass

class RateLimitError(ProviderError):
    def __init__(self, message, reset_time=0):
        super().__init__(message)
        self.reset_time = reset_time

class FallbackExhaustedError(Exception):
    def __init__(self, message, attempted=None, last_error=None):
        super().__init__(message)
        self.attempted = attempted or []
        self.last_error = last_error

Usage Examples

Basic Fallback

router = MultiProviderRouter()

# Simple completion with automatic fallback
result = await router.complete_with_fallback(
    prompt="Explain the visitor pattern in Python",
    task_type=TaskType.CODING
)

print(f"Provider: {result['provider']}")
print(f"Attempts: {result['attempts']}")
print(f"Content: {result['content']}")

Cost-Optimized Routing

# Prefer cheaper providers for simple tasks
result = await router.complete_with_fallback(
    prompt="Summarize this text...",
    task_type=TaskType.COST_SENSITIVE,
    prefer_cost=True
)
# Will prefer Google Gemini or local models

Capability-Based Selection

# Select based on task requirements
decision = router.select_provider(
    task_type=TaskType.LONG_CONTEXT,
    required_context_length=100000
)
# Will select providers with large context windows (Anthropic, Google)

Health Monitoring

# Get current provider health
health = router.get_health_report()

for provider, status in health.items():
    print(f"{provider}: {status['status']} (latency: {status['latency_ms']}ms)")

Integration with CODITECT

With Adaptive Retry

from skills.adaptive_retry import AdaptiveRetryParameters

async def robust_llm_call(prompt: str) -> Dict:
    """Combine adaptive retry with multi-provider fallback"""
    router = MultiProviderRouter()
    retry_params = AdaptiveRetryParameters()

    for retry_count in range(3):
        try:
            # Get adjusted parameters for this retry
            params = retry_params.get_params(retry_count)

            result = await router.complete_with_fallback(
                prompt=prompt,
                max_tokens=params["max_tokens"],
                temperature=params["temperature"]
            )
            return result

        except FallbackExhaustedError:
            if retry_count == 2:
                raise
            # Wait before retrying all providers
            await asyncio.sleep(retry_params.get_backoff(retry_count))

With Orchestrator

# In orchestrator configuration
llm_routing:
  skill: multi-provider-llm-fallback
  default_task_type: coding
  cost_threshold: 0.10  # Max cost per request
  timeout_threshold: 60  # Seconds before fallback

Configuration

Parameter	Default	Description
`max_retries`	3	Maximum fallback attempts
`failure_threshold`	3	Failures before marking unavailable
`rate_limit_buffer`	0.9	Use 90% of rate limit
`health_check_interval`	60	Seconds between health checks
`cost_tracking`	true	Track and report costs

Success Metrics

Metric	Target
Request success rate	99.5%
Average fallback attempts	<1.5
Cost optimization	20-40% savings
Latency overhead	<100ms

Success Output

When this skill is successfully applied, output:

✅ SKILL COMPLETE: multi-provider-llm-fallback

Completed:
- [x] MultiProviderRouter configured with 4 providers (Anthropic, OpenAI, Google, Local)
- [x] Provider health monitoring active (status, latency, failure tracking)
- [x] Capability-based routing implemented (task type matching)
- [x] Automatic fallback tested (3 providers tried, success on 2nd attempt)
- [x] Cost optimization configured (prefer cheaper for cost-sensitive tasks)

Outputs:
- MultiProviderRouter class with intelligent routing
- ProviderConfig for all providers (API keys, models, costs, rate limits)
- Health monitoring dashboard (status, latency, failure counts)
- Fallback chain documentation (primary → fallback order)

Metrics:
- Request success rate: 99.5%
- Average fallback attempts: 1.2
- Cost savings: 32% (vs. always using most expensive)
- Average latency overhead: 65ms

Completion Checklist

Before marking this skill as complete, verify:

Failure Indicators

This skill has FAILED if:

❌ All providers unavailable (FallbackExhaustedError)
❌ Primary provider always selected despite degradation
❌ Rate limits exceeded causing 429 errors
❌ No fallback attempted when primary fails
❌ Health status not updated on failures
❌ Cost optimization disabled or not working
❌ Provider selection ignores task type capabilities
❌ Latency overhead >500ms (routing too slow)

When NOT to Use

Do NOT use this skill when:

Single provider sufficient and reliable - complexity not justified
Fixed provider required by policy/compliance - no alternatives allowed
Testing/development with no availability requirements - single provider adequate
Latency-critical with no time for fallback (<100ms response time) - use single fast provider
Budget for only one provider - can't afford multiple API subscriptions
Provider selection requires human judgment - automated routing inappropriate

Use alternatives instead:

Single reliable provider → Direct API calls with retry
Compliance constraints → Use approved provider only
Development → Mock LLM responses
Latency-critical → Pre-computed responses or cache

Anti-Patterns (Avoid)

Anti-Pattern	Problem	Solution
No health monitoring	Continues sending to failed provider	Track failures, mark unavailable after 3
Always using most expensive	Unnecessary costs	Route cost-sensitive tasks to cheaper providers
No rate limit awareness	Hits 429 errors, wastes time	Track rate limits, pause provider when limited
Fallback to identical provider	No redundancy benefit	Ensure fallback chain has different providers
Ignoring task requirements	Wrong model for task (e.g., small context)	Match task type to provider strengths
No cost tracking	Budget overruns undetected	Log and aggregate costs per provider
Synchronous fallback only	High latency on failures	Consider parallel requests with circuit breaker

Principles

This skill embodies:

#2 First Principles Thinking - Understand provider capabilities before routing
#3 Keep It Simple (KISS) - Start with 2 providers, add more only if needed
#4 Separation of Concerns - Router separate from provider-specific implementations
#7 Automation - Automatic health monitoring and failover
#8 No Assumptions - Verify provider availability, don't assume success
#11 Resilience - Graceful degradation with fallback chains

Full Standard: CODITECT-STANDARD-AUTOMATION.md

Provider Comparison Quick Reference

Provider	Best For	Cost ($/1K out)	Context	Latency	Reliability
Anthropic Claude	Coding, Reasoning	$0.015	200K	Medium	High
OpenAI GPT-4	General, Creative	$0.03	128K	Fast	High
Google Gemini	Long Context, Cost	$0.0015	1M	Medium	Medium
Azure OpenAI	Enterprise, Compliance	$0.03	128K	Fast	Very High
Local (Ollama)	Privacy, Cost	$0.00	32K	Variable	Depends

Provider Selection Decision Tree:

What's the primary requirement?
│
├── Coding/Technical → Anthropic Claude (primary)
│                      └── Fallback: OpenAI GPT-4
│
├── Cost Optimization → Google Gemini (primary)
│                       └── Fallback: Local Ollama
│
├── Enterprise Compliance → Azure OpenAI (primary)
│                          └── Fallback: Anthropic Claude
│
├── Long Context (>128K) → Google Gemini (primary)
│                         └── Fallback: Anthropic Claude
│
└── Maximum Reliability → Multi-provider with fallback chain
                         Anthropic → OpenAI → Google → Local

Recommended Fallback Chains:

Task Type	Primary	Fallback 1	Fallback 2
Code Generation	Anthropic	OpenAI	Local
Creative Writing	OpenAI	Anthropic	Google
Long Document Analysis	Google	Anthropic	OpenAI
Enterprise/Compliance	Azure	Anthropic	Google
Cost-Sensitive Batch	Google	Local	Anthropic

Source Reference

Pattern extracted from DeepCode multi-agent system.

See /submodules/labs/DeepCode/DEEP-ANALYSIS.md for complete analysis.

How to Use This Skill​

When to Use​

Supported Providers​

Core Algorithm​

Provider Router​

Usage Examples​

Basic Fallback​

Cost-Optimized Routing​

Capability-Based Selection​

Health Monitoring​

Integration with CODITECT​

With Adaptive Retry​

With Orchestrator​

Configuration​

Success Metrics​

Success Output​

Completion Checklist​

Failure Indicators​

When NOT to Use​

Anti-Patterns (Avoid)​

Principles​

Provider Comparison Quick Reference​

Source Reference​