ADR-004: Scaling Model for Agent Selection
Status
Proposed
Context
CODITECT currently selects agents and orchestration patterns using heuristic-based logic:
- MoE classifier routes to "best" agent based on skill similarity scores
- Multi-agent invocations use fixed counts (e.g., always 3 agents for peer review)
- No quantitative feedback on whether adding more agents improves or degrades performance
- Coordination overhead not measured or considered in selection
Brainqub3 Agent Labs provides a paper-aligned scaling model (arXiv:2512.08296) that predicts:
Performance Scaling:
P_hat = clip(beta0 + sum(beta_i*z_i) + sum(beta_ij*z_i*z_j), 0, 1)
Where:
P_hat= predicted performance (0-1 scale)z_i= standardized task/architecture featuresbeta_i= feature coefficients (learned from experiments)beta_ij= interaction terms (e.g., agent_count × task_complexity)
Elasticity Estimation:
x_hat = clamp(x_base * (n/n0)^eta_n * (T/T0)^eta_T)
Where:
x_hat= predicted value (performance, cost, time, etc.)n= agent count,T= task complexityeta_n,eta_T= elasticity coefficients
Coordination Metrics (5):
- Overhead %: Time spent coordinating vs productive work
- Message Density: Messages per agent per task
- Redundancy: Duplicate work across agents
- Efficiency: Output quality per unit coordination cost
- Error Amplification: How errors compound with agent count
The question: Should CODITECT use Agent Labs scaling predictions to dynamically select agent count and architecture at runtime, rather than using static configurations?
Example scenarios where this matters:
- Simple task → MoE might predict coordination overhead > benefit for multi-agent
- Complex task → Model might recommend 5 agents (decentralised) over 3 (centralised)
- High-redundancy detection → Switch from Independent to Centralised to reduce waste
Decision
Use Agent Labs scaling model predictions to inform CODITECT agent selection decisions, with the following design:
1. Two-Phase Approach
Phase 1: Offline Calibration (Manual)
- Run Agent Labs experiments on representative CODITECT tasks
- Generate scaling curves for each (task_type, architecture) pair
- Store coefficients (beta, eta) in
~/.coditect-data/scaling-models/ - Update periodically (e.g., quarterly or when new agent types added)
Phase 2: Runtime Prediction (Automated)
- Before invoking multi-agent orchestration, query scaling model
- Predict performance for candidate configurations (1 agent, 3 agents, 5 agents; centralised vs decentralised)
- Select configuration with best predicted
efficiency(quality / coordination_cost) - Fall back to heuristic if no calibration data available
2. Integration Points
MoE Router Enhancement:
# Current: always select single best agent
best_agent = moe_router.select(task)
# Enhanced: compare single-agent vs multi-agent predictions
sas_prediction = scaling_model.predict(task, architecture="sas", agents=1)
mas_prediction = scaling_model.predict(task, architecture="centralised", agents=3)
if mas_prediction.efficiency > sas_prediction.efficiency * 1.2: # 20% threshold
agents = moe_router.select_top_k(task, k=3)
else:
agents = [best_agent]
Architecture Selection:
# For known multi-agent tasks, compare architectures
predictions = {
"centralised": scaling_model.predict(task, architecture="centralised", agents=3),
"decentralised": scaling_model.predict(task, architecture="decentralised", agents=3),
"hybrid": scaling_model.predict(task, architecture="hybrid", agents=3)
}
best_arch = max(predictions.items(), key=lambda x: x[1].efficiency)[0]
3. Coordination Collapse Detection
Trigger re-evaluation if coordination metrics exceed thresholds:
overhead_pct > 40%→ Reduce agent count or switch to SASmessage_density > 10→ Switch from Decentralised to Centralisedredundancy > 0.5→ Consolidate agents or use Independenterror_amplification > 1.2→ Reduce agent count (errors compounding)
4. Guardrails
- Always support manual override:
--agents 5 --architecture hybridignores predictions - Gradual rollout: Start with logging predictions only (no action), validate accuracy
- Fallback to heuristic: If scaling model unavailable or low confidence (<0.7), use existing logic
- Cost limits: Never exceed user-configured max agent count or API budget
Alternatives Considered
1. Static Configuration (Status Quo)
Pros:
- Simple, predictable
- No runtime overhead
- Proven to work
Cons:
- Suboptimal for many tasks (over/under-provision agents)
- No adaptation to task complexity
- Coordination overhead invisible
- Cannot detect coordination collapse
2. Heuristic-Based Scaling (No Model)
Pros:
- Lightweight, fast
- No calibration needed
Cons:
- Rules brittle (e.g., "if task_tokens > 1000, use 3 agents")
- No quantitative basis
- Cannot predict efficiency vs overhead tradeoff
- Misses interaction effects (agent_count × complexity)
3. Reinforcement Learning Agent Selector
Pros:
- Could learn optimal policies end-to-end
- Adapts continuously
Cons:
- Requires large training dataset (hundreds of runs)
- Black box (no interpretability)
- High maintenance (model drift, retraining)
- Violates CODITECT principle of explainability
4. Always Use Maximum Agents (Brute Force)
Pros:
- Maximizes chance of success
Cons:
- Extremely expensive (10x API cost for 10 agents vs 1)
- High coordination overhead (diminishing returns)
- Slower (more latency)
- Violates cost optimization goals
5. User-Configured Policies (No Automation)
Pros:
- User has full control
- No surprises
Cons:
- High cognitive load (users must understand scaling dynamics)
- Requires expertise in coordination theory
- Error-prone (easy to misconfigure)
Consequences
Positive
- Data-Driven Decisions: Agent selection based on empirical evidence, not guesswork
- Cost Optimization: Avoid over-provisioning agents when coordination overhead > benefit
- Performance Improvement: Select architectures that maximize efficiency for specific task types
- Coordination Collapse Prevention: Detect and mitigate when adding agents hurts performance
- Transparent Tradeoffs: Users see predicted performance vs cost before execution
- Continuous Improvement: Scaling model improves as more calibration runs accumulate
- Adaptability: Responds to task complexity variations automatically
- Explainability: Model coefficients and predictions are interpretable (vs black-box ML)
Negative
- Calibration Overhead: Requires running experiments to generate scaling curves
- Model Accuracy Dependency: Predictions only useful if model is well-calibrated
- Runtime Latency: Querying scaling model adds <100ms to invocation time
- Complexity Increase: More moving parts in agent selection logic
- Potential Misclassification: Incorrect task type classification → wrong scaling curve
- Storage Requirements: Must persist scaling coefficients for each (task_type, architecture) pair
- Maintenance Burden: Scaling model needs periodic recalibration as agents evolve
Risks
-
Model Overfitting: Scaling curves trained on narrow task set don't generalize
- Mitigation: Use diverse calibration tasks; validate on held-out tasks; fallback to heuristic if prediction confidence low
-
Prediction Errors Lead to Poor Performance: Model recommends 5 agents, coordination collapse occurs
- Mitigation: Monitor actual coordination metrics; trigger re-evaluation if thresholds exceeded; allow manual override
-
Calibration Cost: Running 100s of experiments expensive in API tokens
- Mitigation: Use small-scale tasks for calibration; apply findings to high-volume production tasks
-
Stale Coefficients: CODITECT agents improve, but scaling model not recalibrated
- Mitigation: Version scaling models alongside agent releases; flag "stale" warnings after 90 days
-
Task Type Explosion: Every new task type requires separate calibration
- Mitigation: Group similar tasks (e.g., "code_analysis", "document_generation"); use task embeddings for clustering
-
User Confusion: Predictions contradict user intuition, eroding trust
- Mitigation: Explain predictions (
"3 agents predicted 20% faster with 15% lower cost"); log decision rationale
- Mitigation: Explain predictions (
Implementation Notes
1. Calibration Workflow
# Define calibration task set
cat > calibration_tasks.yaml <<EOF
tasks:
- id: code_review_simple
type: code_analysis
complexity: low
description: "Review 50-line Python function"
- id: code_review_complex
type: code_analysis
complexity: high
description: "Review 500-line distributed system module"
- id: doc_generation_api
type: documentation
complexity: medium
description: "Generate API reference from OpenAPI spec"
EOF
# Run experiments (all architectures, agent counts 1-5)
/scaling-calibrate \
--tasks calibration_tasks.yaml \
--architectures sas,independent,centralised,decentralised,hybrid \
--agent-counts 1,2,3,5,7 \
--output ~/.coditect-data/scaling-models/calibration-2026-02-16/
# Train scaling model (fit coefficients)
/scaling-train \
--experiments ~/.coditect-data/scaling-models/calibration-2026-02-16/ \
--output ~/.coditect-data/scaling-models/coditect-v1.json
# Validate predictions
/scaling-validate \
--model ~/.coditect-data/scaling-models/coditect-v1.json \
--holdout-tasks validation_tasks.yaml
2. Runtime Prediction API
# scripts/scaling-analysis/predictor.py
from pathlib import Path
import json
from typing import Dict, Tuple
class ScalingPredictor:
def __init__(self, model_path: Path):
with open(model_path) as f:
self.model = json.load(f)
self.coefficients = self.model["coefficients"]
self.elasticities = self.model["elasticities"]
def predict(
self,
task_type: str,
architecture: str,
agent_count: int
) -> Dict[str, float]:
"""
Returns:
{
"performance": 0.85, # 0-1 scale
"overhead_pct": 25.0,
"efficiency": 3.4,
"confidence": 0.82
}
"""
# Implementation: apply scaling law equations
pass
def recommend_config(self, task_type: str) -> Tuple[str, int]:
"""Returns (architecture, agent_count) with best efficiency."""
best_config = None
best_efficiency = 0
for arch in ["sas", "independent", "centralised", "decentralised", "hybrid"]:
for n in [1, 2, 3, 5, 7]:
pred = self.predict(task_type, arch, n)
if pred["efficiency"] > best_efficiency:
best_efficiency = pred["efficiency"]
best_config = (arch, n)
return best_config
3. MoE Integration
# Enhance existing MoE router
from scripts.scaling_analysis.predictor import ScalingPredictor
predictor = ScalingPredictor(Path("~/.coditect-data/scaling-models/coditect-v1.json"))
def select_agents(task: str, task_type: str = "code_analysis"):
# Get recommendation
recommended_arch, recommended_count = predictor.recommend_config(task_type)
# Predict performance for recommended config vs SAS baseline
mas_pred = predictor.predict(task_type, recommended_arch, recommended_count)
sas_pred = predictor.predict(task_type, "sas", 1)
# Log predictions
print(f"[scaling] SAS: perf={sas_pred['performance']:.2f}")
print(f"[scaling] {recommended_arch} (n={recommended_count}): perf={mas_pred['performance']:.2f}, overhead={mas_pred['overhead_pct']:.0f}%")
# Decide based on efficiency gain threshold
if mas_pred["efficiency"] > sas_pred["efficiency"] * 1.2:
# Use multi-agent
agents = moe_router.select_top_k(task, k=recommended_count)
return agents, recommended_arch
else:
# Use single agent
return [moe_router.select(task)], "sas"
4. Storage Schema
// ~/.coditect-data/scaling-models/coditect-v1.json
{
"version": "1.0",
"created": "2026-02-16T10:30:00Z",
"calibration_run_count": 245,
"task_types": ["code_analysis", "documentation", "testing"],
"coefficients": {
"code_analysis": {
"beta0": 0.45,
"beta_agent_count": 0.12,
"beta_complexity": 0.38,
"beta_agent_count_x_complexity": -0.08
}
},
"elasticities": {
"code_analysis": {
"eta_agent_count": 0.65,
"eta_task_complexity": 1.12
}
},
"thresholds": {
"overhead_pct_max": 40.0,
"message_density_max": 10.0,
"redundancy_max": 0.5
}
}
Rollout Plan
Stage 1 (Weeks 1-2): Calibration
- Run 200+ experiments on diverse CODITECT tasks
- Generate initial scaling model v1.0
- Validate predictions on held-out tasks (target: R² > 0.7)
Stage 2 (Weeks 3-4): Shadow Mode
- Integrate predictor into MoE router
- Log predictions only (no action)
- Compare predictions vs actual performance
- Tune thresholds and confidence levels
Stage 3 (Weeks 5-6): Gradual Activation
- Enable predictions for low-risk tasks (documentation, simple code analysis)
- Monitor coordination metrics
- Collect user feedback
Stage 4 (Weeks 7-8): Full Deployment
- Enable for all task types
- Continuous monitoring of prediction accuracy
- Recalibration every 90 days
References
- Paper: arXiv:2512.08296 - "Scaling Laws for Multi-Agent Systems"
- Scaling Model Equations: Section 3 "Mathematical Framework"
- Coordination Metrics: Section 4 "Coordination Dynamics"
- CODITECT MoE Router:
scripts/moe_classifier/router.py - Related ADRs:
- ADR-001: Agent Labs Adoption
- ADR-002: Integration Pattern
- ADR-003: Agent Orchestration Mapping
- ADR-005: Experiment Data Governance
Author: Claude (Sonnet 4.5) Date: 2026-02-16 Track: H (Framework) Task ID: H.0