ADR-001: Agent Labs Adoption as CODITECT Subsystem

Status

Proposed

Context

CODITECT is an AI agent platform with 776 agents, 445 skills, and 377 commands operating across 37 tracks. As the platform scales, understanding the performance characteristics of different agent orchestration patterns becomes critical for optimization and cost management.

Brainqub3 Agent Labs is an open-source framework specifically designed to benchmark and compare single-agent systems (SAS) versus multi-agent systems (MAS) using a paper-aligned scaling model from arXiv:2512.08296. The framework provides:

5 orchestration patterns: SAS, Independent, Centralised, Decentralised, Hybrid
Mathematical scaling model: P_hat = clip(beta0 + sum(beta_i*z_i) + sum(beta_ij*z_i*z_j), 0, 1)
Elasticity estimation for agent count and task complexity scaling
5 coordination metrics: overhead, message density, redundancy, efficiency, error amplification
Evaluator-first design with immutable, content-hashed runs
Claude Agent SDK compatibility for live agent execution
Custom HTML dashboard for analysis (Python ThreadingHTTPServer)

The question is whether to adopt Agent Labs as a subsystem for systematic MAS analysis within CODITECT, or pursue alternative approaches.

Decision

Adopt Brainqub3 Agent Labs as a git submodule-based subsystem for multi-agent scaling analysis within the CODITECT platform. The subsystem will be used to:

Benchmark CODITECT agent orchestration patterns against the paper-aligned scaling model
Generate empirical coordination metrics for different task types
Inform runtime agent selection and architecture decisions
Provide quantitative evidence for agent system optimization

Integration will follow the submodule pattern (see ADR-002) with CODITECT-specific adapters for agent mapping (see ADR-003).

Alternatives Considered

1. Build Custom Scaling Analysis Framework

Pros:

Full control over implementation
Tailored specifically to CODITECT patterns
No external dependencies

Cons:

Duplicate effort (Agent Labs already implements paper model)
Lack of academic validation (paper-aligned model provides credibility)
Higher maintenance burden
Longer time to production

2. Use Different MAS Benchmarking Framework

Pros:

Might have different strengths or focus areas
Potentially larger community

Cons:

No frameworks found with equivalent paper-aligned scaling model
Most MAS frameworks focus on domain-specific problems, not general orchestration patterns
Would still require similar integration work

3. Paper-Only Theoretical Analysis

Pros:

Zero code dependencies
Pure mathematical approach

Cons:

No empirical validation of CODITECT patterns
Cannot inform runtime decisions
Lacks reproducibility and tooling

4. Manual Benchmarking Without Framework

Pros:

Maximum flexibility
No framework constraints

Cons:

Ad-hoc, non-reproducible experiments
No standardized metrics
Difficult to compare across time or configurations

Consequences

Positive

Academic Rigor: Paper-aligned model (arXiv:2512.08296) provides theoretical foundation for scaling analysis
Empirical Evidence: Enables data-driven decisions about agent orchestration rather than intuition-based choices
Standardized Metrics: 5 coordination metrics provide consistent vocabulary for discussing MAS performance
Claude SDK Compatibility: Agent Labs uses the same claude-agent-sdk as CODITECT, minimizing adapter complexity
Evaluator-First Design: Forces explicit success criteria before experiments, improving experiment quality
Immutable Runs: Content-hashed run manifests enable reproducibility and audit trails
Visualization: Built-in dashboard accelerates insight generation
Cost Optimization: Scaling predictions can prevent coordination collapse and reduce wasted API calls
Framework Maturity: Leverages existing, tested implementation rather than greenfield development

Negative

External Dependency: CODITECT subsystem depends on upstream Agent Labs maintenance and compatibility
Learning Curve: Team must understand Agent Labs concepts, CLI, and scaling model mathematics
Integration Overhead: Requires adapter layer to map CODITECT agents to Agent Labs architecture types
Version Management: Need to track Agent Labs versions and handle breaking changes
Storage Growth: Immutable runs accumulate storage, requiring archival strategy (see ADR-005)
Potential Mismatch: Not all CODITECT orchestration patterns may map cleanly to the 5 Agent Labs types

Risks

Upstream Abandonment: If Agent Labs is no longer maintained, CODITECT would need to fork or replace
- Mitigation: Open-source license allows forking; submodule pattern makes replacement feasible
Model Accuracy: Scaling predictions depend on experiment quality and may not generalize to all CODITECT use cases
- Mitigation: Validate predictions against production metrics; calibrate model with CODITECT-specific runs
Overhead Cost: Running scaling experiments consumes API tokens and compute resources
- Mitigation: Use small-scale calibration runs; apply findings to reduce production costs
Complexity Increase: Adds another subsystem to CODITECT architecture
- Mitigation: Clear boundaries via submodule; optional use (not in critical path)
Paper Model Limitations: The scaling model is domain-general and may miss CODITECT-specific patterns
- Mitigation: Extend model with CODITECT-specific factors if needed; use as baseline, not absolute truth

Implementation Notes

Submodule Location: submodules/r-and-d/brainqub3-agent-labs/
Adapter Package: scripts/scaling-analysis/ for CODITECT-specific integration
Experiment Repository: ~/.coditect-data/scaling-experiments/ for run data
Dashboard Access: HTML dashboard accessible via brainqub3 dashboard command
Documentation: Create skills/scaling-analysis/SKILL.md for usage guidance

References

Paper: arXiv:2512.08296 - "Scaling Laws for Multi-Agent Systems"
Agent Labs Repository: https://github.com/brainqub3/agent-labs
CODITECT Agents: agents/ directory (776 agents)
CODITECT MoE Patterns: skills/moe-task-execution/SKILL.md
Claude Agent SDK: https://github.com/anthropics/anthropic-sdk-python
Related ADRs:
- ADR-002: Integration Pattern
- ADR-003: Agent Orchestration Mapping
- ADR-004: Scaling Model for Agent Selection
- ADR-005: Experiment Data Governance

Author: Claude (Sonnet 4.5) Date: 2026-02-16 Track: H (Framework) Task ID: H.0

Status​

Context​

Decision​

Alternatives Considered​

1. Build Custom Scaling Analysis Framework​

2. Use Different MAS Benchmarking Framework​

3. Paper-Only Theoretical Analysis​

4. Manual Benchmarking Without Framework​

Consequences​

Positive​

Negative​

Risks​

Implementation Notes​

References​

Status

Context

Decision

Alternatives Considered

1. Build Custom Scaling Analysis Framework

2. Use Different MAS Benchmarking Framework

3. Paper-Only Theoretical Analysis

4. Manual Benchmarking Without Framework

Consequences

Positive

Negative

Risks

Implementation Notes

References