Skip to main content

ADR-001: Agent Labs Adoption as CODITECT Subsystem

Status

Proposed

Context

CODITECT is an AI agent platform with 776 agents, 445 skills, and 377 commands operating across 37 tracks. As the platform scales, understanding the performance characteristics of different agent orchestration patterns becomes critical for optimization and cost management.

Brainqub3 Agent Labs is an open-source framework specifically designed to benchmark and compare single-agent systems (SAS) versus multi-agent systems (MAS) using a paper-aligned scaling model from arXiv:2512.08296. The framework provides:

  • 5 orchestration patterns: SAS, Independent, Centralised, Decentralised, Hybrid
  • Mathematical scaling model: P_hat = clip(beta0 + sum(beta_i*z_i) + sum(beta_ij*z_i*z_j), 0, 1)
  • Elasticity estimation for agent count and task complexity scaling
  • 5 coordination metrics: overhead, message density, redundancy, efficiency, error amplification
  • Evaluator-first design with immutable, content-hashed runs
  • Claude Agent SDK compatibility for live agent execution
  • Custom HTML dashboard for analysis (Python ThreadingHTTPServer)

The question is whether to adopt Agent Labs as a subsystem for systematic MAS analysis within CODITECT, or pursue alternative approaches.

Decision

Adopt Brainqub3 Agent Labs as a git submodule-based subsystem for multi-agent scaling analysis within the CODITECT platform. The subsystem will be used to:

  1. Benchmark CODITECT agent orchestration patterns against the paper-aligned scaling model
  2. Generate empirical coordination metrics for different task types
  3. Inform runtime agent selection and architecture decisions
  4. Provide quantitative evidence for agent system optimization

Integration will follow the submodule pattern (see ADR-002) with CODITECT-specific adapters for agent mapping (see ADR-003).

Alternatives Considered

1. Build Custom Scaling Analysis Framework

Pros:

  • Full control over implementation
  • Tailored specifically to CODITECT patterns
  • No external dependencies

Cons:

  • Duplicate effort (Agent Labs already implements paper model)
  • Lack of academic validation (paper-aligned model provides credibility)
  • Higher maintenance burden
  • Longer time to production

2. Use Different MAS Benchmarking Framework

Pros:

  • Might have different strengths or focus areas
  • Potentially larger community

Cons:

  • No frameworks found with equivalent paper-aligned scaling model
  • Most MAS frameworks focus on domain-specific problems, not general orchestration patterns
  • Would still require similar integration work

3. Paper-Only Theoretical Analysis

Pros:

  • Zero code dependencies
  • Pure mathematical approach

Cons:

  • No empirical validation of CODITECT patterns
  • Cannot inform runtime decisions
  • Lacks reproducibility and tooling

4. Manual Benchmarking Without Framework

Pros:

  • Maximum flexibility
  • No framework constraints

Cons:

  • Ad-hoc, non-reproducible experiments
  • No standardized metrics
  • Difficult to compare across time or configurations

Consequences

Positive

  1. Academic Rigor: Paper-aligned model (arXiv:2512.08296) provides theoretical foundation for scaling analysis
  2. Empirical Evidence: Enables data-driven decisions about agent orchestration rather than intuition-based choices
  3. Standardized Metrics: 5 coordination metrics provide consistent vocabulary for discussing MAS performance
  4. Claude SDK Compatibility: Agent Labs uses the same claude-agent-sdk as CODITECT, minimizing adapter complexity
  5. Evaluator-First Design: Forces explicit success criteria before experiments, improving experiment quality
  6. Immutable Runs: Content-hashed run manifests enable reproducibility and audit trails
  7. Visualization: Built-in dashboard accelerates insight generation
  8. Cost Optimization: Scaling predictions can prevent coordination collapse and reduce wasted API calls
  9. Framework Maturity: Leverages existing, tested implementation rather than greenfield development

Negative

  1. External Dependency: CODITECT subsystem depends on upstream Agent Labs maintenance and compatibility
  2. Learning Curve: Team must understand Agent Labs concepts, CLI, and scaling model mathematics
  3. Integration Overhead: Requires adapter layer to map CODITECT agents to Agent Labs architecture types
  4. Version Management: Need to track Agent Labs versions and handle breaking changes
  5. Storage Growth: Immutable runs accumulate storage, requiring archival strategy (see ADR-005)
  6. Potential Mismatch: Not all CODITECT orchestration patterns may map cleanly to the 5 Agent Labs types

Risks

  1. Upstream Abandonment: If Agent Labs is no longer maintained, CODITECT would need to fork or replace

    • Mitigation: Open-source license allows forking; submodule pattern makes replacement feasible
  2. Model Accuracy: Scaling predictions depend on experiment quality and may not generalize to all CODITECT use cases

    • Mitigation: Validate predictions against production metrics; calibrate model with CODITECT-specific runs
  3. Overhead Cost: Running scaling experiments consumes API tokens and compute resources

    • Mitigation: Use small-scale calibration runs; apply findings to reduce production costs
  4. Complexity Increase: Adds another subsystem to CODITECT architecture

    • Mitigation: Clear boundaries via submodule; optional use (not in critical path)
  5. Paper Model Limitations: The scaling model is domain-general and may miss CODITECT-specific patterns

    • Mitigation: Extend model with CODITECT-specific factors if needed; use as baseline, not absolute truth

Implementation Notes

  1. Submodule Location: submodules/r-and-d/brainqub3-agent-labs/
  2. Adapter Package: scripts/scaling-analysis/ for CODITECT-specific integration
  3. Experiment Repository: ~/.coditect-data/scaling-experiments/ for run data
  4. Dashboard Access: HTML dashboard accessible via brainqub3 dashboard command
  5. Documentation: Create skills/scaling-analysis/SKILL.md for usage guidance

References


Author: Claude (Sonnet 4.5) Date: 2026-02-16 Track: H (Framework) Task ID: H.0