Skip to main content

Brainqub3 Agent Labs - Scaling Integration Assessment

Date: 2026-02-16 Author: Claude (Opus 4.6) Task ID: H.0 (ad-hoc analysis) Project: PILOT (coditect-rollout-master)


1. Executive Summary

Recommendation: Conditional Go - Adopt brainqub3/agent-labs as an offline architecture validation tool within the CODITECT control plane. Do NOT replace CODITECT's orchestration engine.

DimensionScoreNotes
Strategic Alignment70%4 MAS patterns map to CODITECT's 5 workflow patterns
Technical QualityHighClean Python, evaluator-first, immutable runs, paper-aligned model
Integration EffortLowMIT license, ~2,500 LOC, CLI-driven, file-based storage
Risk LevelLowRead-only submodule, no production dependencies
Gap SeverityMediumNo multi-tenancy, Claude-only, no compliance hooks

2. Repository Overview

PropertyValue
Repositorybrainqub3/agent-labs
PaperarXiv:2512.08296 - "Towards a Science of Scaling Agent Systems"
LicenseMIT
LanguagePython 3.11+
Package Manageruv
LOC~2,500 (core)
Agent SDKClaude Agent SDK (Anthropic)
Submodule Pathsubmodules/labs/agent-labs/
Push PolicyNEVER push (third-party, read-only)

3. Architecture Deep-Dive

3.1 Core Modules (Verified Against Source Code)

ModuleFileLOCPurpose
Orchestratorsbrainqub3/arena/orchestrators.py2315 orchestration patterns (SAS, Independent, Centralised, Decentralised, Hybrid)
Runnerbrainqub3/arena/runner.py1,088Experiment execution engine with AgentBackend
Paper Modelbrainqub3/scaling/paper_model.py142Mixed-effects scaling model (R²=0.52)
Immutabilitybrainqub3/telemetry/immutability.py117SHA-256 per-file + Merkle tree hash for runs
Coordinationbrainqub3/metrics/coordination.py314 coordination metric functions
Redundancybrainqub3/metrics/redundancy.py80TF-IDF cosine similarity for work overlap
CLIbrainqub3/cli.py1,089Full CLI: doctor, run, dashboard, metrics, scenario
Dashboardbrainqub3/dashboard/webapp.py~500Custom HTML webapp (Python ThreadingHTTPServer)

3.2 Orchestration Patterns

PatternClassPeer ExchangeAggregationPaper Default Overhead
SASSASOrchestratorNoneDirect output0%
IndependentIndependentOrchestratorNone_majority_vote() (Counter.most_common)58%
CentralisedCentralisedOrchestratorNoneOrchestrator synthesis285%
DecentralisedDecentralisedOrchestratorN refine roundsConsensus vote263%
HybridHybridOrchestratorAssignment + peer roundsOrchestrator synthesis515%

Key implementation detail: build_orchestrator(architecture: str) factory maps string → class. OrchestratorContext dataclass provides execute_turn, record_inter_agent, record_orchestrator callbacks.

3.3 Scaling Model

Formula: P_hat = clip(beta0 + sum(beta_i * z_i) + sum(beta_ij * z_i * z_j), 0, 1)

Transforms (verified in paper_model.py:56-79):

  • I_centered = intelligence_index - 56.9 (I_mean_center from coefficients)
  • log1p(T) for tool count
  • log1p(n_agents) for agent count
  • log1p(overhead_pct) for overhead
  • log1p(error_amp_Ae) for error amplification

Interaction terms (9, verified in paper_model.py:105-114):

  • P_SA x log1p_n_agents, Ec x T, overhead_pct x T, Ae x T, R x n_agents
  • I x Ec, Ae x P_SA, c x I, I x log1p_T

3.4 Coordination Metrics (Verified Formulas)

MetricCode FormulaSource
overhead_pct((turns_mas - turns_sas) / turns_sas) * 100coordination.py:7
message_densityinter_agent_messages / (inter_agent_messages + turns_total)coordination.py:15-16
coordination_efficiencysuccess_rate / (turns_total / turns_sas)coordination.py:22
error_amplification(1 - success_mas) / (1 - success_sas)coordination.py:30
redundancy_RTF-IDF cosine similarity mean across worker outputsredundancy.py:53-79

Important: Overhead is turns-based (not time-based). Message density uses the ratio of inter-agent messages to total message count (not messages/n_agents).

3.5 Run Immutability

  • SHA-256 per-file hashing (_sha256() in immutability.py:21-29)
  • Merkle-like tree hash from concatenated path:hash pairs
  • finalize_run() writes manifest, raises RuntimeError if already finalized
  • verify_run_manifest() checks file existence + hash match, returns (bool, list[str])

3.6 Key CLI Commands (Verified in cli.py)

uv run brainqub3 doctor                           # Environment health check
uv run brainqub3 run sas --task X --model Y # Single-agent baseline
uv run brainqub3 run mas --task X --arch Z --model Y # Multi-agent experiment
uv run brainqub3 run elasticity --task X --arch Z # Scaling grid sweep
uv run brainqub3 dashboard # Launch HTML dashboard on :8765
uv run brainqub3 task init my_task # Create new task scaffold
uv run brainqub3 metrics compute --run-id X # Compute coordination metrics
uv run brainqub3 model predict --scenario X # Run scaling prediction

3.7 Evaluator-First Enforcement

ExperimentRunner._run() (runner.py:800) calls self._run_eval_tests(task) before any experiment execution. This matches CODITECT's ground truth validation pattern.

run_mas() auto-generates SAS baseline by calling self.run_sas() first (runner.py:1059).


4. CODITECT Alignment Analysis

4.1 Strong Alignment (70%)

CODITECT CapabilityAgent Labs EquivalentAlignment
5 workflow patterns (MoE)4 MAS patterns + SAS baselineDirect map for 4/5
Ground truth validationEvaluator-first with test enforcementExact match
Circuit Breaker / Token BudgetCoordination metrics (overhead, efficiency)Input signal
Compliance audit trailSHA-256 immutable run manifestsCompatible
776 agent benchmarkingArena runtime + controlled experimentsDirect use
Scaling predictionsPaper model with elasticity calibrationUnique value-add

4.2 Gaps (30%)

GapImpactMitigation
No multi-tenancyCannot share experiments across CODITECT tenantsPhase 2: adapter layer with tenant isolation
Claude-only SDKNo multi-model provider abstractionPhase 2: provider interface extraction
No compliance hooksFDA 21 CFR Part 11, HIPAA, SOC2 not enforcedPhase 2: wrapper with compliance middleware
Offline calibration onlyNo runtime adaptive selectionPhase 3: streaming predictions to Pattern Selector
4 patterns (vs CODITECT 5+)Missing Router/Pipeline patternsPhase 2: custom orchestrator implementations
No observabilityNo OTEL, Prometheus integrationPhase 2: instrument with CODITECT telemetry
Local-first storageSingle-machine, no cloud persistencePhase 3: GCS/S3 adapter for run storage

5. Integration Phases

Phase 1: Immediate (0-2 weeks)

  • Use as-is for offline validation experiments
  • Author CODITECT-specific tasks (agent benchmarks, skill evaluation)
  • Read-only submodule at submodules/labs/agent-labs/
  • No production dependencies

Phase 2: Adapter Layer (4-6 weeks)

  • Build scripts/scaling-analysis/ wrapper package
  • Add multi-tenant experiment isolation
  • Extract provider interface for multi-model support
  • Add OTEL instrumentation to run lifecycle
  • Implement compliance middleware (audit events)
  • Add 5th orchestrator pattern (Router/Pipeline)

Phase 3: Runtime Integration (8-12 weeks, deferred)

  • Scaling model predictions feed Pattern Selector in real-time
  • Streaming elasticity calibration from production workloads
  • GCS/S3 backend for experiment data
  • Dashboard integration with CODITECT admin UI

6. Artifact Inventory

20 analysis artifacts were generated and code-verified:

CategoryCountLocation
Core documents4artifacts/ (executive-summary, coditect-impact, quick-start, glossary)
Architecture docs4artifacts/ (sdd, tdd, c4-architecture, mermaid-diagrams)
ADRs5artifacts/adrs/ (ADR-001 through ADR-005)
React dashboards6artifacts/dashboards/ (6 JSX components)
Index1artifacts/README.md

Total: 20 files, ~17,500 lines Location: analyze-new-artifacts/coditect-scaling-agent-systems/artifacts/

Discrepancies Found and Fixed

4 factual discrepancies between generated artifacts and actual source code were identified during deep-dive review and corrected:

DiscrepancyAffected FilesFix Applied
"Streamlit dashboard" (actual: custom HTML ThreadingHTTPServer)sdd, tdd, c4, glossary, mermaid, ADR-001, ADR-002, JSXAll 30+ refs corrected
run-experiment CLI command (actual: run sas/run mas)mermaid-diagrams2 refs corrected
Overhead formula "coordination_time/total_time" (actual: turns-based)mermaid-diagrams1 ref corrected
Message density "messages/n_agents" (actual: inter_agent/(inter_agent+turns))tdd, mermaid, c44 refs corrected

Mermaid Diagram Validation

All 10 Mermaid diagrams rendered successfully via Mermaid Chart MCP:

#DiagramTypeStatus
1System Contextgraph TBPASS
2Arena Run LifecyclesequenceDiagramPASS
3Orchestration Patternsgraph LRPASS
4Scaling Model Data Flowflowchart TDPASS
5Elasticity Estimationflowchart TDPASS
6Coordination Metricsflowchart LRPASS
7CODITECT Integrationgraph TBPASS
8Data Storagegraph TDPASS
9Experiment Designflowchart TDPASS
10Component Diagramgraph TBPASS

7. Risk Assessment

RiskProbabilityImpactMitigation
Upstream breaking changesMediumLowPin submodule to specific commit
Claude SDK version driftMediumMediumVersion constraint in adapter
Scaling model R²=0.52 limitationsHighLowUse for ranking, not absolute predictions
License contaminationLowLowMIT license, permissive
Dependency conflicts with CODITECTLowMediumIsolated venv per submodule

8. Conclusion

Brainqub3 Agent Labs provides a rigorous, paper-aligned measurement rig that fills a gap in CODITECT's architecture validation capabilities. The 70% alignment with existing patterns, combined with low integration effort and zero production risk, makes it a clear Conditional Go for Phase 1 adoption.

The 30% gap (multi-tenancy, multi-model, compliance) is addressable through the adapter layer in Phase 2, and does not block immediate use for offline experiments.


Analysis Location: internal/analysis/agent-labs-scaling/agent-labs-scaling-integration-assessment-2026-02-16.md Artifacts Location: analyze-new-artifacts/coditect-scaling-agent-systems/artifacts/ Submodule Location: submodules/labs/agent-labs/ (read-only, NEVER push)