Brainqub3 Agent Labs - Scaling Integration Assessment
Date: 2026-02-16 Author: Claude (Opus 4.6) Task ID: H.0 (ad-hoc analysis) Project: PILOT (coditect-rollout-master)
1. Executive Summary
Recommendation: Conditional Go - Adopt brainqub3/agent-labs as an offline architecture validation tool within the CODITECT control plane. Do NOT replace CODITECT's orchestration engine.
| Dimension | Score | Notes |
|---|---|---|
| Strategic Alignment | 70% | 4 MAS patterns map to CODITECT's 5 workflow patterns |
| Technical Quality | High | Clean Python, evaluator-first, immutable runs, paper-aligned model |
| Integration Effort | Low | MIT license, ~2,500 LOC, CLI-driven, file-based storage |
| Risk Level | Low | Read-only submodule, no production dependencies |
| Gap Severity | Medium | No multi-tenancy, Claude-only, no compliance hooks |
2. Repository Overview
| Property | Value |
|---|---|
| Repository | brainqub3/agent-labs |
| Paper | arXiv:2512.08296 - "Towards a Science of Scaling Agent Systems" |
| License | MIT |
| Language | Python 3.11+ |
| Package Manager | uv |
| LOC | ~2,500 (core) |
| Agent SDK | Claude Agent SDK (Anthropic) |
| Submodule Path | submodules/labs/agent-labs/ |
| Push Policy | NEVER push (third-party, read-only) |
3. Architecture Deep-Dive
3.1 Core Modules (Verified Against Source Code)
| Module | File | LOC | Purpose |
|---|---|---|---|
| Orchestrators | brainqub3/arena/orchestrators.py | 231 | 5 orchestration patterns (SAS, Independent, Centralised, Decentralised, Hybrid) |
| Runner | brainqub3/arena/runner.py | 1,088 | Experiment execution engine with AgentBackend |
| Paper Model | brainqub3/scaling/paper_model.py | 142 | Mixed-effects scaling model (R²=0.52) |
| Immutability | brainqub3/telemetry/immutability.py | 117 | SHA-256 per-file + Merkle tree hash for runs |
| Coordination | brainqub3/metrics/coordination.py | 31 | 4 coordination metric functions |
| Redundancy | brainqub3/metrics/redundancy.py | 80 | TF-IDF cosine similarity for work overlap |
| CLI | brainqub3/cli.py | 1,089 | Full CLI: doctor, run, dashboard, metrics, scenario |
| Dashboard | brainqub3/dashboard/webapp.py | ~500 | Custom HTML webapp (Python ThreadingHTTPServer) |
3.2 Orchestration Patterns
| Pattern | Class | Peer Exchange | Aggregation | Paper Default Overhead |
|---|---|---|---|---|
| SAS | SASOrchestrator | None | Direct output | 0% |
| Independent | IndependentOrchestrator | None | _majority_vote() (Counter.most_common) | 58% |
| Centralised | CentralisedOrchestrator | None | Orchestrator synthesis | 285% |
| Decentralised | DecentralisedOrchestrator | N refine rounds | Consensus vote | 263% |
| Hybrid | HybridOrchestrator | Assignment + peer rounds | Orchestrator synthesis | 515% |
Key implementation detail: build_orchestrator(architecture: str) factory maps string → class. OrchestratorContext dataclass provides execute_turn, record_inter_agent, record_orchestrator callbacks.
3.3 Scaling Model
Formula: P_hat = clip(beta0 + sum(beta_i * z_i) + sum(beta_ij * z_i * z_j), 0, 1)
Transforms (verified in paper_model.py:56-79):
I_centered = intelligence_index - 56.9(I_mean_center from coefficients)log1p(T)for tool countlog1p(n_agents)for agent countlog1p(overhead_pct)for overheadlog1p(error_amp_Ae)for error amplification
Interaction terms (9, verified in paper_model.py:105-114):
- P_SA x log1p_n_agents, Ec x T, overhead_pct x T, Ae x T, R x n_agents
- I x Ec, Ae x P_SA, c x I, I x log1p_T
3.4 Coordination Metrics (Verified Formulas)
| Metric | Code Formula | Source |
|---|---|---|
| overhead_pct | ((turns_mas - turns_sas) / turns_sas) * 100 | coordination.py:7 |
| message_density | inter_agent_messages / (inter_agent_messages + turns_total) | coordination.py:15-16 |
| coordination_efficiency | success_rate / (turns_total / turns_sas) | coordination.py:22 |
| error_amplification | (1 - success_mas) / (1 - success_sas) | coordination.py:30 |
| redundancy_R | TF-IDF cosine similarity mean across worker outputs | redundancy.py:53-79 |
Important: Overhead is turns-based (not time-based). Message density uses the ratio of inter-agent messages to total message count (not messages/n_agents).
3.5 Run Immutability
- SHA-256 per-file hashing (
_sha256()inimmutability.py:21-29) - Merkle-like tree hash from concatenated
path:hashpairs finalize_run()writes manifest, raisesRuntimeErrorif already finalizedverify_run_manifest()checks file existence + hash match, returns(bool, list[str])
3.6 Key CLI Commands (Verified in cli.py)
uv run brainqub3 doctor # Environment health check
uv run brainqub3 run sas --task X --model Y # Single-agent baseline
uv run brainqub3 run mas --task X --arch Z --model Y # Multi-agent experiment
uv run brainqub3 run elasticity --task X --arch Z # Scaling grid sweep
uv run brainqub3 dashboard # Launch HTML dashboard on :8765
uv run brainqub3 task init my_task # Create new task scaffold
uv run brainqub3 metrics compute --run-id X # Compute coordination metrics
uv run brainqub3 model predict --scenario X # Run scaling prediction
3.7 Evaluator-First Enforcement
ExperimentRunner._run() (runner.py:800) calls self._run_eval_tests(task) before any experiment execution. This matches CODITECT's ground truth validation pattern.
run_mas() auto-generates SAS baseline by calling self.run_sas() first (runner.py:1059).
4. CODITECT Alignment Analysis
4.1 Strong Alignment (70%)
| CODITECT Capability | Agent Labs Equivalent | Alignment |
|---|---|---|
| 5 workflow patterns (MoE) | 4 MAS patterns + SAS baseline | Direct map for 4/5 |
| Ground truth validation | Evaluator-first with test enforcement | Exact match |
| Circuit Breaker / Token Budget | Coordination metrics (overhead, efficiency) | Input signal |
| Compliance audit trail | SHA-256 immutable run manifests | Compatible |
| 776 agent benchmarking | Arena runtime + controlled experiments | Direct use |
| Scaling predictions | Paper model with elasticity calibration | Unique value-add |
4.2 Gaps (30%)
| Gap | Impact | Mitigation |
|---|---|---|
| No multi-tenancy | Cannot share experiments across CODITECT tenants | Phase 2: adapter layer with tenant isolation |
| Claude-only SDK | No multi-model provider abstraction | Phase 2: provider interface extraction |
| No compliance hooks | FDA 21 CFR Part 11, HIPAA, SOC2 not enforced | Phase 2: wrapper with compliance middleware |
| Offline calibration only | No runtime adaptive selection | Phase 3: streaming predictions to Pattern Selector |
| 4 patterns (vs CODITECT 5+) | Missing Router/Pipeline patterns | Phase 2: custom orchestrator implementations |
| No observability | No OTEL, Prometheus integration | Phase 2: instrument with CODITECT telemetry |
| Local-first storage | Single-machine, no cloud persistence | Phase 3: GCS/S3 adapter for run storage |
5. Integration Phases
Phase 1: Immediate (0-2 weeks)
- Use as-is for offline validation experiments
- Author CODITECT-specific tasks (agent benchmarks, skill evaluation)
- Read-only submodule at
submodules/labs/agent-labs/ - No production dependencies
Phase 2: Adapter Layer (4-6 weeks)
- Build
scripts/scaling-analysis/wrapper package - Add multi-tenant experiment isolation
- Extract provider interface for multi-model support
- Add OTEL instrumentation to run lifecycle
- Implement compliance middleware (audit events)
- Add 5th orchestrator pattern (Router/Pipeline)
Phase 3: Runtime Integration (8-12 weeks, deferred)
- Scaling model predictions feed Pattern Selector in real-time
- Streaming elasticity calibration from production workloads
- GCS/S3 backend for experiment data
- Dashboard integration with CODITECT admin UI
6. Artifact Inventory
20 analysis artifacts were generated and code-verified:
| Category | Count | Location |
|---|---|---|
| Core documents | 4 | artifacts/ (executive-summary, coditect-impact, quick-start, glossary) |
| Architecture docs | 4 | artifacts/ (sdd, tdd, c4-architecture, mermaid-diagrams) |
| ADRs | 5 | artifacts/adrs/ (ADR-001 through ADR-005) |
| React dashboards | 6 | artifacts/dashboards/ (6 JSX components) |
| Index | 1 | artifacts/README.md |
Total: 20 files, ~17,500 lines
Location: analyze-new-artifacts/coditect-scaling-agent-systems/artifacts/
Discrepancies Found and Fixed
4 factual discrepancies between generated artifacts and actual source code were identified during deep-dive review and corrected:
| Discrepancy | Affected Files | Fix Applied |
|---|---|---|
| "Streamlit dashboard" (actual: custom HTML ThreadingHTTPServer) | sdd, tdd, c4, glossary, mermaid, ADR-001, ADR-002, JSX | All 30+ refs corrected |
run-experiment CLI command (actual: run sas/run mas) | mermaid-diagrams | 2 refs corrected |
| Overhead formula "coordination_time/total_time" (actual: turns-based) | mermaid-diagrams | 1 ref corrected |
| Message density "messages/n_agents" (actual: inter_agent/(inter_agent+turns)) | tdd, mermaid, c4 | 4 refs corrected |
Mermaid Diagram Validation
All 10 Mermaid diagrams rendered successfully via Mermaid Chart MCP:
| # | Diagram | Type | Status |
|---|---|---|---|
| 1 | System Context | graph TB | PASS |
| 2 | Arena Run Lifecycle | sequenceDiagram | PASS |
| 3 | Orchestration Patterns | graph LR | PASS |
| 4 | Scaling Model Data Flow | flowchart TD | PASS |
| 5 | Elasticity Estimation | flowchart TD | PASS |
| 6 | Coordination Metrics | flowchart LR | PASS |
| 7 | CODITECT Integration | graph TB | PASS |
| 8 | Data Storage | graph TD | PASS |
| 9 | Experiment Design | flowchart TD | PASS |
| 10 | Component Diagram | graph TB | PASS |
7. Risk Assessment
| Risk | Probability | Impact | Mitigation |
|---|---|---|---|
| Upstream breaking changes | Medium | Low | Pin submodule to specific commit |
| Claude SDK version drift | Medium | Medium | Version constraint in adapter |
| Scaling model R²=0.52 limitations | High | Low | Use for ranking, not absolute predictions |
| License contamination | Low | Low | MIT license, permissive |
| Dependency conflicts with CODITECT | Low | Medium | Isolated venv per submodule |
8. Conclusion
Brainqub3 Agent Labs provides a rigorous, paper-aligned measurement rig that fills a gap in CODITECT's architecture validation capabilities. The 70% alignment with existing patterns, combined with low integration effort and zero production risk, makes it a clear Conditional Go for Phase 1 adoption.
The 30% gap (multi-tenancy, multi-model, compliance) is addressable through the adapter layer in Phase 2, and does not block immediate use for offline experiments.
Analysis Location: internal/analysis/agent-labs-scaling/agent-labs-scaling-integration-assessment-2026-02-16.md
Artifacts Location: analyze-new-artifacts/coditect-scaling-agent-systems/artifacts/
Submodule Location: submodules/labs/agent-labs/ (read-only, NEVER push)