Skip to main content

ADR 012: Mixture of Experts (MoE) Analysis Framework

ADR-012: Mixture of Experts (MoE) Analysis Framework

Document: ADR-012-moe-analysis-framework
Version: 1.0.0
Purpose: Document architectural decisions for multi-agent research analysis with certainty scoring
Audience: Framework contributors, developers, AI agents
Date Created: 2025-12-19
Status: APPROVED
Depends On:
- ADR-011-uncertainty-quantification-framework
Related ADRs:
- ADR-010-autonomous-orchestration-system
- ADR-013-moe-judges-framework
Related Components:
- commands/moe-analyze.md
- agents/uncertainty-orchestrator.md
- skills/uncertainty-quantification/SKILL.md
Research Foundation:
- docs/09-research-analysis/ACADEMIC-RESEARCH-REFERENCES-UQ-MOE-2024-2025.md

Context and Problem Statement

The Research Quality Problem

Current multi-agent research workflows suffer from:

  1. Implicit Certainty - Findings presented without confidence indicators
  2. Evidence Opacity - Sources not distinguished by reliability or recency
  3. Hidden Gaps - Missing information not explicitly documented
  4. Overconfident Assertions - Claims made without acknowledging uncertainty
  5. No Inference Transparency - Speculative conclusions lack reasoning traces

Research Foundation

This ADR is supported by peer-reviewed research from 2024-2025:

ResearchVenueContributionCertainty
Semantic DensityNeurIPS 2024Best AUROC 26/28 cases96%
Self-ConsistencyICLR 2022+17.9% GSM8K improvement97%
Mixture-of-AgentsarXiv 202465.1% AlpacaEval win rate90%
UoT FrameworkNeurIPS 202438.1% task completion improvement93%
Chain-of-VerificationACL 202423% F1 improvement92%

Full citations: See docs/09-research-analysis/ACADEMIC-RESEARCH-REFERENCES-UQ-MOE-2024-2025.md

Decision Drivers

  1. Factual Accuracy - CODITECT must never report unverified information as fact
  2. Explicit Uncertainty - When certainty is lacking, this must be communicated
  3. Research Validation - Claims must be traceable to evidence sources
  4. Logical Transparency - Inferred conclusions must show reasoning chains
  5. Ensemble Wisdom - Multiple perspectives reduce individual agent bias

Considered Options

Option A: Single-Agent Research with Confidence Prompting

  • Single agent with uncertainty-aware prompting
  • Rejected: Single-source bias, no cross-validation, limited perspectives

Option B: Multi-Agent Research without Certainty Scoring

  • Multiple agents aggregate findings informally
  • Rejected: No explicit certainty, inconsistent quality, hidden disagreements

Option C: MoE Analysis Framework with Certainty Scoring (Selected)

  • Multiple specialized agents with structured certainty outputs
  • Composite certainty scoring from weighted factors
  • Evidence validation protocol with source classification
  • Logical inference chains for speculative claims
  • Selected: Addresses all decision drivers with research backing

Option D: External Fact-Checking Service Integration

  • Third-party verification API
  • Rejected: Latency, cost, availability dependencies

Decision

Implement Option C: MoE Analysis Framework with the following architecture:

1. Multi-Agent Analyst Panel

Agent Composition (4-6 agents per analysis):

analysts = [
{"type": "web-search-researcher", "focus": "External evidence gathering"},
{"type": "thoughts-analyzer", "focus": "Internal documentation analysis"},
{"type": "codebase-analyzer", "focus": "Implementation verification"},
{"type": "qa-reviewer", "focus": "Analysis quality validation"},
# Optional domain-specific agents
]

Research Basis: Mixture-of-Agents (arXiv 2024) demonstrates ensemble wisdom outperforms single powerful models (65.1% vs 57.5% AlpacaEval).

2. Certainty Scoring System

Composite Formula (from ADR-011):

certainty_score = (
evidence_support * 0.40 + # Quality of supporting sources
source_reliability * 0.25 + # Credibility of sources
internal_consistency * 0.20 + # Agent agreement level
recency * 0.15 # Information freshness
)

Research Basis: Semantic Density (NeurIPS 2024) achieves best AUROC in 26/28 test cases using multi-factor confidence analysis.

Certainty Levels:

ScoreLevelRequired Action
85-100%HIGHReport with confidence
60-84%MEDIUMNote limitations, provide sources
30-59%LOWExplicitly state uncertainty
0-29%INFERREDRequire logical inference chain

3. Evidence Validation Protocol

Source Classification:

Source TypeBase ReliabilityRecency Penalty
Peer-reviewed95%-30% if >5 years
Government90%-15% if >3 years
Academic institution85%-15% if >3 years
Industry leader80%-5% if >1 year
Reputable news70%-10% if >1 year
Industry blog60%-15% if >1 year
Unknown/no source20%N/A

Research Basis: RAGAS framework (industry standard) achieves 95% human agreement on faithfulness validation.

Required Evidence Format:

{
"claim": "Specific statement being made",
"certainty_factor": 0.85,
"certainty_basis": "evidence_backed",
"evidence": [
{
"url": "https://example.com/source",
"title": "Source Title",
"venue": "Publication Venue",
"year": 2024,
"evidence_strength": "strong",
"summary": "How source supports claim"
}
],
"missing_information": ["What would increase certainty"]
}

4. Logical Inference Protocol

When evidence is insufficient, generate explicit reasoning:

## Inferred Conclusion: [Statement]

**Inference Type:** Deduction | Induction | Abduction
**Certainty:** [X%] (INFERRED)

### Reasoning Chain
1. **Premise 1:** [Statement]
- Evidence: [Source or "Assumed based on..."]
- Certainty: [X%]

2. **Premise 2:** [Statement]
- Evidence: [Source or "Domain practice"]
- Certainty: [X%]

3. **Therefore:** [Conclusion]

### Assumptions
- [List assumptions that, if false, invalidate conclusion]

### Falsification Criteria
- [Evidence that would disprove this inference]

Research Basis: Uncertainty of Thoughts (NeurIPS 2024) demonstrates 38.1% improvement through explicit uncertainty modeling.

5. Quality Gates

ConditionAction
Claim without evidenceMark INFERRED, require reasoning chain
Source >2 years oldFlag recency concern, reduce reliability
Single-source claimMark LOW certainty minimum
Contradictory sourcesDocument conflict, require reconciliation
Agent disagreement >1.5σInvestigate and document dissent

Research Basis: Chain-of-Verification (ACL 2024) shows 23% F1 improvement with verification requirements.

Architecture

Workflow Phases

┌─────────────────────────────────────────────────────────────┐
│ Phase 1: Dispatch │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Analyst │ │ Analyst │ │ Analyst │ │ Analyst │ │
│ │ 1 │ │ 2 │ │ 3 │ │ 4 │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │ │
│ └────────────┴─────┬──────┴────────────┘ │
│ ▼ │
├─────────────────────────────────────────────────────────────┤
│ Phase 2: Aggregation │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Cross-validate claims, identify contradictions, │ │
│ │ calculate composite certainty, document gaps │ │
│ └─────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
├─────────────────────────────────────────────────────────────┤
│ Phase 3: Conditional Research │
│ IF low_certainty_claims > threshold: │
│ Dispatch targeted follow-up research │
│ ELSE: │
│ Proceed to synthesis │
│ │ │
│ ▼ │
├─────────────────────────────────────────────────────────────┤
│ Phase 4: Synthesis │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Generate findings report with: │ │
│ │ - Certainty scores per finding │ │
│ │ - Evidence citations │ │
│ │ - Logical inference chains (where needed) │ │
│ │ - Explicit gaps and recommendations │ │
│ └─────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

Output Schema

{
"analysis_id": "moe-[timestamp]",
"query": "Original research question",
"findings": [
{
"id": "F1",
"statement": "Finding statement",
"certainty": {
"score": 85,
"level": "HIGH",
"factors": {
"evidence_support": 90,
"source_reliability": 85,
"internal_consistency": 80,
"recency": 75
}
},
"evidence": [...],
"inference_chain": null
},
{
"id": "F2",
"statement": "Inferred finding",
"certainty": {
"score": 25,
"level": "INFERRED"
},
"evidence": [],
"inference_chain": {
"type": "deduction",
"premises": [...],
"assumptions": [...],
"falsification": [...]
}
}
],
"gaps": ["List of missing information"],
"recommendations": ["Suggested follow-up research"],
"metadata": {
"agents_used": 4,
"sources_validated": 12,
"inference_chains_generated": 2,
"overall_certainty": 72
}
}

Consequences

Positive

  • Reduced Overconfidence - Explicit certainty prevents unwarranted assertions
  • Evidence Traceability - All claims linked to sources
  • Transparent Reasoning - Inference chains expose logic
  • Gap Documentation - Missing information explicitly captured
  • Quality Improvement - Gaps enable targeted follow-up research

Negative

  • Increased Token Usage - ~1000-2000 tokens per analysis for certainty metadata
  • Latency - Multi-agent coordination adds processing time
  • Complexity - More sophisticated output parsing required
  • Training - Users must understand certainty levels

Neutral

  • Shifts from implicit to explicit uncertainty expression
  • Changes agent prompting requirements

Implementation

Phase 1: Core Components (Week 1-2)

  • Create moe-analyze command specification
  • Create uncertainty-orchestrator agent
  • Create uncertainty-quantification skill
  • Implement certainty scoring functions
  • Define evidence validation schema

Phase 2: Integration (Week 3-4)

  • Integrate with existing analyst agents
  • Add certainty requirements to agent prompts
  • Implement quality gate checks
  • Create output formatting

Phase 3: Validation (Week 5-6)

  • Test against known-answer datasets
  • Calibrate certainty thresholds
  • Gather user feedback
  • Document edge cases

Validation Criteria

MetricTargetMeasurement
Overconfidence Rate<10%Claims with HIGH certainty that are incorrect
Evidence Coverage>90%Claims with valid source citations
Gap Documentation100%Analyses that document missing information
Inference Transparency100%INFERRED claims with reasoning chains
User Certainty Understanding>80%Survey comprehension of certainty levels

References

Primary Research (Tier 1: 95%+ Certainty)

  1. Semantic Density - Qiu & Miikkulainen, NeurIPS 2024

  2. Self-Consistency (CoT-SC) - Wang et al., ICLR 2022

  3. Uncertainty of Thoughts - Hu et al., NeurIPS 2024

  4. Chain-of-Verification - Meta AI, ACL 2024

Secondary Research (Tier 2: 85-94% Certainty)

  1. Mixture-of-Agents - Together AI, arXiv 2024

  2. RAGAS Framework - Explodinggradients

CODITECT Components

  • commands/moe-analyze.md - Command specification
  • agents/uncertainty-orchestrator.md - Orchestration agent
  • skills/uncertainty-quantification/SKILL.md - Certainty calculation patterns
  • docs/09-research-analysis/ACADEMIC-RESEARCH-REFERENCES-UQ-MOE-2024-2025.md - Full research catalog

Document Version: 1.0.0 Last Updated: 2025-12-19 Author: CODITECT Research Team Status: APPROVED