ADR 012: Mixture of Experts (MoE) Analysis Framework

ADR-012: Mixture of Experts (MoE) Analysis Framework

Document: ADR-012-moe-analysis-framework
Version: 1.0.0
Purpose: Document architectural decisions for multi-agent research analysis with certainty scoring
Audience: Framework contributors, developers, AI agents
Date Created: 2025-12-19
Status: APPROVED
Depends On:
  - ADR-011-uncertainty-quantification-framework
Related ADRs:
  - ADR-010-autonomous-orchestration-system
  - ADR-013-moe-judges-framework
Related Components:
  - commands/moe-analyze.md
  - agents/uncertainty-orchestrator.md
  - skills/uncertainty-quantification/SKILL.md
Research Foundation:
  - docs/09-research-analysis/ACADEMIC-RESEARCH-REFERENCES-UQ-MOE-2024-2025.md

Context and Problem Statement

The Research Quality Problem

Current multi-agent research workflows suffer from:

Implicit Certainty - Findings presented without confidence indicators
Evidence Opacity - Sources not distinguished by reliability or recency
Hidden Gaps - Missing information not explicitly documented
Overconfident Assertions - Claims made without acknowledging uncertainty
No Inference Transparency - Speculative conclusions lack reasoning traces

Research Foundation

This ADR is supported by peer-reviewed research from 2024-2025:

Research	Venue	Contribution	Certainty
Semantic Density	NeurIPS 2024	Best AUROC 26/28 cases	96%
Self-Consistency	ICLR 2022	+17.9% GSM8K improvement	97%
Mixture-of-Agents	arXiv 2024	65.1% AlpacaEval win rate	90%
UoT Framework	NeurIPS 2024	38.1% task completion improvement	93%
Chain-of-Verification	ACL 2024	23% F1 improvement	92%

Full citations: See docs/09-research-analysis/ACADEMIC-RESEARCH-REFERENCES-UQ-MOE-2024-2025.md

Decision Drivers

Factual Accuracy - CODITECT must never report unverified information as fact
Explicit Uncertainty - When certainty is lacking, this must be communicated
Research Validation - Claims must be traceable to evidence sources
Logical Transparency - Inferred conclusions must show reasoning chains
Ensemble Wisdom - Multiple perspectives reduce individual agent bias

Considered Options

Option A: Single-Agent Research with Confidence Prompting

Single agent with uncertainty-aware prompting
Rejected: Single-source bias, no cross-validation, limited perspectives

Option B: Multi-Agent Research without Certainty Scoring

Multiple agents aggregate findings informally
Rejected: No explicit certainty, inconsistent quality, hidden disagreements

Option C: MoE Analysis Framework with Certainty Scoring (Selected)

Multiple specialized agents with structured certainty outputs
Composite certainty scoring from weighted factors
Evidence validation protocol with source classification
Logical inference chains for speculative claims
Selected: Addresses all decision drivers with research backing

Option D: External Fact-Checking Service Integration

Third-party verification API
Rejected: Latency, cost, availability dependencies

Decision

Implement Option C: MoE Analysis Framework with the following architecture:

1. Multi-Agent Analyst Panel

Agent Composition (4-6 agents per analysis):

analysts = [
    {"type": "web-search-researcher", "focus": "External evidence gathering"},
    {"type": "thoughts-analyzer", "focus": "Internal documentation analysis"},
    {"type": "codebase-analyzer", "focus": "Implementation verification"},
    {"type": "qa-reviewer", "focus": "Analysis quality validation"},
    # Optional domain-specific agents
]

Research Basis: Mixture-of-Agents (arXiv 2024) demonstrates ensemble wisdom outperforms single powerful models (65.1% vs 57.5% AlpacaEval).

2. Certainty Scoring System

Composite Formula (from ADR-011):

certainty_score = (
    evidence_support * 0.40 +      # Quality of supporting sources
    source_reliability * 0.25 +    # Credibility of sources
    internal_consistency * 0.20 +  # Agent agreement level
    recency * 0.15                 # Information freshness
)

Research Basis: Semantic Density (NeurIPS 2024) achieves best AUROC in 26/28 test cases using multi-factor confidence analysis.

Certainty Levels:

Score	Level	Required Action
85-100%	HIGH	Report with confidence
60-84%	MEDIUM	Note limitations, provide sources
30-59%	LOW	Explicitly state uncertainty
0-29%	INFERRED	Require logical inference chain

3. Evidence Validation Protocol

Source Classification:

Source Type	Base Reliability	Recency Penalty
Peer-reviewed	95%	-30% if >5 years
Government	90%	-15% if >3 years
Academic institution	85%	-15% if >3 years
Industry leader	80%	-5% if >1 year
Reputable news	70%	-10% if >1 year
Industry blog	60%	-15% if >1 year
Unknown/no source	20%	N/A

Research Basis: RAGAS framework (industry standard) achieves 95% human agreement on faithfulness validation.

Required Evidence Format:

{
  "claim": "Specific statement being made",
  "certainty_factor": 0.85,
  "certainty_basis": "evidence_backed",
  "evidence": [
    {
      "url": "https://example.com/source",
      "title": "Source Title",
      "venue": "Publication Venue",
      "year": 2024,
      "evidence_strength": "strong",
      "summary": "How source supports claim"
    }
  ],
  "missing_information": ["What would increase certainty"]
}

4. Logical Inference Protocol

When evidence is insufficient, generate explicit reasoning:

## Inferred Conclusion: [Statement]

**Inference Type:** Deduction | Induction | Abduction
**Certainty:** [X%] (INFERRED)

### Reasoning Chain
1. **Premise 1:** [Statement]
   - Evidence: [Source or "Assumed based on..."]
   - Certainty: [X%]

2. **Premise 2:** [Statement]
   - Evidence: [Source or "Domain practice"]
   - Certainty: [X%]

3. **Therefore:** [Conclusion]

### Assumptions
- [List assumptions that, if false, invalidate conclusion]

### Falsification Criteria
- [Evidence that would disprove this inference]

Research Basis: Uncertainty of Thoughts (NeurIPS 2024) demonstrates 38.1% improvement through explicit uncertainty modeling.

5. Quality Gates

Condition	Action
Claim without evidence	Mark INFERRED, require reasoning chain
Source >2 years old	Flag recency concern, reduce reliability
Single-source claim	Mark LOW certainty minimum
Contradictory sources	Document conflict, require reconciliation
Agent disagreement >1.5σ	Investigate and document dissent

Research Basis: Chain-of-Verification (ACL 2024) shows 23% F1 improvement with verification requirements.

Architecture

Workflow Phases

┌─────────────────────────────────────────────────────────────┐
│                    Phase 1: Dispatch                         │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐       │
│  │ Analyst  │ │ Analyst  │ │ Analyst  │ │ Analyst  │       │
│  │    1     │ │    2     │ │    3     │ │    4     │       │
│  └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘       │
│       │            │            │            │              │
│       └────────────┴─────┬──────┴────────────┘              │
│                          ▼                                   │
├─────────────────────────────────────────────────────────────┤
│                  Phase 2: Aggregation                        │
│  ┌─────────────────────────────────────────────────────┐    │
│  │ Cross-validate claims, identify contradictions,      │    │
│  │ calculate composite certainty, document gaps         │    │
│  └─────────────────────────────────────────────────────┘    │
│                          │                                   │
│                          ▼                                   │
├─────────────────────────────────────────────────────────────┤
│               Phase 3: Conditional Research                  │
│  IF low_certainty_claims > threshold:                        │
│     Dispatch targeted follow-up research                     │
│  ELSE:                                                       │
│     Proceed to synthesis                                     │
│                          │                                   │
│                          ▼                                   │
├─────────────────────────────────────────────────────────────┤
│                  Phase 4: Synthesis                          │
│  ┌─────────────────────────────────────────────────────┐    │
│  │ Generate findings report with:                        │    │
│  │ - Certainty scores per finding                       │    │
│  │ - Evidence citations                                 │    │
│  │ - Logical inference chains (where needed)            │    │
│  │ - Explicit gaps and recommendations                  │    │
│  └─────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────┘

Output Schema

{
  "analysis_id": "moe-[timestamp]",
  "query": "Original research question",
  "findings": [
    {
      "id": "F1",
      "statement": "Finding statement",
      "certainty": {
        "score": 85,
        "level": "HIGH",
        "factors": {
          "evidence_support": 90,
          "source_reliability": 85,
          "internal_consistency": 80,
          "recency": 75
        }
      },
      "evidence": [...],
      "inference_chain": null
    },
    {
      "id": "F2",
      "statement": "Inferred finding",
      "certainty": {
        "score": 25,
        "level": "INFERRED"
      },
      "evidence": [],
      "inference_chain": {
        "type": "deduction",
        "premises": [...],
        "assumptions": [...],
        "falsification": [...]
      }
    }
  ],
  "gaps": ["List of missing information"],
  "recommendations": ["Suggested follow-up research"],
  "metadata": {
    "agents_used": 4,
    "sources_validated": 12,
    "inference_chains_generated": 2,
    "overall_certainty": 72
  }
}

Consequences

Positive

Reduced Overconfidence - Explicit certainty prevents unwarranted assertions
Evidence Traceability - All claims linked to sources
Transparent Reasoning - Inference chains expose logic
Gap Documentation - Missing information explicitly captured
Quality Improvement - Gaps enable targeted follow-up research

Negative

Increased Token Usage - ~1000-2000 tokens per analysis for certainty metadata
Latency - Multi-agent coordination adds processing time
Complexity - More sophisticated output parsing required
Training - Users must understand certainty levels

Neutral

Shifts from implicit to explicit uncertainty expression
Changes agent prompting requirements

Implementation

Phase 1: Core Components (Week 1-2)

Create moe-analyze command specification
Create uncertainty-orchestrator agent
Create uncertainty-quantification skill
Implement certainty scoring functions
Define evidence validation schema

Phase 2: Integration (Week 3-4)

Integrate with existing analyst agents
Add certainty requirements to agent prompts
Implement quality gate checks
Create output formatting

Phase 3: Validation (Week 5-6)

Test against known-answer datasets
Calibrate certainty thresholds
Gather user feedback
Document edge cases

Validation Criteria

Metric	Target	Measurement
Overconfidence Rate	<10%	Claims with HIGH certainty that are incorrect
Evidence Coverage	>90%	Claims with valid source citations
Gap Documentation	100%	Analyses that document missing information
Inference Transparency	100%	INFERRED claims with reasoning chains
User Certainty Understanding	>80%	Survey comprehension of certainty levels

References

Primary Research (Tier 1: 95%+ Certainty)

Semantic Density - Qiu & Miikkulainen, NeurIPS 2024
- URL: https://neurips.cc/virtual/2024/poster/95598
- Contribution: Multi-factor confidence scoring methodology
Self-Consistency (CoT-SC) - Wang et al., ICLR 2022
- URL: https://arxiv.org/abs/2203.11171
- Contribution: Internal consistency measurement approach
Uncertainty of Thoughts - Hu et al., NeurIPS 2024
- URL: https://arxiv.org/abs/2402.03271
- Contribution: Explicit uncertainty modeling patterns
Chain-of-Verification - Meta AI, ACL 2024
- URL: https://arxiv.org/abs/2309.11495
- Contribution: Evidence validation protocol design

Secondary Research (Tier 2: 85-94% Certainty)

Mixture-of-Agents - Together AI, arXiv 2024
- URL: https://arxiv.org/abs/2406.04692
- Contribution: Multi-agent orchestration pattern
RAGAS Framework - Explodinggradients
- URL: https://docs.ragas.io
- Contribution: Evidence validation metrics

CODITECT Components

commands/moe-analyze.md - Command specification
agents/uncertainty-orchestrator.md - Orchestration agent
skills/uncertainty-quantification/SKILL.md - Certainty calculation patterns
docs/09-research-analysis/ACADEMIC-RESEARCH-REFERENCES-UQ-MOE-2024-2025.md - Full research catalog

Document Version: 1.0.0 Last Updated: 2025-12-19 Author: CODITECT Research Team Status: APPROVED

Context and Problem Statement​

The Research Quality Problem​

Research Foundation​

Decision Drivers​

Considered Options​

Option A: Single-Agent Research with Confidence Prompting​

Option B: Multi-Agent Research without Certainty Scoring​

Option C: MoE Analysis Framework with Certainty Scoring (Selected)​

Option D: External Fact-Checking Service Integration​

Decision​

1. Multi-Agent Analyst Panel​

2. Certainty Scoring System​

3. Evidence Validation Protocol​

4. Logical Inference Protocol​

5. Quality Gates​

Architecture​

Workflow Phases​

Output Schema​

Consequences​

Positive​

Negative​

Neutral​

Implementation​

Phase 1: Core Components (Week 1-2)​

Phase 2: Integration (Week 3-4)​

Phase 3: Validation (Week 5-6)​

Validation Criteria​

References​

Primary Research (Tier 1: 95%+ Certainty)​

Secondary Research (Tier 2: 85-94% Certainty)​

CODITECT Components​