ADR-019: MoE Document Classification System

Status

Accepted - December 27, 2025

Context

CODITECT manages 18,000+ documents across 13 categories requiring accurate classification for:

Semantic search - Documents must be correctly categorized for retrieval
Workflow routing - Different document types require different processing
Compliance - Regulatory documents need proper classification
AI agent context - Agents need accurate document metadata for task execution

Manual classification is:

Error-prone (human fatigue, inconsistency)
Not scalable (18,000+ documents)
Time-consuming (estimated 100+ hours for full inventory)

Decision

Implement a Mixture of Experts (MoE) Multi-Agent Classification System with:

5 Analyst Agents (specialized classification perspectives)
3 Judge Agents (validation and quality assurance)
1 Orchestrator Agent (coordination and consensus)

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                     ORCHESTRATOR AGENT                          │
│  • Coordinates workflow    • Aggregates results                 │
│  • Manages consensus       • Handles escalations                │
└─────────────────────────────────────────────────────────────────┘
                              │
        ┌─────────────────────┼─────────────────────┐
        ▼                     ▼                     ▼
┌───────────────┐     ┌───────────────┐     ┌───────────────┐
│   ANALYST     │     │   ANALYST     │     │   ANALYST     │
│   POOL (5)    │     │   CONSENSUS   │     │   JUDGE       │
│               │     │   ENGINE      │     │   POOL (3)    │
│ • Structural  │     │               │     │               │
│ • Content     │────▶│ • Vote Tally  │────▶│ • Consistency │
│ • Metadata    │     │ • Confidence  │     │ • Quality     │
│ • Semantic    │     │ • Threshold   │     │ • Domain      │
│ • Pattern     │     │               │     │               │
└───────────────┘     └───────────────┘     └───────────────┘
                              │
                              ▼
                    ┌───────────────────┐
                    │  FINAL DECISION   │
                    │  + Audit Trail    │
                    └───────────────────┘

Agent Specifications

1. Analyst Agents (5)

Each analyst provides an independent classification with confidence score.

1.1 Structural Analyst

Input: File path, extension, size, directory location Analysis:

Path pattern matching (/agents/ → agent, /docs/ → documentation)
File extension validation (.md, .json, .py)
Directory depth and naming conventions
File size heuristics

Output:

{
  "agent": "structural",
  "classification": "agent",
  "confidence": 0.85,
  "reasoning": "Located in /agents/ directory with .md extension"
}

1.2 Content Analyst

Input: Document body (markdown content) Analysis:

Heading structure (H1, H2, H3 patterns)
Section names (## Status, ## Decision → ADR)
Code block presence and language
List structures and formatting

Output:

{
  "agent": "content",
  "classification": "adr",
  "confidence": 0.92,
  "reasoning": "Contains ## Status, ## Context, ## Decision sections typical of ADRs"
}

1.3 Metadata Analyst

Input: YAML frontmatter Analysis:

type field value (if present)
tags array analysis
status and audience fields
Custom metadata fields

Output:

{
  "agent": "metadata",
  "classification": "reference",
  "confidence": 0.98,
  "reasoning": "Frontmatter type field explicitly set to 'reference'"
}

1.4 Semantic Analyst

Input: Full document content Analysis:

LLM-based intent classification
Topic extraction
Semantic similarity to known document types
Natural language understanding of purpose

Output:

{
  "agent": "semantic",
  "classification": "guide",
  "confidence": 0.88,
  "reasoning": "Document provides step-by-step instructions for user task completion"
}

1.5 Pattern Analyst

Input: Full document + CODITECT conventions Analysis:

CODITECT naming conventions
Component patterns (agents have capabilities, commands have invocation)
Cross-reference patterns
Template compliance

Output:

{
  "agent": "pattern",
  "classification": "command",
  "confidence": 0.91,
  "reasoning": "Contains ## Invocation, ## Arguments per command template"
}

2. Judge Agents (3)

Judges validate analyst consensus and have veto authority.

2.1 Consistency Judge

Role: Ensure classification consistency across related documents Checks:

Cross-reference validation (linked documents have compatible types)
Directory consistency (siblings should have related types)
Historical consistency (similar documents classified same way)

Veto Conditions:

Classification contradicts related documents
Breaks established patterns in directory

2.2 Quality Judge

Role: Validate confidence thresholds and agreement quality Checks:

Minimum confidence threshold (≥0.70 per analyst)
Agreement threshold (≥3/5 analysts agree)
Confidence spread (no single outlier driving decision)

Veto Conditions:

Consensus confidence < 0.85
Only 2/5 analysts agree
Single analyst confidence > 0.95 contradicts majority

2.3 Domain Judge

Role: Enforce CODITECT-specific classification rules Checks:

ADR-018 type taxonomy compliance
CODITECT component standards
Organizational hierarchy rules

Veto Conditions:

Classification uses invalid type
Document violates CODITECT standards
Metadata doesn't match classification

3. Orchestrator Agent

Responsibilities:

Dispatch: Send document to all 5 analysts in parallel
Collect: Gather all analyst outputs with timeout (30s)
Aggregate: Calculate weighted consensus
Escalate: Invoke judges if confidence < 0.90
Decide: Make final classification decision
Audit: Log complete decision trail

Consensus Algorithm

Step 1: Collect Votes

votes = {
    "structural": {"classification": "agent", "confidence": 0.85},
    "content": {"classification": "agent", "confidence": 0.92},
    "metadata": {"classification": "agent", "confidence": 0.98},
    "semantic": {"classification": "command", "confidence": 0.75},
    "pattern": {"classification": "agent", "confidence": 0.91}
}

Step 2: Calculate Agreement

vote_counts = Counter(v["classification"] for v in votes.values())
# {"agent": 4, "command": 1}

majority_type = vote_counts.most_common(1)[0][0]  # "agent"
agreement_ratio = vote_counts[majority_type] / len(votes)  # 4/5 = 0.80

Step 3: Calculate Weighted Confidence

# Only count votes for majority classification
majority_votes = [v for v in votes.values() if v["classification"] == majority_type]
weighted_confidence = sum(v["confidence"] for v in majority_votes) / len(majority_votes)
# (0.85 + 0.92 + 0.98 + 0.91) / 4 = 0.915

Step 4: Apply Thresholds

AGREEMENT_THRESHOLD = 0.60      # At least 3/5 analysts agree
CONFIDENCE_THRESHOLD = 0.85     # Average confidence of agreeing analysts
ESCALATION_THRESHOLD = 0.90     # Below this, invoke judges

if agreement_ratio >= AGREEMENT_THRESHOLD:
    if weighted_confidence >= ESCALATION_THRESHOLD:
        return FinalDecision(majority_type, weighted_confidence, "AUTO_APPROVED")
    elif weighted_confidence >= CONFIDENCE_THRESHOLD:
        return invoke_judges(votes, majority_type)
    else:
        return EscalateToHuman(votes, "LOW_CONFIDENCE")
else:
    return EscalateToHuman(votes, "NO_CONSENSUS")

Step 5: Judge Validation (if invoked)

def invoke_judges(votes, proposed_classification):
    vetos = []

    consistency_result = consistency_judge.validate(document, proposed_classification)
    if consistency_result.veto:
        vetos.append(("consistency", consistency_result.reason))

    quality_result = quality_judge.validate(votes, proposed_classification)
    if quality_result.veto:
        vetos.append(("quality", quality_result.reason))

    domain_result = domain_judge.validate(document, proposed_classification)
    if domain_result.veto:
        vetos.append(("domain", domain_result.reason))

    if vetos:
        return EscalateToHuman(votes, vetos)
    else:
        return FinalDecision(proposed_classification, weighted_confidence, "JUDGE_APPROVED")

Confidence Thresholds

Threshold	Value	Purpose
Per-Analyst Minimum	0.70	Reject low-confidence individual votes
Agreement Ratio	0.60	Require 3/5 analysts to agree
Auto-Approval	0.90	Skip judges if consensus is strong
Judge Approval	0.85	Accept with judge validation
Escalation	< 0.85	Require human review

Escalation Rules

Automatic Human Escalation

No Consensus: < 3/5 analysts agree on classification
Low Confidence: Weighted confidence < 0.85
Judge Veto: Any judge vetoes the classification
Timeout: Analyst response timeout (> 30s)
Error: Any agent throws exception

Escalation Output

{
  "document": "/path/to/document.md",
  "proposed_classification": "agent",
  "votes": {...},
  "confidence": 0.82,
  "escalation_reason": "LOW_CONFIDENCE",
  "vetos": [],
  "recommended_action": "HUMAN_REVIEW"
}

Audit Trail

Every classification decision is logged:

{
  "timestamp": "2025-12-27T10:30:00Z",
  "document": "/path/to/document.md",
  "document_hash": "sha256:abc123...",
  "analysts": {
    "structural": {
      "classification": "agent", "confidence": 0.85, "duration_ms": 120
    },
    "content": {
      "classification": "agent", "confidence": 0.92, "duration_ms": 450
    },
    "metadata": {
      "classification": "agent", "confidence": 0.98, "duration_ms": 80
    },
    "semantic": {
      "classification": "command", "confidence": 0.75, "duration_ms": 2300
    },
    "pattern": {
      "classification": "agent", "confidence": 0.91, "duration_ms": 380
    }
  },
  "consensus": {
    "agreement_ratio": 0.80,
    "weighted_confidence": 0.915,
    "majority_classification": "agent"
  },
  "judges_invoked": false,
  "final_decision": {
    "classification": "agent",
    "confidence": 0.915,
    "approval_type": "AUTO_APPROVED"
  },
  "processing_time_ms": 2850
}

Success Metrics

Metric	Target	Measurement
Accuracy	≥ 99.9%	Compared to human-labeled ground truth
Precision	≥ 99.5%	Per document type
Recall	≥ 99.5%	Per document type
Processing Time	< 5s	Per document average
Escalation Rate	< 5%	Documents requiring human review
Judge Veto Rate	< 2%	Classifications rejected by judges

Implementation Phases

Phase 2.1: Architecture (Week 1)

Create ADR-019 (this document)
Create C4 diagrams
Design consensus algorithm pseudocode

Phase 2.2: Analyst Agents (Week 2)

Phase 2.3: Judge Agents (Week 3)

Implement Consistency Judge
Implement Quality Judge
Implement Domain Judge

Phase 2.4: Orchestration (Week 4)

Implement Orchestrator
Implement consensus engine
Implement audit trail system

Phase 2.5: Validation (Week 5)

Pilot classification (100 docs)
Full classification (18,000+ docs)
Accuracy validation

Consequences

Positive

Accuracy: Multi-agent consensus reduces single-point-of-failure errors
Explainability: Full audit trail for every decision
Scalability: Parallel analyst execution handles large document sets
Quality Assurance: Judge layer catches edge cases
Automation: Reduces manual classification effort by 95%+

Negative

Complexity: Multi-agent system requires careful orchestration
Cost: Semantic Analyst requires LLM API calls
Latency: Full pipeline takes 2-5 seconds per document

Mitigations

Complexity: Well-defined interfaces and comprehensive testing
Cost: Batch processing and caching for similar documents
Latency: Parallel analyst execution, async processing for bulk operations

References

Author: AI Specialist Agent Reviewed By: Pending stakeholder review Approved By: Pending

Status​

Context​

Decision​

Architecture Overview​

Agent Specifications​

1. Analyst Agents (5)​

1.1 Structural Analyst​

1.2 Content Analyst​

1.3 Metadata Analyst​

1.4 Semantic Analyst​

1.5 Pattern Analyst​

2. Judge Agents (3)​

2.1 Consistency Judge​

2.2 Quality Judge​

2.3 Domain Judge​

3. Orchestrator Agent​

Consensus Algorithm​

Step 1: Collect Votes​

Step 2: Calculate Agreement​

Step 3: Calculate Weighted Confidence​

Step 4: Apply Thresholds​

Step 5: Judge Validation (if invoked)​

Confidence Thresholds​

Escalation Rules​

Automatic Human Escalation​

Escalation Output​

Audit Trail​

Success Metrics​

Implementation Phases​

Phase 2.1: Architecture (Week 1)​

Phase 2.2: Analyst Agents (Week 2)​

Phase 2.3: Judge Agents (Week 3)​

Phase 2.4: Orchestration (Week 4)​

Phase 2.5: Validation (Week 5)​

Consequences​

Positive​

Negative​

Mitigations​

References​

Status

Context

Decision

Architecture Overview

Agent Specifications

1. Analyst Agents (5)

1.1 Structural Analyst

1.2 Content Analyst

1.3 Metadata Analyst

1.4 Semantic Analyst

1.5 Pattern Analyst

2. Judge Agents (3)

2.1 Consistency Judge

2.2 Quality Judge

2.3 Domain Judge

3. Orchestrator Agent

Consensus Algorithm

Step 1: Collect Votes

Step 2: Calculate Agreement

Step 3: Calculate Weighted Confidence

Step 4: Apply Thresholds

Step 5: Judge Validation (if invoked)

Confidence Thresholds

Escalation Rules

Automatic Human Escalation

Escalation Output

Audit Trail

Success Metrics

Implementation Phases

Phase 2.1: Architecture (Week 1)

Phase 2.2: Analyst Agents (Week 2)

Phase 2.3: Judge Agents (Week 3)

Phase 2.4: Orchestration (Week 4)

Phase 2.5: Validation (Week 5)

Consequences

Positive

Negative

Mitigations

References