Skip to main content

ADR-019: MoE Document Classification System

Status

Accepted - December 27, 2025

Context

CODITECT manages 18,000+ documents across 13 categories requiring accurate classification for:

  1. Semantic search - Documents must be correctly categorized for retrieval
  2. Workflow routing - Different document types require different processing
  3. Compliance - Regulatory documents need proper classification
  4. AI agent context - Agents need accurate document metadata for task execution

Manual classification is:

  • Error-prone (human fatigue, inconsistency)
  • Not scalable (18,000+ documents)
  • Time-consuming (estimated 100+ hours for full inventory)

Decision

Implement a Mixture of Experts (MoE) Multi-Agent Classification System with:

  • 5 Analyst Agents (specialized classification perspectives)
  • 3 Judge Agents (validation and quality assurance)
  • 1 Orchestrator Agent (coordination and consensus)

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│ ORCHESTRATOR AGENT │
│ • Coordinates workflow • Aggregates results │
│ • Manages consensus • Handles escalations │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────┼─────────────────────┐
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ ANALYST │ │ ANALYST │ │ ANALYST │
│ POOL (5) │ │ CONSENSUS │ │ JUDGE │
│ │ │ ENGINE │ │ POOL (3) │
│ • Structural │ │ │ │ │
│ • Content │────▶│ • Vote Tally │────▶│ • Consistency │
│ • Metadata │ │ • Confidence │ │ • Quality │
│ • Semantic │ │ • Threshold │ │ • Domain │
│ • Pattern │ │ │ │ │
└───────────────┘ └───────────────┘ └───────────────┘


┌───────────────────┐
│ FINAL DECISION │
│ + Audit Trail │
└───────────────────┘

Agent Specifications

1. Analyst Agents (5)

Each analyst provides an independent classification with confidence score.

1.1 Structural Analyst

Input: File path, extension, size, directory location Analysis:

  • Path pattern matching (/agents/ → agent, /docs/ → documentation)
  • File extension validation (.md, .json, .py)
  • Directory depth and naming conventions
  • File size heuristics

Output:

{
"agent": "structural",
"classification": "agent",
"confidence": 0.85,
"reasoning": "Located in /agents/ directory with .md extension"
}

1.2 Content Analyst

Input: Document body (markdown content) Analysis:

  • Heading structure (H1, H2, H3 patterns)
  • Section names (## Status, ## Decision → ADR)
  • Code block presence and language
  • List structures and formatting

Output:

{
"agent": "content",
"classification": "adr",
"confidence": 0.92,
"reasoning": "Contains ## Status, ## Context, ## Decision sections typical of ADRs"
}

1.3 Metadata Analyst

Input: YAML frontmatter Analysis:

  • type field value (if present)
  • tags array analysis
  • status and audience fields
  • Custom metadata fields

Output:

{
"agent": "metadata",
"classification": "reference",
"confidence": 0.98,
"reasoning": "Frontmatter type field explicitly set to 'reference'"
}

1.4 Semantic Analyst

Input: Full document content Analysis:

  • LLM-based intent classification
  • Topic extraction
  • Semantic similarity to known document types
  • Natural language understanding of purpose

Output:

{
"agent": "semantic",
"classification": "guide",
"confidence": 0.88,
"reasoning": "Document provides step-by-step instructions for user task completion"
}

1.5 Pattern Analyst

Input: Full document + CODITECT conventions Analysis:

  • CODITECT naming conventions
  • Component patterns (agents have capabilities, commands have invocation)
  • Cross-reference patterns
  • Template compliance

Output:

{
"agent": "pattern",
"classification": "command",
"confidence": 0.91,
"reasoning": "Contains ## Invocation, ## Arguments per command template"
}

2. Judge Agents (3)

Judges validate analyst consensus and have veto authority.

2.1 Consistency Judge

Role: Ensure classification consistency across related documents Checks:

  • Cross-reference validation (linked documents have compatible types)
  • Directory consistency (siblings should have related types)
  • Historical consistency (similar documents classified same way)

Veto Conditions:

  • Classification contradicts related documents
  • Breaks established patterns in directory

2.2 Quality Judge

Role: Validate confidence thresholds and agreement quality Checks:

  • Minimum confidence threshold (≥0.70 per analyst)
  • Agreement threshold (≥3/5 analysts agree)
  • Confidence spread (no single outlier driving decision)

Veto Conditions:

  • Consensus confidence < 0.85
  • Only 2/5 analysts agree
  • Single analyst confidence > 0.95 contradicts majority

2.3 Domain Judge

Role: Enforce CODITECT-specific classification rules Checks:

  • ADR-018 type taxonomy compliance
  • CODITECT component standards
  • Organizational hierarchy rules

Veto Conditions:

  • Classification uses invalid type
  • Document violates CODITECT standards
  • Metadata doesn't match classification

3. Orchestrator Agent

Responsibilities:

  1. Dispatch: Send document to all 5 analysts in parallel
  2. Collect: Gather all analyst outputs with timeout (30s)
  3. Aggregate: Calculate weighted consensus
  4. Escalate: Invoke judges if confidence < 0.90
  5. Decide: Make final classification decision
  6. Audit: Log complete decision trail

Consensus Algorithm

Step 1: Collect Votes

votes = {
"structural": {"classification": "agent", "confidence": 0.85},
"content": {"classification": "agent", "confidence": 0.92},
"metadata": {"classification": "agent", "confidence": 0.98},
"semantic": {"classification": "command", "confidence": 0.75},
"pattern": {"classification": "agent", "confidence": 0.91}
}

Step 2: Calculate Agreement

vote_counts = Counter(v["classification"] for v in votes.values())
# {"agent": 4, "command": 1}

majority_type = vote_counts.most_common(1)[0][0] # "agent"
agreement_ratio = vote_counts[majority_type] / len(votes) # 4/5 = 0.80

Step 3: Calculate Weighted Confidence

# Only count votes for majority classification
majority_votes = [v for v in votes.values() if v["classification"] == majority_type]
weighted_confidence = sum(v["confidence"] for v in majority_votes) / len(majority_votes)
# (0.85 + 0.92 + 0.98 + 0.91) / 4 = 0.915

Step 4: Apply Thresholds

AGREEMENT_THRESHOLD = 0.60      # At least 3/5 analysts agree
CONFIDENCE_THRESHOLD = 0.85 # Average confidence of agreeing analysts
ESCALATION_THRESHOLD = 0.90 # Below this, invoke judges

if agreement_ratio >= AGREEMENT_THRESHOLD:
if weighted_confidence >= ESCALATION_THRESHOLD:
return FinalDecision(majority_type, weighted_confidence, "AUTO_APPROVED")
elif weighted_confidence >= CONFIDENCE_THRESHOLD:
return invoke_judges(votes, majority_type)
else:
return EscalateToHuman(votes, "LOW_CONFIDENCE")
else:
return EscalateToHuman(votes, "NO_CONSENSUS")

Step 5: Judge Validation (if invoked)

def invoke_judges(votes, proposed_classification):
vetos = []

consistency_result = consistency_judge.validate(document, proposed_classification)
if consistency_result.veto:
vetos.append(("consistency", consistency_result.reason))

quality_result = quality_judge.validate(votes, proposed_classification)
if quality_result.veto:
vetos.append(("quality", quality_result.reason))

domain_result = domain_judge.validate(document, proposed_classification)
if domain_result.veto:
vetos.append(("domain", domain_result.reason))

if vetos:
return EscalateToHuman(votes, vetos)
else:
return FinalDecision(proposed_classification, weighted_confidence, "JUDGE_APPROVED")

Confidence Thresholds

ThresholdValuePurpose
Per-Analyst Minimum0.70Reject low-confidence individual votes
Agreement Ratio0.60Require 3/5 analysts to agree
Auto-Approval0.90Skip judges if consensus is strong
Judge Approval0.85Accept with judge validation
Escalation< 0.85Require human review

Escalation Rules

Automatic Human Escalation

  1. No Consensus: < 3/5 analysts agree on classification
  2. Low Confidence: Weighted confidence < 0.85
  3. Judge Veto: Any judge vetoes the classification
  4. Timeout: Analyst response timeout (> 30s)
  5. Error: Any agent throws exception

Escalation Output

{
"document": "/path/to/document.md",
"proposed_classification": "agent",
"votes": {...},
"confidence": 0.82,
"escalation_reason": "LOW_CONFIDENCE",
"vetos": [],
"recommended_action": "HUMAN_REVIEW"
}

Audit Trail

Every classification decision is logged:

{
"timestamp": "2025-12-27T10:30:00Z",
"document": "/path/to/document.md",
"document_hash": "sha256:abc123...",
"analysts": {
"structural": {
"classification": "agent", "confidence": 0.85, "duration_ms": 120
},
"content": {
"classification": "agent", "confidence": 0.92, "duration_ms": 450
},
"metadata": {
"classification": "agent", "confidence": 0.98, "duration_ms": 80
},
"semantic": {
"classification": "command", "confidence": 0.75, "duration_ms": 2300
},
"pattern": {
"classification": "agent", "confidence": 0.91, "duration_ms": 380
}
},
"consensus": {
"agreement_ratio": 0.80,
"weighted_confidence": 0.915,
"majority_classification": "agent"
},
"judges_invoked": false,
"final_decision": {
"classification": "agent",
"confidence": 0.915,
"approval_type": "AUTO_APPROVED"
},
"processing_time_ms": 2850
}

Success Metrics

MetricTargetMeasurement
Accuracy≥ 99.9%Compared to human-labeled ground truth
Precision≥ 99.5%Per document type
Recall≥ 99.5%Per document type
Processing Time< 5sPer document average
Escalation Rate< 5%Documents requiring human review
Judge Veto Rate< 2%Classifications rejected by judges

Implementation Phases

Phase 2.1: Architecture (Week 1)

  • Create ADR-019 (this document)
  • Create C4 diagrams
  • Design consensus algorithm pseudocode

Phase 2.2: Analyst Agents (Week 2)

  • Implement Structural Analyst
  • Implement Content Analyst
  • Implement Metadata Analyst
  • Implement Semantic Analyst
  • Implement Pattern Analyst

Phase 2.3: Judge Agents (Week 3)

  • Implement Consistency Judge
  • Implement Quality Judge
  • Implement Domain Judge

Phase 2.4: Orchestration (Week 4)

  • Implement Orchestrator
  • Implement consensus engine
  • Implement audit trail system

Phase 2.5: Validation (Week 5)

  • Pilot classification (100 docs)
  • Full classification (18,000+ docs)
  • Accuracy validation

Consequences

Positive

  • Accuracy: Multi-agent consensus reduces single-point-of-failure errors
  • Explainability: Full audit trail for every decision
  • Scalability: Parallel analyst execution handles large document sets
  • Quality Assurance: Judge layer catches edge cases
  • Automation: Reduces manual classification effort by 95%+

Negative

  • Complexity: Multi-agent system requires careful orchestration
  • Cost: Semantic Analyst requires LLM API calls
  • Latency: Full pipeline takes 2-5 seconds per document

Mitigations

  • Complexity: Well-defined interfaces and comprehensive testing
  • Cost: Batch processing and caching for similar documents
  • Latency: Parallel analyst execution, async processing for bulk operations

References


Author: AI Specialist Agent Reviewed By: Pending stakeholder review Approved By: Pending