ADR-019: MoE Document Classification System
Status
Accepted - December 27, 2025
Context
CODITECT manages 18,000+ documents across 13 categories requiring accurate classification for:
- Semantic search - Documents must be correctly categorized for retrieval
- Workflow routing - Different document types require different processing
- Compliance - Regulatory documents need proper classification
- AI agent context - Agents need accurate document metadata for task execution
Manual classification is:
- Error-prone (human fatigue, inconsistency)
- Not scalable (18,000+ documents)
- Time-consuming (estimated 100+ hours for full inventory)
Decision
Implement a Mixture of Experts (MoE) Multi-Agent Classification System with:
- 5 Analyst Agents (specialized classification perspectives)
- 3 Judge Agents (validation and quality assurance)
- 1 Orchestrator Agent (coordination and consensus)
Architecture Overview
┌─────────────────────────────────────────────────────────────────┐
│ ORCHESTRATOR AGENT │
│ • Coordinates workflow • Aggregates results │
│ • Manages consensus • Handles escalations │
└─────────────────────────────────────────────────────────────────┘
│
┌─────────────────────┼─────────────────────┐
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ ANALYST │ │ ANALYST │ │ ANALYST │
│ POOL (5) │ │ CONSENSUS │ │ JUDGE │
│ │ │ ENGINE │ │ POOL (3) │
│ • Structural │ │ │ │ │
│ • Content │────▶│ • Vote Tally │────▶│ • Consistency │
│ • Metadata │ │ • Confidence │ │ • Quality │
│ • Semantic │ │ • Threshold │ │ • Domain │
│ • Pattern │ │ │ │ │
└───────────────┘ └───────────────┘ └───────────────┘
│
▼
┌───────────────────┐
│ FINAL DECISION │
│ + Audit Trail │
└───────────────────┘
Agent Specifications
1. Analyst Agents (5)
Each analyst provides an independent classification with confidence score.
1.1 Structural Analyst
Input: File path, extension, size, directory location Analysis:
- Path pattern matching (
/agents/→ agent,/docs/→ documentation) - File extension validation (
.md,.json,.py) - Directory depth and naming conventions
- File size heuristics
Output:
{
"agent": "structural",
"classification": "agent",
"confidence": 0.85,
"reasoning": "Located in /agents/ directory with .md extension"
}
1.2 Content Analyst
Input: Document body (markdown content) Analysis:
- Heading structure (H1, H2, H3 patterns)
- Section names (## Status, ## Decision → ADR)
- Code block presence and language
- List structures and formatting
Output:
{
"agent": "content",
"classification": "adr",
"confidence": 0.92,
"reasoning": "Contains ## Status, ## Context, ## Decision sections typical of ADRs"
}
1.3 Metadata Analyst
Input: YAML frontmatter Analysis:
typefield value (if present)tagsarray analysisstatusandaudiencefields- Custom metadata fields
Output:
{
"agent": "metadata",
"classification": "reference",
"confidence": 0.98,
"reasoning": "Frontmatter type field explicitly set to 'reference'"
}
1.4 Semantic Analyst
Input: Full document content Analysis:
- LLM-based intent classification
- Topic extraction
- Semantic similarity to known document types
- Natural language understanding of purpose
Output:
{
"agent": "semantic",
"classification": "guide",
"confidence": 0.88,
"reasoning": "Document provides step-by-step instructions for user task completion"
}
1.5 Pattern Analyst
Input: Full document + CODITECT conventions Analysis:
- CODITECT naming conventions
- Component patterns (agents have capabilities, commands have invocation)
- Cross-reference patterns
- Template compliance
Output:
{
"agent": "pattern",
"classification": "command",
"confidence": 0.91,
"reasoning": "Contains ## Invocation, ## Arguments per command template"
}
2. Judge Agents (3)
Judges validate analyst consensus and have veto authority.
2.1 Consistency Judge
Role: Ensure classification consistency across related documents Checks:
- Cross-reference validation (linked documents have compatible types)
- Directory consistency (siblings should have related types)
- Historical consistency (similar documents classified same way)
Veto Conditions:
- Classification contradicts related documents
- Breaks established patterns in directory
2.2 Quality Judge
Role: Validate confidence thresholds and agreement quality Checks:
- Minimum confidence threshold (≥0.70 per analyst)
- Agreement threshold (≥3/5 analysts agree)
- Confidence spread (no single outlier driving decision)
Veto Conditions:
- Consensus confidence < 0.85
- Only 2/5 analysts agree
- Single analyst confidence > 0.95 contradicts majority
2.3 Domain Judge
Role: Enforce CODITECT-specific classification rules Checks:
- ADR-018 type taxonomy compliance
- CODITECT component standards
- Organizational hierarchy rules
Veto Conditions:
- Classification uses invalid type
- Document violates CODITECT standards
- Metadata doesn't match classification
3. Orchestrator Agent
Responsibilities:
- Dispatch: Send document to all 5 analysts in parallel
- Collect: Gather all analyst outputs with timeout (30s)
- Aggregate: Calculate weighted consensus
- Escalate: Invoke judges if confidence < 0.90
- Decide: Make final classification decision
- Audit: Log complete decision trail
Consensus Algorithm
Step 1: Collect Votes
votes = {
"structural": {"classification": "agent", "confidence": 0.85},
"content": {"classification": "agent", "confidence": 0.92},
"metadata": {"classification": "agent", "confidence": 0.98},
"semantic": {"classification": "command", "confidence": 0.75},
"pattern": {"classification": "agent", "confidence": 0.91}
}
Step 2: Calculate Agreement
vote_counts = Counter(v["classification"] for v in votes.values())
# {"agent": 4, "command": 1}
majority_type = vote_counts.most_common(1)[0][0] # "agent"
agreement_ratio = vote_counts[majority_type] / len(votes) # 4/5 = 0.80
Step 3: Calculate Weighted Confidence
# Only count votes for majority classification
majority_votes = [v for v in votes.values() if v["classification"] == majority_type]
weighted_confidence = sum(v["confidence"] for v in majority_votes) / len(majority_votes)
# (0.85 + 0.92 + 0.98 + 0.91) / 4 = 0.915
Step 4: Apply Thresholds
AGREEMENT_THRESHOLD = 0.60 # At least 3/5 analysts agree
CONFIDENCE_THRESHOLD = 0.85 # Average confidence of agreeing analysts
ESCALATION_THRESHOLD = 0.90 # Below this, invoke judges
if agreement_ratio >= AGREEMENT_THRESHOLD:
if weighted_confidence >= ESCALATION_THRESHOLD:
return FinalDecision(majority_type, weighted_confidence, "AUTO_APPROVED")
elif weighted_confidence >= CONFIDENCE_THRESHOLD:
return invoke_judges(votes, majority_type)
else:
return EscalateToHuman(votes, "LOW_CONFIDENCE")
else:
return EscalateToHuman(votes, "NO_CONSENSUS")
Step 5: Judge Validation (if invoked)
def invoke_judges(votes, proposed_classification):
vetos = []
consistency_result = consistency_judge.validate(document, proposed_classification)
if consistency_result.veto:
vetos.append(("consistency", consistency_result.reason))
quality_result = quality_judge.validate(votes, proposed_classification)
if quality_result.veto:
vetos.append(("quality", quality_result.reason))
domain_result = domain_judge.validate(document, proposed_classification)
if domain_result.veto:
vetos.append(("domain", domain_result.reason))
if vetos:
return EscalateToHuman(votes, vetos)
else:
return FinalDecision(proposed_classification, weighted_confidence, "JUDGE_APPROVED")
Confidence Thresholds
| Threshold | Value | Purpose |
|---|---|---|
| Per-Analyst Minimum | 0.70 | Reject low-confidence individual votes |
| Agreement Ratio | 0.60 | Require 3/5 analysts to agree |
| Auto-Approval | 0.90 | Skip judges if consensus is strong |
| Judge Approval | 0.85 | Accept with judge validation |
| Escalation | < 0.85 | Require human review |
Escalation Rules
Automatic Human Escalation
- No Consensus: < 3/5 analysts agree on classification
- Low Confidence: Weighted confidence < 0.85
- Judge Veto: Any judge vetoes the classification
- Timeout: Analyst response timeout (> 30s)
- Error: Any agent throws exception
Escalation Output
{
"document": "/path/to/document.md",
"proposed_classification": "agent",
"votes": {...},
"confidence": 0.82,
"escalation_reason": "LOW_CONFIDENCE",
"vetos": [],
"recommended_action": "HUMAN_REVIEW"
}
Audit Trail
Every classification decision is logged:
{
"timestamp": "2025-12-27T10:30:00Z",
"document": "/path/to/document.md",
"document_hash": "sha256:abc123...",
"analysts": {
"structural": {
"classification": "agent", "confidence": 0.85, "duration_ms": 120
},
"content": {
"classification": "agent", "confidence": 0.92, "duration_ms": 450
},
"metadata": {
"classification": "agent", "confidence": 0.98, "duration_ms": 80
},
"semantic": {
"classification": "command", "confidence": 0.75, "duration_ms": 2300
},
"pattern": {
"classification": "agent", "confidence": 0.91, "duration_ms": 380
}
},
"consensus": {
"agreement_ratio": 0.80,
"weighted_confidence": 0.915,
"majority_classification": "agent"
},
"judges_invoked": false,
"final_decision": {
"classification": "agent",
"confidence": 0.915,
"approval_type": "AUTO_APPROVED"
},
"processing_time_ms": 2850
}
Success Metrics
| Metric | Target | Measurement |
|---|---|---|
| Accuracy | ≥ 99.9% | Compared to human-labeled ground truth |
| Precision | ≥ 99.5% | Per document type |
| Recall | ≥ 99.5% | Per document type |
| Processing Time | < 5s | Per document average |
| Escalation Rate | < 5% | Documents requiring human review |
| Judge Veto Rate | < 2% | Classifications rejected by judges |
Implementation Phases
Phase 2.1: Architecture (Week 1)
- Create ADR-019 (this document)
- Create C4 diagrams
- Design consensus algorithm pseudocode
Phase 2.2: Analyst Agents (Week 2)
- Implement Structural Analyst
- Implement Content Analyst
- Implement Metadata Analyst
- Implement Semantic Analyst
- Implement Pattern Analyst
Phase 2.3: Judge Agents (Week 3)
- Implement Consistency Judge
- Implement Quality Judge
- Implement Domain Judge
Phase 2.4: Orchestration (Week 4)
- Implement Orchestrator
- Implement consensus engine
- Implement audit trail system
Phase 2.5: Validation (Week 5)
- Pilot classification (100 docs)
- Full classification (18,000+ docs)
- Accuracy validation
Consequences
Positive
- Accuracy: Multi-agent consensus reduces single-point-of-failure errors
- Explainability: Full audit trail for every decision
- Scalability: Parallel analyst execution handles large document sets
- Quality Assurance: Judge layer catches edge cases
- Automation: Reduces manual classification effort by 95%+
Negative
- Complexity: Multi-agent system requires careful orchestration
- Cost: Semantic Analyst requires LLM API calls
- Latency: Full pipeline takes 2-5 seconds per document
Mitigations
- Complexity: Well-defined interfaces and comprehensive testing
- Cost: Batch processing and caching for similar documents
- Latency: Parallel analyst execution, async processing for bulk operations
References
Author: AI Specialist Agent Reviewed By: Pending stakeholder review Approved By: Pending