ADR-026: Intent Classification System for LLM Inference Management
Status: Accepted Date: 2026-01-02 Deciders: Hal Casteel (Founder/CEO/CTO), CODITECT Core Team Technical Story: Enable accurate task routing and disambiguation through systematic intent classification
Context and Problem Statement
The Inference Ambiguity Problem
LLM-based systems face significant challenges when user requests are ambiguous:
- Vague Requests - "Help with the backend" could mean 50+ different actions
- Multiple Valid Interpretations - "Improve the API" has many valid solutions
- Scope Uncertainty - Unclear what's in/out of scope for a request
- Agent Routing - Wrong agent selection leads to suboptimal outputs
Impact: Without intent classification, systems either:
- Guess incorrectly (hallucination risk)
- Ask too many questions (poor UX)
- Produce out-of-scope work (wasted effort)
Related Work
This ADR builds on:
- ADR-007 (Uncertainty Quantification) - Measuring confidence levels
- ADR-008 (MoE Analysis Framework) - Multi-expert routing patterns
- ADR-009 (MoE Judges Framework) - Validation and disambiguation
Academic Research Foundation
Recent research (2024-2025) informs this architecture:
| Paper | Key Insight | Application |
|---|---|---|
| Resolving Ambiguity Through Interaction (NAACL 2025) | Three-stage pipeline: identify ambiguous inputs, generate clarification questions, predict output with clarification | Classification pipeline stages |
| Ambiguity in LLMs is a Concept Missing Problem (arXiv 2025) | Sample multiple outputs to identify ambiguity via consistency analysis | Confidence calculation via sampling |
| Aligning Language Models to Handle Ambiguity (arXiv 2024) | Tree-of-Clarification (ToC) refines ambiguity within inputs | Structured clarification generation |
| Bridging Gap Between LLMs and Human Intentions (arXiv 2025) | LLMs struggle with ambiguous inputs, defaulting to training data preferences | Context enrichment to override defaults |
| Symbolic MoE: Adaptive Skill-based Routing (arXiv 2025) | Infer discrete skills needed to solve a problem for expert selection | Domain and action_type classification |
Decision Drivers
- Accuracy - Correctly interpret user intent >90% of the time
- Efficiency - Minimize clarification questions while maintaining accuracy
- Traceability - Document why specific interpretations were chosen
- Integration - Work with existing MoE router and agent framework
Considered Options
Option 1: Rule-Based Classification (Rejected)
- Keyword matching and regex patterns
- Pro: Fast, deterministic
- Con: Brittle, doesn't handle novel requests
Option 2: Embedding-Based Similarity (Partial)
- Compare request embeddings to known intent clusters
- Pro: Handles paraphrasing well
- Con: Requires training data, doesn't use context
Option 3: Context-Aware Multi-Signal Classification (Chosen)
- Combine lexical, contextual, and historical signals
- Use project plan and decisions as context
- Pro: Accurate, context-aware, traceable
- Con: More complex, requires context infrastructure
Decision
Implement Context-Aware Multi-Signal Intent Classification with the following architecture:
High-Level Architecture (Mermaid)
Clarification Flow (Mermaid)
Classification Dimensions
intent_dimensions:
domain:
values: [backend, frontend, devops, security, documentation, architecture]
weight: 0.25
source: "Keyword analysis + file patterns"
action_type:
values: [implement, fix, review, design, test, deploy, document]
weight: 0.25
source: "Verb extraction + task history"
scope:
values: [specific_file, component, module, system, cross_cutting]
weight: 0.20
source: "Entity extraction + plan alignment"
complexity:
values: [trivial, simple, moderate, complex, architectural]
weight: 0.15
source: "Estimated effort + dependency analysis"
urgency:
values: [blocking, high, normal, low, backlog]
weight: 0.15
source: "Explicit markers + context inference"
Classification Pipeline
┌─────────────────────────────────────────────────────────────────────────────┐
│ INTENT CLASSIFICATION PIPELINE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ USER REQUEST │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ STAGE 1: LEXICAL ANALYSIS │ │
│ │ • Extract verbs (implement, fix, add, remove, update) │ │
│ │ • Extract entities (files, components, endpoints) │ │
│ │ • Detect vague terms (improve, help, fix, handle) │ │
│ │ • Output: lexical_features │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ STAGE 2: CONTEXT ENRICHMENT │ │
│ │ • Load active plan (PILOT-PARALLEL-EXECUTION-PLAN.md) │ │
│ │ • Load current task and track │ │
│ │ • Query /cxq for relevant decisions │ │
│ │ • Output: context_features │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ STAGE 3: MULTI-SIGNAL CLASSIFICATION │ │
│ │ • Classify each dimension independently │ │
│ │ • Calculate confidence per dimension │ │
│ │ • Aggregate weighted confidence │ │
│ │ • Output: classification_result │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ STAGE 4: CONFIDENCE CHECK │ │
│ │ │ │
│ │ if confidence >= 0.85: │ │
│ │ → Route to agent (ADR-008 MoE Router) │ │
│ │ │ │
│ │ if confidence 0.50-0.85: │ │
│ │ → Generate clarifying question │ │
│ │ → Present top interpretations with tradeoffs │ │
│ │ │ │
│ │ if confidence < 0.50: │ │
│ │ → Request more context from user │ │
│ │ → Suggest breaking down the request │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Integration with MoE Router
def classify_and_route(request: str, context: Context) -> RoutingDecision:
"""Intent classification integrated with MoE routing"""
# Stage 1: Classify intent
classification = intent_classifier.classify(request, context)
# Stage 2: Check confidence threshold
if classification.confidence < CLARIFICATION_THRESHOLD:
return RoutingDecision(
action="clarify",
questions=generate_clarification_questions(classification),
alternatives=classification.top_interpretations[:3]
)
# Stage 3: Select agent based on intent (ADR-008)
agent = moe_router.select_agent(
domain=classification.domain,
action_type=classification.action_type,
complexity=classification.complexity
)
# Stage 4: Apply guardrails (ADR-027)
guardrail_check = guardrail_engine.validate(
intent=classification,
agent=agent,
context=context
)
if not guardrail_check.passed:
return RoutingDecision(
action="blocked",
reason=guardrail_check.reason,
remediation=guardrail_check.suggested_action
)
return RoutingDecision(
action="execute",
agent=agent,
intent=classification,
confidence=classification.confidence
)
Classification Output Schema
intent_classification:
request: "Original user request"
timestamp: "2026-01-02T10:30:00Z"
classification:
domain: "backend"
domain_confidence: 0.92
action_type: "implement"
action_confidence: 0.88
scope: "component"
scope_confidence: 0.75
complexity: "moderate"
complexity_confidence: 0.80
urgency: "normal"
urgency_confidence: 0.95
overall_confidence: 0.86
context_used:
active_plan: "PILOT-PARALLEL-EXECUTION-PLAN.md"
current_track: "A.3 Commerce API"
current_task: "A.3.3 Checkout endpoint"
relevant_decisions:
- "AUTH-001: Use JWT with 1hr expiry"
- "API-003: RESTful conventions"
interpretations:
- interpretation: "Implement checkout validation"
confidence: 0.86
rationale: "Matches current task A.3.3"
- interpretation: "Add payment processing"
confidence: 0.12
rationale: "Payment is task A.3.4, not current"
recommended_agent: "rust-expert-developer"
routing_confidence: 0.88
Consequences
Positive
- Reduced Hallucination - Context-aware classification prevents misinterpretation
- Better UX - Fewer unnecessary clarification questions
- Traceable Decisions - Full audit trail of why interpretations were chosen
- Agent Optimization - Right agent selected for each task type
Negative
- Context Dependency - Requires well-maintained project plans
- Initial Setup - Needs training on project-specific patterns
- Latency - Multi-stage pipeline adds processing time
Risks
-
Over-reliance on Plan - May miss valid requests outside current plan
- Mitigation: Support "out-of-scope" handling with user confirmation
-
Classification Drift - Patterns may become stale
- Mitigation: Continual learning from user corrections
Implementation
Phase 1: Core Classification (Week 1-2)
- Implement lexical analyzer
- Build context enrichment pipeline
- Create multi-signal classifier
- Add confidence calculation
Phase 2: Integration (Week 3-4)
- Integrate with MoE router (ADR-008)
- Connect to guardrail engine (ADR-027)
- Add clarification question generation
- Implement audit logging
Phase 3: Learning (Week 5-6)
- Add user feedback collection
- Implement pattern learning
- Create classification dashboard
- Performance optimization
Related Components
commands/which.md- Intent classification commandagents/intent-classifier.md- Classification agentskills/intent-classification/SKILL.md- Classification skillscripts/core/intent_classifier.py- Core implementation
References
Internal ADRs
- ADR-007: Uncertainty Quantification Framework
- ADR-008: MoE Analysis Framework
- ADR-009: MoE Judges Framework
- ADR-027: Guardrail Engine (companion ADR)
- WHAT-IS-CODITECT.md: Intent Management & Inference Guardrails section
Academic References
Disambiguation & Clarification:
- Resolving Ambiguity Through Interaction with LMs - NAACL 2025
- Ambiguity in LLMs is a Concept Missing Problem - arXiv 2025
- Aligning Language Models to Explicitly Handle Ambiguity - arXiv 2024
- Bridging the Gap Between LLMs and Human Intentions - arXiv 2025
- Disambiguation in Conversational Question Answering - EMNLP 2025
MoE Routing & Expert Selection:
- Symbolic MoE: Adaptive Skill-based Routing - arXiv 2025
- Mixture of Experts in Large Language Models - arXiv 2025
- Visual Guide to Mixture of Experts
Uncertainty Estimation:
- Survey of Uncertainty Estimation in LLMs - ACM 2025
Suggested Search Terms for Further Research
| Category | Search Terms |
|---|---|
| Disambiguation | LLM clarification question generation, conversational ambiguity resolution, intent disambiguation neural |
| Routing | task routing LLM agents, expert selection multi-agent, query-level MoE routing |
| Uncertainty | semantic entropy LLM, self-consistency prompting, uncertainty of thoughts |
| Classification | intent classification transformer, multi-label intent detection, hierarchical intent recognition |
| Evaluation | disambiguation benchmark, ambiguity detection evaluation, intent classification metrics |