Skip to main content

ADR-026: Intent Classification System for LLM Inference Management

Status: Accepted Date: 2026-01-02 Deciders: Hal Casteel (Founder/CEO/CTO), CODITECT Core Team Technical Story: Enable accurate task routing and disambiguation through systematic intent classification


Context and Problem Statement

The Inference Ambiguity Problem

LLM-based systems face significant challenges when user requests are ambiguous:

  1. Vague Requests - "Help with the backend" could mean 50+ different actions
  2. Multiple Valid Interpretations - "Improve the API" has many valid solutions
  3. Scope Uncertainty - Unclear what's in/out of scope for a request
  4. Agent Routing - Wrong agent selection leads to suboptimal outputs

Impact: Without intent classification, systems either:

  • Guess incorrectly (hallucination risk)
  • Ask too many questions (poor UX)
  • Produce out-of-scope work (wasted effort)

This ADR builds on:

  • ADR-007 (Uncertainty Quantification) - Measuring confidence levels
  • ADR-008 (MoE Analysis Framework) - Multi-expert routing patterns
  • ADR-009 (MoE Judges Framework) - Validation and disambiguation

Academic Research Foundation

Recent research (2024-2025) informs this architecture:

PaperKey InsightApplication
Resolving Ambiguity Through Interaction (NAACL 2025)Three-stage pipeline: identify ambiguous inputs, generate clarification questions, predict output with clarificationClassification pipeline stages
Ambiguity in LLMs is a Concept Missing Problem (arXiv 2025)Sample multiple outputs to identify ambiguity via consistency analysisConfidence calculation via sampling
Aligning Language Models to Handle Ambiguity (arXiv 2024)Tree-of-Clarification (ToC) refines ambiguity within inputsStructured clarification generation
Bridging Gap Between LLMs and Human Intentions (arXiv 2025)LLMs struggle with ambiguous inputs, defaulting to training data preferencesContext enrichment to override defaults
Symbolic MoE: Adaptive Skill-based Routing (arXiv 2025)Infer discrete skills needed to solve a problem for expert selectionDomain and action_type classification

Decision Drivers

  1. Accuracy - Correctly interpret user intent >90% of the time
  2. Efficiency - Minimize clarification questions while maintaining accuracy
  3. Traceability - Document why specific interpretations were chosen
  4. Integration - Work with existing MoE router and agent framework

Considered Options

Option 1: Rule-Based Classification (Rejected)

  • Keyword matching and regex patterns
  • Pro: Fast, deterministic
  • Con: Brittle, doesn't handle novel requests

Option 2: Embedding-Based Similarity (Partial)

  • Compare request embeddings to known intent clusters
  • Pro: Handles paraphrasing well
  • Con: Requires training data, doesn't use context

Option 3: Context-Aware Multi-Signal Classification (Chosen)

  • Combine lexical, contextual, and historical signals
  • Use project plan and decisions as context
  • Pro: Accurate, context-aware, traceable
  • Con: More complex, requires context infrastructure

Decision

Implement Context-Aware Multi-Signal Intent Classification with the following architecture:

High-Level Architecture (Mermaid)

Clarification Flow (Mermaid)

Classification Dimensions

intent_dimensions:
domain:
values: [backend, frontend, devops, security, documentation, architecture]
weight: 0.25
source: "Keyword analysis + file patterns"

action_type:
values: [implement, fix, review, design, test, deploy, document]
weight: 0.25
source: "Verb extraction + task history"

scope:
values: [specific_file, component, module, system, cross_cutting]
weight: 0.20
source: "Entity extraction + plan alignment"

complexity:
values: [trivial, simple, moderate, complex, architectural]
weight: 0.15
source: "Estimated effort + dependency analysis"

urgency:
values: [blocking, high, normal, low, backlog]
weight: 0.15
source: "Explicit markers + context inference"

Classification Pipeline

┌─────────────────────────────────────────────────────────────────────────────┐
│ INTENT CLASSIFICATION PIPELINE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ USER REQUEST │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ STAGE 1: LEXICAL ANALYSIS │ │
│ │ • Extract verbs (implement, fix, add, remove, update) │ │
│ │ • Extract entities (files, components, endpoints) │ │
│ │ • Detect vague terms (improve, help, fix, handle) │ │
│ │ • Output: lexical_features │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ STAGE 2: CONTEXT ENRICHMENT │ │
│ │ • Load active plan (PILOT-PARALLEL-EXECUTION-PLAN.md) │ │
│ │ • Load current task and track │ │
│ │ • Query /cxq for relevant decisions │ │
│ │ • Output: context_features │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ STAGE 3: MULTI-SIGNAL CLASSIFICATION │ │
│ │ • Classify each dimension independently │ │
│ │ • Calculate confidence per dimension │ │
│ │ • Aggregate weighted confidence │ │
│ │ • Output: classification_result │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ STAGE 4: CONFIDENCE CHECK │ │
│ │ │ │
│ │ if confidence >= 0.85: │ │
│ │ → Route to agent (ADR-008 MoE Router) │ │
│ │ │ │
│ │ if confidence 0.50-0.85: │ │
│ │ → Generate clarifying question │ │
│ │ → Present top interpretations with tradeoffs │ │
│ │ │ │
│ │ if confidence < 0.50: │ │
│ │ → Request more context from user │ │
│ │ → Suggest breaking down the request │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘

Integration with MoE Router

def classify_and_route(request: str, context: Context) -> RoutingDecision:
"""Intent classification integrated with MoE routing"""

# Stage 1: Classify intent
classification = intent_classifier.classify(request, context)

# Stage 2: Check confidence threshold
if classification.confidence < CLARIFICATION_THRESHOLD:
return RoutingDecision(
action="clarify",
questions=generate_clarification_questions(classification),
alternatives=classification.top_interpretations[:3]
)

# Stage 3: Select agent based on intent (ADR-008)
agent = moe_router.select_agent(
domain=classification.domain,
action_type=classification.action_type,
complexity=classification.complexity
)

# Stage 4: Apply guardrails (ADR-027)
guardrail_check = guardrail_engine.validate(
intent=classification,
agent=agent,
context=context
)

if not guardrail_check.passed:
return RoutingDecision(
action="blocked",
reason=guardrail_check.reason,
remediation=guardrail_check.suggested_action
)

return RoutingDecision(
action="execute",
agent=agent,
intent=classification,
confidence=classification.confidence
)

Classification Output Schema

intent_classification:
request: "Original user request"
timestamp: "2026-01-02T10:30:00Z"

classification:
domain: "backend"
domain_confidence: 0.92
action_type: "implement"
action_confidence: 0.88
scope: "component"
scope_confidence: 0.75
complexity: "moderate"
complexity_confidence: 0.80
urgency: "normal"
urgency_confidence: 0.95

overall_confidence: 0.86

context_used:
active_plan: "PILOT-PARALLEL-EXECUTION-PLAN.md"
current_track: "A.3 Commerce API"
current_task: "A.3.3 Checkout endpoint"
relevant_decisions:
- "AUTH-001: Use JWT with 1hr expiry"
- "API-003: RESTful conventions"

interpretations:
- interpretation: "Implement checkout validation"
confidence: 0.86
rationale: "Matches current task A.3.3"
- interpretation: "Add payment processing"
confidence: 0.12
rationale: "Payment is task A.3.4, not current"

recommended_agent: "rust-expert-developer"
routing_confidence: 0.88

Consequences

Positive

  1. Reduced Hallucination - Context-aware classification prevents misinterpretation
  2. Better UX - Fewer unnecessary clarification questions
  3. Traceable Decisions - Full audit trail of why interpretations were chosen
  4. Agent Optimization - Right agent selected for each task type

Negative

  1. Context Dependency - Requires well-maintained project plans
  2. Initial Setup - Needs training on project-specific patterns
  3. Latency - Multi-stage pipeline adds processing time

Risks

  1. Over-reliance on Plan - May miss valid requests outside current plan

    • Mitigation: Support "out-of-scope" handling with user confirmation
  2. Classification Drift - Patterns may become stale

    • Mitigation: Continual learning from user corrections

Implementation

Phase 1: Core Classification (Week 1-2)

  • Implement lexical analyzer
  • Build context enrichment pipeline
  • Create multi-signal classifier
  • Add confidence calculation

Phase 2: Integration (Week 3-4)

  • Integrate with MoE router (ADR-008)
  • Connect to guardrail engine (ADR-027)
  • Add clarification question generation
  • Implement audit logging

Phase 3: Learning (Week 5-6)

  • Add user feedback collection
  • Implement pattern learning
  • Create classification dashboard
  • Performance optimization

  • commands/which.md - Intent classification command
  • agents/intent-classifier.md - Classification agent
  • skills/intent-classification/SKILL.md - Classification skill
  • scripts/core/intent_classifier.py - Core implementation

References

Internal ADRs

  • ADR-007: Uncertainty Quantification Framework
  • ADR-008: MoE Analysis Framework
  • ADR-009: MoE Judges Framework
  • ADR-027: Guardrail Engine (companion ADR)
  • WHAT-IS-CODITECT.md: Intent Management & Inference Guardrails section

Academic References

Disambiguation & Clarification:

MoE Routing & Expert Selection:

Uncertainty Estimation:

Suggested Search Terms for Further Research

CategorySearch Terms
DisambiguationLLM clarification question generation, conversational ambiguity resolution, intent disambiguation neural
Routingtask routing LLM agents, expert selection multi-agent, query-level MoE routing
Uncertaintysemantic entropy LLM, self-consistency prompting, uncertainty of thoughts
Classificationintent classification transformer, multi-label intent detection, hierarchical intent recognition
Evaluationdisambiguation benchmark, ambiguity detection evaluation, intent classification metrics