ADR-026: Intent Classification System for LLM Inference Management

Status: Accepted Date: 2026-01-02 Deciders: Hal Casteel (Founder/CEO/CTO), CODITECT Core Team Technical Story: Enable accurate task routing and disambiguation through systematic intent classification

Context and Problem Statement

The Inference Ambiguity Problem

LLM-based systems face significant challenges when user requests are ambiguous:

Vague Requests - "Help with the backend" could mean 50+ different actions
Multiple Valid Interpretations - "Improve the API" has many valid solutions
Scope Uncertainty - Unclear what's in/out of scope for a request
Agent Routing - Wrong agent selection leads to suboptimal outputs

Impact: Without intent classification, systems either:

Guess incorrectly (hallucination risk)
Ask too many questions (poor UX)
Produce out-of-scope work (wasted effort)

This ADR builds on:

ADR-007 (Uncertainty Quantification) - Measuring confidence levels
ADR-008 (MoE Analysis Framework) - Multi-expert routing patterns
ADR-009 (MoE Judges Framework) - Validation and disambiguation

Academic Research Foundation

Recent research (2024-2025) informs this architecture:

Paper	Key Insight	Application
Resolving Ambiguity Through Interaction (NAACL 2025)	Three-stage pipeline: identify ambiguous inputs, generate clarification questions, predict output with clarification	Classification pipeline stages
Ambiguity in LLMs is a Concept Missing Problem (arXiv 2025)	Sample multiple outputs to identify ambiguity via consistency analysis	Confidence calculation via sampling
Aligning Language Models to Handle Ambiguity (arXiv 2024)	Tree-of-Clarification (ToC) refines ambiguity within inputs	Structured clarification generation
Bridging Gap Between LLMs and Human Intentions (arXiv 2025)	LLMs struggle with ambiguous inputs, defaulting to training data preferences	Context enrichment to override defaults
Symbolic MoE: Adaptive Skill-based Routing (arXiv 2025)	Infer discrete skills needed to solve a problem for expert selection	Domain and action_type classification

Decision Drivers

Accuracy - Correctly interpret user intent >90% of the time
Efficiency - Minimize clarification questions while maintaining accuracy
Traceability - Document why specific interpretations were chosen
Integration - Work with existing MoE router and agent framework

Considered Options

Option 1: Rule-Based Classification (Rejected)

Keyword matching and regex patterns
Pro: Fast, deterministic
Con: Brittle, doesn't handle novel requests

Option 2: Embedding-Based Similarity (Partial)

Compare request embeddings to known intent clusters
Pro: Handles paraphrasing well
Con: Requires training data, doesn't use context

Option 3: Context-Aware Multi-Signal Classification (Chosen)

Combine lexical, contextual, and historical signals
Use project plan and decisions as context
Pro: Accurate, context-aware, traceable
Con: More complex, requires context infrastructure

Decision

Implement Context-Aware Multi-Signal Intent Classification with the following architecture:

High-Level Architecture (Mermaid)

Clarification Flow (Mermaid)

Classification Dimensions

intent_dimensions:
  domain:
    values: [backend, frontend, devops, security, documentation, architecture]
    weight: 0.25
    source: "Keyword analysis + file patterns"

  action_type:
    values: [implement, fix, review, design, test, deploy, document]
    weight: 0.25
    source: "Verb extraction + task history"

  scope:
    values: [specific_file, component, module, system, cross_cutting]
    weight: 0.20
    source: "Entity extraction + plan alignment"

  complexity:
    values: [trivial, simple, moderate, complex, architectural]
    weight: 0.15
    source: "Estimated effort + dependency analysis"

  urgency:
    values: [blocking, high, normal, low, backlog]
    weight: 0.15
    source: "Explicit markers + context inference"

Classification Pipeline

┌─────────────────────────────────────────────────────────────────────────────┐
│                    INTENT CLASSIFICATION PIPELINE                            │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  USER REQUEST                                                                │
│       │                                                                      │
│       ▼                                                                      │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  STAGE 1: LEXICAL ANALYSIS                                          │    │
│  │  • Extract verbs (implement, fix, add, remove, update)              │    │
│  │  • Extract entities (files, components, endpoints)                  │    │
│  │  • Detect vague terms (improve, help, fix, handle)                  │    │
│  │  • Output: lexical_features                                         │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│       │                                                                      │
│       ▼                                                                      │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  STAGE 2: CONTEXT ENRICHMENT                                        │    │
│  │  • Load active plan (PILOT-PARALLEL-EXECUTION-PLAN.md)             │    │
│  │  • Load current task and track                                      │    │
│  │  • Query /cxq for relevant decisions                                │    │
│  │  • Output: context_features                                         │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│       │                                                                      │
│       ▼                                                                      │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  STAGE 3: MULTI-SIGNAL CLASSIFICATION                               │    │
│  │  • Classify each dimension independently                            │    │
│  │  • Calculate confidence per dimension                               │    │
│  │  • Aggregate weighted confidence                                    │    │
│  │  • Output: classification_result                                    │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│       │                                                                      │
│       ▼                                                                      │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  STAGE 4: CONFIDENCE CHECK                                          │    │
│  │                                                                      │    │
│  │  if confidence >= 0.85:                                             │    │
│  │      → Route to agent (ADR-008 MoE Router)                         │    │
│  │                                                                      │    │
│  │  if confidence 0.50-0.85:                                           │    │
│  │      → Generate clarifying question                                 │    │
│  │      → Present top interpretations with tradeoffs                   │    │
│  │                                                                      │    │
│  │  if confidence < 0.50:                                              │    │
│  │      → Request more context from user                               │    │
│  │      → Suggest breaking down the request                            │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Integration with MoE Router

def classify_and_route(request: str, context: Context) -> RoutingDecision:
    """Intent classification integrated with MoE routing"""

    # Stage 1: Classify intent
    classification = intent_classifier.classify(request, context)

    # Stage 2: Check confidence threshold
    if classification.confidence < CLARIFICATION_THRESHOLD:
        return RoutingDecision(
            action="clarify",
            questions=generate_clarification_questions(classification),
            alternatives=classification.top_interpretations[:3]
        )

    # Stage 3: Select agent based on intent (ADR-008)
    agent = moe_router.select_agent(
        domain=classification.domain,
        action_type=classification.action_type,
        complexity=classification.complexity
    )

    # Stage 4: Apply guardrails (ADR-027)
    guardrail_check = guardrail_engine.validate(
        intent=classification,
        agent=agent,
        context=context
    )

    if not guardrail_check.passed:
        return RoutingDecision(
            action="blocked",
            reason=guardrail_check.reason,
            remediation=guardrail_check.suggested_action
        )

    return RoutingDecision(
        action="execute",
        agent=agent,
        intent=classification,
        confidence=classification.confidence
    )

Classification Output Schema

intent_classification:
  request: "Original user request"
  timestamp: "2026-01-02T10:30:00Z"

  classification:
    domain: "backend"
    domain_confidence: 0.92
    action_type: "implement"
    action_confidence: 0.88
    scope: "component"
    scope_confidence: 0.75
    complexity: "moderate"
    complexity_confidence: 0.80
    urgency: "normal"
    urgency_confidence: 0.95

  overall_confidence: 0.86

  context_used:
    active_plan: "PILOT-PARALLEL-EXECUTION-PLAN.md"
    current_track: "A.3 Commerce API"
    current_task: "A.3.3 Checkout endpoint"
    relevant_decisions:
      - "AUTH-001: Use JWT with 1hr expiry"
      - "API-003: RESTful conventions"

  interpretations:
    - interpretation: "Implement checkout validation"
      confidence: 0.86
      rationale: "Matches current task A.3.3"
    - interpretation: "Add payment processing"
      confidence: 0.12
      rationale: "Payment is task A.3.4, not current"

  recommended_agent: "rust-expert-developer"
  routing_confidence: 0.88

Consequences

Positive

Reduced Hallucination - Context-aware classification prevents misinterpretation
Better UX - Fewer unnecessary clarification questions
Traceable Decisions - Full audit trail of why interpretations were chosen
Agent Optimization - Right agent selected for each task type

Negative

Context Dependency - Requires well-maintained project plans
Initial Setup - Needs training on project-specific patterns
Latency - Multi-stage pipeline adds processing time

Risks

Over-reliance on Plan - May miss valid requests outside current plan
- Mitigation: Support "out-of-scope" handling with user confirmation
Classification Drift - Patterns may become stale
- Mitigation: Continual learning from user corrections

Implementation

Phase 1: Core Classification (Week 1-2)

Implement lexical analyzer
Build context enrichment pipeline
Create multi-signal classifier
Add confidence calculation

Phase 2: Integration (Week 3-4)

Integrate with MoE router (ADR-008)
Connect to guardrail engine (ADR-027)
Add clarification question generation
Implement audit logging

Phase 3: Learning (Week 5-6)

Add user feedback collection
Implement pattern learning
Create classification dashboard
Performance optimization

commands/which.md - Intent classification command
agents/intent-classifier.md - Classification agent
skills/intent-classification/SKILL.md - Classification skill
scripts/core/intent_classifier.py - Core implementation

References

Internal ADRs

ADR-007: Uncertainty Quantification Framework
ADR-008: MoE Analysis Framework
ADR-009: MoE Judges Framework
ADR-027: Guardrail Engine (companion ADR)
WHAT-IS-CODITECT.md: Intent Management & Inference Guardrails section

Academic References

Disambiguation & Clarification:

Resolving Ambiguity Through Interaction with LMs - NAACL 2025
Ambiguity in LLMs is a Concept Missing Problem - arXiv 2025
Aligning Language Models to Explicitly Handle Ambiguity - arXiv 2024
Bridging the Gap Between LLMs and Human Intentions - arXiv 2025
Disambiguation in Conversational Question Answering - EMNLP 2025

MoE Routing & Expert Selection:

Symbolic MoE: Adaptive Skill-based Routing - arXiv 2025
Mixture of Experts in Large Language Models - arXiv 2025
Visual Guide to Mixture of Experts

Uncertainty Estimation:

Survey of Uncertainty Estimation in LLMs - ACM 2025

Suggested Search Terms for Further Research

Category	Search Terms
Disambiguation	`LLM clarification question generation`, `conversational ambiguity resolution`, `intent disambiguation neural`
Routing	`task routing LLM agents`, `expert selection multi-agent`, `query-level MoE routing`
Uncertainty	`semantic entropy LLM`, `self-consistency prompting`, `uncertainty of thoughts`
Classification	`intent classification transformer`, `multi-label intent detection`, `hierarchical intent recognition`
Evaluation	`disambiguation benchmark`, `ambiguity detection evaluation`, `intent classification metrics`

Context and Problem Statement​

The Inference Ambiguity Problem​

Related Work​

Academic Research Foundation​

Decision Drivers​

Considered Options​

Option 1: Rule-Based Classification (Rejected)​

Option 2: Embedding-Based Similarity (Partial)​

Option 3: Context-Aware Multi-Signal Classification (Chosen)​

Decision​

High-Level Architecture (Mermaid)​

Clarification Flow (Mermaid)​

Classification Dimensions​

Classification Pipeline​

Integration with MoE Router​

Classification Output Schema​

Consequences​

Positive​

Negative​

Risks​

Implementation​

Phase 1: Core Classification (Week 1-2)​

Phase 2: Integration (Week 3-4)​

Phase 3: Learning (Week 5-6)​

Related Components​

References​

Internal ADRs​

Academic References​

Suggested Search Terms for Further Research​