Skip to main content

Ambiguity and Intent Research Index

Academic papers on LLM disambiguation, intent classification, and uncertainty quantification (2024-2025)

Downloaded: January 2, 2026 Total Papers: 15 Total Size: ~45 MB


Quick Reference

#PaperVenueYearRelevance
01Disambiguation in Conversational QA SurveyEMNLP2025Disambiguation taxonomy
02Aligning LLMs to Handle AmbiguityEMNLP2024APA training pipeline
03Mixture of Experts in LLMs SurveyarXiv2025MoE routing strategies
04OLMoE: Open Mixture-of-ExpertsarXiv2024Open-source MoE
05Uncertainty Estimation SurveyarXiv2024Theoretical frameworks
06Uncertainty: Supervised ApproacharXiv2024Hidden activation method
07Self-Contradictory HallucinationsICLR2024Contradiction detection
08ConU: Conformal UncertaintyEMNLP2024Statistical guarantees
09AMBROSIA BenchmarkNeurIPS2024Ambiguity benchmark
10MIntRec2.0 DatasetICLR2024Intent recognition
11Semantic Routing for IntentarXiv2024Deterministic routing
12Disambiguate First, Parse LaterarXiv2025Two-stage approach
13Difficulty-Aware Agent OrchestrationarXiv2024DAAO framework
14RCR-Router: Role-Aware RoutingarXiv2025Context routing
15Talk to Right SpecialistsarXiv2025Multi-agent QA

Category 1: Disambiguation and Clarification

01-Disambiguation-ConvQA-Survey-EMNLP2025.pdf

Title: Disambiguation in Conversational Question Answering in the Era of LLMs and Agents: A Survey

Authors: Md Mehrab Tanjim, Yeonjun In, Xiang Chen, Victor S. Bursztyn, Ryan A. Rossi, Sungchul Kim, et al.

Venue: EMNLP 2025 (accepted)

arXiv: 2505.12543

Abstract: Comprehensive survey examining how ambiguity remains a fundamental challenge in NLP, particularly in conversational QA systems powered by LLMs. Provides definitions, categorizes disambiguation strategies, and analyzes trade-offs.

Key Contribution: First comprehensive framework for understanding disambiguation approaches in the LLM era, including agentic systems. Classification of various disambiguation strategies enabled by modern language models.

CODITECT Relevance: Directly applicable to designing disambiguation strategies for the /which command and MoE routing. Provides taxonomy of ambiguity types and resolution approaches that can inform guardrail design.


02-Aligning-LLMs-Handle-Ambiguity-EMNLP2024.pdf

Title: Aligning Language Models to Explicitly Handle Ambiguity

Authors: Hyuhng Joon Kim, Youna Kim, Cheonbok Park, Junyeob Kim, Choonghyun Park, Kang Min Yoo, Sang-goo Lee, Taeuk Kim

Venue: EMNLP 2024

arXiv: 2404.11972

Abstract: Addresses how LLMs struggle with ambiguous inputs containing ellipsis or imprecision. Introduces Alignment with Perceived Ambiguity (APA), a pipeline that trains models to recognize and manage ambiguous inputs using uncertainty assessments.

Key Contribution: Novel training approach that uses model's own uncertainty to detect ambiguity rather than external labels. Outperforms gold-standard label training in out-of-distribution scenarios.

CODITECT Relevance: The APA pipeline could be adapted for MoE confidence scoring. Using model uncertainty to trigger clarification is directly applicable to the disambiguation workflow.


12-Disambiguate-First-Parse-Later-2025.pdf

Title: Disambiguate First, Parse Later: Generating Interpretations for Ambiguity Resolution in Semantic Parsing

Venue: arXiv 2025

arXiv: 2502.18448

Abstract: Proposes using natural language to spell out ambiguity before mapping to logical forms. Leverages LLMs to generate preferred interpretations, then uses specialized infilling models.

Key Contribution: Novel two-stage approach: disambiguate in natural language first, then parse. Tested on AMBROSIA benchmark.

CODITECT Relevance: The "disambiguate first" principle aligns perfectly with CODITECT's workflow where /which should clarify intent before routing to agents. The approach of generating multiple interpretations maps to the MoE candidate generation phase.


Category 2: Mixture of Experts Routing

03-MoE-in-LLMs-Survey-2025.pdf

Title: Mixture of Experts in Large Language Models

Authors: Danyang Zhang, Junhao Song, Ziqian Bi, Xinyuan Song, Yingfang Yuan, Tianyang Wang, Joe Yeong, Junfeng Hao

Venue: arXiv 2025

arXiv: 2507.11181

Abstract: Comprehensive review of MoE architectures in LLMs covering expert gating, routing systems, hierarchical configurations, and sparse MoE designs.

Key Contribution: Systematic categorization of MoE routing strategies: token-level, modality-level, task-level, and other-level routing. Analysis of load balancing challenges and expert specialization.

CODITECT Relevance: Directly applicable to MoE task classification system. The task-level routing section is particularly relevant. Insights on load balancing and expert specialization inform agent dispatcher design.


04-OLMoE-Open-MoE-2024.pdf

Title: OLMoE: Open Mixture-of-Experts Language Models

Authors: Niklas Muennighoff, Luca Soldaini, Dirk Groeneveld, Kyle Lo, et al. (AI2, UW)

Venue: arXiv 2024

arXiv: 2409.02060

Abstract: Fully open-source sparse MoE model with 7B total parameters, 1B active per token. Trained on 5 trillion tokens with high expert specialization emerging naturally.

Key Contribution: First fully open MoE model with complete training artifacts (data, code, logs, checkpoints). Demonstrates natural expert specialization without explicit constraint.

CODITECT Relevance: Proof that expert specialization emerges naturally with proper architecture. OLMoE's routing mechanism could inspire CODITECT's agent dispatcher. Open-source nature allows experimentation with routing strategies.


13-Difficulty-Aware-Agent-Orchestration-2024.pdf

Title: Difficulty-Aware Agent Orchestration in LLM-Powered Workflows

Venue: arXiv 2024

arXiv: 2509.11079

Abstract: Proposes DAAO (Difficulty-Aware Agentic Orchestration), which adapts workflow depth, operator selection, and LLM assignment based on query difficulty. Uses VAE for difficulty estimation, modular operator allocator, and cost-aware LLM router.

Key Contribution: Dynamic routing based on task difficulty rather than static rules. Outperforms prior multi-agent systems in both accuracy and efficiency.

CODITECT Relevance: CODITECT could implement difficulty-aware routing where simple tasks go to lightweight agents and complex tasks trigger multi-agent orchestration. Cost-aware routing aligns with token optimization goals.


14-RCR-Router-Context-Routing-2025.pdf

Title: RCR-Router: Role-Aware Context Routing for Multi-Agent LLM Systems

Venue: arXiv 2025

arXiv: 2508.04903

Abstract: Dynamic, role-conditioned routing with semantic importance scoring and token budget constraints.

Key Contribution: Selecting semantically relevant context per agent role.

CODITECT Relevance: CODITECT agents have roles (specialists). RCR-Router's approach to selecting semantically relevant context per agent role directly applies to memory/context management.


15-Talk-Right-Specialists-QA-2025.pdf

Title: Talk to Right Specialists: Routing and Planning in Multi-agent System for QA

Venue: arXiv 2025

arXiv: 2501.07813

Abstract: Describing knowledge boundaries of agents to enable appropriate routing.

Key Contribution: Formalizing agent capability boundaries for routing decisions.

CODITECT Relevance: CODITECT agents have explicit capability definitions (AGENT.md files). Formalizing "knowledge boundaries" would improve routing accuracy.


Category 3: Intent Classification

10-MIntRec2-Intent-Dataset-ICLR2024.pdf

Title: MIntRec2.0: A Large-scale Benchmark Dataset for Multimodal Intent Recognition and Out-of-scope Detection in Conversations

Authors: Hanlei Zhang, Xin Wang, Hua Xu, Qianrui Zhou, Kai Gao, Jianhua Su, Jinyue Zhao, Wenrui Li, Yanting Chen

Venue: ICLR 2024

arXiv: 2403.10943

Abstract: First large-scale dataset for multimodal intent recognition and out-of-scope detection in multi-party conversations. 15K utterances across 30 intent classes with text, video, and audio modalities.

Key Contribution: Reveals 30%+ performance gap between LLMs and humans (71% human accuracy). Establishes challenging benchmark for intent classification with out-of-scope detection.

CODITECT Relevance: Out-of-scope detection is critical for MoE routing—knowing when NO agent is appropriate. The multimodal aspect is relevant for future CODITECT extensions (e.g., screenshot analysis, voice commands).


11-Semantic-Routing-Intent-2024.pdf

Title: Semantic Routing for Enhanced Performance of LLM-Assisted Intent-Based 5G Core Network Management

Venue: arXiv 2024

arXiv: 2404.15869

Abstract: Uses semantic routing with vector embeddings to achieve deterministic routing instead of LLM-based classification. Addresses scalability limitations of prompt-based methods and hallucination risks.

Key Contribution: Shows semantic router improves accuracy and efficiency vs. standalone LLM prompting. Deterministic routing reduces hallucination in intent extraction.

CODITECT Relevance: Semantic routing using pre-computed embeddings offers fast, low-cost intent classification. Could replace or supplement LLM-based routing in /which for common queries. The deterministic aspect addresses reliability concerns.


Category 4: Uncertainty Quantification

05-Uncertainty-Estimation-Survey-2024.pdf

Title: A Survey of Uncertainty Estimation in LLMs: Theory Meets Practice

Authors: Hsiu-Yuan Huang, Yutong Yang, Zhaoxi Zhang, Sanwoo Lee, Yunfang Wu

Venue: arXiv 2024

arXiv: 2410.15326

Abstract: Comprehensive survey distinguishing uncertainty from confidence, integrating theoretical frameworks (Bayesian, information-theoretic, ensemble-based) to systematize uncertainty estimation methods.

Key Contribution: Bridges theory and practice by categorizing methods within established frameworks. Addresses specific challenges for LLM uncertainty in real-world scenarios.

CODITECT Relevance: Provides theoretical foundation for MoE confidence scoring. Out-of-distribution detection techniques directly applicable to detecting queries CODITECT cannot handle confidently.


06-Uncertainty-Supervised-Approach-2024.pdf

Title: Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach

Authors: Linyu Liu, Yu Pan, Xiaocheng Li, Guanting Chen

Venue: arXiv 2024

arXiv: 2404.15993

Abstract: Supervised approach leveraging labeled datasets to estimate uncertainty using LLM hidden activations. Demonstrates robust transferability in out-of-distribution settings.

Key Contribution: Practical supervised method using hidden layer activations. Supports varying model access levels (black-box to white-box).

CODITECT Relevance: Hidden activation-based uncertainty estimation could enhance MoE confidence scoring without requiring model retraining. The supervised approach using labeled data (CODITECT's skill-learnings.json) is immediately applicable.


07-Self-Contradictory-Hallucinations-ICLR2024.pdf

Title: Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation

Venue: ICLR 2024

arXiv: 2305.15852

Abstract: Addresses self-contradiction in LLM outputs (17.7% of ChatGPT sentences). Proposes prompting-based framework to detect and mitigate self-contradictions, which reveal model uncertainty.

Key Contribution: Self-contradiction as uncertainty indicator. Black-box mitigation without external knowledge retrieval. Iterative refinement to remove contradictions while preserving fluency.

CODITECT Relevance: Self-contradiction detection could be a guardrail in agent output validation. When agents produce contradictory statements in their reasoning, it signals low confidence and should trigger re-routing or clarification.


08-ConU-Conformal-Uncertainty-EMNLP2024.pdf

Title: ConU: Conformal Uncertainty in Large Language Models with Correctness Coverage Guarantees

Venue: EMNLP 2024 Findings

arXiv: 2407.00499

Abstract: Applies conformal prediction to black-box LLMs for open-ended NLG tasks. Introduces uncertainty measure based on self-consistency theory, integrated into CP algorithm for correctness coverage guarantees.

Key Contribution: Rigorous statistical guarantees for uncertainty quantification. Outperforms prior SOTA methods across 7 LLMs on 4 NLG datasets with strict correctness coverage control.

CODITECT Relevance: Conformal prediction provides statistical guarantees that confidence scores are calibrated. Could ensure CODITECT's MoE routing decisions have provable reliability guarantees (e.g., "90% of high-confidence routes are correct").


Category 5: Benchmarks

09-AMBROSIA-Benchmark-NeurIPS2024.pdf

Title: AMBROSIA: A Benchmark for Parsing Ambiguous Questions into Database Queries

Authors: Irina Saparina, Mirella Lapata

Venue: NeurIPS 2024 (Datasets & Benchmarks Track, Spotlight)

arXiv: 2406.19073

Abstract: Benchmark with 1,277 ambiguous questions, human-provided unambiguous interpretations, spanning 846 multi-table databases across 16 domains. Three ambiguity types: scope, attachment, vagueness.

Key Contribution: First benchmark for ambiguous semantic parsing. Shows even advanced LLMs struggle with ambiguity identification and interpretation. Llama3-70B best recall but all models fail at finding all interpretations.

CODITECT Relevance: Provides test cases for evaluating CODITECT's disambiguation capabilities. The three ambiguity types (scope, attachment, vagueness) map well to command/agent invocation ambiguities. Could adapt dataset for CODITECT-specific benchmarking.


Application to CODITECT

ADR-026 (Intent Classification) Enhancements

PaperApplicable Technique
02APA pipeline for confidence scoring
10Out-of-scope detection patterns
11Semantic routing for fast classification
12"Disambiguate first" two-stage approach

ADR-027 (Guardrail Engine) Enhancements

PaperApplicable Technique
07Self-contradiction detection guardrail
08Conformal prediction for calibrated confidence
05Uncertainty frameworks for validation

MoE Routing Improvements

PaperApplicable Technique
03Task-level routing strategies
04Natural expert specialization
13Difficulty-aware orchestration
14Role-conditioned context routing
15Knowledge boundary formalization

Research Gaps Identified

  1. Hierarchical agent routing - Most papers address flat query→agent routing, not orchestrator→specialist→sub-agent chains
  2. Developer tool intent classification - No benchmarks for AI-powered dev tools
  3. Multi-turn disambiguation - Papers focus on single-turn; CODITECT needs multi-turn clarification loops
  4. Continual learning for routing - Online learning from routing mistakes not well-covered
  5. Cost-aware routing - Limited work on computational cost trade-offs

Suggested Next Steps

  1. Read Priority Papers: 01, 02, 03, 13 (core concepts)
  2. Implement: Semantic routing from paper 11 for /which fast path
  3. Adapt: APA pipeline from paper 02 for confidence scoring
  4. Benchmark: Use AMBROSIA (paper 09) patterns for CODITECT eval suite
  5. Integrate: Self-contradiction detection (paper 07) as guardrail

Created: January 2, 2026 Location: coditect-core/00-ANALYZE-NEW-ARTIFACTS/Ambiguity-and-Intent-Research/ Related ADRs: ADR-026, ADR-027, ADR-007, ADR-008