Ambiguity and Intent Research Index
Academic papers on LLM disambiguation, intent classification, and uncertainty quantification (2024-2025)
Downloaded: January 2, 2026 Total Papers: 15 Total Size: ~45 MB
Quick Reference
| # | Paper | Venue | Year | Relevance |
|---|---|---|---|---|
| 01 | Disambiguation in Conversational QA Survey | EMNLP | 2025 | Disambiguation taxonomy |
| 02 | Aligning LLMs to Handle Ambiguity | EMNLP | 2024 | APA training pipeline |
| 03 | Mixture of Experts in LLMs Survey | arXiv | 2025 | MoE routing strategies |
| 04 | OLMoE: Open Mixture-of-Experts | arXiv | 2024 | Open-source MoE |
| 05 | Uncertainty Estimation Survey | arXiv | 2024 | Theoretical frameworks |
| 06 | Uncertainty: Supervised Approach | arXiv | 2024 | Hidden activation method |
| 07 | Self-Contradictory Hallucinations | ICLR | 2024 | Contradiction detection |
| 08 | ConU: Conformal Uncertainty | EMNLP | 2024 | Statistical guarantees |
| 09 | AMBROSIA Benchmark | NeurIPS | 2024 | Ambiguity benchmark |
| 10 | MIntRec2.0 Dataset | ICLR | 2024 | Intent recognition |
| 11 | Semantic Routing for Intent | arXiv | 2024 | Deterministic routing |
| 12 | Disambiguate First, Parse Later | arXiv | 2025 | Two-stage approach |
| 13 | Difficulty-Aware Agent Orchestration | arXiv | 2024 | DAAO framework |
| 14 | RCR-Router: Role-Aware Routing | arXiv | 2025 | Context routing |
| 15 | Talk to Right Specialists | arXiv | 2025 | Multi-agent QA |
Category 1: Disambiguation and Clarification
01-Disambiguation-ConvQA-Survey-EMNLP2025.pdf
Title: Disambiguation in Conversational Question Answering in the Era of LLMs and Agents: A Survey
Authors: Md Mehrab Tanjim, Yeonjun In, Xiang Chen, Victor S. Bursztyn, Ryan A. Rossi, Sungchul Kim, et al.
Venue: EMNLP 2025 (accepted)
arXiv: 2505.12543
Abstract: Comprehensive survey examining how ambiguity remains a fundamental challenge in NLP, particularly in conversational QA systems powered by LLMs. Provides definitions, categorizes disambiguation strategies, and analyzes trade-offs.
Key Contribution: First comprehensive framework for understanding disambiguation approaches in the LLM era, including agentic systems. Classification of various disambiguation strategies enabled by modern language models.
CODITECT Relevance: Directly applicable to designing disambiguation strategies for the /which command and MoE routing. Provides taxonomy of ambiguity types and resolution approaches that can inform guardrail design.
02-Aligning-LLMs-Handle-Ambiguity-EMNLP2024.pdf
Title: Aligning Language Models to Explicitly Handle Ambiguity
Authors: Hyuhng Joon Kim, Youna Kim, Cheonbok Park, Junyeob Kim, Choonghyun Park, Kang Min Yoo, Sang-goo Lee, Taeuk Kim
Venue: EMNLP 2024
arXiv: 2404.11972
Abstract: Addresses how LLMs struggle with ambiguous inputs containing ellipsis or imprecision. Introduces Alignment with Perceived Ambiguity (APA), a pipeline that trains models to recognize and manage ambiguous inputs using uncertainty assessments.
Key Contribution: Novel training approach that uses model's own uncertainty to detect ambiguity rather than external labels. Outperforms gold-standard label training in out-of-distribution scenarios.
CODITECT Relevance: The APA pipeline could be adapted for MoE confidence scoring. Using model uncertainty to trigger clarification is directly applicable to the disambiguation workflow.
12-Disambiguate-First-Parse-Later-2025.pdf
Title: Disambiguate First, Parse Later: Generating Interpretations for Ambiguity Resolution in Semantic Parsing
Venue: arXiv 2025
arXiv: 2502.18448
Abstract: Proposes using natural language to spell out ambiguity before mapping to logical forms. Leverages LLMs to generate preferred interpretations, then uses specialized infilling models.
Key Contribution: Novel two-stage approach: disambiguate in natural language first, then parse. Tested on AMBROSIA benchmark.
CODITECT Relevance: The "disambiguate first" principle aligns perfectly with CODITECT's workflow where /which should clarify intent before routing to agents. The approach of generating multiple interpretations maps to the MoE candidate generation phase.
Category 2: Mixture of Experts Routing
03-MoE-in-LLMs-Survey-2025.pdf
Title: Mixture of Experts in Large Language Models
Authors: Danyang Zhang, Junhao Song, Ziqian Bi, Xinyuan Song, Yingfang Yuan, Tianyang Wang, Joe Yeong, Junfeng Hao
Venue: arXiv 2025
arXiv: 2507.11181
Abstract: Comprehensive review of MoE architectures in LLMs covering expert gating, routing systems, hierarchical configurations, and sparse MoE designs.
Key Contribution: Systematic categorization of MoE routing strategies: token-level, modality-level, task-level, and other-level routing. Analysis of load balancing challenges and expert specialization.
CODITECT Relevance: Directly applicable to MoE task classification system. The task-level routing section is particularly relevant. Insights on load balancing and expert specialization inform agent dispatcher design.
04-OLMoE-Open-MoE-2024.pdf
Title: OLMoE: Open Mixture-of-Experts Language Models
Authors: Niklas Muennighoff, Luca Soldaini, Dirk Groeneveld, Kyle Lo, et al. (AI2, UW)
Venue: arXiv 2024
arXiv: 2409.02060
Abstract: Fully open-source sparse MoE model with 7B total parameters, 1B active per token. Trained on 5 trillion tokens with high expert specialization emerging naturally.
Key Contribution: First fully open MoE model with complete training artifacts (data, code, logs, checkpoints). Demonstrates natural expert specialization without explicit constraint.
CODITECT Relevance: Proof that expert specialization emerges naturally with proper architecture. OLMoE's routing mechanism could inspire CODITECT's agent dispatcher. Open-source nature allows experimentation with routing strategies.
13-Difficulty-Aware-Agent-Orchestration-2024.pdf
Title: Difficulty-Aware Agent Orchestration in LLM-Powered Workflows
Venue: arXiv 2024
arXiv: 2509.11079
Abstract: Proposes DAAO (Difficulty-Aware Agentic Orchestration), which adapts workflow depth, operator selection, and LLM assignment based on query difficulty. Uses VAE for difficulty estimation, modular operator allocator, and cost-aware LLM router.
Key Contribution: Dynamic routing based on task difficulty rather than static rules. Outperforms prior multi-agent systems in both accuracy and efficiency.
CODITECT Relevance: CODITECT could implement difficulty-aware routing where simple tasks go to lightweight agents and complex tasks trigger multi-agent orchestration. Cost-aware routing aligns with token optimization goals.
14-RCR-Router-Context-Routing-2025.pdf
Title: RCR-Router: Role-Aware Context Routing for Multi-Agent LLM Systems
Venue: arXiv 2025
arXiv: 2508.04903
Abstract: Dynamic, role-conditioned routing with semantic importance scoring and token budget constraints.
Key Contribution: Selecting semantically relevant context per agent role.
CODITECT Relevance: CODITECT agents have roles (specialists). RCR-Router's approach to selecting semantically relevant context per agent role directly applies to memory/context management.
15-Talk-Right-Specialists-QA-2025.pdf
Title: Talk to Right Specialists: Routing and Planning in Multi-agent System for QA
Venue: arXiv 2025
arXiv: 2501.07813
Abstract: Describing knowledge boundaries of agents to enable appropriate routing.
Key Contribution: Formalizing agent capability boundaries for routing decisions.
CODITECT Relevance: CODITECT agents have explicit capability definitions (AGENT.md files). Formalizing "knowledge boundaries" would improve routing accuracy.
Category 3: Intent Classification
10-MIntRec2-Intent-Dataset-ICLR2024.pdf
Title: MIntRec2.0: A Large-scale Benchmark Dataset for Multimodal Intent Recognition and Out-of-scope Detection in Conversations
Authors: Hanlei Zhang, Xin Wang, Hua Xu, Qianrui Zhou, Kai Gao, Jianhua Su, Jinyue Zhao, Wenrui Li, Yanting Chen
Venue: ICLR 2024
arXiv: 2403.10943
Abstract: First large-scale dataset for multimodal intent recognition and out-of-scope detection in multi-party conversations. 15K utterances across 30 intent classes with text, video, and audio modalities.
Key Contribution: Reveals 30%+ performance gap between LLMs and humans (71% human accuracy). Establishes challenging benchmark for intent classification with out-of-scope detection.
CODITECT Relevance: Out-of-scope detection is critical for MoE routing—knowing when NO agent is appropriate. The multimodal aspect is relevant for future CODITECT extensions (e.g., screenshot analysis, voice commands).
11-Semantic-Routing-Intent-2024.pdf
Title: Semantic Routing for Enhanced Performance of LLM-Assisted Intent-Based 5G Core Network Management
Venue: arXiv 2024
arXiv: 2404.15869
Abstract: Uses semantic routing with vector embeddings to achieve deterministic routing instead of LLM-based classification. Addresses scalability limitations of prompt-based methods and hallucination risks.
Key Contribution: Shows semantic router improves accuracy and efficiency vs. standalone LLM prompting. Deterministic routing reduces hallucination in intent extraction.
CODITECT Relevance: Semantic routing using pre-computed embeddings offers fast, low-cost intent classification. Could replace or supplement LLM-based routing in /which for common queries. The deterministic aspect addresses reliability concerns.
Category 4: Uncertainty Quantification
05-Uncertainty-Estimation-Survey-2024.pdf
Title: A Survey of Uncertainty Estimation in LLMs: Theory Meets Practice
Authors: Hsiu-Yuan Huang, Yutong Yang, Zhaoxi Zhang, Sanwoo Lee, Yunfang Wu
Venue: arXiv 2024
arXiv: 2410.15326
Abstract: Comprehensive survey distinguishing uncertainty from confidence, integrating theoretical frameworks (Bayesian, information-theoretic, ensemble-based) to systematize uncertainty estimation methods.
Key Contribution: Bridges theory and practice by categorizing methods within established frameworks. Addresses specific challenges for LLM uncertainty in real-world scenarios.
CODITECT Relevance: Provides theoretical foundation for MoE confidence scoring. Out-of-distribution detection techniques directly applicable to detecting queries CODITECT cannot handle confidently.
06-Uncertainty-Supervised-Approach-2024.pdf
Title: Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach
Authors: Linyu Liu, Yu Pan, Xiaocheng Li, Guanting Chen
Venue: arXiv 2024
arXiv: 2404.15993
Abstract: Supervised approach leveraging labeled datasets to estimate uncertainty using LLM hidden activations. Demonstrates robust transferability in out-of-distribution settings.
Key Contribution: Practical supervised method using hidden layer activations. Supports varying model access levels (black-box to white-box).
CODITECT Relevance: Hidden activation-based uncertainty estimation could enhance MoE confidence scoring without requiring model retraining. The supervised approach using labeled data (CODITECT's skill-learnings.json) is immediately applicable.
07-Self-Contradictory-Hallucinations-ICLR2024.pdf
Title: Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation
Venue: ICLR 2024
arXiv: 2305.15852
Abstract: Addresses self-contradiction in LLM outputs (17.7% of ChatGPT sentences). Proposes prompting-based framework to detect and mitigate self-contradictions, which reveal model uncertainty.
Key Contribution: Self-contradiction as uncertainty indicator. Black-box mitigation without external knowledge retrieval. Iterative refinement to remove contradictions while preserving fluency.
CODITECT Relevance: Self-contradiction detection could be a guardrail in agent output validation. When agents produce contradictory statements in their reasoning, it signals low confidence and should trigger re-routing or clarification.
08-ConU-Conformal-Uncertainty-EMNLP2024.pdf
Title: ConU: Conformal Uncertainty in Large Language Models with Correctness Coverage Guarantees
Venue: EMNLP 2024 Findings
arXiv: 2407.00499
Abstract: Applies conformal prediction to black-box LLMs for open-ended NLG tasks. Introduces uncertainty measure based on self-consistency theory, integrated into CP algorithm for correctness coverage guarantees.
Key Contribution: Rigorous statistical guarantees for uncertainty quantification. Outperforms prior SOTA methods across 7 LLMs on 4 NLG datasets with strict correctness coverage control.
CODITECT Relevance: Conformal prediction provides statistical guarantees that confidence scores are calibrated. Could ensure CODITECT's MoE routing decisions have provable reliability guarantees (e.g., "90% of high-confidence routes are correct").
Category 5: Benchmarks
09-AMBROSIA-Benchmark-NeurIPS2024.pdf
Title: AMBROSIA: A Benchmark for Parsing Ambiguous Questions into Database Queries
Authors: Irina Saparina, Mirella Lapata
Venue: NeurIPS 2024 (Datasets & Benchmarks Track, Spotlight)
arXiv: 2406.19073
Abstract: Benchmark with 1,277 ambiguous questions, human-provided unambiguous interpretations, spanning 846 multi-table databases across 16 domains. Three ambiguity types: scope, attachment, vagueness.
Key Contribution: First benchmark for ambiguous semantic parsing. Shows even advanced LLMs struggle with ambiguity identification and interpretation. Llama3-70B best recall but all models fail at finding all interpretations.
CODITECT Relevance: Provides test cases for evaluating CODITECT's disambiguation capabilities. The three ambiguity types (scope, attachment, vagueness) map well to command/agent invocation ambiguities. Could adapt dataset for CODITECT-specific benchmarking.
Application to CODITECT
ADR-026 (Intent Classification) Enhancements
| Paper | Applicable Technique |
|---|---|
| 02 | APA pipeline for confidence scoring |
| 10 | Out-of-scope detection patterns |
| 11 | Semantic routing for fast classification |
| 12 | "Disambiguate first" two-stage approach |
ADR-027 (Guardrail Engine) Enhancements
| Paper | Applicable Technique |
|---|---|
| 07 | Self-contradiction detection guardrail |
| 08 | Conformal prediction for calibrated confidence |
| 05 | Uncertainty frameworks for validation |
MoE Routing Improvements
| Paper | Applicable Technique |
|---|---|
| 03 | Task-level routing strategies |
| 04 | Natural expert specialization |
| 13 | Difficulty-aware orchestration |
| 14 | Role-conditioned context routing |
| 15 | Knowledge boundary formalization |
Research Gaps Identified
- Hierarchical agent routing - Most papers address flat query→agent routing, not orchestrator→specialist→sub-agent chains
- Developer tool intent classification - No benchmarks for AI-powered dev tools
- Multi-turn disambiguation - Papers focus on single-turn; CODITECT needs multi-turn clarification loops
- Continual learning for routing - Online learning from routing mistakes not well-covered
- Cost-aware routing - Limited work on computational cost trade-offs
Suggested Next Steps
- Read Priority Papers: 01, 02, 03, 13 (core concepts)
- Implement: Semantic routing from paper 11 for
/whichfast path - Adapt: APA pipeline from paper 02 for confidence scoring
- Benchmark: Use AMBROSIA (paper 09) patterns for CODITECT eval suite
- Integrate: Self-contradiction detection (paper 07) as guardrail
Created: January 2, 2026
Location: coditect-core/00-ANALYZE-NEW-ARTIFACTS/Ambiguity-and-Intent-Research/
Related ADRs: ADR-026, ADR-027, ADR-007, ADR-008