Ambiguity and Intent Research Index

Academic papers on LLM disambiguation, intent classification, and uncertainty quantification (2024-2025)

Downloaded: January 2, 2026 Total Papers: 15 Total Size: ~45 MB

Quick Reference

#	Paper	Venue	Year	Relevance
01	Disambiguation in Conversational QA Survey	EMNLP	2025	Disambiguation taxonomy
02	Aligning LLMs to Handle Ambiguity	EMNLP	2024	APA training pipeline
03	Mixture of Experts in LLMs Survey	arXiv	2025	MoE routing strategies
04	OLMoE: Open Mixture-of-Experts	arXiv	2024	Open-source MoE
05	Uncertainty Estimation Survey	arXiv	2024	Theoretical frameworks
06	Uncertainty: Supervised Approach	arXiv	2024	Hidden activation method
07	Self-Contradictory Hallucinations	ICLR	2024	Contradiction detection
08	ConU: Conformal Uncertainty	EMNLP	2024	Statistical guarantees
09	AMBROSIA Benchmark	NeurIPS	2024	Ambiguity benchmark
10	MIntRec2.0 Dataset	ICLR	2024	Intent recognition
11	Semantic Routing for Intent	arXiv	2024	Deterministic routing
12	Disambiguate First, Parse Later	arXiv	2025	Two-stage approach
13	Difficulty-Aware Agent Orchestration	arXiv	2024	DAAO framework
14	RCR-Router: Role-Aware Routing	arXiv	2025	Context routing
15	Talk to Right Specialists	arXiv	2025	Multi-agent QA

Category 1: Disambiguation and Clarification

01-Disambiguation-ConvQA-Survey-EMNLP2025.pdf

Title: Disambiguation in Conversational Question Answering in the Era of LLMs and Agents: A Survey

Authors: Md Mehrab Tanjim, Yeonjun In, Xiang Chen, Victor S. Bursztyn, Ryan A. Rossi, Sungchul Kim, et al.

Venue: EMNLP 2025 (accepted)

arXiv: 2505.12543

Abstract: Comprehensive survey examining how ambiguity remains a fundamental challenge in NLP, particularly in conversational QA systems powered by LLMs. Provides definitions, categorizes disambiguation strategies, and analyzes trade-offs.

Key Contribution: First comprehensive framework for understanding disambiguation approaches in the LLM era, including agentic systems. Classification of various disambiguation strategies enabled by modern language models.

CODITECT Relevance: Directly applicable to designing disambiguation strategies for the /which command and MoE routing. Provides taxonomy of ambiguity types and resolution approaches that can inform guardrail design.

02-Aligning-LLMs-Handle-Ambiguity-EMNLP2024.pdf

Title: Aligning Language Models to Explicitly Handle Ambiguity

Authors: Hyuhng Joon Kim, Youna Kim, Cheonbok Park, Junyeob Kim, Choonghyun Park, Kang Min Yoo, Sang-goo Lee, Taeuk Kim

Venue: EMNLP 2024

arXiv: 2404.11972

Abstract: Addresses how LLMs struggle with ambiguous inputs containing ellipsis or imprecision. Introduces Alignment with Perceived Ambiguity (APA), a pipeline that trains models to recognize and manage ambiguous inputs using uncertainty assessments.

Key Contribution: Novel training approach that uses model's own uncertainty to detect ambiguity rather than external labels. Outperforms gold-standard label training in out-of-distribution scenarios.

CODITECT Relevance: The APA pipeline could be adapted for MoE confidence scoring. Using model uncertainty to trigger clarification is directly applicable to the disambiguation workflow.

12-Disambiguate-First-Parse-Later-2025.pdf

Title: Disambiguate First, Parse Later: Generating Interpretations for Ambiguity Resolution in Semantic Parsing

Venue: arXiv 2025

arXiv: 2502.18448

Abstract: Proposes using natural language to spell out ambiguity before mapping to logical forms. Leverages LLMs to generate preferred interpretations, then uses specialized infilling models.

Key Contribution: Novel two-stage approach: disambiguate in natural language first, then parse. Tested on AMBROSIA benchmark.

CODITECT Relevance: The "disambiguate first" principle aligns perfectly with CODITECT's workflow where /which should clarify intent before routing to agents. The approach of generating multiple interpretations maps to the MoE candidate generation phase.

Category 2: Mixture of Experts Routing

03-MoE-in-LLMs-Survey-2025.pdf

Title: Mixture of Experts in Large Language Models

Authors: Danyang Zhang, Junhao Song, Ziqian Bi, Xinyuan Song, Yingfang Yuan, Tianyang Wang, Joe Yeong, Junfeng Hao

Venue: arXiv 2025

arXiv: 2507.11181

Abstract: Comprehensive review of MoE architectures in LLMs covering expert gating, routing systems, hierarchical configurations, and sparse MoE designs.

Key Contribution: Systematic categorization of MoE routing strategies: token-level, modality-level, task-level, and other-level routing. Analysis of load balancing challenges and expert specialization.

CODITECT Relevance: Directly applicable to MoE task classification system. The task-level routing section is particularly relevant. Insights on load balancing and expert specialization inform agent dispatcher design.

04-OLMoE-Open-MoE-2024.pdf

Title: OLMoE: Open Mixture-of-Experts Language Models

Authors: Niklas Muennighoff, Luca Soldaini, Dirk Groeneveld, Kyle Lo, et al. (AI2, UW)

Venue: arXiv 2024

arXiv: 2409.02060

Abstract: Fully open-source sparse MoE model with 7B total parameters, 1B active per token. Trained on 5 trillion tokens with high expert specialization emerging naturally.

Key Contribution: First fully open MoE model with complete training artifacts (data, code, logs, checkpoints). Demonstrates natural expert specialization without explicit constraint.

CODITECT Relevance: Proof that expert specialization emerges naturally with proper architecture. OLMoE's routing mechanism could inspire CODITECT's agent dispatcher. Open-source nature allows experimentation with routing strategies.

13-Difficulty-Aware-Agent-Orchestration-2024.pdf

Title: Difficulty-Aware Agent Orchestration in LLM-Powered Workflows

Venue: arXiv 2024

arXiv: 2509.11079

Abstract: Proposes DAAO (Difficulty-Aware Agentic Orchestration), which adapts workflow depth, operator selection, and LLM assignment based on query difficulty. Uses VAE for difficulty estimation, modular operator allocator, and cost-aware LLM router.

Key Contribution: Dynamic routing based on task difficulty rather than static rules. Outperforms prior multi-agent systems in both accuracy and efficiency.

CODITECT Relevance: CODITECT could implement difficulty-aware routing where simple tasks go to lightweight agents and complex tasks trigger multi-agent orchestration. Cost-aware routing aligns with token optimization goals.

14-RCR-Router-Context-Routing-2025.pdf

Title: RCR-Router: Role-Aware Context Routing for Multi-Agent LLM Systems

Venue: arXiv 2025

arXiv: 2508.04903

Abstract: Dynamic, role-conditioned routing with semantic importance scoring and token budget constraints.

Key Contribution: Selecting semantically relevant context per agent role.

CODITECT Relevance: CODITECT agents have roles (specialists). RCR-Router's approach to selecting semantically relevant context per agent role directly applies to memory/context management.

15-Talk-Right-Specialists-QA-2025.pdf

Title: Talk to Right Specialists: Routing and Planning in Multi-agent System for QA

Venue: arXiv 2025

arXiv: 2501.07813

Abstract: Describing knowledge boundaries of agents to enable appropriate routing.

Key Contribution: Formalizing agent capability boundaries for routing decisions.

CODITECT Relevance: CODITECT agents have explicit capability definitions (AGENT.md files). Formalizing "knowledge boundaries" would improve routing accuracy.

Category 3: Intent Classification

10-MIntRec2-Intent-Dataset-ICLR2024.pdf

Title: MIntRec2.0: A Large-scale Benchmark Dataset for Multimodal Intent Recognition and Out-of-scope Detection in Conversations

Authors: Hanlei Zhang, Xin Wang, Hua Xu, Qianrui Zhou, Kai Gao, Jianhua Su, Jinyue Zhao, Wenrui Li, Yanting Chen

Venue: ICLR 2024

arXiv: 2403.10943

Abstract: First large-scale dataset for multimodal intent recognition and out-of-scope detection in multi-party conversations. 15K utterances across 30 intent classes with text, video, and audio modalities.

Key Contribution: Reveals 30%+ performance gap between LLMs and humans (71% human accuracy). Establishes challenging benchmark for intent classification with out-of-scope detection.

CODITECT Relevance: Out-of-scope detection is critical for MoE routing—knowing when NO agent is appropriate. The multimodal aspect is relevant for future CODITECT extensions (e.g., screenshot analysis, voice commands).

11-Semantic-Routing-Intent-2024.pdf

Title: Semantic Routing for Enhanced Performance of LLM-Assisted Intent-Based 5G Core Network Management

Venue: arXiv 2024

arXiv: 2404.15869

Abstract: Uses semantic routing with vector embeddings to achieve deterministic routing instead of LLM-based classification. Addresses scalability limitations of prompt-based methods and hallucination risks.

Key Contribution: Shows semantic router improves accuracy and efficiency vs. standalone LLM prompting. Deterministic routing reduces hallucination in intent extraction.

CODITECT Relevance: Semantic routing using pre-computed embeddings offers fast, low-cost intent classification. Could replace or supplement LLM-based routing in /which for common queries. The deterministic aspect addresses reliability concerns.

Category 4: Uncertainty Quantification

05-Uncertainty-Estimation-Survey-2024.pdf

Title: A Survey of Uncertainty Estimation in LLMs: Theory Meets Practice

Authors: Hsiu-Yuan Huang, Yutong Yang, Zhaoxi Zhang, Sanwoo Lee, Yunfang Wu

Venue: arXiv 2024

arXiv: 2410.15326

Abstract: Comprehensive survey distinguishing uncertainty from confidence, integrating theoretical frameworks (Bayesian, information-theoretic, ensemble-based) to systematize uncertainty estimation methods.

Key Contribution: Bridges theory and practice by categorizing methods within established frameworks. Addresses specific challenges for LLM uncertainty in real-world scenarios.

CODITECT Relevance: Provides theoretical foundation for MoE confidence scoring. Out-of-distribution detection techniques directly applicable to detecting queries CODITECT cannot handle confidently.

06-Uncertainty-Supervised-Approach-2024.pdf

Title: Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach

Authors: Linyu Liu, Yu Pan, Xiaocheng Li, Guanting Chen

Venue: arXiv 2024

arXiv: 2404.15993

Abstract: Supervised approach leveraging labeled datasets to estimate uncertainty using LLM hidden activations. Demonstrates robust transferability in out-of-distribution settings.

Key Contribution: Practical supervised method using hidden layer activations. Supports varying model access levels (black-box to white-box).

CODITECT Relevance: Hidden activation-based uncertainty estimation could enhance MoE confidence scoring without requiring model retraining. The supervised approach using labeled data (CODITECT's skill-learnings.json) is immediately applicable.

07-Self-Contradictory-Hallucinations-ICLR2024.pdf

Title: Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation

Venue: ICLR 2024

arXiv: 2305.15852

Abstract: Addresses self-contradiction in LLM outputs (17.7% of ChatGPT sentences). Proposes prompting-based framework to detect and mitigate self-contradictions, which reveal model uncertainty.

Key Contribution: Self-contradiction as uncertainty indicator. Black-box mitigation without external knowledge retrieval. Iterative refinement to remove contradictions while preserving fluency.

CODITECT Relevance: Self-contradiction detection could be a guardrail in agent output validation. When agents produce contradictory statements in their reasoning, it signals low confidence and should trigger re-routing or clarification.

08-ConU-Conformal-Uncertainty-EMNLP2024.pdf

Title: ConU: Conformal Uncertainty in Large Language Models with Correctness Coverage Guarantees

Venue: EMNLP 2024 Findings

arXiv: 2407.00499

Abstract: Applies conformal prediction to black-box LLMs for open-ended NLG tasks. Introduces uncertainty measure based on self-consistency theory, integrated into CP algorithm for correctness coverage guarantees.

Key Contribution: Rigorous statistical guarantees for uncertainty quantification. Outperforms prior SOTA methods across 7 LLMs on 4 NLG datasets with strict correctness coverage control.

CODITECT Relevance: Conformal prediction provides statistical guarantees that confidence scores are calibrated. Could ensure CODITECT's MoE routing decisions have provable reliability guarantees (e.g., "90% of high-confidence routes are correct").

Category 5: Benchmarks

09-AMBROSIA-Benchmark-NeurIPS2024.pdf

Title: AMBROSIA: A Benchmark for Parsing Ambiguous Questions into Database Queries

Authors: Irina Saparina, Mirella Lapata

Venue: NeurIPS 2024 (Datasets & Benchmarks Track, Spotlight)

arXiv: 2406.19073

Abstract: Benchmark with 1,277 ambiguous questions, human-provided unambiguous interpretations, spanning 846 multi-table databases across 16 domains. Three ambiguity types: scope, attachment, vagueness.

Key Contribution: First benchmark for ambiguous semantic parsing. Shows even advanced LLMs struggle with ambiguity identification and interpretation. Llama3-70B best recall but all models fail at finding all interpretations.

CODITECT Relevance: Provides test cases for evaluating CODITECT's disambiguation capabilities. The three ambiguity types (scope, attachment, vagueness) map well to command/agent invocation ambiguities. Could adapt dataset for CODITECT-specific benchmarking.

Application to CODITECT

ADR-026 (Intent Classification) Enhancements

Paper	Applicable Technique
02	APA pipeline for confidence scoring
10	Out-of-scope detection patterns
11	Semantic routing for fast classification
12	"Disambiguate first" two-stage approach

ADR-027 (Guardrail Engine) Enhancements

Paper	Applicable Technique
07	Self-contradiction detection guardrail
08	Conformal prediction for calibrated confidence
05	Uncertainty frameworks for validation

MoE Routing Improvements

Paper	Applicable Technique
03	Task-level routing strategies
04	Natural expert specialization
13	Difficulty-aware orchestration
14	Role-conditioned context routing
15	Knowledge boundary formalization

Research Gaps Identified

Hierarchical agent routing - Most papers address flat query→agent routing, not orchestrator→specialist→sub-agent chains
Developer tool intent classification - No benchmarks for AI-powered dev tools
Multi-turn disambiguation - Papers focus on single-turn; CODITECT needs multi-turn clarification loops
Continual learning for routing - Online learning from routing mistakes not well-covered
Cost-aware routing - Limited work on computational cost trade-offs

Suggested Next Steps

Read Priority Papers: 01, 02, 03, 13 (core concepts)
Implement: Semantic routing from paper 11 for /which fast path
Adapt: APA pipeline from paper 02 for confidence scoring
Benchmark: Use AMBROSIA (paper 09) patterns for CODITECT eval suite
Integrate: Self-contradiction detection (paper 07) as guardrail

Created: January 2, 2026 Location: coditect-core/00-ANALYZE-NEW-ARTIFACTS/Ambiguity-and-Intent-Research/ Related ADRs: ADR-026, ADR-027, ADR-007, ADR-008

Quick Reference​

Category 1: Disambiguation and Clarification​

01-Disambiguation-ConvQA-Survey-EMNLP2025.pdf​

02-Aligning-LLMs-Handle-Ambiguity-EMNLP2024.pdf​

12-Disambiguate-First-Parse-Later-2025.pdf​

Category 2: Mixture of Experts Routing​

03-MoE-in-LLMs-Survey-2025.pdf​

04-OLMoE-Open-MoE-2024.pdf​

13-Difficulty-Aware-Agent-Orchestration-2024.pdf​

14-RCR-Router-Context-Routing-2025.pdf​

15-Talk-Right-Specialists-QA-2025.pdf​

Category 3: Intent Classification​

10-MIntRec2-Intent-Dataset-ICLR2024.pdf​

11-Semantic-Routing-Intent-2024.pdf​

Category 4: Uncertainty Quantification​

05-Uncertainty-Estimation-Survey-2024.pdf​

06-Uncertainty-Supervised-Approach-2024.pdf​

07-Self-Contradictory-Hallucinations-ICLR2024.pdf​

08-ConU-Conformal-Uncertainty-EMNLP2024.pdf​

Category 5: Benchmarks​

09-AMBROSIA-Benchmark-NeurIPS2024.pdf​

Application to CODITECT​

ADR-026 (Intent Classification) Enhancements​

ADR-027 (Guardrail Engine) Enhancements​

MoE Routing Improvements​

Research Gaps Identified​

Suggested Next Steps​

Quick Reference

Category 1: Disambiguation and Clarification

01-Disambiguation-ConvQA-Survey-EMNLP2025.pdf

02-Aligning-LLMs-Handle-Ambiguity-EMNLP2024.pdf

12-Disambiguate-First-Parse-Later-2025.pdf

Category 2: Mixture of Experts Routing

03-MoE-in-LLMs-Survey-2025.pdf

04-OLMoE-Open-MoE-2024.pdf

13-Difficulty-Aware-Agent-Orchestration-2024.pdf

14-RCR-Router-Context-Routing-2025.pdf

15-Talk-Right-Specialists-QA-2025.pdf

Category 3: Intent Classification

10-MIntRec2-Intent-Dataset-ICLR2024.pdf

11-Semantic-Routing-Intent-2024.pdf

Category 4: Uncertainty Quantification

05-Uncertainty-Estimation-Survey-2024.pdf

06-Uncertainty-Supervised-Approach-2024.pdf

07-Self-Contradictory-Hallucinations-ICLR2024.pdf

08-ConU-Conformal-Uncertainty-EMNLP2024.pdf

Category 5: Benchmarks

09-AMBROSIA-Benchmark-NeurIPS2024.pdf

Application to CODITECT

ADR-026 (Intent Classification) Enhancements

ADR-027 (Guardrail Engine) Enhancements

MoE Routing Improvements

Research Gaps Identified

Suggested Next Steps