“Who won the last elections?” Detecting Underspecified Queries in Question Answering with LLMs
Yunchong Huang

Abstract:
This thesis presents a comprehensive investigation into semantically underspecified user queries in Question Answering (QA) scenarios of human-machine interaction. Drawing from a wide range of theoretical insights in linguistics and existing research in Natural Language Processing (NLP), we first establish a foundational understanding of semantic underspecification and tailor it to a definition in QA settings. Based on this, we propose a working taxonomy of underspecified queries in QA that is both theoretically grounded in linguistics and empirically validated by data distributions observed in diverse QA datasets. We then demonstrate through experiments that the prompt-based integration of this taxonomy into the off-the-shelf State-of-the-Art (SotA) Large Language Models (LLMs) significantly improves the detection accuracy of underspecified queries, verifying the effectiveness of a taxonomy-driven supervision. Applying the best-performing taxonomy-integrated LLM-based classifier to large-scale general QA datasets, we identify fully specified and underspecified query subsets, and reveal that underspecified queries are widely present. Furthermore, SotA proprietary LLMs are consistently evaluated to underperform on these underspecified queries in QA tasks. Since this pattern potentially stems from the lack of consideration of underspecified queries in the construction of existing QA datasets instead of model limitations, it raises fundamental concerns about the reliability and utility of current QA benchmarks and underscores the need to explicitly account for underspecified queries in future QA dataset development and related LLM research.