{"ID":2873378,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.06544","arxiv_id":"2509.06544","title":"Reason to Retrieve: Enhancing Query Understanding through Decomposition and Interpretation","abstract":"Query understanding (QU) aims to accurately infer user intent to improve document retrieval. It plays a vital role in modern search engines. While large language models (LLMs) have made notable progress in this area, their effectiveness has primarily been studied on short, keyword-based queries. With the rise of AI-driven search, long-form queries with complex intent become increasingly common, but they are underexplored in the context of LLM-based QU. To address this gap, we introduce ReDI, a reasoning-enhanced query understanding method through decomposition and interpretation. ReDI uses the reasoning and understanding capabilities of LLMs within a three-stage pipeline. (i) It decomposes a complex query into a set of targeted sub-queries to capture the user intent. (ii) It enriches each sub-query with detailed semantic interpretations to enhance the retrieval of intent-document matching. And (iii), after independently retrieving documents for each sub-query, ReDI uses a fusion strategy to aggregate the results and obtain the final ranking. We collect a large-scale dataset of real-world complex queries from a commercial search engine and distill the query understanding capabilities of DeepSeek-R1 into small models for practical application. Experiments on public benchmarks, including BRIGHT and BEIR, show that ReDI consistently outperforms strong baselines in both sparse and dense retrieval paradigms, demonstrating its effectiveness. We release our code, generated sub-queries, and interpretations at https://github.com/youngbeauty250/ReDI.","short_abstract":"Query understanding (QU) aims to accurately infer user intent to improve document retrieval. It plays a vital role in modern search engines. While large language models (LLMs) have made notable progress in this area, their effectiveness has primarily been studied on short, keyword-based queries. With the rise of AI-dri...","url_abs":"https://arxiv.org/abs/2509.06544","url_pdf":"https://arxiv.org/pdf/2509.06544v4","authors":"[\"Yunfei Zhong\",\"Jun Yang\",\"Yixing Fan\",\"Lixin Su\",\"Maarten de Rijke\",\"Ruqing Zhang\",\"Xueqi Cheng\"]","published":"2025-09-08T10:58:42Z","proceeding":"cs.IR","tasks":"[\"cs.IR\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":610049,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2873378,"paper_url":"https://arxiv.org/abs/2509.06544","paper_title":"Reason to Retrieve: Enhancing Query Understanding through Decomposition and Interpretation","repo_url":"https://github.com/youngbeauty250/ReDI","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
