{"ID":2859679,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.04468","arxiv_id":"2510.04468","title":"Improving IR-based Bug Localization with Semantics-Driven Query Reduction","abstract":"Despite decades of research, software bug localization remains challenging due to heterogeneous content and inherent ambiguities in bug reports. Existing methods, such as Information Retrieval (IR)-based approaches, often attempt to match source documents to bug reports, overlooking the context and semantics of the source code. On the other hand, Large Language Models (LLMs) (e.g., Transformer models) show promising results in understanding both texts and code. However, they have not yet been adapted well to localize software bugs using bug reports. They could also be data or resource-intensive. To bridge this gap, we propose, IQLoc, a novel approach that capitalizes on the strengths of both IR and LLMs for bug localization. In particular, we leverage the transformer-based model's understanding of code semantics to reason about its suspiciousness and to reformulate search queries and thus enhance bug localization using Information Retrieval. To evaluate IQLoc, we refine the Bench4BL benchmark dataset and extend it by incorporating ~30% more recent bug reports, resulting in a benchmark containing ~7.5K bug reports. We evaluated IQLoc using three performance metrics and compare it against eight baseline techniques. Experimental results demonstrate its superiority, achieving up to 100.40% and 78.08% in MAP, 61.49% and 64.58% in MRR, and 76.98% and 100.90% in HIT@K for the test bug reports with random and time-wise splits, respectively. Moreover, IQLoc improves MAP by 118.70% for bug reports with stack traces, 111.87% for those that include code elements, and 127.45% for those containing only descriptions in natural language. By integrating program semantic understanding into Information Retrieval, IQLoc mitigates several longstanding challenges of traditional IR-based approaches in bug localization.","short_abstract":"Despite decades of research, software bug localization remains challenging due to heterogeneous content and inherent ambiguities in bug reports. Existing methods, such as Information Retrieval (IR)-based approaches, often attempt to match source documents to bug reports, overlooking the context and semantics of the sou...","url_abs":"https://arxiv.org/abs/2510.04468","url_pdf":"https://arxiv.org/pdf/2510.04468v2","authors":"[\"Asif Mohammed Samir\",\"Mohammad Masudur Rahman\"]","published":"2025-10-06T03:43:38Z","proceeding":"cs.SE","tasks":"[\"cs.SE\"]","methods":"[\"Transformer\",\"Large Language Model\",\"Language Model\"]","has_code":false}