{"ID":2891887,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.16727","arxiv_id":"2507.16727","title":"Deliberative Searcher: Improving LLM Reliability via Reinforcement Learning with constraints","abstract":"Improving the reliability of large language models (LLMs) is critical for deploying them in real-world scenarios. In this paper, we propose \\textbf{Deliberative Searcher}, the first framework to integrate certainty calibration with retrieval-based search for open-domain question answering. The agent performs multi-step reflection and verification over Wikipedia data and is trained with a reinforcement learning algorithm that optimizes for accuracy under a soft reliability constraint. Empirical results show that proposed method improves alignment between model confidence and correctness, leading to more trustworthy outputs. This paper will be continuously updated.","short_abstract":"Improving the reliability of large language models (LLMs) is critical for deploying them in real-world scenarios. In this paper, we propose \\textbf{Deliberative Searcher}, the first framework to integrate certainty calibration with retrieval-based search for open-domain question answering. The agent performs multi-step...","url_abs":"https://arxiv.org/abs/2507.16727","url_pdf":"https://arxiv.org/pdf/2507.16727v3","authors":"[\"Zhenyun Yin\",\"Shujie Wang\",\"Xuhong Wang\",\"Xingjun Ma\",\"Yinchun Wang\"]","published":"2025-07-22T16:09:34Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[\"Reinforcement Learning\",\"Large Language Model\",\"Language Model\"]","has_code":false}
