{"ID":3050183,"CreatedAt":"2026-06-04T02:13:16.786527022Z","UpdatedAt":"2026-06-06T07:53:07.675991959Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.04579","arxiv_id":"2606.04579","title":"SCI-PRM: A Tool Aware Process Reward Model for Scientific Reasoning Verification","abstract":"While Process Reward Models (PRMs) have achieved remarkable success in mathematical reasoning, their application in complex scientific domains-such as biology, chemistry, and physics remains largely unexplored. Scientific problems demand not only logical rigor but also factual consistency and the precise usage of domain-specific tools, areas where current models often suffer from hallucinations and lack of verification. In this paper, we first construct SCIPRM70K, a large-scale dataset featuring Chain-of-Tool trajectories that explicitly interleave reasoning with the execution of scientific tools. Building upon this, we train an efficient reward model called Sci-PRM to provide fine-grained supervision on tool selection, execution accuracy, and result interpretation at each step in one inference. Experiments demonstrate that Sci-PRM significantly enhances foundation models in two key aspects: (1) it enables effective test-time scaling via Best-of-N selection; and (2) when integrated into Reinforcement Learning, it serves as a dense reward signal that mitigates the critical issue of advantage disappearance, allowing the model to break through existing performance ceilings.","short_abstract":"While Process Reward Models (PRMs) have achieved remarkable success in mathematical reasoning, their application in complex scientific domains-such as biology, chemistry, and physics remains largely unexplored. Scientific problems demand not only logical rigor but also factual consistency and the precise usage of domai...","url_abs":"https://arxiv.org/abs/2606.04579","url_pdf":"https://arxiv.org/pdf/2606.04579v1","authors":"[\"Xiangyu Zhao\",\"Hengyuan Zhao\",\"Yiheng Wang\",\"Wanghan Xu\",\"Yuhao Zhou\",\"Qinglong Cao\",\"Zhiwang Zhou\",\"Lei Bai\",\"Wenlong Zhang\",\"Xiao-Ming Wu\"]","published":"2026-06-03T08:13:27Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
