{"ID":2854661,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.14703","arxiv_id":"2510.14703","title":"ToolPRM: Fine-Grained Inference Scaling of Structured Outputs for Function Calling","abstract":"Large language models (LLMs) excel at function calling, but inference scaling has been explored mainly for unstructured generation. We propose an inference-scaling framework for structured outputs that combines fine-grained beam search with \\textbf{ToolPRM}, a process reward model scoring each intra-call decision (function name and argument filling). We build the first fine-grained intra-call supervision dataset via function masking, rollout collection, and step-level annotation. ToolPRM outperforms outcome and coarse-grained reward models in predictive accuracy and yields consistent test-time gains on multiple function-calling benchmarks. We further show that structured generation follows ``\\textbf{explore more but retain less}'', since early JSON errors are unrecoverable.","short_abstract":"Large language models (LLMs) excel at function calling, but inference scaling has been explored mainly for unstructured generation. We propose an inference-scaling framework for structured outputs that combines fine-grained beam search with \\textbf{ToolPRM}, a process reward model scoring each intra-call decision (func...","url_abs":"https://arxiv.org/abs/2510.14703","url_pdf":"https://arxiv.org/pdf/2510.14703v2","authors":"[\"Jianghao Lin\",\"Yuanyuan Shi\",\"Xin Peng\",\"Renjie Ding\",\"Hairui Wang\",\"Yuxuan Peng\",\"Bizhe Bai\",\"Weixi Song\",\"Fengshuo Bai\",\"Huacan Chai\",\"Weinan Zhang\",\"Fei Huang\",\"Ying Wen\"]","published":"2025-10-16T14:06:03Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}