{"ID":2867151,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.19003","arxiv_id":"2509.19003","title":"Unveiling Chain of Step Reasoning for Vision-Language Models with Fine-grained Rewards","abstract":"Chain of thought reasoning has demonstrated remarkable success in large language models, yet its adaptation to vision-language reasoning remains an open challenge with unclear best practices. Existing attempts typically employ reasoning chains at a coarse-grained level, which struggles to perform fine-grained structured reasoning and, more importantly, are difficult to evaluate the reward and quality of intermediate reasoning. In this work, we delve into chain of step reasoning for vision-language models, enabling assessing reasoning step quality accurately and leading to effective reinforcement learning and inference-time scaling with fine-grained rewards. We present a simple, effective, and fully transparent framework, including the step-level reasoning data, process reward model (PRM), and reinforcement learning training. With the proposed approaches, our models set strong baselines with consistent improvements on challenging vision-language benchmarks. More importantly, we conduct a thorough empirical analysis and ablation study, unveiling the impact of each component and several intriguing properties of inference-time scaling. We believe this paper serves as a baseline for vision-language models and offers insights into more complex multimodal reasoning. Our dataset, PRM, and code will be available at https://github.com/baaivision/CoS.","short_abstract":"Chain of thought reasoning has demonstrated remarkable success in large language models, yet its adaptation to vision-language reasoning remains an open challenge with unclear best practices. Existing attempts typically employ reasoning chains at a coarse-grained level, which struggles to perform fine-grained structure...","url_abs":"https://arxiv.org/abs/2509.19003","url_pdf":"https://arxiv.org/pdf/2509.19003v1","authors":"[\"Honghao Chen\",\"Xingzhou Lou\",\"Xiaokun Feng\",\"Kaiqi Huang\",\"Xinlong Wang\"]","published":"2025-09-23T13:47:32Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Reinforcement Learning\",\"Language Model\"]","has_code":false,"code_links":[{"ID":609443,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2867151,"paper_url":"https://arxiv.org/abs/2509.19003","paper_title":"Unveiling Chain of Step Reasoning for Vision-Language Models with Fine-grained Rewards","repo_url":"https://github.com/baaivision/CoS","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
