{"ID":2851201,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.20579","arxiv_id":"2510.20579","title":"Open-o3-Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence","abstract":"Most video reasoning models only generate textual reasoning traces without indicating when and where key evidence appears. Recent models such as OpenAI-o3 have sparked wide interest in evidence-centered reasoning for images, yet extending this ability to videos is more challenging due to the need for joint temporal tracking and spatial localization across dynamic scenes. We introduce Open-o3-Video, a non-agent framework that integrates explicit spatio-temporal evidence into video reasoning by highlighting key timestamps, objects, and bounding boxes, making the reasoning process traceable and verifiable. To enable this capability, we first construct high-quality datasets STGR that provide unified spatio-temporal supervision, which is absent in existing resources. We further adopt a cold-start reinforcement learning strategy with specially designed rewards that jointly encourage answer accuracy, temporal alignment, and spatial precision. On the V-STAR benchmark, Open-o3-Video achieves state-of-the-art performance, improving mAM by 14.4% and mLGM by 24.2% over the Qwen2.5-VL baseline, and shows consistent gains across a range of video understanding benchmarks. Beyond accuracy, the grounded reasoning traces produced by Open-o3-Video support confidence-aware test-time scaling, improving answer reliability.","short_abstract":"Most video reasoning models only generate textual reasoning traces without indicating when and where key evidence appears. Recent models such as OpenAI-o3 have sparked wide interest in evidence-centered reasoning for images, yet extending this ability to videos is more challenging due to the need for joint temporal tra...","url_abs":"https://arxiv.org/abs/2510.20579","url_pdf":"https://arxiv.org/pdf/2510.20579v2","authors":"[\"Jiahao Meng\",\"Xiangtai Li\",\"Haochen Wang\",\"Yue Tan\",\"Tao Zhang\",\"Lingdong Kong\",\"Yunhai Tong\",\"Anran Wang\",\"Zhiyang Teng\",\"Yujing Wang\",\"Zhuochen Wang\"]","published":"2025-10-23T14:05:56Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\",\"cs.MM\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
