{"ID":2829170,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.13874","arxiv_id":"2512.13874","title":"SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning","abstract":"As humans, we are natural any-horizon reasoners, i.e., we can decide whether to iteratively skim long videos or watch short ones in full when necessary for a given task. With this in mind, one would expect video reasoning models to reason flexibly across different durations. However, SOTA models are still trained to predict answers in a single turn while processing a large number of frames, akin to watching an entire long video, requiring significant resources. This raises the question: Is it possible to develop performant any-horizon video reasoning systems? Inspired by human behavior, we first propose SAGE, an agent system that performs multi-turn reasoning on long videos while handling simpler problems in a single turn. Secondly, we introduce an easy synthetic data generation pipeline using Gemini-2.5-Flash to train the orchestrator, SAGE-MM, which lies at the core of SAGE. We further propose an effective RL post-training recipe essential for instilling any-horizon reasoning ability in SAGE-MM. Thirdly, we curate SAGE-Bench with an average duration of greater than 700 seconds for evaluating video reasoning ability in real-world entertainment use cases. Lastly, we empirically validate the effectiveness of our system, data, and RL recipe, observing notable improvements of up to 6.1% on open-ended video reasoning tasks, as well as an impressive 8.2% improvement on videos longer than 10 minutes.","short_abstract":"As humans, we are natural any-horizon reasoners, i.e., we can decide whether to iteratively skim long videos or watch short ones in full when necessary for a given task. With this in mind, one would expect video reasoning models to reason flexibly across different durations. However, SOTA models are still trained to pr...","url_abs":"https://arxiv.org/abs/2512.13874","url_pdf":"https://arxiv.org/pdf/2512.13874v2","authors":"[\"Jitesh Jain\",\"Jialuo Li\",\"Zixian Ma\",\"Jieyu Zhang\",\"Chris Dongjoo Kim\",\"Sangho Lee\",\"Rohun Tripathi\",\"Tanmay Gupta\",\"Christopher Clark\",\"Humphrey Shi\"]","published":"2025-12-15T20:14:19Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
