{"ID":2863890,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.25358","arxiv_id":"2509.25358","title":"SARM: Stage-Aware Reward Modeling for Long Horizon Robot Manipulation","abstract":"Large-scale robot learning has made progress on complex manipulation tasks, yet long horizon, contact rich problems, especially those involving deformable objects, remain challenging due to inconsistent demonstration quality. We propose a stage-aware, video-based reward modeling framework that jointly predicts task stage and fine-grained progress, using natural language subtask annotations to derive consistent labels across variable-length demonstrations. This avoids the brittleness of frame index based labeling and provides stable supervision even in tasks like T-shirt folding. Our reward model is robust to demonstration variability, generalizes to out-of-distribution scenarios, and improves downstream policy training. Building on it, we introduce Reward-Aligned Behavior Cloning (RA-BC), which filters and reweights demonstrations based on reward estimates. Experiments show that our method significantly outperforms baselines in both real-world rollouts and human validation. On T-shirt folding, we achieve 83% success from the flattened state and 67% from the crumpled state, compared to 8% and 0% with vanilla BC. Overall, our results highlight reward modeling as a scalable and annotation-efficient solution for long horizon robotic manipulation. Project website: https://qianzhong-chen.github.io/sarm.github.io/","short_abstract":"Large-scale robot learning has made progress on complex manipulation tasks, yet long horizon, contact rich problems, especially those involving deformable objects, remain challenging due to inconsistent demonstration quality. We propose a stage-aware, video-based reward modeling framework that jointly predicts task sta...","url_abs":"https://arxiv.org/abs/2509.25358","url_pdf":"https://arxiv.org/pdf/2509.25358v4","authors":"[\"Qianzhong Chen\",\"Justin Yu\",\"Mac Schwager\",\"Pieter Abbeel\",\"Yide Shentu\",\"Philipp Wu\"]","published":"2025-09-29T18:07:54Z","proceeding":"cs.RO","tasks":"[\"cs.RO\"]","methods":"[]","has_code":false}
