{"ID":2865386,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.22402","arxiv_id":"2509.22402","title":"ReLAM: Learning Anticipation Model for Rewarding Visual Robotic Manipulation","abstract":"Reward design remains a critical bottleneck in visual reinforcement learning (RL) for robotic manipulation. In simulated environments, rewards are conventionally designed based on the distance to a target position. However, such precise positional information is often unavailable in real-world visual settings due to sensory and perceptual limitations. In this study, we propose a method that implicitly infers spatial distances through keypoints extracted from images. Building on this, we introduce Reward Learning with Anticipation Model (ReLAM), a novel framework that automatically generates dense, structured rewards from action-free video demonstrations. ReLAM first learns an anticipation model that serves as a planner and proposes intermediate keypoint-based subgoals on the optimal path to the final goal, creating a structured learning curriculum directly aligned with the task's geometric objectives. Based on the anticipated subgoals, a continuous reward signal is provided to train a low-level, goal-conditioned policy under the hierarchical reinforcement learning (HRL) framework with provable sub-optimality bound. Extensive experiments on complex, long-horizon manipulation tasks show that ReLAM significantly accelerates learning and achieves superior performance compared to state-of-the-art methods.","short_abstract":"Reward design remains a critical bottleneck in visual reinforcement learning (RL) for robotic manipulation. In simulated environments, rewards are conventionally designed based on the distance to a target position. However, such precise positional information is often unavailable in real-world visual settings due to se...","url_abs":"https://arxiv.org/abs/2509.22402","url_pdf":"https://arxiv.org/pdf/2509.22402v1","authors":"[\"Nan Tang\",\"Jing-Cheng Pang\",\"Guanlin Li\",\"Chao Qian\",\"Yang Yu\"]","published":"2025-09-26T14:28:42Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.RO\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
