{"ID":2884688,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.06206","arxiv_id":"2508.06206","title":"Affordance-R1: Reinforcement Learning for Generalizable Affordance Reasoning in Multimodal Large Language Model","abstract":"Affordance grounding focuses on predicting the specific regions of objects that are associated with the actions to be performed by robots. It plays a vital role in the fields of human-robot interaction, human-object interaction, embodied manipulation, and embodied perception. Existing models often neglect the affordance shared among different objects because they lack the Chain-of-Thought(CoT) reasoning abilities, limiting their out-of-domain (OOD) generalization and explicit reasoning capabilities. To address these challenges, we propose Affordance-R1, the first unified affordance grounding framework that integrates cognitive CoT guided Group Relative Policy Optimization (GRPO) within a reinforcement learning paradigm. Specifically, we designed a sophisticated affordance function, which contains format, perception, and cognition rewards to effectively guide optimization directions. Furthermore, we constructed a high-quality affordance-centric reasoning dataset, ReasonAff, to support training. Trained exclusively via reinforcement learning with GRPO and without explicit reasoning data, Affordance-R1 achieves robust zero-shot generalization and exhibits emergent test-time reasoning capabilities. Comprehensive experiments demonstrate that our model outperforms well-established methods and exhibits open-world generalization. To the best of our knowledge, Affordance-R1 is the first to integrate GRPO-based RL with reasoning into affordance reasoning. The code of our method and our dataset is released on https://github.com/hq-King/Affordance-R1.","short_abstract":"Affordance grounding focuses on predicting the specific regions of objects that are associated with the actions to be performed by robots. It plays a vital role in the fields of human-robot interaction, human-object interaction, embodied manipulation, and embodied perception. Existing models often neglect the affordanc...","url_abs":"https://arxiv.org/abs/2508.06206","url_pdf":"https://arxiv.org/pdf/2508.06206v5","authors":"[\"Hanqing Wang\",\"Shaoyang Wang\",\"Yiming Zhong\",\"Zemin Yang\",\"Jiamin Wang\",\"Zhiqing Cui\",\"Jiahao Yuan\",\"Yifan Han\",\"Mingyu Liu\",\"Yuexin Ma\"]","published":"2025-08-08T10:39:04Z","proceeding":"cs.RO","tasks":"[\"cs.RO\",\"cs.CV\"]","methods":"[\"Reinforcement Learning\",\"Language Model\"]","has_code":false,"code_links":[{"ID":611102,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2884688,"paper_url":"https://arxiv.org/abs/2508.06206","paper_title":"Affordance-R1: Reinforcement Learning for Generalizable Affordance Reasoning in Multimodal Large Language Model","repo_url":"https://github.com/hq-King/Affordance-R1","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
