{"ID":2895076,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.10814","arxiv_id":"2507.10814","title":"Versatile and Generalizable Manipulation via Goal-Conditioned Reinforcement Learning with Grounded Object Detection","abstract":"General-purpose robotic manipulation, including reach and grasp, is essential for deployment into households and workspaces involving diverse and evolving tasks. Recent advances propose using large pre-trained models, such as Large Language Models and object detectors, to boost robotic perception in reinforcement learning. These models, trained on large datasets via self-supervised learning, can process text prompts and identify diverse objects in scenes, an invaluable skill in RL where learning object interaction is resource-intensive. This study demonstrates how to integrate such models into Goal-Conditioned Reinforcement Learning to enable general and versatile robotic reach and grasp capabilities. We use a pre-trained object detection model to enable the agent to identify the object from a text prompt and generate a mask for goal conditioning. Mask-based goal conditioning provides object-agnostic cues, improving feature sharing and generalization. The effectiveness of the proposed framework is demonstrated in a simulated reach-and-grasp task, where the mask-based goal conditioning consistently maintains a $\\sim$90\\% success rate in grasping both in and out-of-distribution objects, while also ensuring faster convergence to higher returns.","short_abstract":"General-purpose robotic manipulation, including reach and grasp, is essential for deployment into households and workspaces involving diverse and evolving tasks. Recent advances propose using large pre-trained models, such as Large Language Models and object detectors, to boost robotic perception in reinforcement learn...","url_abs":"https://arxiv.org/abs/2507.10814","url_pdf":"https://arxiv.org/pdf/2507.10814v1","authors":"[\"Huiyi Wang\",\"Fahim Shahriar\",\"Alireza Azimi\",\"Gautham Vasan\",\"Rupam Mahmood\",\"Colin Bellinger\"]","published":"2025-07-14T21:21:46Z","proceeding":"cs.RO","tasks":"[\"cs.RO\"]","methods":"[\"Reinforcement Learning\",\"Language Model\"]","has_code":false}
