{"ID":2835520,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.23186","arxiv_id":"2511.23186","title":"Obstruction reasoning for robotic grasping","abstract":"Successful robotic grasping in cluttered environments not only requires a model to visually ground a target object but also to reason about obstructions that must be cleared beforehand. While current vision-language embodied reasoning models show emergent spatial understanding, they remain limited in terms of obstruction reasoning and accessibility planning. To bridge this gap, we present UNOGrasp, a learning-based vision-language model capable of performing visually-grounded obstruction reasoning to infer the sequence of actions needed to unobstruct the path and grasp the target object. We devise a novel multi-step reasoning process based on obstruction paths originated by the target object. We anchor each reasoning step with obstruction-aware visual cues to incentivize reasoning capability. UNOGrasp combines supervised and reinforcement finetuning through verifiable reasoning rewards. Moreover, we construct UNOBench, a large-scale dataset for both training and benchmarking, based on MetaGraspNetV2, with over 100k obstruction paths annotated by humans with obstruction ratios, contact points, and natural-language instructions. Extensive experiments and real-robot evaluations show that UNOGrasp significantly improves obstruction reasoning and grasp success across both synthetic and real-world environments, outperforming generalist and proprietary alternatives. Project website: https://tev-fbk.github.io/UnoGrasp/.","short_abstract":"Successful robotic grasping in cluttered environments not only requires a model to visually ground a target object but also to reason about obstructions that must be cleared beforehand. While current vision-language embodied reasoning models show emergent spatial understanding, they remain limited in terms of obstructi...","url_abs":"https://arxiv.org/abs/2511.23186","url_pdf":"https://arxiv.org/pdf/2511.23186v1","authors":"[\"Runyu Jiao\",\"Matteo Bortolon\",\"Francesco Giuliari\",\"Alice Fasoli\",\"Sergio Povoli\",\"Guofeng Mei\",\"Yiming Wang\",\"Fabio Poiesi\"]","published":"2025-11-28T13:53:12Z","proceeding":"cs.RO","tasks":"[\"cs.RO\",\"cs.AI\",\"cs.CV\"]","methods":"[\"Language Model\"]","has_code":false}
