{"ID":2869360,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.14967","arxiv_id":"2509.14967","title":"Affordance-Based Disambiguation of Surgical Instructions for Collaborative Robot-Assisted Surgery","abstract":"Effective human-robot collaboration in surgery is affected by the inherent ambiguity of verbal communication. This paper presents a framework for a robotic surgical assistant that interprets and disambiguates verbal instructions from a surgeon by grounding them in the visual context of the operating field. The system employs a two-level affordance-based reasoning process that first analyzes the surgical scene using a multimodal vision-language model and then reasons about the instruction using a knowledge base of tool capabilities. To ensure patient safety, a dual-set conformal prediction method is used to provide a statistically rigorous confidence measure for robot decisions, allowing it to identify and flag ambiguous commands. We evaluated our framework on a curated dataset of ambiguous surgical requests from cholecystectomy videos, demonstrating a general disambiguation rate of 60% and presenting a method for safer human-robot interaction in the operating room.","short_abstract":"Effective human-robot collaboration in surgery is affected by the inherent ambiguity of verbal communication. This paper presents a framework for a robotic surgical assistant that interprets and disambiguates verbal instructions from a surgeon by grounding them in the visual context of the operating field. The system e...","url_abs":"https://arxiv.org/abs/2509.14967","url_pdf":"https://arxiv.org/pdf/2509.14967v2","authors":"[\"Ana Davila\",\"Jacinto Colan\",\"Yasuhisa Hasegawa\"]","published":"2025-09-18T13:59:56Z","proceeding":"cs.RO","tasks":"[\"cs.RO\",\"cs.HC\"]","methods":"[\"Language Model\"]","has_code":false}