{"ID":3083783,"CreatedAt":"2026-06-05T06:46:15.197025399Z","UpdatedAt":"2026-06-07T05:32:54.120957816Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.06061","arxiv_id":"2606.06061","title":"A Conversational Framework for Human-Robot Collaborative Manipulation with Distributed Generative AI models","abstract":"This paper presents a distributed conversational framework for human-robot collaborative manipulation that integrates local language and vision-language models (VLMs) with a Robot Operating System 2 (ROS 2)-based execution stack. Language understanding, visual grounding, orchestration, and motion execution run as separate ROS 2 nodes, enabling flexible deployment across distributed hardware while maintaining a responsive control loop. From free-form user commands, the system generates structured action requests for pick, place, and handover. It uses a VLM to return image-space targets, which are converted into metric robot-frame goals using depth and calibration. A web dashboard exposes intermediate intent and grounding overlays (pixel, depth, and robot-frame) and requires explicit operator confirmation before any motion is executed. Experiments on a Franka FR3 platform evaluate end-to-end task reliability and latency under increasing working table scene ambiguity and compare alternative LLM/VLM configurations in the same pipeline. Code and full documentation are available at [github.com/cogrob-tuni/franka-llm](https://github.com/cogrob-tuni/franka-llm).","short_abstract":"This paper presents a distributed conversational framework for human-robot collaborative manipulation that integrates local language and vision-language models (VLMs) with a Robot Operating System 2 (ROS 2)-based execution stack. Language understanding, visual grounding, orchestration, and motion execution run as separ...","url_abs":"https://arxiv.org/abs/2606.06061","url_pdf":"https://arxiv.org/pdf/2606.06061v1","authors":"[\"Arash Ghasemzadeh Kakroudi\",\"Roel Pieters\"]","published":"2026-06-04T12:00:41Z","proceeding":"cs.RO","tasks":"[\"cs.RO\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":612831,"CreatedAt":"2026-06-05T06:46:15.197025399Z","UpdatedAt":"2026-06-05T06:46:15.197025399Z","DeletedAt":null,"paper_id":3083783,"paper_url":"https://arxiv.org/abs/2606.06061","paper_title":"A Conversational Framework for Human-Robot Collaborative Manipulation with Distributed Generative AI models","repo_url":"https://github.com/cogrob-tuni/franka-llm","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
