{"ID":2891242,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.17294","arxiv_id":"2507.17294","title":"VLA-Touch: Enhancing Vision-Language-Action Models with Dual-Level Tactile Feedback","abstract":"Tactile feedback is generally recognized to be crucial for effective interaction with the physical world. However, state-of-the-art Vision-Language-Action (VLA) models lack the ability to interpret and use tactile signals, limiting their effectiveness in contact-rich tasks. Incorporating tactile feedback into these systems is challenging due to the absence of large multi-modal datasets. We present VLA-Touch, an approach that enhances generalist robot policies with tactile sensing \\emph{without fine-tuning} the base VLA. Our method introduces two key innovations: (1) a pipeline that leverages a pretrained tactile-language model that provides semantic tactile feedback for high-level task planning, and (2) a diffusion-based controller that refines VLA-generated actions with tactile signals for contact-rich manipulation. Through real-world experiments, we demonstrate that our dual-level integration of tactile feedback improves task planning efficiency while enhancing execution precision. Code is open-sourced at \\href{https://github.com/jxbi1010/VLA-Touch}{this URL}.","short_abstract":"Tactile feedback is generally recognized to be crucial for effective interaction with the physical world. However, state-of-the-art Vision-Language-Action (VLA) models lack the ability to interpret and use tactile signals, limiting their effectiveness in contact-rich tasks. Incorporating tactile feedback into these sys...","url_abs":"https://arxiv.org/abs/2507.17294","url_pdf":"https://arxiv.org/pdf/2507.17294v2","authors":"[\"Jianxin Bi\",\"Kevin Yuchen Ma\",\"Ce Hao\",\"Mike Zheng Shou\",\"Harold Soh\"]","published":"2025-07-23T07:54:10Z","proceeding":"cs.RO","tasks":"[\"cs.RO\",\"cs.LG\"]","methods":"[\"Diffusion Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":611860,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2891242,"paper_url":"https://arxiv.org/abs/2507.17294","paper_title":"VLA-Touch: Enhancing Vision-Language-Action Models with Dual-Level Tactile Feedback","repo_url":"https://github.com/jxbi1010/VLA-Touch","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}