{"ID":2831531,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.07222","arxiv_id":"2512.07222","title":"Pay Less Attention to Function Words for Free Robustness of Vision-Language Models","abstract":"To address the trade-off between robustness and performance for robust VLM, we observe that function words could incur vulnerability of VLMs against cross-modal adversarial attacks, and propose Function-word De-Attention (FDA) accordingly to mitigate the impact of function words. Similar to differential amplifiers, our FDA calculates the original and the function-word cross-attention within attention heads, and differentially subtracts the latter from the former for more aligned and robust VLMs. Comprehensive experiments include 2 SOTA baselines under 6 different attacks on 2 downstream tasks, 3 datasets, and 3 models. Overall, our FDA yields an average 18/13/53% ASR drop with only 0.2/0.3/0.6% performance drops on the 3 tested models on retrieval, and a 90% ASR drop with a 0.3% performance gain on visual grounding. We demonstrate the scalability, generalization, and zero-shot performance of FDA experimentally, as well as in-depth ablation studies and analysis. Code is available at https://github.com/michaeltian108/FDA.","short_abstract":"To address the trade-off between robustness and performance for robust VLM, we observe that function words could incur vulnerability of VLMs against cross-modal adversarial attacks, and propose Function-word De-Attention (FDA) accordingly to mitigate the impact of function words. Similar to differential amplifiers, our...","url_abs":"https://arxiv.org/abs/2512.07222","url_pdf":"https://arxiv.org/pdf/2512.07222v4","authors":"[\"Qiwei Tian\",\"Chenhao Lin\",\"Zhengyu Zhao\",\"Chao Shen\"]","published":"2025-12-08T07:05:18Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.CL\"]","methods":"[\"Language Model\"]","has_code":false,"code_links":[{"ID":606133,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2831531,"paper_url":"https://arxiv.org/abs/2512.07222","paper_title":"Pay Less Attention to Function Words for Free Robustness of Vision-Language Models","repo_url":"https://github.com/michaeltian108/FDA","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}