{"ID":2852906,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.17759","arxiv_id":"2510.17759","title":"VERA-V: Variational Inference Framework for Jailbreaking Vision-Language Models","abstract":"Vision-Language Models (VLMs) extend large language models with visual reasoning, but their multimodal design also introduces new, underexplored vulnerabilities. Existing multimodal red-teaming methods largely rely on brittle templates, focus on single-attack settings, and expose only a narrow subset of vulnerabilities. To address these limitations, we introduce VERA-V, a variational inference framework that recasts multimodal jailbreak discovery as learning a joint posterior distribution over paired text-image prompts. This probabilistic view enables the generation of stealthy, coupled adversarial inputs that bypass model guardrails. We train a lightweight attacker to approximate the posterior, allowing efficient sampling of diverse jailbreaks and providing distributional insights into vulnerabilities. VERA-V further integrates three complementary strategies: (i) typography-based text prompts that embed harmful cues, (ii) diffusion-based image synthesis that introduces adversarial signals, and (iii) structured distractors to fragment VLM attention. Experiments on HarmBench and HADES benchmarks show that VERA-V consistently outperforms state-of-the-art baselines on both open-source and frontier VLMs, achieving up to 53.75% higher attack success rate (ASR) over the best baseline on GPT-4o. We include the code on the project page available here: https://github.com/kxwhiowo/VERA-V","short_abstract":"Vision-Language Models (VLMs) extend large language models with visual reasoning, but their multimodal design also introduces new, underexplored vulnerabilities. Existing multimodal red-teaming methods largely rely on brittle templates, focus on single-attack settings, and expose only a narrow subset of vulnerabilities...","url_abs":"https://arxiv.org/abs/2510.17759","url_pdf":"https://arxiv.org/pdf/2510.17759v2","authors":"[\"Qilin Liao\",\"Anamika Lochab\",\"Ruqi Zhang\"]","published":"2025-10-20T17:12:10Z","proceeding":"cs.CR","tasks":"[\"cs.CR\",\"cs.CL\",\"cs.CV\",\"cs.LG\",\"stat.ML\"]","methods":"[\"Diffusion Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":608041,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2852906,"paper_url":"https://arxiv.org/abs/2510.17759","paper_title":"VERA-V: Variational Inference Framework for Jailbreaking Vision-Language Models","repo_url":"https://github.com/kxwhiowo/VERA-V","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
