{"ID":2877886,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.18649","arxiv_id":"2508.18649","title":"PRISM: Robust VLM Alignment with Principled Reasoning for Integrated Safety in Multimodality","abstract":"Safeguarding vision-language models (VLMs) is a critical challenge, as existing methods often suffer from over-defense, which harms utility, or rely on shallow alignment, failing to detect complex threats that require deep reasoning. To this end, we introduc PRISM (Principled Reasoning for Integrated Safety in Multimodality), a System 2-like framework that aligns VLMs through a structured four-stage reasoning process explicitly designed to handle three distinct categories of multimodal safety violations. Our framework consists of two key components: a structured reasoning pipeline that analyzes each violation category in dedicated stages, and PRISM-DPO, generated via Monte Carlo Tree Search (MCTS) to refine reasoning quality through Direct Preference Optimization. Comprehensive evaluations show that PRISM substantially reduces attack success rates on JailbreakV-28K and VLBreak, improves robustness against adaptive attacks, and generalizes to out-of-distribution multi-image threats, while better preserving model utility on benign multimodal benchmarks. Our code, data, and model weights available at https://github.com/SaFoLab-WISC/PRISM.","short_abstract":"Safeguarding vision-language models (VLMs) is a critical challenge, as existing methods often suffer from over-defense, which harms utility, or rely on shallow alignment, failing to detect complex threats that require deep reasoning. To this end, we introduc PRISM (Principled Reasoning for Integrated Safety in Multimod...","url_abs":"https://arxiv.org/abs/2508.18649","url_pdf":"https://arxiv.org/pdf/2508.18649v2","authors":"[\"Nanxi Li\",\"Zhengyue Zhao\",\"G. Edward Suh\",\"Marco Pavone\",\"Chaowei Xiao\"]","published":"2025-08-26T03:45:19Z","proceeding":"cs.CR","tasks":"[\"cs.CR\",\"cs.AI\"]","methods":"[\"Language Model\"]","has_code":false,"code_links":[{"ID":610424,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2877886,"paper_url":"https://arxiv.org/abs/2508.18649","paper_title":"PRISM: Robust VLM Alignment with Principled Reasoning for Integrated Safety in Multimodality","repo_url":"https://github.com/SaFoLab-WISC/PRISM","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
