{"ID":2888739,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.05658","arxiv_id":"2508.05658","title":"Universally Unfiltered and Unseen:Input-Agnostic Multimodal Jailbreaks against Text-to-Image Model Safeguards","abstract":"Various (text) prompt filters and (image) safety checkers have been implemented to mitigate the misuse of Text-to-Image (T2I) models in creating Not-Safe-For-Work (NSFW) content. In order to expose potential security vulnerabilities of such safeguards, multimodal jailbreaks have been studied. However, existing jailbreaks are limited to prompt-specific and image-specific perturbations, which suffer from poor scalability and time-consuming optimization. To address these limitations, we propose Universally Unfiltered and Unseen (U3)-Attack, a multimodal jailbreak attack method against T2I safeguards. Specifically, U3-Attack optimizes an adversarial patch on the image background to universally bypass safety checkers and optimizes a safe paraphrase set from a sensitive word to universally bypass prompt filters while eliminating redundant computations. Extensive experimental results demonstrate the superiority of our U3-Attack on both open-source and commercial T2I models. For example, on the commercial Runway-inpainting model with both prompt filter and safety checker, our U3-Attack achieves $~4\\times$ higher success rates than the state-of-the-art multimodal jailbreak attack, MMA-Diffusion.","short_abstract":"Various (text) prompt filters and (image) safety checkers have been implemented to mitigate the misuse of Text-to-Image (T2I) models in creating Not-Safe-For-Work (NSFW) content. In order to expose potential security vulnerabilities of such safeguards, multimodal jailbreaks have been studied. However, existing jailbrea...","url_abs":"https://arxiv.org/abs/2508.05658","url_pdf":"https://arxiv.org/pdf/2508.05658v2","authors":"[\"Song Yan\",\"Hui Wei\",\"Jinlong Fei\",\"Guoliang Yang\",\"Zhengyu Zhao\",\"Zheng Wang\"]","published":"2025-07-30T10:06:38Z","proceeding":"cs.CR","tasks":"[\"cs.CR\",\"cs.CV\",\"cs.MM\"]","methods":"[\"Diffusion Model\"]","has_code":false}
