{"ID":2837606,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.19558","arxiv_id":"2511.19558","title":"SPQR: A Standardized Benchmark for Modern Safety Alignment Methods in Text-to-Image Diffusion Models","abstract":"Text-to-image diffusion models can emit copyrighted, unsafe, or private content. Safety alignment aims to suppress specific concepts, yet evaluations seldom test whether safety persists under benign downstream fine-tuning routinely applied after deployment (e.g., LoRA personalization, style/domain adapters). We study the stability of current safety methods under benign fine-tuning and observe frequent breakdowns. As true safety alignment must withstand even benign post-deployment adaptations, we introduce the SPQR benchmark (Safety-Prompt adherence-Quality-Robustness). SPQR is a single-scored metric that provides a standardized and reproducible framework to evaluate how well safety-aligned diffusion models preserve safety, utility, and robustness under benign fine-tuning, by reporting a single leaderboard score to facilitate comparisons. We conduct multilingual, domain-specific, and out-of-distribution analyses, along with category-wise breakdowns, to identify when safety alignment fails after benign fine-tuning, ultimately showcasing SPQR as a concise yet comprehensive benchmark for T2I safety alignment techniques for T2I models.","short_abstract":"Text-to-image diffusion models can emit copyrighted, unsafe, or private content. Safety alignment aims to suppress specific concepts, yet evaluations seldom test whether safety persists under benign downstream fine-tuning routinely applied after deployment (e.g., LoRA personalization, style/domain adapters). We study t...","url_abs":"https://arxiv.org/abs/2511.19558","url_pdf":"https://arxiv.org/pdf/2511.19558v1","authors":"[\"Mohammed Talha Alam\",\"Nada Saadi\",\"Fahad Shamshad\",\"Nils Lukas\",\"Karthik Nandakumar\",\"Fahkri Karray\",\"Samuele Poppi\"]","published":"2025-11-24T14:46:20Z","proceeding":"cs.CR","tasks":"[\"cs.CR\",\"cs.AI\",\"cs.CV\",\"cs.LG\"]","methods":"[\"Diffusion Model\",\"LoRA\"]","has_code":false}
