{"ID":2892927,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.13598","arxiv_id":"2507.13598","title":"GIFT: Gradient-aware Immunization of diffusion models against malicious Fine-Tuning with safe concepts retention","abstract":"We present GIFT: a {G}radient-aware {I}mmunization technique to defend diffusion models against malicious {F}ine-{T}uning while preserving their ability to generate safe content. Existing safety mechanisms like safety checkers are easily bypassed, and concept erasure methods fail under adversarial fine-tuning. GIFT addresses this by framing immunization as a bi-level optimization problem: the upper-level objective degrades the model's ability to represent harmful concepts using representation noising and maximization, while the lower-level objective preserves performance on safe data. GIFT achieves robust resistance to malicious fine-tuning while maintaining safe generative quality. Experimental results show that our method significantly impairs the model's ability to re-learn harmful concepts while maintaining performance on safe content, offering a promising direction for creating inherently safer generative models resistant to adversarial fine-tuning attacks.","short_abstract":"We present GIFT: a {G}radient-aware {I}mmunization technique to defend diffusion models against malicious {F}ine-{T}uning while preserving their ability to generate safe content. Existing safety mechanisms like safety checkers are easily bypassed, and concept erasure methods fail under adversarial fine-tuning. GIFT add...","url_abs":"https://arxiv.org/abs/2507.13598","url_pdf":"https://arxiv.org/pdf/2507.13598v1","authors":"[\"Amro Abdalla\",\"Ismail Shaheen\",\"Dan DeGenaro\",\"Rupayan Mallick\",\"Bogdan Raita\",\"Sarah Adel Bargal\"]","published":"2025-07-18T01:47:07Z","proceeding":"cs.CR","tasks":"[\"cs.CR\",\"cs.AI\",\"cs.CV\",\"cs.LG\"]","methods":"[\"Diffusion Model\"]","has_code":false}
