{"ID":2867090,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.21379","arxiv_id":"2509.21379","title":"SAEmnesia: Erasing Concepts in Diffusion Models with Supervised Sparse Autoencoders","abstract":"Concept unlearning in diffusion models is hampered by feature splitting, where concepts are distributed across many latent features, making their removal challenging and computationally expensive. We introduce SAEmnesia, a supervised sparse autoencoder framework that overcomes this by enforcing one-to-one concept-neuron mappings. By systematically labeling concepts during training, our method achieves feature centralization, binding each concept to a single, interpretable neuron. This enables highly targeted and efficient concept erasure. Compared to the state-of-the-art sparse autoencoder-based unlearning approach, SAEmnesia reduces hyperparameter search by 96.67% and achieves a 9.22% improvement on the UnlearnCanvas benchmark for objects. Our method also shows superior scalability in sequential unlearning, improving accuracy by 28.4% when removing nine objects, establishing a step forward for precise and controllable concept erasure. Moreover, SAEmnesia effectively suppresses nudity on the I2P benchmark and remains robust to adversarial attacks. Source code available at https://github.com/EIDOSLAB/SAEmnesia.","short_abstract":"Concept unlearning in diffusion models is hampered by feature splitting, where concepts are distributed across many latent features, making their removal challenging and computationally expensive. We introduce SAEmnesia, a supervised sparse autoencoder framework that overcomes this by enforcing one-to-one concept-neuro...","url_abs":"https://arxiv.org/abs/2509.21379","url_pdf":"https://arxiv.org/pdf/2509.21379v3","authors":"[\"Enrico Cassano\",\"Riccardo Renzulli\",\"Marco Nurisso\",\"Mirko Zaffaroni\",\"Alan Perotti\",\"Marco Grangetto\"]","published":"2025-09-23T11:29:30Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\"]","methods":"[\"Diffusion Model\"]","has_code":false,"code_links":[{"ID":609438,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2867090,"paper_url":"https://arxiv.org/abs/2509.21379","paper_title":"SAEmnesia: Erasing Concepts in Diffusion Models with Supervised Sparse Autoencoders","repo_url":"https://github.com/EIDOSLAB/SAEmnesia","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
