{"ID":2841086,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.12742","arxiv_id":"2511.12742","title":"Stabilizing Self-Consuming Diffusion Models with Latent Space Filtering","abstract":"As synthetic data proliferates across the Internet, it is often reused to train successive generations of generative models. This creates a ``self-consuming loop\" that can lead to training instability or \\textit{model collapse}. Common strategies to address the issue -- such as accumulating historical training data or injecting fresh real data -- either increase computational cost or require expensive human annotation. In this paper, we empirically analyze the latent space dynamics of self-consuming diffusion models and observe that the low-dimensional structure of latent representations extracted from synthetic data degrade over generations. Based on this insight, we propose \\textit{Latent Space Filtering} (LSF), a novel approach that mitigates model collapse by filtering out less realistic synthetic data from mixed datasets. Theoretically, we present a framework that connects latent space degradation to empirical observations. Experimentally, we show that LSF consistently outperforms existing baselines across multiple real-world datasets, effectively mitigating model collapse without increasing training cost or relying on human annotation.","short_abstract":"As synthetic data proliferates across the Internet, it is often reused to train successive generations of generative models. This creates a ``self-consuming loop\" that can lead to training instability or \\textit{model collapse}. Common strategies to address the issue -- such as accumulating historical training data or...","url_abs":"https://arxiv.org/abs/2511.12742","url_pdf":"https://arxiv.org/pdf/2511.12742v1","authors":"[\"Zhongteng Cai\",\"Yaxuan Wang\",\"Yang Liu\",\"Xueru Zhang\"]","published":"2025-11-16T19:17:00Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Diffusion Model\"]","has_code":false}
