{"ID":2923672,"CreatedAt":"2026-06-02T04:05:25.881865328Z","UpdatedAt":"2026-06-04T13:12:39.622923895Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.02237","arxiv_id":"2606.02237","title":"Why Are DMD Students Lazy? Understanding the Copying Behavior in Few-Step Distillation","abstract":"Distribution Matching Distillation (DMD) compresses pretrained diffusion models into efficient few-step generators by aligning their noised distributions across all scales. In principle, such distribution-level supervision remains agnostic to specific noise-data pairings of the teacher; this provides the student the freedom to remap latent noise, a behavior consistently observed in low-dimensional settings. Surprisingly, we find that in high-dimensional settings, distilled students spontaneously reproduce the original noise-data pairings of the teacher, a phenomenon we term copying. We demonstrate that copying is neither a byproduct of adversarial objectives nor a result of teacher memorization. Instead, our evidence suggests that copying is an emergent property arising from the limited geometric freedom of the student model during high-dimensional distillation.","short_abstract":"Distribution Matching Distillation (DMD) compresses pretrained diffusion models into efficient few-step generators by aligning their noised distributions across all scales. In principle, such distribution-level supervision remains agnostic to specific noise-data pairings of the teacher; this provides the student the fr...","url_abs":"https://arxiv.org/abs/2606.02237","url_pdf":"https://arxiv.org/pdf/2606.02237v1","authors":"[\"Shucheng Li\",\"Iolo Jones\",\"Alexander Tong\",\"Michael M. Bronstein\"]","published":"2026-06-01T13:30:19Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Diffusion Model\"]","has_code":false}