{"ID":2865634,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.22925","arxiv_id":"2509.22925","title":"Soft-Di[M]O: Improving One-Step Discrete Image Generation with Soft Embeddings","abstract":"One-step generators distilled from Masked Diffusion Models (MDMs) compress multiple sampling steps into a single forward pass, enabling efficient text and image synthesis. However, they suffer two key limitations: they inherit modeling bias from the teacher, and their discrete token outputs block gradient flow, preventing post-distillation refinements such as adversarial training, reward-based fine-tuning, and Test-Time Embedding Optimization (TTEO). In this work, we introduce soft embeddings, a simple relaxation that replaces discrete tokens with the expected embeddings under the generator's output distribution. Soft embeddings preserve representation fidelity for one-step discrete generator while providing a fully differentiable continuous surrogate that is compatible with teacher backbones and tokenizer decoders. Integrating soft embeddings into the Di[M]O distillation framework (denoted Soft-Di[M]O) makes one-step generators end-to-end trainable and enables straightforward application of GAN-based refinement, differentiable reward fine-tuning, and TTEO. Empirically, across multiple MDM teachers (e.g., MaskBit, MaskGen), Soft-Di[M]O achieves state-of-the-art one-step results: improved class-to-image performance, a one-step FID of 1.56 on ImageNet-256 with GAN-based refinement, along with higher GenEval and HPS scores on text-to-image with reward fine-tuning, and further gains from TTEO.","short_abstract":"One-step generators distilled from Masked Diffusion Models (MDMs) compress multiple sampling steps into a single forward pass, enabling efficient text and image synthesis. However, they suffer two key limitations: they inherit modeling bias from the teacher, and their discrete token outputs block gradient flow, prevent...","url_abs":"https://arxiv.org/abs/2509.22925","url_pdf":"https://arxiv.org/pdf/2509.22925v2","authors":"[\"Yuanzhi Zhu\",\"Xi Wang\",\"Stéphane Lathuilière\",\"Vicky Kalogeiton\"]","published":"2025-09-26T20:51:20Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\",\"cs.LG\"]","methods":"[\"Diffusion Model\",\"Generative Adversarial Network\"]","has_code":false}
