{"ID":2864583,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.23122","arxiv_id":"2509.23122","title":"Stochastic Interpolants via Conditional Dependent Coupling","abstract":"Existing image generation models face critical challenges regarding the trade-off between computation and fidelity. Specifically, models relying on a pretrained Variational Autoencoder (VAE) suffer from information loss, limited detail, and the inability to support end-to-end training. In contrast, models operating directly in the pixel space incur prohibitive computational cost. Although cascade models can mitigate computational cost, stage-wise separation prevents effective end-to-end optimization, hampers knowledge sharing, and often results in inaccurate distribution learning within each stage. To address these challenges, we introduce a unified multistage generative framework based on our proposed Conditional Dependent Coupling strategy. It decomposes the generative process into interpolant trajectories at multiple stages, ensuring accurate distribution learning while enabling end-to-end optimization. Importantly, the entire process is modeled as a single unified Diffusion Transformer, eliminating the need for disjoint modules and also enabling knowledge sharing. Extensive experiments demonstrate that our method achieves both high fidelity and efficiency across multiple resolutions.","short_abstract":"Existing image generation models face critical challenges regarding the trade-off between computation and fidelity. Specifically, models relying on a pretrained Variational Autoencoder (VAE) suffer from information loss, limited detail, and the inability to support end-to-end training. In contrast, models operating dir...","url_abs":"https://arxiv.org/abs/2509.23122","url_pdf":"https://arxiv.org/pdf/2509.23122v1","authors":"[\"Chenrui Ma\",\"Xi Xiao\",\"Tianyang Wang\",\"Xiao Wang\",\"Yanning Shen\"]","published":"2025-09-27T05:03:08Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Diffusion Model\",\"Transformer\",\"Variational Autoencoder\"]","has_code":false}
