{"ID":2841326,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.12207","arxiv_id":"2511.12207","title":"Mixture of States: Routing Token-Level Dynamics for Multimodal Generation","abstract":"We introduce MoS (Mixture of States), a novel fusion paradigm for multimodal diffusion models that merges modalities using flexible, state-based interactions. The core of MoS is a learnable, token-wise router that creates denoising timestep- and input-dependent interactions between modalities' hidden states, precisely aligning token-level features with the diffusion trajectory. This router sparsely selects the top-$k$ hidden states and is trained with an $ε$-greedy strategy, efficiently selecting contextual features with minimal learnable parameters and negligible computational overhead. We validate our design with text-to-image generation (MoS-Image) and editing (MoS-Editing), which achieve state-of-the-art results. With only 3B to 5B parameters, our models match or surpass counterparts up to $4\\times$ larger. These findings establish MoS as a flexible and compute-efficient paradigm for scaling multimodal diffusion models.","short_abstract":"We introduce MoS (Mixture of States), a novel fusion paradigm for multimodal diffusion models that merges modalities using flexible, state-based interactions. The core of MoS is a learnable, token-wise router that creates denoising timestep- and input-dependent interactions between modalities' hidden states, precisely...","url_abs":"https://arxiv.org/abs/2511.12207","url_pdf":"https://arxiv.org/pdf/2511.12207v2","authors":"[\"Haozhe Liu\",\"Ding Liu\",\"Mingchen Zhuge\",\"Zijian Zhou\",\"Tian Xie\",\"Sen He\",\"Yukang Yang\",\"Shuming Liu\",\"Yuren Cong\",\"Jiadong Guo\",\"Hongyu Xu\",\"Ke Xu\",\"Kam-Woh Ng\",\"Juan C. Pérez\",\"Juan-Manuel Pérez-Rúa\",\"Tao Xiang\",\"Wei Liu\",\"Shikun Liu\",\"Jürgen Schmidhuber\"]","published":"2025-11-15T13:24:57Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Diffusion Model\"]","has_code":false}
