{"ID":2886618,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.10009","arxiv_id":"2508.10009","title":"Beyond Hard Sharing: Efficient Multi-Task Speech-to-Text Modeling with Supervised Mixture of Experts","abstract":"Hard-parameter sharing is a common strategy to train a single model jointly across diverse tasks. However, this often leads to task interference, impeding overall model performance. To address the issue, we propose a simple yet effective Supervised Mixture of Experts (S-MoE). Unlike traditional Mixture of Experts models, S-MoE eliminates the need for training gating functions by utilizing special guiding tokens to route each task to its designated expert. By assigning each task to a separate feedforward network, S-MoE overcomes the limitations of hard-parameter sharing. We further apply S-MoE to a speech-to-text model, enabling the model to process mixed-bandwidth input while jointly performing automatic speech recognition (ASR) and speech translation (ST). Experimental results demonstrate the effectiveness of the proposed S-MoE, achieving a 6.35% relative improvement in Word Error Rate (WER) when applied to both the encoder and decoder.","short_abstract":"Hard-parameter sharing is a common strategy to train a single model jointly across diverse tasks. However, this often leads to task interference, impeding overall model performance. To address the issue, we propose a simple yet effective Supervised Mixture of Experts (S-MoE). Unlike traditional Mixture of Experts model...","url_abs":"https://arxiv.org/abs/2508.10009","url_pdf":"https://arxiv.org/pdf/2508.10009v1","authors":"[\"Hojun Jin\",\"Eunsoo Hong\",\"Ziwon Hyung\",\"Sungjun Lim\",\"Seungjin Lee\",\"Keunseok Cho\"]","published":"2025-08-05T23:56:11Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\",\"cs.SD\",\"eess.AS\"]","methods":"[\"Mixture of Experts\"]","has_code":false}
