{"ID":2898292,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.03343","arxiv_id":"2507.03343","title":"SHNU Multilingual Conversational Speech Recognition System for INTERSPEECH 2025 MLC-SLM Challenge","abstract":"This paper describes SHNU multilingual conversational speech recognition system (SHNU-mASR, team name-\"maybe\"), submitted to Track 1 of the INTERSPEECH 2025 MLC-SLM Challenge. Our system integrates a parallel-speech-encoder architecture with a large language model (LLM) to form a unified multilingual ASR framework. The parallel-speech-encoder consists of two pre-trained encoders, the Whisper-large-v3 encoder and mHuBERT-147 encoder. Their output embeddings are concatenated and fed into the LLM, enabling the model to leverage complementary acoustic and linguistic knowledge and achieve competitive performance. Moreover, we adopt a tri-stage training strategy to jointly update the low-rank adaptation modules and projector parameters of both the speech encoders and the LLM. In addition, we incorporate an additional language-aware prompt at the LLM input to enhance language-specific text generation. The SHNU-mASR system achieves an overall character/word error rate (CER/WER) of 11.76% on the blind evaluation set of the challenge, outperforming the official MLC-SLM baseline by 8.41 absolute CER/WER, without increasing the baseline training data.","short_abstract":"This paper describes SHNU multilingual conversational speech recognition system (SHNU-mASR, team name-\"maybe\"), submitted to Track 1 of the INTERSPEECH 2025 MLC-SLM Challenge. Our system integrates a parallel-speech-encoder architecture with a large language model (LLM) to form a unified multilingual ASR framework. The...","url_abs":"https://arxiv.org/abs/2507.03343","url_pdf":"https://arxiv.org/pdf/2507.03343v2","authors":"[\"Yuxiang Mei\",\"Yuang Zheng\",\"Dongxing Xu\",\"Yanhua Long\"]","published":"2025-07-04T07:10:33Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"eess.AS\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
