{"ID":2895146,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.09499","arxiv_id":"2507.09499","title":"The DKU System for Multi-Speaker Automatic Speech Recognition in MLC-SLM Challenge","abstract":"We present the DKU system for Task 2 of the MLC-SLM Challenge, which aims to perform multi-speaker automatic speech recognition directly from raw audio without Oracle speaker labels or time boundaries. Our approach builds upon a diarization-aware framework integrating speaker embeddings and temporal utterance boundaries into a Qwen2.5-based large language model (LLM). Then, we enhance the system's multilingual performance by fine-tuning language-specific adapters and LoRA modules within the LLM decoder. Finally, our system achieves the tcpWER of 23.56\\% and 18.08\\% on the development and test sets of the MLC-SLM dataset, substantially outperforming the official baseline.","short_abstract":"We present the DKU system for Task 2 of the MLC-SLM Challenge, which aims to perform multi-speaker automatic speech recognition directly from raw audio without Oracle speaker labels or time boundaries. Our approach builds upon a diarization-aware framework integrating speaker embeddings and temporal utterance boundarie...","url_abs":"https://arxiv.org/abs/2507.09499","url_pdf":"https://arxiv.org/pdf/2507.09499v1","authors":"[\"Yuke Lin\",\"Ming Cheng\",\"Ze Li\",\"Ming Li\"]","published":"2025-07-13T05:30:39Z","proceeding":"eess.AS","tasks":"[\"eess.AS\",\"cs.SD\"]","methods":"[\"Large Language Model\",\"Language Model\",\"LoRA\"]","has_code":false}
