{"ID":2845750,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.03361","arxiv_id":"2511.03361","title":"Open Source State-Of-the-Art Solution for Romanian Speech Recognition","abstract":"In this work, we present a new state-of-the-art Romanian Automatic Speech Recognition (ASR) system based on NVIDIA's FastConformer architecture--explored here for the first time in the context of Romanian. We train our model on a large corpus of, mostly, weakly supervised transcriptions, totaling over 2,600 hours of speech. Leveraging a hybrid decoder with both Connectionist Temporal Classification (CTC) and Token-Duration Transducer (TDT) branches, we evaluate a range of decoding strategies including greedy, ALSD, and CTC beam search with a 6-gram token-level language model. Our system achieves state-of-the-art performance across all Romanian evaluation benchmarks, including read, spontaneous, and domain-specific speech, with up to 27% relative WER reduction compared to previous best-performing systems. In addition to improved transcription accuracy, our approach demonstrates practical decoding efficiency, making it suitable for both research and deployment in low-latency ASR applications.","short_abstract":"In this work, we present a new state-of-the-art Romanian Automatic Speech Recognition (ASR) system based on NVIDIA's FastConformer architecture--explored here for the first time in the context of Romanian. We train our model on a large corpus of, mostly, weakly supervised transcriptions, totaling over 2,600 hours of sp...","url_abs":"https://arxiv.org/abs/2511.03361","url_pdf":"https://arxiv.org/pdf/2511.03361v1","authors":"[\"Gabriel Pirlogeanu\",\"Alexandru-Lucian Georgescu\",\"Horia Cucu\"]","published":"2025-11-05T11:02:16Z","proceeding":"eess.AS","tasks":"[\"eess.AS\",\"cs.AI\"]","methods":"[\"Language Model\"]","has_code":false}
