{"ID":2884414,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.07014","arxiv_id":"2508.07014","title":"TurboBias: Universal ASR Context-Biasing powered by GPU-accelerated Phrase-Boosting Tree","abstract":"Recognizing specific key phrases is an essential task for contextualized Automatic Speech Recognition (ASR). However, most existing context-biasing approaches have limitations associated with the necessity of additional model training, significantly slow down the decoding process, or constrain the choice of the ASR system type. This paper proposes a universal ASR context-biasing framework that supports all major types: CTC, Transducers, and Attention Encoder-Decoder models. The framework is based on a GPU-accelerated word boosting tree, which enables it to be used in shallow fusion mode for greedy and beam search decoding without noticeable speed degradation, even with a vast number of key phrases (up to 20K items). The obtained results showed high efficiency of the proposed method, surpassing the considered open-source context-biasing approaches in accuracy and decoding speed. Our context-biasing framework is open-sourced as a part of the NeMo toolkit.","short_abstract":"Recognizing specific key phrases is an essential task for contextualized Automatic Speech Recognition (ASR). However, most existing context-biasing approaches have limitations associated with the necessity of additional model training, significantly slow down the decoding process, or constrain the choice of the ASR sys...","url_abs":"https://arxiv.org/abs/2508.07014","url_pdf":"https://arxiv.org/pdf/2508.07014v2","authors":"[\"Andrei Andrusenko\",\"Vladimir Bataev\",\"Lilit Grigoryan\",\"Vitaly Lavrukhin\",\"Boris Ginsburg\"]","published":"2025-08-09T15:27:07Z","proceeding":"eess.AS","tasks":"[\"eess.AS\",\"cs.AI\",\"cs.CL\",\"cs.SD\"]","methods":"[]","has_code":false}
