{"ID":2884136,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.07315","arxiv_id":"2508.07315","title":"FlexCTC: GPU-powered CTC Beam Decoding With Advanced Contextual Abilities","abstract":"While beam search improves speech recognition quality over greedy decoding, standard implementations are slow, often sequential, and CPU-bound. To fully leverage modern hardware capabilities, we present a novel open-source FlexCTC toolkit for fully GPU-based beam decoding, designed for Connectionist Temporal Classification (CTC) models. Developed entirely in Python and PyTorch, it offers a fast, user-friendly, and extensible alternative to traditional C++, CUDA, or WFST-based decoders. The toolkit features a high-performance, fully batched GPU implementation with eliminated CPU-GPU synchronization and minimized kernel launch overhead via CUDA Graphs. It also supports advanced contextualization techniques, including GPU-powered N-gram language model fusion and phrase-level boosting. These features enable accurate and efficient decoding, making them suitable for both research and production use.","short_abstract":"While beam search improves speech recognition quality over greedy decoding, standard implementations are slow, often sequential, and CPU-bound. To fully leverage modern hardware capabilities, we present a novel open-source FlexCTC toolkit for fully GPU-based beam decoding, designed for Connectionist Temporal Classifica...","url_abs":"https://arxiv.org/abs/2508.07315","url_pdf":"https://arxiv.org/pdf/2508.07315v2","authors":"[\"Lilit Grigoryan\",\"Vladimir Bataev\",\"Nikolay Karpov\",\"Andrei Andrusenko\",\"Vitaly Lavrukhin\",\"Boris Ginsburg\"]","published":"2025-08-10T12:15:57Z","proceeding":"eess.AS","tasks":"[\"eess.AS\",\"cs.AI\",\"cs.CL\",\"cs.LG\",\"cs.SD\"]","methods":"[\"Language Model\"]","has_code":false}