{"ID":2921885,"CreatedAt":"2026-06-02T02:42:49.606572591Z","UpdatedAt":"2026-06-03T22:46:55.310989306Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.01483","arxiv_id":"2606.01483","title":"MURMUR: An Efficient Inference System for Long-Form ASR","abstract":"Long-form automatic speech recognition (ASR) requires both high accuracy and low latency, but existing systems force a trade-off between the two. Chunk-based pipelines process audio in parallel windows for low latency, but lose cross-chunk context and need brittle heuristics to align speakers and timestamps at boundaries. Long-context ASR models resolve everything in a single pass for better accuracy, but are an order of magnitude slower. We propose Murmur, an inference system that overcomes this trade-off by operating at two levels. At the inter-chunk level, we revisit the chunk-based pipeline for modern long-context ASR, treating chunk size as a tunable hyperparameter, and show that intermediate chunk sizes strike a good balance of accuracy and latency. At the intra-chunk level, we exploit attention sparsity through a sliding window KV cache eviction policy applied to both output and speech tokens. On AMI-IHM, Murmur matches single-pass accuracy while reducing latency by 4.2x, with further gains from token eviction at less than 1% relative tcpWER degradation. The code of Murmur is available at https://github.com/uw-syfi/Murmur.","short_abstract":"Long-form automatic speech recognition (ASR) requires both high accuracy and low latency, but existing systems force a trade-off between the two. Chunk-based pipelines process audio in parallel windows for low latency, but lose cross-chunk context and need brittle heuristics to align speakers and timestamps at boundari...","url_abs":"https://arxiv.org/abs/2606.01483","url_pdf":"https://arxiv.org/pdf/2606.01483v1","authors":"[\"Wei-Tzu Lee\",\"Keisuke Kamahori\",\"Baris Kasikci\"]","published":"2026-05-31T22:54:57Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\",\"eess.AS\"]","methods":"[]","has_code":false,"code_links":[{"ID":612615,"CreatedAt":"2026-06-02T02:42:49.606572591Z","UpdatedAt":"2026-06-02T02:42:49.606572591Z","DeletedAt":null,"paper_id":2921885,"paper_url":"https://arxiv.org/abs/2606.01483","paper_title":"MURMUR: An Efficient Inference System for Long-Form ASR","repo_url":"https://github.com/uw-syfi/Murmur","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}