{"ID":2886045,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.02974","arxiv_id":"2508.02974","title":"Real-time speech enhancement in noise for throat microphone using neural audio codec as foundation model","abstract":"We present a real-time speech enhancement demo using speech captured with a throat microphone. This demo aims to showcase the complete pipeline, from recording to deep learning-based post-processing, for speech captured in noisy environments with a body-conducted microphone. The throat microphone records skin vibrations, which naturally attenuate external noise, but this robustness comes at the cost of reduced audio bandwidth. To address this challenge, we fine-tune Kyutai's Mimi--a neural audio codec supporting real-time inference--on Vibravox, a dataset containing paired air-conducted and throat microphone recordings. We compare this enhancement strategy against state-of-the-art models and demonstrate its superior performance. The inference runs in an interactive interface that allows users to toggle enhancement, visualize spectrograms, and monitor processing latency.","short_abstract":"We present a real-time speech enhancement demo using speech captured with a throat microphone. This demo aims to showcase the complete pipeline, from recording to deep learning-based post-processing, for speech captured in noisy environments with a body-conducted microphone. The throat microphone records skin vibration...","url_abs":"https://arxiv.org/abs/2508.02974","url_pdf":"https://arxiv.org/pdf/2508.02974v1","authors":"[\"Julien Hauret\",\"Thomas Joubaud\",\"Éric Bavu\"]","published":"2025-08-05T00:46:12Z","proceeding":"eess.AS","tasks":"[\"eess.AS\"]","methods":"[]","has_code":false}
