{"ID":2846171,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.02454","arxiv_id":"2511.02454","title":"Improving DF-Conformer Using Hydra For High-Fidelity Generative Speech Enhancement on Discrete Codec Token","abstract":"The Dilated FAVOR Conformer (DF-Conformer) is an efficient variant of the Conformer architecture designed for speech enhancement (SE). It employs fast attention through positive orthogonal random features (FAVOR+) to mitigate the quadratic complexity associated with self-attention, while utilizing dilated convolution to expand the receptive field. This combination results in impressive performance across various SE models. In this paper, we propose replacing FAVOR+ with bidirectional selective structured state-space sequence models to achieve two main objectives:(1) enhancing global sequential modeling by eliminating the approximations inherent in FAVOR+, and (2) maintaining linear complexity relative to the sequence length. Specifically, we utilize Hydra, a bidirectional extension of Mamba, framed within the structured matrix mixer framework. Experiments conducted using a generative SE model on discrete codec tokens, known as Genhancer, demonstrate that the proposed method surpasses the performance of the DF-Conformer.","short_abstract":"The Dilated FAVOR Conformer (DF-Conformer) is an efficient variant of the Conformer architecture designed for speech enhancement (SE). It employs fast attention through positive orthogonal random features (FAVOR+) to mitigate the quadratic complexity associated with self-attention, while utilizing dilated convolution t...","url_abs":"https://arxiv.org/abs/2511.02454","url_pdf":"https://arxiv.org/pdf/2511.02454v1","authors":"[\"Shogo Seki\",\"Shaoxiang Dang\",\"Li Li\"]","published":"2025-11-04T10:32:49Z","proceeding":"cs.SD","tasks":"[\"cs.SD\"]","methods":"[]","has_code":false}
