{"ID":2865924,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.21003","arxiv_id":"2509.21003","title":"Query-Based Asymmetric Modeling with Decoupled Input-Output Rates for Speech Restoration","abstract":"Speech restoration in real-world conditions is challenging due to compounded distortions and mismatches between input and desired output rates. Most existing systems assume a fixed and shared input-output rate, relying on external resampling that incurs redundant computation and limits generality. We address this setting by formulating speech restoration under decoupled input-output rates, and propose TF-Restormer, a query-based asymmetric modeling framework. The encoder concentrates analysis on the observed input bandwidth using a time-frequency dual-path architecture, while a lightweight decoder reconstructs missing spectral content via frequency extension queries. This design enables a single model to operate consistently across arbitrary input-output rate pairs without redundant resampling. Experiments across diverse sampling rates, degradations, and operating modes show that TF-Restormer maintains stable restoration behavior and balanced perceptual quality, including in real-time streaming scenarios. Code and demos are available at https://tf-restormer.github.io/demo.","short_abstract":"Speech restoration in real-world conditions is challenging due to compounded distortions and mismatches between input and desired output rates. Most existing systems assume a fixed and shared input-output rate, relying on external resampling that incurs redundant computation and limits generality. We address this setti...","url_abs":"https://arxiv.org/abs/2509.21003","url_pdf":"https://arxiv.org/pdf/2509.21003v3","authors":"[\"Ui-Hyeop Shin\",\"Jaehyun Ko\",\"Woocheol Jeong\",\"Hyung-Min Park\"]","published":"2025-09-25T10:57:13Z","proceeding":"eess.AS","tasks":"[\"eess.AS\"]","methods":"[]","has_code":false}
