{"ID":3083700,"CreatedAt":"2026-06-05T06:46:15.197025399Z","UpdatedAt":"2026-06-07T09:00:11.459356253Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.06211","arxiv_id":"2606.06211","title":"FiLM-Based Speaker Conditioning of a SpeechLLM for Pathological Speech Recognition","abstract":"Automatic speech recognition (ASR) has advanced remarkably for standard speech; however, pathological speech from neurological conditions remains a significant challenge. We investigate speaker conditioning via Feature-wise Linear Modulation (FiLM), injecting x-vector-derived information into each transformer layer of a frozen ASR encoder to adapt internal representations to individual pathological speakers without modifying base model weights. We benchmark this for the ASR task against standard and parameter-efficient fine-tuning baselines, complemented by post-processing, on Spanish and English pathological speech. Additionally, we evaluate if the adapted model preserves the ability to answer speech-related questions. Results show that speaker-conditioned ASR is competitive with established adaptation strategies while retaining performance on non-conditioned speech.","short_abstract":"Automatic speech recognition (ASR) has advanced remarkably for standard speech; however, pathological speech from neurological conditions remains a significant challenge. We investigate speaker conditioning via Feature-wise Linear Modulation (FiLM), injecting x-vector-derived information into each transformer layer of...","url_abs":"https://arxiv.org/abs/2606.06211","url_pdf":"https://arxiv.org/pdf/2606.06211v1","authors":"[\"Fernando López\",\"Santosh Kesiraju\",\"Jordi Luque\"]","published":"2026-06-04T14:20:11Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.SD\",\"eess.AS\"]","methods":"[\"Transformer\",\"Large Language Model\"]","has_code":false}
