{"ID":2881681,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.12061","arxiv_id":"2508.12061","title":"VARAN: Variational Inference for Self-Supervised Speech Models Fine-Tuning on Downstream Tasks","abstract":"Conventional methods for aggregating layers in fine-tuned self-supervised speech models, such as using the final layer or weighted sum, suffer from information bottlenecks and static feature weighting for all dataset examples. We propose VARAN, a framework that dynamically tailors layer aggregation to individual inputs. By employing layer-specialized probing heads and data-dependent weighting, VARAN adaptively prioritizes layer's features based on input. Evaluations on automatic speech recognition and speech emotion recognition tasks demonstrate VARAN's superior performance, particularly when using the LoRA fine-tuning technique. The framework resolves the trade-off between preserving layer-specific information and enabling flexible feature utilization, advancing efficient adaptation of self-supervised speech representations.","short_abstract":"Conventional methods for aggregating layers in fine-tuned self-supervised speech models, such as using the final layer or weighted sum, suffer from information bottlenecks and static feature weighting for all dataset examples. We propose VARAN, a framework that dynamically tailors layer aggregation to individual inputs...","url_abs":"https://arxiv.org/abs/2508.12061","url_pdf":"https://arxiv.org/pdf/2508.12061v1","authors":"[\"Daria Diatlova\",\"Nikita Balagansky\",\"Alexander Varlamov\",\"Egor Spirin\"]","published":"2025-08-16T14:26:59Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"LoRA\"]","has_code":false}
