{"ID":2825510,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.21204","arxiv_id":"2512.21204","title":"SpidR-Adapt: A Universal Speech Representation Model for Few-Shot Adaptation","abstract":"Human infants, with only a few hundred hours of speech exposure, acquire basic units of new languages, highlighting a striking efficiency gap compared to the data-hungry self-supervised speech models. To address this gap, this paper introduces SpidR-Adapt for rapid adaptation of speech units to new languages using minimal unlabeled data. We cast such low-resource speech representation learning as a meta-learning problem and construct a multi-task adaptive pre-training (MAdaPT) protocol which formulates the adaptation process as a bi-level optimization framework. To enable scalable meta-training under this framework, we propose a novel heuristic solution, first-order bi-level optimization (FOBLO), avoiding heavy computation costs. Finally, we stabilize meta-training by using a robust initialization through interleaved supervision which alternates self-supervised and supervised objectives. Empirically, SpidR-Adapt achieves rapid gains in phonemic discriminability (ABX) and downstream spoken language modeling scores (sWUGGY, sBLIMP, tSC), surpassing in-domain toplines after training on less than 1h of target-language audio and delivering $100\\times$ greater data efficiency than standard multi-task training. These findings highlight a practical, architecture-agnostic path toward biologically inspired, data-efficient representations. We open-source the training code and model checkpoints at https://github.com/facebookresearch/spidr-adapt.","short_abstract":"Human infants, with only a few hundred hours of speech exposure, acquire basic units of new languages, highlighting a striking efficiency gap compared to the data-hungry self-supervised speech models. To address this gap, this paper introduces SpidR-Adapt for rapid adaptation of speech units to new languages using mini...","url_abs":"https://arxiv.org/abs/2512.21204","url_pdf":"https://arxiv.org/pdf/2512.21204v2","authors":"[\"Mahi Luthra\",\"Jiayi Shen\",\"Maxime Poli\",\"Angelo Ortiz\",\"Yosuke Higuchi\",\"Youssef Benchekroun\",\"Martin Gleize\",\"Charles-Eric Saint-James\",\"Dongyan Lin\",\"Phillip Rust\",\"Angel Villar\",\"Surya Parimi\",\"Vanessa Stark\",\"Rashel Moritz\",\"Juan Pino\",\"Yann LeCun\",\"Emmanuel Dupoux\"]","published":"2025-12-24T14:33:16Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[\"Language Model\"]","has_code":false,"code_links":[{"ID":605670,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2825510,"paper_url":"https://arxiv.org/abs/2512.21204","paper_title":"SpidR-Adapt: A Universal Speech Representation Model for Few-Shot Adaptation","repo_url":"https://github.com/facebookresearch/spidr-adapt","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
