{"ID":2842786,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.09232","arxiv_id":"2511.09232","title":"POTSA: A Cross-Lingual Speech Alignment Framework for Speech-to-Text Translation","abstract":"Speech Large Language Models have achieved breakthroughs in multilingual speech-to-text translation. However, existing approaches often overlook semantic commonalities across source languages, leading to biased translation performance. In this work, we propose POTSA (Parallel Optimal Transport for Speech Alignment), a new framework based on cross-lingual parallel speech pairs and Optimal Transport, designed to bridge high- and low-resource translation gaps. First, we introduce a Bias Compensation module to coarsely align initial speech representations. Second, we impose token-level OT constraints on a Q-Former using parallel pairs to establish fine-grained representation consistency. Then, we apply a layer scheduling strategy to focus OT constraints on semantically beneficial layers. Experiments on FLEURS show our method achieves SOTA performance, with +1.29 BLEU over five common languages and +2.93 BLEU on zero-shot languages, using only 10 hours of parallel speech per language.","short_abstract":"Speech Large Language Models have achieved breakthroughs in multilingual speech-to-text translation. However, existing approaches often overlook semantic commonalities across source languages, leading to biased translation performance. In this work, we propose POTSA (Parallel Optimal Transport for Speech Alignment), a...","url_abs":"https://arxiv.org/abs/2511.09232","url_pdf":"https://arxiv.org/pdf/2511.09232v3","authors":"[\"Xuanchen Li\",\"Chenrui Cui\",\"Tianrui Wang\",\"Meng Ge\",\"Zikang Huang\",\"Yizhou Peng\",\"Jin Li\",\"Yuheng Lu\",\"Yu Jiang\",\"Nyima Tashi\",\"Longbiao Wang\",\"Jianwu Dang\"]","published":"2025-11-12T11:47:56Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.SD\"]","methods":"[\"Language Model\"]","has_code":false}
