{"ID":3083573,"CreatedAt":"2026-06-05T06:46:15.197025399Z","UpdatedAt":"2026-06-07T10:04:37.499725329Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.06428","arxiv_id":"2606.06428","title":"Reinforcement Learning Elicits Contextual Learning of Unseen Language Translation","abstract":"Prior work has shown that large language models (LLMs) can translate unseen or low-resource languages by undergoing continued training or even by encoding a grammar book in their context. However, both methods typically overfit specific languages, with limited zero-shot transfer at test time. To translate extremely low-resource languages at scale, we argue that LLMs must acquire the meta-skill of utilizing in-context linguistic knowledge rather than memorizing specific languages. In this paper, we propose a reinforcement learning (RL) approach to unseen language translation given rich linguistic context, using a surface-level translation metric (chrF) as the reward. Empirically, despite the lightweight reward, our RL-trained models effectively extract and apply relevant linguistic information from the provided context, leading to better translations on completely unseen languages than in-context learning or supervised fine-tuning. Our analyses suggest that outcome-based RL can extend beyond conventional reasoning tasks like math and coding to serve as a recipe for language learning from context.","short_abstract":"Prior work has shown that large language models (LLMs) can translate unseen or low-resource languages by undergoing continued training or even by encoding a grammar book in their context. However, both methods typically overfit specific languages, with limited zero-shot transfer at test time. To translate extremely low...","url_abs":"https://arxiv.org/abs/2606.06428","url_pdf":"https://arxiv.org/pdf/2606.06428v1","authors":"[\"Hanxu Hu\",\"Zdeněk Šnajdr\",\"Pinzhen Chen\",\"Jannis Vamvas\",\"Rico Sennrich\"]","published":"2026-06-04T17:32:06Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Reinforcement Learning\",\"Large Language Model\",\"Language Model\"]","has_code":false}
