{"ID":2892939,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.13618","arxiv_id":"2507.13618","title":"Seed-X: Building Strong Multilingual Translation LLM with 7B Parameters","abstract":"Multilingual translation stands as a challenging task for large language models (LLMs) to handle intricate language patterns and stilted translations that arise in automated translations. In this paper, we introduce Seed-X, a family of open-source LLMs comprising instruct and reasoning models, pushing the limits of translation capability with 7B parameter size. The base model is pre-trained on a diverse, high-quality dataset encompassing both monolingual and bilingual content across 28 languages, harnessing the full potential of multilingual data. The instruct model is then finetuned to translate by Chain-of-Thought (CoT) reasoning and further enhanced through reinforcement learning (RL) to achieve better generalization across diverse language pairs. Seed-X achieves performance comparable to leading closed-source models, including Gemini-2.5 and GPT-4o, across 28 languages, and significantly outperforms larger open-source models in both automatic metrics and human evaluations. We share the best practices through our optimization process, and make the parameter public available for advancing translation research and applications.","short_abstract":"Multilingual translation stands as a challenging task for large language models (LLMs) to handle intricate language patterns and stilted translations that arise in automated translations. In this paper, we introduce Seed-X, a family of open-source LLMs comprising instruct and reasoning models, pushing the limits of tra...","url_abs":"https://arxiv.org/abs/2507.13618","url_pdf":"https://arxiv.org/pdf/2507.13618v4","authors":"[\"Shanbo Cheng\",\"Yu Bao\",\"Qian Cao\",\"Luyang Huang\",\"Liyan Kang\",\"Zhicheng Liu\",\"Yu Lu\",\"Wenhao Zhu\",\"Jingwen Chen\",\"Zhichao Huang\",\"Tao Li\",\"Yifu Li\",\"Huiying Lin\",\"Sitong Liu\",\"Ningxin Peng\",\"Shuaijie She\",\"Lu Xu\",\"Nuo Xu\",\"Sen Yang\",\"Runsheng Yu\",\"Yiming Yu\",\"Liehao Zou\",\"Hang Li\",\"Lu Lu\",\"Yuxuan Wang\",\"Yonghui Wu\"]","published":"2025-07-18T03:19:43Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[\"Reinforcement Learning\",\"Large Language Model\",\"Language Model\"]","has_code":false}
