{"ID":2851127,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.20449","arxiv_id":"2510.20449","title":"LM-mixup: Text Data Augmentation via Language Model based Mixup","abstract":"Instruction tuning is crucial for aligning Large Language Models (LLMs), yet the quality of instruction-following data varies significantly. While high-quality data is paramount, it is often scarce; conversely, abundant low-quality data is frequently discarded, leading to substantial information loss. Existing data augmentation methods struggle to augment this low-quality data effectively, and the evaluation of such techniques remains poorly defined. To address this, we formally define the task of Instruction Distillation: distilling multiple low-quality and redundant inputs into high-quality and coherent instruction-output pairs. Specifically, we introduce a comprehensive data construction pipeline to create MIXTURE, a 144K-sample dataset pairing low-quality or semantically redundant imperfect instruction clusters with their high-quality distillations. We then introduce LM-Mixup, by first performing supervised fine-tuning on MIXTURE and then optimizing it with reinforcement learning. This process uses three complementary reward signals: quality, semantic alignment, and format compliance, via Group Relative Policy Optimization (GRPO). We demonstrate that LM-Mixup effectively augments imperfect datasets: fine-tuning LLMs on its distilled data, which accounts for only about 3% of the entire dataset, not only surpasses full-dataset training but also competes with state-of-the-art high-quality data selection methods across multiple benchmarks. Our work establishes that low-quality data is a valuable resource when properly distilled and augmented with LM-Mixup, significantly enhancing the efficiency and performance of instruction-tuned LLMs.","short_abstract":"Instruction tuning is crucial for aligning Large Language Models (LLMs), yet the quality of instruction-following data varies significantly. While high-quality data is paramount, it is often scarce; conversely, abundant low-quality data is frequently discarded, leading to substantial information loss. Existing data aug...","url_abs":"https://arxiv.org/abs/2510.20449","url_pdf":"https://arxiv.org/pdf/2510.20449v1","authors":"[\"Zhijie Deng\",\"Zhouan Shen\",\"Ling Li\",\"Yao Zhou\",\"Zhaowei Zhu\",\"Yanji He\",\"Wei Wang\",\"Jiaheng Wei\"]","published":"2025-10-23T11:33:35Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Reinforcement Learning\",\"Large Language Model\",\"Language Model\"]","has_code":false}
