{"ID":2898584,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.02302","arxiv_id":"2507.02302","title":"DoMIX: An Efficient Framework for Exploiting Domain Knowledge in Fine-Tuning","abstract":"Domain-Adaptive Pre-training (DAP) has recently gained attention for its effectiveness in fine-tuning pre-trained models. Building on this, continual DAP has been explored to develop pre-trained models capable of incrementally incorporating different domain datasets. However, existing continual DAP methods face several limitations: (1) high computational cost and GPU memory usage during training; (2) sensitivity to incremental data order; and (3) providing a single, generalized model for all end tasks, which contradicts the essence of DAP. In this paper, we propose DoMIX, a novel approach that addresses these challenges by leveraging LoRA modules, a representative parameter-efficient fine-tuning (PEFT) method. Our approach enables efficient and parallel domain-adaptive pre-training that is robust to domain order and effectively utilizes accumulated knowledge to provide tailored pre-trained models for specific tasks. We also demonstrate that our method can be extended beyond the DAP setting to standard LLM fine-tuning scenarios. Code is available at https://github.com/dohoonkim-ai/DoMIX.","short_abstract":"Domain-Adaptive Pre-training (DAP) has recently gained attention for its effectiveness in fine-tuning pre-trained models. Building on this, continual DAP has been explored to develop pre-trained models capable of incrementally incorporating different domain datasets. However, existing continual DAP methods face several...","url_abs":"https://arxiv.org/abs/2507.02302","url_pdf":"https://arxiv.org/pdf/2507.02302v1","authors":"[\"Dohoon Kim\",\"Donghun Kang\",\"Taesup Moon\"]","published":"2025-07-03T04:13:01Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\",\"cs.CV\",\"cs.LG\"]","methods":"[\"Large Language Model\",\"LoRA\"]","has_code":false,"code_links":[{"ID":612416,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2898584,"paper_url":"https://arxiv.org/abs/2507.02302","paper_title":"DoMIX: An Efficient Framework for Exploiting Domain Knowledge in Fine-Tuning","repo_url":"https://github.com/dohoonkim-ai/DoMIX","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
