{"ID":2855361,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.13998","arxiv_id":"2510.13998","title":"BitNet Distillation","abstract":"In this paper, we present BitNet Distillation (BitDistill), a lightweight pipeline that fine-tunes off-the-shelf full-precision LLMs (e.g., Qwen) into 1.58-bit precision (i.e., ternary weights {-1, 0, 1}) for specific downstream tasks, achieving strong task-specific performance with minimal computational cost. Specifically, BitDistill incorporates three key techniques: the SubLN module, as introduced in BitNet; multi-head attention distillation, based on MiniLM; and continual pre-training, which serves as a crucial warm-up step to mitigate the scalability issue of the performance gap between finetuned full-precision and 1.58-bit LLMs on specific tasks. Experimental results show that BitDistill achieves performance comparable to the full-precision counterpart models across model size, while enabling up to 10x memory savings and 2.65x faster inference on CPUs. Code is available at https://github.com/microsoft/BitNet.","short_abstract":"In this paper, we present BitNet Distillation (BitDistill), a lightweight pipeline that fine-tunes off-the-shelf full-precision LLMs (e.g., Qwen) into 1.58-bit precision (i.e., ternary weights {-1, 0, 1}) for specific downstream tasks, achieving strong task-specific performance with minimal computational cost. Specific...","url_abs":"https://arxiv.org/abs/2510.13998","url_pdf":"https://arxiv.org/pdf/2510.13998v1","authors":"[\"Xun Wu\",\"Shaohan Huang\",\"Wenhui Wang\",\"Ting Song\",\"Li Dong\",\"Yan Xia\",\"Furu Wei\"]","published":"2025-10-15T18:28:12Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.CL\"]","methods":"[\"Large Language Model\"]","has_code":false,"code_links":[{"ID":608249,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2855361,"paper_url":"https://arxiv.org/abs/2510.13998","paper_title":"BitNet Distillation","repo_url":"https://github.com/microsoft/BitNet","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
