{"ID":2845179,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.04063","arxiv_id":"2511.04063","title":"DartQuant: Efficient Rotational Distribution Calibration for LLM Quantization","abstract":"Quantization plays a crucial role in accelerating the inference of large-scale models, and rotational matrices have been shown to effectively improve quantization performance by smoothing outliers. However, end-to-end fine-tuning of rotational optimization algorithms incurs high computational costs and is prone to overfitting. To address this challenge, we propose an efficient distribution-aware rotational calibration method, DartQuant, which reduces the complexity of rotational optimization by constraining the distribution of the activations after rotation. This approach also effectively reduces reliance on task-specific losses, thereby mitigating the risk of overfitting. Additionally, we introduce the QR-Orth optimization scheme, which replaces expensive alternating optimization with a more efficient solution. In a variety of model quantization experiments, DartQuant demonstrates superior performance. Compared to existing methods, it achieves 47$\\times$ acceleration and 10$\\times$ memory savings for rotational optimization on a 70B model. Furthermore, it is the first to successfully complete rotational calibration for a 70B model on a single 3090 GPU, making quantization of large language models feasible in resource-constrained environments. Code is available at https://github.com/CAS-CLab/DartQuant.git.","short_abstract":"Quantization plays a crucial role in accelerating the inference of large-scale models, and rotational matrices have been shown to effectively improve quantization performance by smoothing outliers. However, end-to-end fine-tuning of rotational optimization algorithms incurs high computational costs and is prone to over...","url_abs":"https://arxiv.org/abs/2511.04063","url_pdf":"https://arxiv.org/pdf/2511.04063v1","authors":"[\"Yuantian Shao\",\"Yuanteng Chen\",\"Peisong Wang\",\"Jianlin Yu\",\"Jing Lin\",\"Yiwu Yao\",\"Zhihui Wei\",\"Jian Cheng\"]","published":"2025-11-06T05:05:24Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.CL\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":607348,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2845179,"paper_url":"https://arxiv.org/abs/2511.04063","paper_title":"DartQuant: Efficient Rotational Distribution Calibration for LLM Quantization","repo_url":"https://github.com/CAS-CLab/DartQuant.git","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
