{"ID":2851379,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.20984","arxiv_id":"2510.20984","title":"Learning Grouped Lattice Vector Quantizers for Low-Bit LLM Compression","abstract":"Large Language Models (LLMs) have demonstrated remarkable capabilities but typically require extensive computational resources and memory for inference. Post-training quantization (PTQ) can effectively reduce these demands by storing weights in lower bit-width formats. However, standard uniform quantization often leads to notable performance degradation, particularly in low-bit scenarios. In this work, we introduce a Grouped Lattice Vector Quantization (GLVQ) framework that assigns each group of weights a customized lattice codebook, defined by a learnable generation matrix. To address the non-differentiability of the quantization process, we adopt Babai rounding to approximate nearest-lattice-point search during training, which enables stable optimization of the generation matrices. Once trained, decoding reduces to a simple matrix-vector multiplication, yielding an efficient and practical quantization pipeline. Experiments on multiple benchmarks show that our approach achieves a better trade-off between model size and accuracy compared to existing post-training quantization baselines, highlighting its effectiveness in deploying large models under stringent resource constraints. Our source code is available on GitHub repository: https://github.com/xzhang9308/GLVQ.","short_abstract":"Large Language Models (LLMs) have demonstrated remarkable capabilities but typically require extensive computational resources and memory for inference. Post-training quantization (PTQ) can effectively reduce these demands by storing weights in lower bit-width formats. However, standard uniform quantization often leads...","url_abs":"https://arxiv.org/abs/2510.20984","url_pdf":"https://arxiv.org/pdf/2510.20984v2","authors":"[\"Xi Zhang\",\"Xiaolin Wu\",\"Jiamang Wang\",\"Weisi Lin\"]","published":"2025-10-23T20:19:48Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":607896,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2851379,"paper_url":"https://arxiv.org/abs/2510.20984","paper_title":"Learning Grouped Lattice Vector Quantizers for Low-Bit LLM Compression","repo_url":"https://github.com/xzhang9308/GLVQ","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
