{"ID":2890990,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.18553","arxiv_id":"2507.18553","title":"The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm","abstract":"Quantizing the weights of large language models (LLMs) from 16-bit to lower bitwidth is the de facto approach to deploy massive transformers onto more affordable accelerators. While GPTQ emerged as one of the standard methods for one-shot post-training quantization at LLM scale, its inner workings are described as a sequence of algebraic updates that obscure geometric meaning or worst-case guarantees. In this work, we show that, when executed back-to-front (from the last to first dimension) for a linear layer, GPTQ is mathematically identical to Babai's nearest plane algorithm for the classical closest vector problem (CVP) on a lattice defined by the Hessian matrix of the layer's inputs. This equivalence is based on a sophisticated mathematical argument, and has two analytical consequences: first, the GPTQ error propagation step gains an intuitive geometric interpretation; second, GPTQ inherits the error upper bound of Babai's algorithm under the assumption that no weights are clipped. Leveraging this bound, we design post-training quantization methods that avoid clipping, and outperform the original GPTQ. In addition, we provide efficient GPU inference kernels for the resulting representation. Taken together, these results place GPTQ on a firm theoretical footing and open the door to importing decades of progress in lattice algorithms towards the design of future quantization algorithms for billion-parameter models. Source code is available at https://github.com/IST-DASLab/GPTQ-Babai.","short_abstract":"Quantizing the weights of large language models (LLMs) from 16-bit to lower bitwidth is the de facto approach to deploy massive transformers onto more affordable accelerators. While GPTQ emerged as one of the standard methods for one-shot post-training quantization at LLM scale, its inner workings are described as a se...","url_abs":"https://arxiv.org/abs/2507.18553","url_pdf":"https://arxiv.org/pdf/2507.18553v4","authors":"[\"Jiale Chen\",\"Yalda Shabanzadeh\",\"Elvir Crnčević\",\"Torsten Hoefler\",\"Dan Alistarh\"]","published":"2025-07-24T16:22:18Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.DS\",\"cs.IT\"]","methods":"[\"Transformer\",\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":611836,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2890990,"paper_url":"https://arxiv.org/abs/2507.18553","paper_title":"The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm","repo_url":"https://github.com/IST-DASLab/GPTQ-Babai","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}