{"ID":3050390,"CreatedAt":"2026-06-04T02:13:16.786527022Z","UpdatedAt":"2026-06-05T07:50:16.0004273Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.04050","arxiv_id":"2606.04050","title":"LiftQuant: Continuous Bit-Width LLM via Dimensional Lifting and Projection","abstract":"Existing quantization methods are fundamentally limited by rigid, integer-based bit-widths (e.g., 2, 3-bit), resulting in a ``deployment gap\" where Large Language Models cannot be optimally fitted to specific memory budgets. To bridge this gap, we introduce LiftQuant, a novel framework that enables continuous bit-width control for true Pareto-optimal deployment. The core innovation is a ``lift-then-project\" mechanism which approximates low-dimensional weight vectors by projecting a simple 1-bit lattice from a higher-dimensional ``lifted\" space. Crucially, the effective bit-width is determined simply by the ratio of the lifted dimension to the original dimension, which allows the bit-width to be tuned quasi-continuous as the dimension is a flexible structural parameter. This projection generates a structured yet non-uniform codebook, capturing the expressive power of Vector Quantization (VQ). While beneficial over VQ, LiftQuant's decoding path relies solely on linear transformations and 1-bit uniform quantizers, retaining hardware-friendly nature. This flexibility is transformative: LiftQuant enables a 70B LLM to be compressed to 2.4 bits to precisely fit a 24GB GPU, where its performance significantly surpasses state-of-the-art 2-bit models fitted on the same device. Our code and ckpt is available at https://github.com/Heliulu/LiftQuant.","short_abstract":"Existing quantization methods are fundamentally limited by rigid, integer-based bit-widths (e.g., 2, 3-bit), resulting in a ``deployment gap\" where Large Language Models cannot be optimally fitted to specific memory budgets. To bridge this gap, we introduce LiftQuant, a novel framework that enables continuous bit-width...","url_abs":"https://arxiv.org/abs/2606.04050","url_pdf":"https://arxiv.org/pdf/2606.04050v1","authors":"[\"Liulu He\",\"XuanAng Liu\",\"Juntao Liu\",\"Taolue Feng\",\"Ting Lu\",\"Chunsheng Gan\",\"Zhiyv Peng\",\"Yuan Du\",\"Huanrui Yang\",\"Yijiang Liu\",\"Li Du\"]","published":"2026-06-02T08:52:04Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":612791,"CreatedAt":"2026-06-04T02:13:16.786527022Z","UpdatedAt":"2026-06-04T02:13:16.786527022Z","DeletedAt":null,"paper_id":3050390,"paper_url":"https://arxiv.org/abs/2606.04050","paper_title":"LiftQuant: Continuous Bit-Width LLM via Dimensional Lifting and Projection","repo_url":"https://github.com/Heliulu/LiftQuant","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}