{"ID":2886285,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.03332","arxiv_id":"2508.03332","title":"Exploring Layer-wise Information Effectiveness for Post-Training Quantization in Small Language Models","abstract":"Large language models with billions of parameters are often over-provisioned: many layers contribute little unique information yet dominate the memory and energy footprint during inference. We present LieQ Layer-wise information effectiveness Quantization, a hardware-native, metric-driven post-training quantization framework that addresses the critical challenge of maintaining accuracy in sub-8B models, model parameters less than 8B, under extreme low-bit compression. LieQ keeps uniform bit-width within each layer while mixing precision across layers, preserving standard multiplication kernels and avoiding irregular memory access, codebooks, or irregular formats at inference time. Our method uncovers a strong correlation between layer-wise functional saliency and representational compactness, revealing that layers with higher training-induced energy concentration are functionally irreplaceable. Leveraging this insight, we propose a purely geometry-driven sensitivity proxy that enables automatic bit-width allocation under a target average-bit budget without expensive gradient updates or inference-based perplexity probing. At sub 2-bit, LieQ consistently reduces the large accuracy gap typically observed for naive 2-bit baselines on Qwen3 and LLaMA3.x families, while retaining standard-kernel efficiency. These properties make LieQ a practical path toward deploying small language models on resource-constrained edge devices. Code will available here: https://github.com/HeXiao-55/LieQ-official.git.","short_abstract":"Large language models with billions of parameters are often over-provisioned: many layers contribute little unique information yet dominate the memory and energy footprint during inference. We present LieQ Layer-wise information effectiveness Quantization, a hardware-native, metric-driven post-training quantization fra...","url_abs":"https://arxiv.org/abs/2508.03332","url_pdf":"https://arxiv.org/pdf/2508.03332v2","authors":"[\"He Xiao\",\"Qingyao Yang\",\"Dirui Xie\",\"Wendong Xu\",\"Zunhai Su\",\"Runming yang\",\"Wenyong Zhou\",\"Haobo Liu\",\"Zhengwu Liu\",\"Ngai Wong\"]","published":"2025-08-05T11:17:04Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\"]","methods":"[\"Language Model\"]","has_code":false,"code_links":[{"ID":611293,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2886285,"paper_url":"https://arxiv.org/abs/2508.03332","paper_title":"Exploring Layer-wise Information Effectiveness for Post-Training Quantization in Small Language Models","repo_url":"https://github.com/HeXiao-55/LieQ-official.git","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}