{"ID":2864544,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.03267","arxiv_id":"2510.03267","title":"PT$^2$-LLM: Post-Training Ternarization for Large Language Models","abstract":"Large Language Models (LLMs) have shown impressive capabilities across diverse tasks, but their large memory and compute demands hinder deployment. Ternarization has gained attention as a promising compression technique, delivering substantial size reduction and high computational efficiency. However, its potential in the post-training quantization (PTQ) setting remains underexplored, due to the challenge of training-free parameter optimization and the quantization difficulty posed by outliers and dispersed weights. To address these issues, we propose PT$^2$-LLM, a post-training ternarization framework tailored for LLMs. At its core is an Asymmetric Ternary Quantizer equipped with a two-stage refinement pipeline: (1) Iterative Ternary Fitting (ITF), which alternates between optimal ternary grid construction and flexible rounding to minimize quantization error, and (2) Activation-aware Grid Alignment (AGA), which further refines the ternary grid to better match full-precision outputs. In addition, we propose a plug-and-play Structural Similarity-based Reordering (SSR) strategy that leverages inter-column structural similarity to ease quantization and mitigate outlier effects, further enhancing overall performance. Extensive experiments demonstrate that PT$^2$-LLM delivers competitive performance against state-of-the-art (SOTA) 2-bit PTQ methods with lower memory cost, while also accelerating both prefill and decoding to achieve end-to-end speedup. The code and models will be available at https://github.com/XIANGLONGYAN/PT2-LLM.","short_abstract":"Large Language Models (LLMs) have shown impressive capabilities across diverse tasks, but their large memory and compute demands hinder deployment. Ternarization has gained attention as a promising compression technique, delivering substantial size reduction and high computational efficiency. However, its potential in...","url_abs":"https://arxiv.org/abs/2510.03267","url_pdf":"https://arxiv.org/pdf/2510.03267v2","authors":"[\"Xianglong Yan\",\"Chengzhu Bao\",\"Zhiteng Li\",\"Tianao Zhang\",\"Kaicheng Yang\",\"Haotong Qin\",\"Ruobing Xie\",\"Xingwu Sun\",\"Yulun Zhang\"]","published":"2025-09-27T03:01:48Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":609164,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2864544,"paper_url":"https://arxiv.org/abs/2510.03267","paper_title":"PT$^2$-LLM: Post-Training Ternarization for Large Language Models","repo_url":"https://github.com/XIANGLONGYAN/PT2-LLM","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}