{"ID":2851226,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.21879","arxiv_id":"2510.21879","title":"TernaryCLIP: Efficiently Compressing Vision-Language Models with Ternary Weights and Distilled Knowledge","abstract":"Recent years have witnessed an increasing interest in image-text contrastive modeling, exemplified by models such as Contrastive Language-Image Pretraining (CLIP). In this paper, we propose the TernaryCLIP, a lightweight computational framework that converts connection weights of both vision and text encoders of CLIP into the ternary format, instead of full-precision or floating ones. TernaryCLIP incorporates quantization-aware training and distillation modules, preventing precision degradation and enabling low-cost and high-efficiency computations. Comprehensive experiments demonstrate that TernaryCLIP can achieve up to 99\\% ternarized weights with 1.58-bit representation, 16.98 $\\times$ compression ratio, 2.3 $\\times$ inference acceleration, 16 $\\times$ storage reduction, 10 $\\times$ memory optimization, and 60\\% sparsity while maintaining promising performance on zero-shot image classification and image-text retrieval tasks across 41 commonly used datasets. Our work highlights the feasibility of extreme quantization for large multimodal models, supporting effective and efficient deployment on resource-constrained devices. The model and code can be accessed from Hugging Face and GitHub.","short_abstract":"Recent years have witnessed an increasing interest in image-text contrastive modeling, exemplified by models such as Contrastive Language-Image Pretraining (CLIP). In this paper, we propose the TernaryCLIP, a lightweight computational framework that converts connection weights of both vision and text encoders of CLIP i...","url_abs":"https://arxiv.org/abs/2510.21879","url_pdf":"https://arxiv.org/pdf/2510.21879v1","authors":"[\"Shu-Hao Zhang\",\"Wei-Cheng Tang\",\"Chen Wu\",\"Peng Hu\",\"Nan Li\",\"Liang-Jie Zhang\",\"Qi Zhang\",\"Shao-Qun Zhang\"]","published":"2025-10-23T14:53:32Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\"]","methods":"[\"Language Model\"]","has_code":false}