{"ID":2882911,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.09981","arxiv_id":"2508.09981","title":"LLMC+: Benchmarking Vision-Language Model Compression with a Plug-and-play Toolkit","abstract":"Large Vision-Language Models (VLMs) exhibit impressive multi-modal capabilities but suffer from prohibitive computational and memory demands, due to their long visual token sequences and massive parameter sizes. To address these issues, recent works have proposed training-free compression methods. However, existing efforts often suffer from three major limitations: (1) Current approaches do not decompose techniques into comparable modules, hindering fair evaluation across spatial and temporal redundancy. (2) Evaluation confined to simple single-turn tasks, failing to reflect performance in realistic scenarios. (3) Isolated use of individual compression techniques, without exploring their joint potential. To overcome these gaps, we introduce LLMC+, a comprehensive VLM compression benchmark with a versatile, plug-and-play toolkit. LLMC+ supports over 20 algorithms across five representative VLM families and enables systematic study of token-level and model-level compression. Our benchmark reveals that: (1) Spatial and temporal redundancies demand distinct technical strategies. (2) Token reduction methods degrade significantly in multi-turn dialogue and detail-sensitive tasks. (3) Combining token and model compression achieves extreme compression with minimal performance loss. We believe LLMC+ will facilitate fair evaluation and inspire future research in efficient VLM. Our code is available at https://github.com/ModelTC/LightCompress.","short_abstract":"Large Vision-Language Models (VLMs) exhibit impressive multi-modal capabilities but suffer from prohibitive computational and memory demands, due to their long visual token sequences and massive parameter sizes. To address these issues, recent works have proposed training-free compression methods. However, existing eff...","url_abs":"https://arxiv.org/abs/2508.09981","url_pdf":"https://arxiv.org/pdf/2508.09981v2","authors":"[\"Chengtao Lv\",\"Bilang Zhang\",\"Yang Yong\",\"Ruihao Gong\",\"Yushi Huang\",\"Shiqiao Gu\",\"Jiajun Wu\",\"Yumeng Shi\",\"Jinyang Guo\",\"Wenya Wang\"]","published":"2025-08-13T17:54:49Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":610934,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2882911,"paper_url":"https://arxiv.org/abs/2508.09981","paper_title":"LLMC+: Benchmarking Vision-Language Model Compression with a Plug-and-play Toolkit","repo_url":"https://github.com/ModelTC/LightCompress","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
