{"ID":2859126,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.05544","arxiv_id":"2510.05544","title":"Activation-Informed Pareto-Guided Low-Rank Compression for Efficient LLM/VLM","abstract":"Large language models (LLM) and vision-language models (VLM) have achieved state-of-the-art performance, but they impose significant memory and computing challenges in deployment. We present a novel low-rank compression framework to address this challenge. First, we upper bound the change of network loss via layer-wise activation-based compression errors, filling a theoretical gap in the literature. We then formulate low-rank model compression as a bi-objective optimization and prove that a single uniform tolerance yields surrogate Pareto-optimal heterogeneous ranks. Based on our theoretical insights, we propose Pareto-Guided Singular Value Decomposition (PGSVD), a zero-shot pipeline that improves activation-aware compression via Pareto-guided rank selection and alternating least-squares implementation. We apply PGSVD to both LLM and VLM, showing better accuracy at the same compression levels and inference speedup.","short_abstract":"Large language models (LLM) and vision-language models (VLM) have achieved state-of-the-art performance, but they impose significant memory and computing challenges in deployment. We present a novel low-rank compression framework to address this challenge. First, we upper bound the change of network loss via layer-wise...","url_abs":"https://arxiv.org/abs/2510.05544","url_pdf":"https://arxiv.org/pdf/2510.05544v1","authors":"[\"Ryan Solgi\",\"Parsa Madinei\",\"Jiayi Tian\",\"Rupak Swaminathan\",\"Jing Liu\",\"Nathan Susanj\",\"Zheng Zhang\"]","published":"2025-10-07T03:07:47Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.LG\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}