{"ID":2880291,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.18227","arxiv_id":"2508.18227","title":"GM-Skip: Metric-Guided Transformer Block Skipping for Efficient Vision-Language Models","abstract":"Transformer-based Vision-Language Models (VLMs) have achieved impressive performance on tasks such as image captioning, object recognition, and visual reasoning, but their high computational cost hinders deployment in latency-sensitive applications like autonomous driving. We introduce GM-Skip, a flexible and metric-adaptive framework for Transformer block skipping that accelerates VLM inference while preserving output quality. GM-Skip features a greedy, metric-guided block selection strategy that uses metric feedback (e.g., accuracy, CIDEr) to identify redundant layers, along with a reverse-order deletion mechanism that preserves early foundational blocks to avoid performance collapse. To support diverse deployment needs, it incorporates a tunable trade-off between sparsity and performance via a score-sparsity balance objective. Experiments across multiple tasks and datasets, including COCO and CODA, show that GM-Skip consistently improves inference speed while maintaining task performance. On the COCO dataset, GM-Skip improves single-object classification accuracy on the Person category from 19.1 percent to 87.3 percent while skipping more than 40 percent of Transformer blocks. In real-world deployment, it achieves up to 45.4 percent latency reduction on single-object detection when integrated into an autonomous vehicle running Autoware.Universe, validating the effectiveness of its skip configurations and confirming its practical value in accelerating real-world inference.","short_abstract":"Transformer-based Vision-Language Models (VLMs) have achieved impressive performance on tasks such as image captioning, object recognition, and visual reasoning, but their high computational cost hinders deployment in latency-sensitive applications like autonomous driving. We introduce GM-Skip, a flexible and metric-ad...","url_abs":"https://arxiv.org/abs/2508.18227","url_pdf":"https://arxiv.org/pdf/2508.18227v1","authors":"[\"Lianming Huang\",\"Haibo Hu\",\"Qiao Li\",\"Xin He\",\"Nan Guan\",\"Chun Jason Xue\"]","published":"2025-08-20T14:53:28Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Transformer\",\"Language Model\"]","has_code":false}
