{"ID":2841321,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.12201","arxiv_id":"2511.12201","title":"OmniSparse: Training-Aware Fine-Grained Sparse Attention for Long-Video MLLMs","abstract":"Existing sparse attention methods primarily target inference-time acceleration by selecting critical tokens under predefined sparsity patterns. However, they often fail to bridge the training-inference gap and lack the capacity for fine-grained token selection across multiple dimensions such as queries, key-values (KV), and heads, leading to suboptimal performance and limited acceleration gains. In this paper, we introduce OmniSparse, a training-aware fine-grained sparse attention framework for long-video MLLMs, which operates in both training and inference with dynamic token budget allocation. Specifically, OmniSparse contains three adaptive and complementary mechanisms: (1) query selection via lazy-active classification, retaining active queries that capture broad semantic similarity while discarding most lazy ones that focus on limited local context and exhibit high functional redundancy; (2) KV selection with head-level dynamic budget allocation, where a shared budget is determined based on the flattest head and applied uniformly across all heads to ensure attention recall; and (3) KV cache slimming to reduce head-level redundancy by selectively fetching visual KV cache according to the head-level decoding query pattern. Experimental results show that OmniSparse matches the performance of full attention while achieving up to 2.7x speedup during prefill and 2.4x memory reduction during decoding.","short_abstract":"Existing sparse attention methods primarily target inference-time acceleration by selecting critical tokens under predefined sparsity patterns. However, they often fail to bridge the training-inference gap and lack the capacity for fine-grained token selection across multiple dimensions such as queries, key-values (KV)...","url_abs":"https://arxiv.org/abs/2511.12201","url_pdf":"https://arxiv.org/pdf/2511.12201v2","authors":"[\"Feng Chen\",\"Yefei He\",\"Shaoxuan He\",\"Yuanyu He\",\"Jing Liu\",\"Lequan Lin\",\"Akide Liu\",\"Zhaoyang Li\",\"Jiyuan Zhang\",\"Zhenbang Sun\",\"Bohan Zhuang\",\"Qi Wu\"]","published":"2025-11-15T13:14:17Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Large Language Model\"]","has_code":false}