{"ID":2868150,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.17072","arxiv_id":"2509.17072","title":"SnipSnap: A Joint Compression Format and Dataflow Co-Optimization Framework for Efficient Sparse LLM Accelerator Design","abstract":"The growing scale of large language models (LLMs) has intensified demands on computation and memory, making efficient inference a key challenge. While sparsity can reduce these costs, existing design space exploration (DSE) frameworks often overlook compression formats, a key factor for leveraging sparsity on accelerators. This paper proposes SnipSnap, a joint compression format and dataflow co-optimization framework for efficient sparse LLM accelerator design. SnipSnap introduces: (1) a hierarchical compression format encoding to expand the design space; (2) an adaptive compression engine for selecting formats under diverse sparsity; and (3) a progressive co-search workflow that jointly optimizes dataflow and compression formats. SnipSnap achieves 18.24% average memory energy savings via format optimization, along with 2248.3$\\times$ and 21.0$\\times$ speedups over Sparseloop and DiMO-Sparse frameworks, respectively.","short_abstract":"The growing scale of large language models (LLMs) has intensified demands on computation and memory, making efficient inference a key challenge. While sparsity can reduce these costs, existing design space exploration (DSE) frameworks often overlook compression formats, a key factor for leveraging sparsity on accelerat...","url_abs":"https://arxiv.org/abs/2509.17072","url_pdf":"https://arxiv.org/pdf/2509.17072v2","authors":"[\"Junyi Wu\",\"Chao Fang\",\"Zhongfeng Wang\"]","published":"2025-09-21T13:07:14Z","proceeding":"cs.AR","tasks":"[\"cs.AR\"]","methods":"[\"Large Language Model\",\"Language Model\",\"LoRA\"]","has_code":false}