{"ID":2835317,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.22849","arxiv_id":"2511.22849","title":"PerfMamba: Performance Analysis and Pruning of Selective State Space Models","abstract":"Recent advances in sequence modeling have introduced selective SSMs as promising alternatives to Transformer architectures, offering theoretical computational efficiency and sequence processing advantages. A comprehensive understanding of selective SSMs in runtime behavior, resource utilization patterns, and scaling characteristics still remains unexplored, thus obstructing their optimal deployment and further architectural improvements. This paper presents a thorough empirical study of Mamba-1 and Mamba-2, systematically profiled for performance to assess the design principles that contribute to their efficiency in state-space modeling. A detailed analysis of computation patterns, memory access, I/O characteristics, and scaling properties was performed for sequence lengths ranging from 64 to 16384 tokens. Our findings show that the SSM component, a central part of the selective SSM architecture, demands a significant portion of computational resources compared to other components in the Mamba block. Based on these insights, we propose a pruning technique that selectively removes low-activity states within the SSM component, achieving measurable throughput and memory gains while maintaining accuracy within a moderate pruning regime. This approach results in performance improvements across varying sequence lengths, achieving a 1.14x speedup and reducing memory usage by 11.50\\%. These results offer valuable guidance for designing more efficient SSM architectures that can be applied to a wide range of real-world applications.","short_abstract":"Recent advances in sequence modeling have introduced selective SSMs as promising alternatives to Transformer architectures, offering theoretical computational efficiency and sequence processing advantages. A comprehensive understanding of selective SSMs in runtime behavior, resource utilization patterns, and scaling ch...","url_abs":"https://arxiv.org/abs/2511.22849","url_pdf":"https://arxiv.org/pdf/2511.22849v1","authors":"[\"Abdullah Al Asif\",\"Mobina Kashaniyan\",\"Sixing Yu\",\"Juan Pablo Muñoz\",\"Ali Jannesari\"]","published":"2025-11-28T03:09:15Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Transformer\"]","has_code":false}
