{"ID":2895938,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.07396","arxiv_id":"2507.07396","title":"IML-Spikeformer: Input-aware Multi-Level Spiking Transformer for Speech Processing","abstract":"Spiking Neural Networks (SNNs), inspired by biological neural mechanisms, represent a promising neuromorphic computing paradigm that offers energy-efficient alternatives to traditional Artificial Neural Networks (ANNs). Despite proven effectiveness, SNN architectures have struggled to achieve competitive performance on large-scale speech processing tasks. Two key challenges hinder progress: (1) the high computational overhead during training caused by multi-timestep spike firing, and (2) the absence of large-scale SNN architectures tailored to speech processing tasks. To overcome the issues, we introduce Input-aware Multi-Level Spikeformer, i.e. IML-Spikeformer, a spiking Transformer architecture specifically designed for large-scale speech processing. Central to our design is the Input-aware Multi-Level Spike (IMLS) mechanism, which simulates multi-timestep spike firing within a single timestep using an adaptive, input-aware thresholding scheme. IML-Spikeformer further integrates a Re-parameterized Spiking Self-Attention (RepSSA) module with a Hierarchical Decay Mask (HDM), forming the HD-RepSSA module. This module enhances the precision of attention maps and enables modeling of multi-scale temporal dependencies in speech signals. Experiments demonstrate that IML-Spikeformer achieves word error rates of 6.0\\% on AiShell-1 and 3.4\\% on Librispeech-960, comparable to conventional ANN transformers while reducing theoretical inference energy consumption by 4.64$\\times$ and 4.32$\\times$ respectively. IML-Spikeformer marks an advance of scalable SNN architectures for large-scale speech processing in both task performance and energy efficiency. Our source code and model checkpoints are publicly available at github.com/Pooookeman/IML-Spikeformer.","short_abstract":"Spiking Neural Networks (SNNs), inspired by biological neural mechanisms, represent a promising neuromorphic computing paradigm that offers energy-efficient alternatives to traditional Artificial Neural Networks (ANNs). Despite proven effectiveness, SNN architectures have struggled to achieve competitive performance on...","url_abs":"https://arxiv.org/abs/2507.07396","url_pdf":"https://arxiv.org/pdf/2507.07396v2","authors":"[\"Zeyang Song\",\"Shimin Zhang\",\"Yuhong Chou\",\"Jibin Wu\",\"Haizhou Li\"]","published":"2025-07-10T03:26:24Z","proceeding":"cs.MM","tasks":"[\"cs.MM\",\"cs.LG\",\"cs.SD\",\"eess.AS\"]","methods":"[\"Transformer\"]","has_code":false}
