{"ID":2874328,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.05276","arxiv_id":"2509.05276","title":"SpikingBrain: Spiking Brain-inspired Large Models","abstract":"Mainstream Transformer-based large language models face major efficiency bottlenecks: training computation scales quadratically with sequence length, and inference memory grows linearly, limiting long-context processing. Building large models on non-NVIDIA platforms also poses challenges for stable and efficient training. To address this, we introduce SpikingBrain, a family of brain-inspired models designed for efficient long-context training and inference. SpikingBrain leverages the MetaX GPU cluster and focuses on three aspects: (1) Model Architecture: linear and hybrid-linear attention architectures with adaptive spiking neurons; (2) Algorithmic Optimizations: an efficient, conversion-based training pipeline and a dedicated spike coding framework; (3) System Engineering: customized training frameworks, operator libraries, and parallelism strategies tailored to MetaX hardware. Using these techniques, we develop two models: SpikingBrain-7B, a linear LLM, and SpikingBrain-76B, a hybrid-linear MoE LLM. These models demonstrate the feasibility of large-scale LLM development on non-NVIDIA platforms, and training remains stable for weeks on hundreds of MetaX GPUs with Model FLOPs Utilization at expected levels. SpikingBrain achieves performance comparable to open-source Transformer baselines while using only about 150B tokens for continual pre-training. Our models also significantly improve long-context efficiency and deliver inference with (partially) constant memory and event-driven spiking behavior. For example, SpikingBrain-7B attains over 100x speedup in Time to First Token for 4M-token sequences. Furthermore, the proposed spiking scheme achieves 69.15 percent sparsity, enabling low-power operation. Overall, this work demonstrates the potential of brain-inspired mechanisms to drive the next generation of efficient and scalable large model design.","short_abstract":"Mainstream Transformer-based large language models face major efficiency bottlenecks: training computation scales quadratically with sequence length, and inference memory grows linearly, limiting long-context processing. Building large models on non-NVIDIA platforms also poses challenges for stable and efficient traini...","url_abs":"https://arxiv.org/abs/2509.05276","url_pdf":"https://arxiv.org/pdf/2509.05276v4","authors":"[\"Yuqi Pan\",\"Yupeng Feng\",\"Jinghao Zhuang\",\"Siyu Ding\",\"Han Xu\",\"Zehao Liu\",\"Bohan Sun\",\"Yuhong Chou\",\"Xuerui Qiu\",\"Anlin Deng\",\"Anjie Hu\",\"Shurong Wang\",\"Peng Zhou\",\"Man Yao\",\"Jibin Wu\",\"Jian Yang\",\"Guoliang Sun\",\"Bo Xu\",\"Guoqi Li\"]","published":"2025-09-05T17:34:00Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\",\"cs.CL\"]","methods":"[\"Transformer\",\"Large Language Model\",\"Language Model\"]","has_code":false}
