{"ID":2869395,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.15036","arxiv_id":"2509.15036","title":"NEURAL: An Elastic Neuromorphic Architecture with Hybrid Data-Event Execution and On-the-fly Attention Dataflow","abstract":"Spiking neural networks (SNNs) have emerged as a promising alternative to artificial neural networks (ANNs), offering improved energy efficiency by leveraging sparse and event-driven computation. However, existing hardware implementations of SNNs still suffer from the inherent spike sparsity and multi-timestep execution, which significantly increase latency and reduce energy efficiency. This study presents NEURAL, a novel neuromorphic architecture based on a hybrid data-event execution paradigm by decoupling sparsity-aware processing from neuron computation and using elastic first-in-first-out (FIFO). NEURAL supports on-the-fly execution of spiking QKFormer by embedding its operations within the baseline computing flow without requiring dedicated hardware units. It also integrates a novel window-to-time-to-first-spike (W2TTFS) mechanism to replace average pooling and enable full-spike execution. Furthermore, we introduce a knowledge distillation (KD)-based training framework to construct single-timestep SNN models with competitive accuracy. NEURAL is implemented on a Xilinx Virtex-7 FPGA and evaluated using ResNet-11, QKFResNet-11, and VGG-11. Experimental results demonstrate that, at the algorithm level, the VGG-11 model trained with KD improves accuracy by 3.20% on CIFAR-10 and 5.13% on CIFAR-100. At the architecture level, compared to existing SNN accelerators, NEURAL achieves a 50% reduction in resource utilization and a 1.97x improvement in energy efficiency.","short_abstract":"Spiking neural networks (SNNs) have emerged as a promising alternative to artificial neural networks (ANNs), offering improved energy efficiency by leveraging sparse and event-driven computation. However, existing hardware implementations of SNNs still suffer from the inherent spike sparsity and multi-timestep executio...","url_abs":"https://arxiv.org/abs/2509.15036","url_pdf":"https://arxiv.org/pdf/2509.15036v1","authors":"[\"Yuehai Chen\",\"Farhad Merchant\"]","published":"2025-09-18T15:02:22Z","proceeding":"cs.AR","tasks":"[\"cs.AR\"]","methods":"[]","has_code":false}