{"ID":2841290,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.12152","arxiv_id":"2511.12152","title":"A Digital SRAM-Based Compute-In-Memory Macro for Weight-Stationary Dynamic Matrix Multiplication in Transformer Attention Score Computation","abstract":"Compute-in-memory (CIM) techniques are widely employed in energy-efficient artificial intelligent (AI) processors. They alleviate power and latency bottlenecks caused by extensive data movements between compute and storage units. To extend these benefits to Transformer, this brief proposes a digital CIM macro to compute attention score. To eliminate dynamic matrix multiplication (MM), we reconstruct the computation as static MM using a combined QK-weight matrix, so that inputs can be directly fed to a single CIM macro to obtain the score results. However, this introduces a new challenge of 2-input static MM. The computation is further decomposed into four groups of bit-serial logical and addition operations. This allows 2-input to directly activate the word line via AND gate, thus realizing 2-input static MM with minimal overhead. A hierarchical zero-value bit skipping mechanism is introduced to prioritize skipping zero-value bits in the 2-input case. This mechanism effectively utilizes data sparsity of 2-input, significantly reducing redundant operations. Implemented in a 65-nm process, the 0.35 mm2 macro delivers 42.27 GOPS at 1.24 mW, yielding 34.1 TOPS/W energy and 120.77 GOPS/mm2 area efficiency. Compared to CPUs and GPUs, it achieves ~25x and ~13x higher efficiency, respectively. Against other Transformer-CIMs, it demonstrates at least 7x energy and 2x area efficiency gains, highlighting its strong potential for edge intelligence.","short_abstract":"Compute-in-memory (CIM) techniques are widely employed in energy-efficient artificial intelligent (AI) processors. They alleviate power and latency bottlenecks caused by extensive data movements between compute and storage units. To extend these benefits to Transformer, this brief proposes a digital CIM macro to comput...","url_abs":"https://arxiv.org/abs/2511.12152","url_pdf":"https://arxiv.org/pdf/2511.12152v3","authors":"[\"Jianyi Yu\",\"Tengxiao Wang\",\"Yuxuan Wang\",\"Xiang Fu\",\"Fei Qiao\",\"Ying Wang\",\"Rui Yuan\",\"Liyuan Liu\",\"Cong Shi\"]","published":"2025-11-15T10:47:44Z","proceeding":"cs.AR","tasks":"[\"cs.AR\",\"eess.SP\"]","methods":"[\"Transformer\"]","has_code":false}
