{"ID":2856794,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.10623","arxiv_id":"2510.10623","title":"ADiP: Adaptive-Precision Systolic Array for Matrix Multiplication Acceleration","abstract":"Transformers are at the core of modern AI nowadays. They rely heavily on matrix multiplication and require efficient acceleration due to their substantial memory and computational requirements. Quantization plays a vital role in reducing memory usage, and can be exploited for computations by designing reconfigurable architectures that enhance matrix multiplication by dynamically adjusting the precision. This paper proposes ADiP, a novel adaptive-precision systolic array architecture designed for efficient matrix multiplication acceleration. The proposed architecture consists of $N$ $\\times$ $N$ reconfigurable processing elements (PEs), along with shared shifters and accumulators. ADiP supports multiple computation modes, including symmetric single-matrix multiplication as well as asymmetric multi-matrix multiplication with a shared input matrix, thereby improving data reuse and PE utilization. By adapting to different precisions, ADiP achieves up to 4$\\times$ higher throughput and up to 4$\\times$ higher memory efficiency. Analytical models are developed for ADiP architecture, including latency and throughput for different architecture configurations. A comprehensive hardware design space exploration is demonstrated using commercial 22nm technology. Furthermore, ADiP is evaluated on different Transformer-based workloads from GPT-2 medium, BERT large, and BitNet-1.58B models, delivering total latency improvement up to 53.6%, and total energy improvement up to 24.4% for attention workloads in BitNet-1.58B model. At a 64$\\times$64 size with reconfigurable 4,096 PEs, ADiP achieves a peak throughput of 8.192 TOPS, 16.384 TOPS, and 32.768 TOPS for 8bit$\\times$8bit, 8bit$\\times$4bit, and 8bit$\\times$2bit operations, respectively.","short_abstract":"Transformers are at the core of modern AI nowadays. They rely heavily on matrix multiplication and require efficient acceleration due to their substantial memory and computational requirements. Quantization plays a vital role in reducing memory usage, and can be exploited for computations by designing reconfigurable ar...","url_abs":"https://arxiv.org/abs/2510.10623","url_pdf":"https://arxiv.org/pdf/2510.10623v3","authors":"[\"Ahmed J. Abdelmaksoud\",\"Cristian Sestito\",\"Shiwei Wang\",\"Themis Prodromakis\"]","published":"2025-10-12T14:03:22Z","proceeding":"cs.AR","tasks":"[\"cs.AR\"]","methods":"[\"Transformer\",\"LoRA\"]","has_code":false}
