{"ID":2843245,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.07985","arxiv_id":"2511.07985","title":"PIMfused: Near-Bank DRAM-PIM with Fused-layer Dataflow for CNN Data Transfer Optimization","abstract":"Near-bank Processing-in-Memory (PIM) architectures integrate processing cores (PIMcores) close to DRAM banks to mitigate the high cost of off-chip memory accesses. When accelerating convolutional neural network (CNN) on DRAM-PIM, performance is often constrained by cross-bank (or cross-PIMcore) data transfers, which are induced by the conventional layer-by-layer dataflow that enforces inter-bank (or inter-PIMcore) dependencies across successive CNN layers. To address this challenge, we propose PIMfused, a hardware-software co-design that enables fused-layer dataflow for end-to-end CNN execution in near-bank DRAM-PIM. By adopting fused-layer dataflow, PIMfused improves data reuse and, more importantly, breaks inter-bank data dependencies, thereby optimizing cross-bank data transfers without sacrificing bank-level parallelism. We study the impact of buffer sizes and PIMcore parallelism (1-bank vs. 4-bank) on PIMfused using end-to-end ResNet18. We present three key takeaways and show that with 4-bank PIMcores, PIMfused achieves overall PPA gains over a GDDR6-AiM-like baseline, cutting memory cycles to 30.6%, energy to 83.4%, and area to 76.5%.","short_abstract":"Near-bank Processing-in-Memory (PIM) architectures integrate processing cores (PIMcores) close to DRAM banks to mitigate the high cost of off-chip memory accesses. When accelerating convolutional neural network (CNN) on DRAM-PIM, performance is often constrained by cross-bank (or cross-PIMcore) data transfers, which ar...","url_abs":"https://arxiv.org/abs/2511.07985","url_pdf":"https://arxiv.org/pdf/2511.07985v1","authors":"[\"Simei Yang\",\"Xinyu Shi\",\"Lu Zhao\",\"Yunyu Ling\",\"Quanjun Wang\",\"Francky Catthoor\"]","published":"2025-11-11T08:50:53Z","proceeding":"cs.AR","tasks":"[\"cs.AR\"]","methods":"[\"Convolutional Neural Network\"]","has_code":false}
