{"ID":2899047,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.01429","arxiv_id":"2507.01429","title":"Hardware-software co-exploration with racetrack memory based in-memory computing for CNN inference in embedded systems","abstract":"Deep neural networks generate and process large volumes of data, posing challenges for low-resource embedded systems. In-memory computing has been demonstrated as an efficient computing infrastructure and shows promise for embedded AI applications. Among newly-researched memory technologies, racetrack memory is a non-volatile technology that allows high data density fabrication, making it a good fit for in-memory computing. However, integrating in-memory arithmetic circuits with memory cells affects both the memory density and power efficiency. It remains challenging to build efficient in-memory arithmetic circuits on racetrack memory within area and energy constraints. To this end, we present an efficient in-memory convolutional neural network (CNN) accelerator optimized for use with racetrack memory. We design a series of fundamental arithmetic circuits as in-memory computing cells suited for multiply-and-accumulate operations. Moreover, we explore the design space of racetrack memory based systems and CNN model architectures, employing co-design to improve the efficiency and performance of performing CNN inference in racetrack memory while maintaining model accuracy. Our designed circuits and model-system co-optimization strategies achieve a small memory bank area with significant improvements in energy and performance for racetrack memory based embedded systems.","short_abstract":"Deep neural networks generate and process large volumes of data, posing challenges for low-resource embedded systems. In-memory computing has been demonstrated as an efficient computing infrastructure and shows promise for embedded AI applications. Among newly-researched memory technologies, racetrack memory is a non-v...","url_abs":"https://arxiv.org/abs/2507.01429","url_pdf":"https://arxiv.org/pdf/2507.01429v1","authors":"[\"Benjamin Chen Ming Choong\",\"Tao Luo\",\"Cheng Liu\",\"Bingsheng He\",\"Wei Zhang\",\"Joey Tianyi Zhou\"]","published":"2025-07-02T07:29:53Z","proceeding":"cs.ET","tasks":"[\"cs.ET\",\"cs.AI\",\"cs.AR\"]","methods":"[\"LoRA\",\"Convolutional Neural Network\"]","has_code":false}
