{"ID":2890256,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.20063","arxiv_id":"2507.20063","title":"Racing to Idle: Energy Efficiency of Matrix Multiplication on Heterogeneous CPU and GPU Architectures","abstract":"The paradigm shift towards multi-core and heterogeneous computing, driven by the fundamental power and thermal limits of single-core processors, has established energy efficiency as a first-class design constraint in high-performance computing (HPC). Heterogeneous systems, integrating traditional multi-core CPUs with specialized accelerators like discrete (dGPU) and integrated (iGPU) graphics processing units, offer a compelling path to navigating the trade-offs between performance and power. However, quantifying these trade-offs on widely accessible hardware remains a critical area of study. This paper presents a direct, empirical measurement of the performance and energy-to-solution of a canonical HPC workload -- a 4096x4096 matrix-matrix multiplication -- on three distinct compute architectures within a single consumer-grade laptop: a multi-core AMD Ryzen 7 5800H CPU, a discrete NVIDIA GeForce GTX 1650 GPU, and an integrated AMD Radeon Vega GPU. Using standard, validated, and minimally intrusive tools such as Linux perf and nvidia-smi, we find that the discrete GPU is not only the performance leader, achieving a 93.5x speedup over the CPU, but is also the most energy-efficient, consuming only 2% of the energy used by the CPU, resulting in a 50-fold improvement in energy efficiency. These findings provide a practical demonstration of the \"race to idle\" principle and offer clear, quantitative guidance on architectural choices for energy-aware software development.","short_abstract":"The paradigm shift towards multi-core and heterogeneous computing, driven by the fundamental power and thermal limits of single-core processors, has established energy efficiency as a first-class design constraint in high-performance computing (HPC). Heterogeneous systems, integrating traditional multi-core CPUs with s...","url_abs":"https://arxiv.org/abs/2507.20063","url_pdf":"https://arxiv.org/pdf/2507.20063v1","authors":"[\"Mufakir Qamar Ansari\",\"Mudabir Qamar Ansari\"]","published":"2025-07-26T21:15:05Z","proceeding":"cs.DC","tasks":"[\"cs.DC\",\"cs.CC\"]","methods":"[]","has_code":false}
