{"ID":2894497,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.11512","arxiv_id":"2507.11512","title":"Scaling the memory wall using mixed-precision -- HPG-MxP on an exascale machine","abstract":"Mixed-precision algorithms have been proposed as a way for scientific computing to benefit from some of the gains seen for artificial intelligence (AI) on recent high performance computing (HPC) platforms. A few applications dominated by dense matrix operations have seen substantial speedups by utilizing low precision formats such as FP16. However, a majority of scientific simulation applications are memory bandwidth limited. Beyond preliminary studies, the practical gain from using mixed-precision algorithms on a given HPC system is largely unclear. The High Performance GMRES Mixed Precision (HPG-MxP) benchmark has been proposed to measure the useful performance of a HPC system on sparse matrix-based mixed-precision applications. In this work, we present a highly optimized implementation of the HPG-MxP benchmark for an exascale system and describe our algorithm enhancements. We show for the first time a speedup of 1.6x using a combination of double- and single-precision on modern GPU-based supercomputers.","short_abstract":"Mixed-precision algorithms have been proposed as a way for scientific computing to benefit from some of the gains seen for artificial intelligence (AI) on recent high performance computing (HPC) platforms. A few applications dominated by dense matrix operations have seen substantial speedups by utilizing low precision...","url_abs":"https://arxiv.org/abs/2507.11512","url_pdf":"https://arxiv.org/pdf/2507.11512v1","authors":"[\"Aditya Kashi\",\"Nicholson Koukpaizan\",\"Hao Lu\",\"Michael Matheson\",\"Sarp Oral\",\"Feiyi Wang\"]","published":"2025-07-15T17:26:37Z","proceeding":"cs.DC","tasks":"[\"cs.DC\",\"cs.PF\",\"math.NA\"]","methods":"[]","has_code":false}