{"ID":2838987,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.16174","arxiv_id":"2511.16174","title":"Pipelined Dense Symmetric Eigenvalue Decomposition on Multi-GPU Architectures","abstract":"Large symmetric eigenvalue problems are commonly observed in many disciplines such as Chemistry and Physics, and several libraries including cuSOLVERMp, MAGMA and ELPA support computing large eigenvalue decomposition on multi-GPU or multi-CPU-GPU hybrid architectures. However, these libraries do not provide satisfied performance that all of the libraries only utilize around 1.5\\% of the peak multi-GPU performance. In this paper, we propose a pipelined two-stage eigenvalue decomposition algorithm instead of conventional subsequent algorithm with substantial optimizations. On an 8$\\times$A100 platform, our implementation surpasses state-of-the-art cuSOLVERMp and MAGMA baselines, delivering mean speedups of 5.74$\\times$ and 6.59$\\times$, with better strong and weak scalability.","short_abstract":"Large symmetric eigenvalue problems are commonly observed in many disciplines such as Chemistry and Physics, and several libraries including cuSOLVERMp, MAGMA and ELPA support computing large eigenvalue decomposition on multi-GPU or multi-CPU-GPU hybrid architectures. However, these libraries do not provide satisfied p...","url_abs":"https://arxiv.org/abs/2511.16174","url_pdf":"https://arxiv.org/pdf/2511.16174v1","authors":"[\"Hansheng Wang\",\"Ruiyi Zhan\",\"Dajun Huang\",\"Xingchen Liu\",\"Qiao Li\",\"Hancong Duan\",\"Dingwen Tao\",\"Guangming Tan\",\"Shaoshuai Zhang\"]","published":"2025-11-20T09:26:34Z","proceeding":"cs.MS","tasks":"[\"cs.MS\",\"cs.DC\"]","methods":"[]","has_code":false}