{"ID":2891878,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.16710","arxiv_id":"2507.16710","title":"AcceleratedKernels.jl: Cross-Architecture Parallel Algorithms from a Unified, Transpiled Codebase","abstract":"AcceleratedKernels.jl is introduced as a backend-agnostic library for parallel computing in Julia, natively targeting NVIDIA, AMD, Intel, and Apple accelerators via a unique transpilation architecture. Written in a unified, compact codebase, it enables productive parallel programming with minimised implementation and usage complexities. Benchmarks of arithmetic-heavy kernels show performance on par with C and OpenMP-multithreaded CPU implementations, with Julia sometimes offering more consistent and predictable numerical performance than conventional C compilers. Exceptional composability is highlighted as simultaneous CPU-GPU co-processing is achievable - such as CPU-GPU co-sorting - with transparent use of hardware-specialised MPI implementations. Tests on the Baskerville Tier 2 UK HPC cluster achieved world-class sorting throughputs of 538-855 GB/s using 200 NVIDIA A100 GPUs, comparable to the highest literature-reported figure of 900 GB/s achieved on 262,144 CPU cores. The use of direct NVLink GPU-to-GPU interconnects resulted in a 4.93x speedup on average; normalised by a combined capital, running and environmental cost, communication-heavy HPC tasks only become economically viable on GPUs if GPUDirect interconnects are employed.","short_abstract":"AcceleratedKernels.jl is introduced as a backend-agnostic library for parallel computing in Julia, natively targeting NVIDIA, AMD, Intel, and Apple accelerators via a unique transpilation architecture. Written in a unified, compact codebase, it enables productive parallel programming with minimised implementation and u...","url_abs":"https://arxiv.org/abs/2507.16710","url_pdf":"https://arxiv.org/pdf/2507.16710v1","authors":"[\"Andrei-Leonard Nicusan\",\"Dominik Werner\",\"Simon Branford\",\"Simon Hartley\",\"Andrew J. Morris\",\"Kit Windows-Yule\"]","published":"2025-07-22T15:45:06Z","proceeding":"cs.DC","tasks":"[\"cs.DC\",\"cs.PF\"]","methods":"[]","has_code":false}
