{"ID":2826414,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.19851","arxiv_id":"2512.19851","title":"An Adaptive Distributed Stencil Abstraction for GPUs","abstract":"The scientific computing ecosystem in Python is largely confined to single-node parallelism, creating a gap between high-level prototyping in NumPy and high-performance execution on modern supercomputers. The increasing prevalence of hardware accelerators and the need for energy efficiency have made resource adaptivity a critical requirement, yet traditional HPC abstractions remain rigid. To address these challenges, we present an adaptive, distributed abstraction for stencil computations on multi-node GPUs. This abstraction is built using CharmTyles, a framework based on the adaptive Charm++ runtime, and features a familiar NumPy-like syntax to minimize the porting effort from prototype to production code. We showcase the resource elasticity of our abstraction by dynamically rescaling a running application across a different number of nodes and present a performance analysis of the associated overheads. Furthermore, we demonstrate that our abstraction achieves significant performance improvements over both a specialized, high-performance stencil DSL and a generalized NumPy replacement.","short_abstract":"The scientific computing ecosystem in Python is largely confined to single-node parallelism, creating a gap between high-level prototyping in NumPy and high-performance execution on modern supercomputers. The increasing prevalence of hardware accelerators and the need for energy efficiency have made resource adaptivity...","url_abs":"https://arxiv.org/abs/2512.19851","url_pdf":"https://arxiv.org/pdf/2512.19851v1","authors":"[\"Aditya Bhosale\",\"Laxmikant Kale\"]","published":"2025-12-22T20:08:47Z","proceeding":"cs.DC","tasks":"[\"cs.DC\"]","methods":"[]","has_code":false}