{"ID":2867230,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.19150","arxiv_id":"2509.19150","title":"In-Transit Data Transport Strategies for Coupled AI-Simulation Workflow Patterns","abstract":"Coupled AI-Simulation workflows are becoming the major workloads for HPC facilities, and their increasing complexity necessitates new tools for performance analysis and prototyping of new in-situ workflows. We present SimAI-Bench, a tool designed to both prototype and evaluate these coupled workflows. In this paper, we use SimAI-Bench to benchmark the data transport performance of two common patterns on the Aurora supercomputer: a one-to-one workflow with co-located simulation and AI training instances, and a many-to-one workflow where a single AI model is trained from an ensemble of simulations. For the one-to-one pattern, our analysis shows that node-local and DragonHPC data staging strategies provide excellent performance compared Redis and Lustre file system. For the many-to-one pattern, we find that data transport becomes a dominant bottleneck as the ensemble size grows. Our evaluation reveals that file system is the optimal solution among the tested strategies for the many-to-one pattern.","short_abstract":"Coupled AI-Simulation workflows are becoming the major workloads for HPC facilities, and their increasing complexity necessitates new tools for performance analysis and prototyping of new in-situ workflows. We present SimAI-Bench, a tool designed to both prototype and evaluate these coupled workflows. In this paper, we...","url_abs":"https://arxiv.org/abs/2509.19150","url_pdf":"https://arxiv.org/pdf/2509.19150v1","authors":"[\"Harikrishna Tummalapalli\",\"Riccardo Balin\",\"Christine M. Simpson\",\"Andrew Park\",\"Aymen Alsaadi\",\"Andrew E. Shao\",\"Wesley Brewer\",\"Shantenu Jha\"]","published":"2025-09-23T15:29:59Z","proceeding":"cs.DC","tasks":"[\"cs.DC\"]","methods":"[]","has_code":false}
