{"ID":2838989,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.16177","arxiv_id":"2511.16177","title":"Mitigating Shared Storage Congestion Using Control Theory","abstract":"Efficient data access in High-Performance Computing (HPC) systems is essential to the performance of intensive computing tasks. Traditional optimizations of the I/O stack aim to improve peak performance but are often workload specific and require deep expertise, making them difficult to generalize or re-use. In shared HPC environments, resource congestion can lead to unpredictable performance, causing slowdowns and timeouts. To address these challenges, we propose a self-adaptive approach based on Control Theory to dynamically regulate client-side I/O rates. Our approach leverages a small set of runtime system load metrics to reduce congestion and enhance performance stability. We implement a controller in a multi-node cluster and evaluate it on a real testbed under a representative workload. Experimental results demonstrate that our method effectively mitigates I/O congestion, reducing total runtime by up to 20% and lowering tail latency, while maintaining stable performance.","short_abstract":"Efficient data access in High-Performance Computing (HPC) systems is essential to the performance of intensive computing tasks. Traditional optimizations of the I/O stack aim to improve peak performance but are often workload specific and require deep expertise, making them difficult to generalize or re-use. In shared...","url_abs":"https://arxiv.org/abs/2511.16177","url_pdf":"https://arxiv.org/pdf/2511.16177v1","authors":"[\"Thomas Collignon\",\"Kouds Halitim\",\"Raphaël Bleuse\",\"Sophie Cerf\",\"Bogdan Robu\",\"Éric Rutten\",\"Lionel Seinturier\",\"Alexandre van Kempen\"]","published":"2025-11-20T09:31:26Z","proceeding":"cs.DC","tasks":"[\"cs.DC\",\"cs.AR\"]","methods":"[]","has_code":false}
