{"ID":2838859,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.15977","arxiv_id":"2511.15977","title":"Efficient Chromosome Parallelization for Precision Medicine Genomic Workflows","abstract":"Large-scale genomic workflows used in precision medicine can process datasets spanning tens to hundreds of gigabytes per sample, leading to high memory spikes, intensive disk I/O, and task failures due to out-of-memory errors. Simple static resource allocation methods struggle to handle the variability in per-chromosome RAM demands, resulting in poor resource utilization and long runtimes. In this work, we propose multiple mechanisms for adaptive, RAM-efficient parallelization of chromosome-level bioinformatics workflows. First, we develop a symbolic regression model that estimates per-chromosome memory consumption for a given task and introduces an interpolating bias to conservatively minimize over-allocation. Second, we present a dynamic scheduler that adaptively predicts RAM usage with a polynomial regression model, treating task packing as a Knapsack problem to optimally batch jobs based on predicted memory requirements. Additionally, we present a static scheduler that optimizes chromosome processing order to minimize peak memory while preserving throughput. Our proposed methods, evaluated on simulations and real-world genomic pipelines, provide new mechanisms to reduce memory overruns and balance load across threads. We thereby achieve faster end-to-end execution, showcasing the potential to optimize large-scale genomic workflows.","short_abstract":"Large-scale genomic workflows used in precision medicine can process datasets spanning tens to hundreds of gigabytes per sample, leading to high memory spikes, intensive disk I/O, and task failures due to out-of-memory errors. Simple static resource allocation methods struggle to handle the variability in per-chromosom...","url_abs":"https://arxiv.org/abs/2511.15977","url_pdf":"https://arxiv.org/pdf/2511.15977v1","authors":"[\"Daniel Mas Montserrat\",\"Ray Verma\",\"Míriam Barrabés\",\"Francisco M. de la Vega\",\"Carlos D. Bustamante\",\"Alexander G. Ioannidis\"]","published":"2025-11-20T02:14:56Z","proceeding":"cs.DC","tasks":"[\"cs.DC\",\"cs.AI\",\"cs.LG\",\"cs.PF\",\"q-bio.GN\"]","methods":"[]","has_code":false}
