{"ID":2875010,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.03263","arxiv_id":"2509.03263","title":"Estudio de la eficiencia en la escalabilidad de GPUs para el entrenamiento de Inteligencia Artificial","abstract":"Training large-scale deep learning models has become a key challenge for the scientific community and industry. While the massive use of GPUs can significantly speed up training times, this approach has a negative impact on efficiency. In this article, we present a detailed analysis of the times reported by MLPerf Training v4.1 on four workloads: BERT, Llama2 LoRA, RetinaNet, and Stable Diffusion, showing that there are configurations that optimise the relationship between performance, GPU usage, and efficiency. The results point to a break-even point that allows training times to be reduced while maximising efficiency.","short_abstract":"Training large-scale deep learning models has become a key challenge for the scientific community and industry. While the massive use of GPUs can significantly speed up training times, this approach has a negative impact on efficiency. In this article, we present a detailed analysis of the times reported by MLPerf Trai...","url_abs":"https://arxiv.org/abs/2509.03263","url_pdf":"https://arxiv.org/pdf/2509.03263v1","authors":"[\"David Cortes\",\"Carlos Juiz\",\"Belen Bermejo\"]","published":"2025-09-03T12:24:42Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\",\"cs.PF\"]","methods":"[\"Diffusion Model\",\"LoRA\"]","has_code":false}