{"ID":2829381,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.12801","arxiv_id":"2512.12801","title":"Fine-Grained Energy Prediction For Parallellized LLM Inference With PIE-P","abstract":"With the widespread adoption of Large Language Models (LLMs), energy costs of running LLMs is quickly becoming a critical concern. However, precisely measuring the energy consumption of LLMs is often infeasible because hardware-based power monitors are not always accessible and software-based energy measurement tools are not accurate. While various prediction techniques have been developed to estimate LLM energy consumption, these approaches are limited to single-GPU environments and thus are not applicable to modern LLM inference which is typically parallelized across multiple GPUs. In this work, we remedy this gap and introduce PIE-P, a fine-grained energy prediction framework for multi-GPU inference, including tensor, pipeline, and data parallelism. Predicting the energy under parallelized inference is complicated by the non-determinism in inter-GPU communication, additional communication overheads, and difficulties in isolating energy during the communication/synchronization phase. We develop a scalable prediction framework that addresses these issues via precise sampling, fine-grained modeling of inter-GPU communication, and careful accounting of parallelization overhead. Our evaluation results show that PIE-P yields accurate and fine-grained energy predictions across parallelism strategies, significantly outperforming baselines.","short_abstract":"With the widespread adoption of Large Language Models (LLMs), energy costs of running LLMs is quickly becoming a critical concern. However, precisely measuring the energy consumption of LLMs is often infeasible because hardware-based power monitors are not always accessible and software-based energy measurement tools a...","url_abs":"https://arxiv.org/abs/2512.12801","url_pdf":"https://arxiv.org/pdf/2512.12801v1","authors":"[\"Anurag Dutt\",\"Young Won Choi\",\"Avirup Sil\",\"Anshul Gandhi\",\"Aruna Balasubramanian\",\"Niranjan Balasubramanian\"]","published":"2025-12-14T18:50:51Z","proceeding":"cs.DC","tasks":"[\"cs.DC\",\"cs.PF\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
