{"ID":2894673,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.09942","arxiv_id":"2507.09942","title":"Green-LLM: Optimal Workload Allocation for Environmentally-Aware Distributed Inference","abstract":"This paper investigates the optimal allocation of large language model (LLM) inference workloads across heterogeneous edge data centers over time. Each data center features on-site renewable generation and faces dynamic electricity prices and spatiotemporal variability in renewable availability. We propose Green-LLM, a lexicographic multi-objective optimization framework that addresses this challenge without requiring manual weight tuning. The proposed model incorporates real-world constraints, including token-dependent processing delay and energy consumption, heterogeneous hardware capabilities, dynamic renewable generation, and spatiotemporal variations in electricity prices and carbon intensity. Unlike existing approaches that optimize individual environmental metrics in isolation, Green-LLM jointly minimizes operational cost, carbon emissions, and delay penalty while enforcing water consumption constraints to ensure both sustainability and quality-of-service requirements. Numerical results demonstrate that Green-LLM achieves significant reductions in carbon emissions and water consumption while maintaining operational costs within 3% of the minimum and ensuring sub-2-second response latency. These findings show that sustainable LLM inference can be achieved without sacrificing service quality or economic efficiency.","short_abstract":"This paper investigates the optimal allocation of large language model (LLM) inference workloads across heterogeneous edge data centers over time. Each data center features on-site renewable generation and faces dynamic electricity prices and spatiotemporal variability in renewable availability. We propose Green-LLM, a...","url_abs":"https://arxiv.org/abs/2507.09942","url_pdf":"https://arxiv.org/pdf/2507.09942v3","authors":"[\"Jiaming Cheng\",\"Duong Tung Nguyen\"]","published":"2025-07-14T05:32:32Z","proceeding":"cs.NI","tasks":"[\"cs.NI\",\"cs.DC\",\"eess.SY\",\"math.OC\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}