{"ID":2892726,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.22911","arxiv_id":"2507.22911","title":"ElectriQ: A Benchmark for Assessing the Response Capability of Large Language Models in Power Marketing","abstract":"As power systems decarbonise and digitalise, high penetrations of distributed energy resources and flexible tariffs make electric power marketing (EPM) a key interface between regulation, system operation and sustainable-energy deployment. Many utilities still rely on human agents and rule- or intent-based chatbots with fragmented knowledge bases that struggle with long, cross-scenario dialogues and fall short of requirements for compliant, verifiable and DR-ready interactions. Meanwhile, frontier large language models (LLMs) show strong conversational ability but are evaluated on generic benchmarks that underweight sector-specific terminology, regulatory reasoning and multi-turn process stability. To address this gap, we present ElectriQ, a large-scale benchmark and evaluation framework for LLMs in EPM. ElectriQ contains over 550k dialogues across six service domains and 24 sub-scenarios and defines a unified protocol that combines human ratings, automatic metrics and two compliance stress tests-Statutory Citation Correctness and Long-Dialogue Consistency. Building on ElectriQ, we propose SEEK-RAG, a retrieval-augmented method that injects policy and domain knowledge during finetuning and inference. Experiments on 13 LLMs show that domain-aligned 7B models with SEEK-RAG match or surpass much larger models while reducing computational cost, providing an auditable, regulation-aware basis for deploying LLM-based EPM assistants that support demand-side management, renewable integration and resilient grid operation.","short_abstract":"As power systems decarbonise and digitalise, high penetrations of distributed energy resources and flexible tariffs make electric power marketing (EPM) a key interface between regulation, system operation and sustainable-energy deployment. Many utilities still rely on human agents and rule- or intent-based chatbots wit...","url_abs":"https://arxiv.org/abs/2507.22911","url_pdf":"https://arxiv.org/pdf/2507.22911v2","authors":"[\"Jinzhi Wang\",\"Qingke Peng\",\"Haozhou Li\",\"Zeyuan Zeng\",\"Jiangbo Zhang\",\"Kaixuan Yang\",\"Ningyong Wu\",\"Qinfeng Song\",\"Ruimeng Li\",\"Biyi Zhou\"]","published":"2025-07-19T02:28:51Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
