{"ID":2849079,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.24367","arxiv_id":"2510.24367","title":"LLM-as-a-Judge for Software Engineering: Literature Review, Vision, and the Road Ahead","abstract":"The rapid integration of Large Language Models (LLMs) into software engineering (SE) has revolutionized tasks like code generation, producing a massive volume of software artifacts. This surge has exposed a critical bottleneck: the lack of scalable, reliable methods to evaluate these outputs. Human evaluation is costly and time-consuming, while traditional automated metrics like BLEU fail to capture nuanced quality aspects. In response, the LLM-as-a-Judge paradigm - using LLMs for automated evaluation - has emerged. This approach leverages the advanced reasoning of LLMs, offering a path toward human-like nuance at automated scale. However, LLM-as-a-Judge research in SE is still in its early stages. This forward-looking SE 2030 paper aims to steer the community toward advancing LLM-as-a-Judge for evaluating LLM-generated software artifacts. We provide a literature review of existing SE studies, analyze their limitations, identify key research gaps, and outline a detailed roadmap. We envision these frameworks as reliable, robust, and scalable human surrogates capable of consistent, multi-faceted artifact evaluation by 2030. Our work aims to foster research and adoption of LLM-as-a-Judge frameworks, ultimately improving the scalability of software artifact evaluation.","short_abstract":"The rapid integration of Large Language Models (LLMs) into software engineering (SE) has revolutionized tasks like code generation, producing a massive volume of software artifacts. This surge has exposed a critical bottleneck: the lack of scalable, reliable methods to evaluate these outputs. Human evaluation is costly...","url_abs":"https://arxiv.org/abs/2510.24367","url_pdf":"https://arxiv.org/pdf/2510.24367v1","authors":"[\"Junda He\",\"Jieke Shi\",\"Terry Yue Zhuo\",\"Christoph Treude\",\"Jiamou Sun\",\"Zhenchang Xing\",\"Xiaoning Du\",\"David Lo\"]","published":"2025-10-28T12:44:54Z","proceeding":"cs.SE","tasks":"[\"cs.SE\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
