{"ID":2825826,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.20328","arxiv_id":"2512.20328","title":"Toward Explaining Large Language Models in Software Engineering Tasks","abstract":"Recent progress in Large Language Models (LLMs) has substantially advanced the automation of software engineering (SE) tasks, enabling complex activities such as code generation and code summarization. However, the black-box nature of LLMs remains a major barrier to their adoption in high-stakes and safety-critical domains, where explainability and transparency are vital for trust, accountability, and effective human supervision. Despite increasing interest in explainable AI for software engineering, existing methods lack domain-specific explanations aligned with how practitioners reason about SE artifacts. To address this gap, we introduce FeatureSHAP, the first fully automated, model-agnostic explainability framework tailored to software engineering tasks. Based on Shapley values, FeatureSHAP attributes model outputs to high-level input features through systematic input perturbation and task-specific similarity comparisons, while remaining compatible with both open-source and proprietary LLMs. We evaluate FeatureSHAP on two bi-modal SE tasks: code generation and code summarization. The results show that FeatureSHAP assigns less importance to irrelevant input features and produces explanations with higher fidelity than baseline methods. A practitioner survey involving 37 participants shows that FeatureSHAP helps practitioners better interpret model outputs and make more informed decisions. Collectively, FeatureSHAP represents a meaningful step toward practical explainable AI in software engineering. FeatureSHAP is available at https://github.com/deviserlab/FeatureSHAP.","short_abstract":"Recent progress in Large Language Models (LLMs) has substantially advanced the automation of software engineering (SE) tasks, enabling complex activities such as code generation and code summarization. However, the black-box nature of LLMs remains a major barrier to their adoption in high-stakes and safety-critical dom...","url_abs":"https://arxiv.org/abs/2512.20328","url_pdf":"https://arxiv.org/pdf/2512.20328v1","authors":"[\"Antonio Vitale\",\"Khai-Nguyen Nguyen\",\"Denys Poshyvanyk\",\"Rocco Oliveto\",\"Simone Scalabrino\",\"Antonio Mastropaolo\"]","published":"2025-12-23T12:56:18Z","proceeding":"cs.SE","tasks":"[\"cs.SE\",\"cs.AI\",\"cs.LG\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":605703,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2825826,"paper_url":"https://arxiv.org/abs/2512.20328","paper_title":"Toward Explaining Large Language Models in Software Engineering Tasks","repo_url":"https://github.com/deviserlab/FeatureSHAP","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
