{"ID":2872461,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.08865","arxiv_id":"2509.08865","title":"TraceRAG: A LLM-Based Framework for Explainable Android Malware Detection and Behavior Analysis","abstract":"Sophisticated evasion tactics in malicious Android applications, combined with their intricate behavioral semantics, enable attackers to conceal malicious logic within legitimate functions, underscoring the critical need for robust and in-depth analysis frameworks. However, traditional analysis techniques often fail to recover deeply hidden behaviors or provide human-readable justifications for their decisions. Inspired by advances in large language models (LLMs), we introduce TraceRAG, a retrieval-augmented generation (RAG) framework that bridges natural language queries and Java code to deliver explainable malware detection and analysis. First, TraceRAG generates summaries of method-level code snippets, which are indexed in a vector database. At query time, behavior-focused questions retrieve the most semantically relevant snippets for deeper inspection. Finally, based on the multi-turn analysis results, TraceRAG produces human-readable reports that present the identified malicious behaviors and their corresponding code implementations. Experimental results demonstrate that our method achieves 96\\% malware detection accuracy and 83.81\\% behavior identification accuracy based on updated VirusTotal (VT) scans and manual verification. Furthermore, expert evaluation confirms the practical utility of the reports generated by TraceRAG.","short_abstract":"Sophisticated evasion tactics in malicious Android applications, combined with their intricate behavioral semantics, enable attackers to conceal malicious logic within legitimate functions, underscoring the critical need for robust and in-depth analysis frameworks. However, traditional analysis techniques often fail to...","url_abs":"https://arxiv.org/abs/2509.08865","url_pdf":"https://arxiv.org/pdf/2509.08865v1","authors":"[\"Guangyu Zhang\",\"Xixuan Wang\",\"Shiyu Sun\",\"Peiyan Xiao\",\"Kun Sun\",\"Yanhai Xiong\"]","published":"2025-09-10T06:07:12Z","proceeding":"cs.SE","tasks":"[\"cs.SE\"]","methods":"[\"RAG\",\"Large Language Model\",\"Language Model\"]","has_code":false}