{"ID":2847299,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.00619","arxiv_id":"2511.00619","title":"GDPR-Bench-Android: A Benchmark for Evaluating Automated GDPR Compliance Detection in Android","abstract":"Automating the detection of EU General Data Protection Regulation (GDPR) violations in source code is a critical but underexplored challenge. We introduce \\textbf{GDPR-Bench-Android}, the first comprehensive benchmark for evaluating diverse automated methods for GDPR compliance detection in Android applications. It contains \\textbf{1951} manually annotated violation instances from \\textbf{15} open-source repositories, covering 23 GDPR articles at file-, module-, and line-level granularities. To enable a multi-paradigm evaluation, we contribute \\textbf{Formal-AST}, a novel, source-code-native formal method that serves as a deterministic baseline. We define two tasks: (1) \\emph{multi-granularity violation localization}, evaluated via Accuracy@\\textit{k}; and (2) \\emph{snippet-level multi-label classification}, assessed by macro-F1 and other classification metrics. We benchmark 11 methods, including eight state-of-the-art LLMs, our Formal-AST analyzer, a retrieval-augmented (RAG) method, and an agentic (ReAct) method. Our findings reveal that no single paradigm excels across all tasks. For Task 1, the ReAct agent achieves the highest file-level Accuracy@1 (17.38%), while the Qwen2.5-72B LLM leads at the line level (61.60%), in stark contrast to the Formal-AST method's 1.86%. For the difficult multi-label Task 2, the Claude-Sonnet-4.5 LLM achieves the best Macro-F1 (5.75%), while the RAG method yields the highest Macro-Precision (7.10%). These results highlight the task-dependent strengths of different automated approaches and underscore the value of our benchmark in diagnosing their capabilities. All resources are available at: https://github.com/Haoyi-Zhang/GDPR-Bench-Android.","short_abstract":"Automating the detection of EU General Data Protection Regulation (GDPR) violations in source code is a critical but underexplored challenge. We introduce \\textbf{GDPR-Bench-Android}, the first comprehensive benchmark for evaluating diverse automated methods for GDPR compliance detection in Android applications. It con...","url_abs":"https://arxiv.org/abs/2511.00619","url_pdf":"https://arxiv.org/pdf/2511.00619v1","authors":"[\"Huaijin Ran\",\"Haoyi Zhang\",\"Xunzhu Tang\"]","published":"2025-11-01T16:49:43Z","proceeding":"cs.SE","tasks":"[\"cs.SE\"]","methods":"[\"Large Language Model\"]","has_code":false,"code_links":[{"ID":607511,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2847299,"paper_url":"https://arxiv.org/abs/2511.00619","paper_title":"GDPR-Bench-Android: A Benchmark for Evaluating Automated GDPR Compliance Detection in Android","repo_url":"https://github.com/Haoyi-Zhang/GDPR-Bench-Android","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
