{"ID":2823003,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2601.01042","arxiv_id":"2601.01042","title":"SeRe: A Security-Related Code Review Dataset Aligned with Real-World Review Activities","abstract":"Software security vulnerabilities can lead to severe consequences, making early detection essential. Although code review serves as a critical defense mechanism against security flaws, relevant feedback remains scarce due to limited attention to security issues or a lack of expertise among reviewers. Existing datasets and studies primarily focus on general-purpose code review comments, either lacking security-specific annotations or being too limited in scale to support large-scale research. To bridge this gap, we introduce \\textbf{SeRe}, a \\textbf{security-related code review dataset}, constructed using an active learning-based ensemble classification approach. The proposed approach iteratively refines model predictions through human annotations, achieving high precision while maintaining reasonable recall. Using the fine-tuned ensemble classifier, we extracted 6,732 security-related reviews from 373,824 raw review instances, ensuring representativeness across multiple programming languages. Statistical analysis indicates that SeRe generally \\textbf{aligns with real-world security-related review distribution}. To assess both the utility of SeRe and the effectiveness of existing code review comment generation approaches, we benchmark state-of-the-art approaches on security-related feedback generation. By releasing SeRe along with our benchmark results, we aim to advance research in automated security-focused code review and contribute to the development of more effective secure software engineering practices.","short_abstract":"Software security vulnerabilities can lead to severe consequences, making early detection essential. Although code review serves as a critical defense mechanism against security flaws, relevant feedback remains scarce due to limited attention to security issues or a lack of expertise among reviewers. Existing datasets...","url_abs":"https://arxiv.org/abs/2601.01042","url_pdf":"https://arxiv.org/pdf/2601.01042v1","authors":"[\"Zixiao Zhao\",\"Yanjie Jiang\",\"Hui Liu\",\"Kui Liu\",\"Lu Zhang\"]","published":"2026-01-03T02:39:53Z","proceeding":"cs.SE","tasks":"[\"cs.SE\"]","methods":"[]","has_code":false}
