Early-Stage Prediction of Review Effort in AI-Generated Pull Requests

cs.SE arXiv:2601.00753
View PDF arXiv JSON

Abstract

As AI coding agents evolve from autocomplete tools to autonomous "AI workforce" teammates, they introduce a critical new bottleneck: human maintainers must now manage complex interaction loops rather than just reviewing code. Analyzing 33,707 agent-authored PRs, we uncover a stark two-regime reality: agents excel at narrow automation (28.3% of PRs merge instantly), but frequently fail at iterative refinement, leading to "ghosting" (abandonment) when faced with subjective feedback. This creates a hidden "attention tax" on maintainers. We introduce a creation-time Circuit Breaker model to predict high-maintenance PRs before human review begins. By leveraging simple static complexity cues (e.g., file types, patch size), our model identifies the "expensive tail" of contributions with AUC 0.96, enabling a gated triage process. At a 20% review budget, this approach captures 69% of the high-effort PRs, effectively allowing maintainers to fast-fail costly, low-quality agent contributions while fast-tracking simple fixes.

PDF Viewer