{"ID":2863195,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.24158","arxiv_id":"2509.24158","title":"Blockwise Missingness meets AI: A Tractable Solution for Semiparametric Inference","abstract":"We consider parameter estimation and inference when data feature blockwise, non-monotone missingness. Our approach, rooted in semiparametric theory and inspired by prediction-powered inference, leverages off-the-shelf AI (predictive or generative) models to handle missing completely at random mechanisms, by finding an approximation of the optimal estimating equation through a novel and tractable Restricted Anova hierarchY (RAY) approximation. The resulting Inference for Blockwise Missingness(RAY), or IBM(RAY) estimator incorporates pre-trained AI models and carefully controls asymptotic variance by tuning model-specific hyperparameters. We then extend IBM(RAY) to a general class of estimators. We find the most efficient estimator in this class, which we call IBM(Adaptive), by solving a constrained quadratic programming problem. All IBM estimators are unbiased, and, crucially, asymptotically achieving guaranteed efficiency gains over a naive complete-case estimator, regardless of the predictive accuracy of the AI models used. We demonstrate the finite-sample performance and numerical stability of our method through simulation studies and an application to surface protein abundance estimation.","short_abstract":"We consider parameter estimation and inference when data feature blockwise, non-monotone missingness. Our approach, rooted in semiparametric theory and inspired by prediction-powered inference, leverages off-the-shelf AI (predictive or generative) models to handle missing completely at random mechanisms, by finding an...","url_abs":"https://arxiv.org/abs/2509.24158","url_pdf":"https://arxiv.org/pdf/2509.24158v1","authors":"[\"Qi Xu\",\"Lorenzo Testa\",\"Jing Lei\",\"Kathryn Roeder\"]","published":"2025-09-29T01:17:28Z","proceeding":"stat.ME","tasks":"[\"stat.ME\",\"stat.ML\"]","methods":"[]","has_code":false}
