{"ID":2860711,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.02625","arxiv_id":"2510.02625","title":"TabImpute: Universal Zero-Shot Imputation for Tabular Data","abstract":"Missing data is a widespread problem in tabular settings. Existing solutions range from simple averaging to complex generative adversarial networks, but due to each method's large variance in performance across real-world domains and time-consuming hyperparameter tuning, no universal imputation method exists. This performance variance is particularly pronounced in small datasets, where the models have the least amount of information. Building on TabPFN, a recent tabular foundation model for supervised learning, we propose TabImpute, a pre-trained transformer that delivers accurate and fast zero-shot imputations, requiring no fitting or hyperparameter tuning at inference time. To train and evaluate TabImpute, we introduce (i) an entry-wise featurization for tabular settings, enabling a 100x speedup over the previous TabPFN imputation method, (ii) a synthetic training data generation pipeline incorporating a diverse set of missingness patterns to enhance accuracy on real-world missing data problems, and (iii) MissBench, a comprehensive benchmark with 42 OpenML tables and 13 new missingness patterns. MissBench spans domains such as medicine, finance, and engineering, showcasing TabImpute's robust performance compared to numerous established imputation methods.","short_abstract":"Missing data is a widespread problem in tabular settings. Existing solutions range from simple averaging to complex generative adversarial networks, but due to each method's large variance in performance across real-world domains and time-consuming hyperparameter tuning, no universal imputation method exists. This perf...","url_abs":"https://arxiv.org/abs/2510.02625","url_pdf":"https://arxiv.org/pdf/2510.02625v4","authors":"[\"Jacob Feitelberg\",\"Dwaipayan Saha\",\"Kyuseong Choi\",\"Zaid Ahmad\",\"Anish Agarwal\",\"Raaz Dwivedi\"]","published":"2025-10-03T00:08:54Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Transformer\"]","has_code":false}
