{"ID":3049933,"CreatedAt":"2026-06-04T02:13:16.786527022Z","UpdatedAt":"2026-06-06T15:44:26.945507316Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.05073","arxiv_id":"2606.05073","title":"Learning What Not to Impute: An Uncertainty-Aware Diffusion Framework for Meaningful Missingness","abstract":"Missing value imputation is a fundamental task in machine learning, with most existing methods assuming that all missing entries correspond to unobserved regular values. In many real-world datasets, however, missingness may arise from two distinct sources: some entries are meaningfully missing (intrinsically absent and semantically valid), while others are missing due to the observation process and should be imputed. We formalize this distinction as a selective imputation problem, where the goal is to jointly infer which missing entries should be preserved and which should be recovered. To address this challenge, we propose Diff-Joint, a diffusion-based framework that jointly models tabular data together with a latent missingness mask. The method alternates between conditional sampling and uncertainty-aware aggregation to iteratively refine both imputed values and missingness labels. Empirical results on synthetic and real-world datasets demonstrate that Diff-Joint effectively identifies meaningfully missing entries while achieving competitive imputation accuracy and improved downstream task performance.","short_abstract":"Missing value imputation is a fundamental task in machine learning, with most existing methods assuming that all missing entries correspond to unobserved regular values. In many real-world datasets, however, missingness may arise from two distinct sources: some entries are meaningfully missing (intrinsically absent and...","url_abs":"https://arxiv.org/abs/2606.05073","url_pdf":"https://arxiv.org/pdf/2606.05073v1","authors":"[\"Lixing Zhang\",\"Yidong Ouyang\",\"Weifu Li\",\"Shixiang Zhu\",\"Guang Cheng\",\"Liyan Xie\"]","published":"2026-06-03T16:31:54Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Diffusion Model\"]","has_code":false}
