{"ID":2885505,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.04740","arxiv_id":"2508.04740","title":"MissMecha: An All-in-One Python Package for Studying Missing Data Mechanisms","abstract":"Incomplete data is a persistent challenge in real-world datasets, often governed by complex and unobservable missing mechanisms. Simulating missingness has become a standard approach for understanding its impact on learning and analysis. However, existing tools are fragmented, mechanism-limited, and typically focus only on numerical variables, overlooking the heterogeneous nature of real-world tabular data. We present MissMecha, an open-source Python toolkit for simulating, visualizing, and evaluating missing data under MCAR, MAR, and MNAR assumptions. MissMecha supports both numerical and categorical features, enabling mechanism-aware studies across mixed-type tabular datasets. It includes visual diagnostics, MCAR testing utilities, and type-aware imputation evaluation metrics. Designed to support data quality research, benchmarking, and education,MissMecha offers a unified platform for researchers and practitioners working with incomplete data.","short_abstract":"Incomplete data is a persistent challenge in real-world datasets, often governed by complex and unobservable missing mechanisms. Simulating missingness has become a standard approach for understanding its impact on learning and analysis. However, existing tools are fragmented, mechanism-limited, and typically focus onl...","url_abs":"https://arxiv.org/abs/2508.04740","url_pdf":"https://arxiv.org/pdf/2508.04740v1","authors":"[\"Youran Zhou\",\"Mohamed Reda Bouadjenek\",\"Sunil Aryal\"]","published":"2025-08-06T02:40:45Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.MS\"]","methods":"[]","has_code":false}
