{"ID":2830977,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.08179","arxiv_id":"2512.08179","title":"Distributional Random Forests for Complex Survey Designs on Reproducing Kernel Hilbert Spaces","abstract":"We study estimation of the conditional law $P(Y|X=x)$ and continuous functionals $Ψ(P(Y|X=x))$ when $Y$ takes values in a locally compact Polish space, $X \\in \\mathbb{R}^p$, and the observations arise from a complex survey design. We propose a survey-calibrated distributional random forest (SDRF) that incorporates complex-design features via a pseudo-population bootstrap, PSU-level honesty, and a Maximum Mean Discrepancy (MMD) split criterion computed from kernel mean embeddings of Hájek-type (design-weighted) node distributions. We provide a framework for analyzing forest-style estimators under survey designs; establish design consistency for the finite-population target and model consistency for the super-population target under explicit conditions on the design, kernel, resampling multipliers, and tree partitions. As far as we are aware, these are the first results on model-free estimation of conditional distributions under survey designs. Simulations under a stratified two-stage cluster design provide finite sample performance and demonstrate the statistical error price of ignoring the survey design. The broad applicability of SDRF is demonstrated using NHANES: We estimate the tolerance regions of the conditional joint distribution of two diabetes biomarkers, illustrating how distributional heterogeneity can support subgroup-specific risk profiling for diabetes mellitus in the U.S. population.","short_abstract":"We study estimation of the conditional law $P(Y|X=x)$ and continuous functionals $Ψ(P(Y|X=x))$ when $Y$ takes values in a locally compact Polish space, $X \\in \\mathbb{R}^p$, and the observations arise from a complex survey design. We propose a survey-calibrated distributional random forest (SDRF) that incorporates comp...","url_abs":"https://arxiv.org/abs/2512.08179","url_pdf":"https://arxiv.org/pdf/2512.08179v2","authors":"[\"Yating Zou\",\"Marcos Matabuena\",\"Michael R. Kosorok\"]","published":"2025-12-09T02:18:19Z","proceeding":"stat.ME","tasks":"[\"stat.ME\",\"stat.ML\"]","methods":"[]","has_code":false}
