{"ID":2829076,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.13565","arxiv_id":"2512.13565","title":"A Nonparametric Statistics Approach to Feature Selection in Deep Neural Networks with Theoretical Guarantees","abstract":"This paper tackles the problem of feature selection in a highly challenging setting: $\\mathbb{E}(y | \\boldsymbol{x}) = G(\\boldsymbol{x}_{\\mathcal{S}_0})$, where $\\mathcal{S}_0$ is the set of relevant features and $G$ is an unknown, potentially nonlinear function subject to mild smoothness conditions. Our approach begins with feature selection in deep neural networks, then generalizes the results to H{ö}lder smooth functions by exploiting the strong approximation capabilities of neural networks. Unlike conventional optimization-based deep learning methods, we reformulate neural networks as index models and estimate $\\mathcal{S}_0$ using the second-order Stein's formula. This gradient-descent-free strategy guarantees feature selection consistency with a sample size requirement of $n = Ω(p^2)$, where $p$ is the feature dimension. To handle high-dimensional scenarios, we further introduce a screening-and-selection mechanism that achieves nonlinear selection consistency when $n = Ω(s \\log p)$, with $s$ representing the sparsity level. Additionally, we refit a neural network on the selected features for prediction and establish performance guarantees under a relaxed sparsity assumption. Extensive simulations and real-data analyses demonstrate the strong performance of our method even in the presence of complex feature interactions.","short_abstract":"This paper tackles the problem of feature selection in a highly challenging setting: $\\mathbb{E}(y | \\boldsymbol{x}) = G(\\boldsymbol{x}_{\\mathcal{S}_0})$, where $\\mathcal{S}_0$ is the set of relevant features and $G$ is an unknown, potentially nonlinear function subject to mild smoothness conditions. Our approach begin...","url_abs":"https://arxiv.org/abs/2512.13565","url_pdf":"https://arxiv.org/pdf/2512.13565v1","authors":"[\"Junye Du\",\"Zhenghao Li\",\"Zhutong Gu\",\"Long Feng\"]","published":"2025-12-15T17:22:49Z","proceeding":"stat.ML","tasks":"[\"stat.ML\",\"cs.LG\"]","methods":"[]","has_code":false}
