{"ID":3004865,"CreatedAt":"2026-06-03T03:09:48.883664427Z","UpdatedAt":"2026-06-05T11:43:53.432517148Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.03549","arxiv_id":"2606.03549","title":"How Many Trees in a Random Forest? A Revisited Approach with Plateau Search and Optuna Integration","abstract":"Hyperparameter optimization (HPO) for Random Forest faces a specific difficulty in tuning the number of trees: the predictive score typically improves monotonically with ensemble size, so standard methods such as Tree-structured Parzen Estimator (TPE) and Hyperband require a predefined search range and often drive the estimate toward its right boundary. Early-stopping strategies avoid fixing such a range, but can be sensitive to score noise and prone to premature stopping. To address this, we propose an integrated triplet-based plateau-search algorithm that removes the number of trees from the direct TPE search space and still exploits information accumulated across HPO trials. The method adaptively tracks a near-minimal sufficient ensemble size by monitoring relative changes in the out-of-bag (OOB) score across a triplet of forest sizes and shifting this triplet accordingly. This yields an automated and user-interpretable procedure based on a tolerance parameter. We also provide a theoretical analysis: we relate the proposed relative OOB-score criterion to the gap between the current and limiting scores, and derive an asymptotic variance estimate for the corresponding OOB-based absolute relative difference. Experiments show that the selected number of trees can differ substantially from the common heuristic: for most classical benchmark datasets it is smaller, whereas for some high-dimensional bioinformatics datasets, such as Arcene and Dorothea, it is larger. The source code and reproducible experiments are available at https://github.com/lange-am/rf_plateau_hpo.","short_abstract":"Hyperparameter optimization (HPO) for Random Forest faces a specific difficulty in tuning the number of trees: the predictive score typically improves monotonically with ensemble size, so standard methods such as Tree-structured Parzen Estimator (TPE) and Hyperband require a predefined search range and often drive the...","url_abs":"https://arxiv.org/abs/2606.03549","url_pdf":"https://arxiv.org/pdf/2606.03549v1","authors":"[\"Vadim Porvatov\",\"Andrey Dukhovny\",\"Andrey Lange\"]","published":"2026-06-02T12:10:43Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"math.PR\"]","methods":"[]","has_code":false,"code_links":[{"ID":612713,"CreatedAt":"2026-06-03T03:09:48.883664427Z","UpdatedAt":"2026-06-03T03:09:48.883664427Z","DeletedAt":null,"paper_id":3004865,"paper_url":"https://arxiv.org/abs/2606.03549","paper_title":"How Many Trees in a Random Forest? A Revisited Approach with Plateau Search and Optuna Integration","repo_url":"https://github.com/lange-am/rf_plateau_hpo","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
