{"ID":2875390,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.02648","arxiv_id":"2509.02648","title":"Optimizing Prognostic Biomarker Discovery in Pancreatic Cancer Through Hybrid Ensemble Feature Selection and Multi-Omics Data","abstract":"Prediction of patient survival using high-dimensional multi-omics data requires systematic feature selection methods that ensure predictive performance, sparsity, and reliability for prognostic biomarker discovery. We developed a hybrid ensemble feature selection (hEFS) approach that combines data subsampling with multiple prognostic models, integrating both embedded and wrapper-based strategies for survival prediction. Omics features are ranked using a voting-theory-inspired aggregation mechanism across models and subsamples, while the optimal number of features is selected via a Pareto front, balancing predictive accuracy and model sparsity without any user-defined thresholds. When applied to multi-omics datasets from three pancreatic cancer cohorts, hEFS identifies significantly fewer and more stable biomarkers compared to the conventional, late-fusion CoxLasso models, while maintaining comparable discrimination performance. Implemented within the open-source mlr3fselect R package, hEFS offers a robust, interpretable, and clinically valuable tool for prognostic modelling and biomarker discovery in high-dimensional survival settings.","short_abstract":"Prediction of patient survival using high-dimensional multi-omics data requires systematic feature selection methods that ensure predictive performance, sparsity, and reliability for prognostic biomarker discovery. We developed a hybrid ensemble feature selection (hEFS) approach that combines data subsampling with mult...","url_abs":"https://arxiv.org/abs/2509.02648","url_pdf":"https://arxiv.org/pdf/2509.02648v1","authors":"[\"John Zobolas\",\"Anne-Marie George\",\"Alberto López\",\"Sebastian Fischer\",\"Marc Becker\",\"Tero Aittokallio\"]","published":"2025-09-02T11:09:24Z","proceeding":"q-bio.GN","tasks":"[\"q-bio.GN\",\"cs.LG\",\"q-bio.QM\",\"stat.AP\"]","methods":"[]","has_code":false}
