{"ID":2892203,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.19530","arxiv_id":"2507.19530","title":"When Validation Fails: Cross-Institutional Blood Pressure Prediction and the Limits of Electronic Health Record-Based Models","abstract":"External validation remains rare in healthcare machine learning despite being critical for establishing real-world feasibility. We developed an ensemble framework to predict blood pressure from electronic health records, incorporating rigorous data leakage prevention. Internal validation on the MIMIC-III dataset yielded moderate performance for systolic (R^2 = 0.248, RMSE = 14.84 mmHg) and diastolic (R^2 = 0.297, RMSE = 8.27 mmHg) blood pressure. However, external validation on the eICU dataset revealed substantial generalization challenges. Baseline systolic performance dropped significantly from R^2 = 0.248 to -0.024, with RMSE increasing from 14.84 to 18.69 mmHg. To address potential confounding from feature imputation, we conducted an intersection-only experiment using 16 universally available features; this yielded worse external performance (R^2 = -0.115, RMSE = 17.32 mmHg), proving imputation artifacts were not the primary cause. Attempts at post-hoc correction, including linear and isotonic recalibration (R^2 ranging from -0.170 to 0.024) and domain adaptation via covariate shift reweighting (R^2 = -0.141), showed limited gains. This highlights fundamental cross-institutional barriers. Our root-cause analysis identified three primary obstacles to generalizability: (1) site-specific feature distributions, even among standard physiological variables; (2) underlying patient population differences with unique pathophysiologies; and (3) institutional variations in measurement protocols creating non-transferable learned patterns. These findings demonstrate that strong internal performance cannot guarantee cross-institutional deployment success. Transparent reporting of validation failures is essential for setting realistic expectations for predictive models. Code is available at https://github.com/mdbasit897/ehr-bp-ensemble.","short_abstract":"External validation remains rare in healthcare machine learning despite being critical for establishing real-world feasibility. We developed an ensemble framework to predict blood pressure from electronic health records, incorporating rigorous data leakage prevention. Internal validation on the MIMIC-III dataset yielde...","url_abs":"https://arxiv.org/abs/2507.19530","url_pdf":"https://arxiv.org/pdf/2507.19530v2","authors":"[\"Md Basit Azam\",\"Sarangthem Ibotombi Singh\"]","published":"2025-07-21T11:15:33Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\"]","methods":"[]","has_code":false,"code_links":[{"ID":611968,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2892203,"paper_url":"https://arxiv.org/abs/2507.19530","paper_title":"When Validation Fails: Cross-Institutional Blood Pressure Prediction and the Limits of Electronic Health Record-Based Models","repo_url":"https://github.com/mdbasit897/ehr-bp-ensemble","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}