{"ID":3084550,"CreatedAt":"2026-06-05T06:46:15.197025399Z","UpdatedAt":"2026-06-06T15:44:26.945507316Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.05265","arxiv_id":"2606.05265","title":"Data-efficient flood depth prediction through domain-aware coreset selection and tabular foundation models","abstract":"Near-real-time flood depth prediction demands surrogate models that are accurate, fast, and transferable across watersheds. Supervised surrogates can match physics-based simulators in accuracy but need millions of training rows per watershed and cannot extrapolate beyond their original mesh. We propose a domain-aware coreset construction pipeline that conditions a tabular foundation model at inference time. The pipeline stratifies storms by return period and most-affected watershed, then samples hexagons with a target-aware spatial selector. With 0.7% of the per-watershed training pool, the model attains a mean $R^2$ of 0.663 across nine Houston-area watersheds, within 98.5% of the supervised reference ($R^2$ = 0.673). It transfers to held-out watersheds without task-specific retraining, staying ahead of a coreset-trained supervised baseline. On real storms it exceeds the supervised reference on a far out-of-distribution case and trails it on a mostly in-distribution one. Domain-aware coreset construction lets tabular foundation models deliver data-efficient, watershed-transferable flood predictions without per-watershed training.","short_abstract":"Near-real-time flood depth prediction demands surrogate models that are accurate, fast, and transferable across watersheds. Supervised surrogates can match physics-based simulators in accuracy but need millions of training rows per watershed and cannot extrapolate beyond their original mesh. We propose a domain-aware c...","url_abs":"https://arxiv.org/abs/2606.05265","url_pdf":"https://arxiv.org/pdf/2606.05265v1","authors":"[\"Lipai Huang\",\"Adithi Srinath\",\"Manas Singh\",\"Junwei Ma\",\"Ali Mostafavi\"]","published":"2026-06-03T16:25:38Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[]","has_code":false}
