{"ID":2896176,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.07829","arxiv_id":"2507.07829","title":"Towards Benchmarking Foundation Models for Tabular Data With Text","abstract":"Foundation models for tabular data are rapidly evolving, with increasing interest in extending them to support additional modalities such as free-text features. However, existing benchmarks for tabular data rarely include textual columns, and identifying real-world tabular datasets with semantically rich text features is non-trivial. We propose a series of simple yet effective ablation-style strategies for incorporating text into conventional tabular pipelines. Moreover, we benchmark how state-of-the-art tabular foundation models can handle textual data by manually curating a collection of real-world tabular datasets with meaningful textual features. Our study is an important step towards improving benchmarking of foundation models for tabular data with text.","short_abstract":"Foundation models for tabular data are rapidly evolving, with increasing interest in extending them to support additional modalities such as free-text features. However, existing benchmarks for tabular data rarely include textual columns, and identifying real-world tabular datasets with semantically rich text features...","url_abs":"https://arxiv.org/abs/2507.07829","url_pdf":"https://arxiv.org/pdf/2507.07829v1","authors":"[\"Martin Mráz\",\"Breenda Das\",\"Anshul Gupta\",\"Lennart Purucker\",\"Frank Hutter\"]","published":"2025-07-10T15:01:31Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[]","has_code":false}
