{"ID":2842977,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.09665","arxiv_id":"2511.09665","title":"Generalization Can Emerge in Tabular Foundation Models From a Single Table","abstract":"Deep tabular modelling increasingly relies on in-context learning where, during inference, a model receives a set of $(x,y)$ pairs as context and predicts labels for new inputs without weight updates. We challenge the prevailing view that broad generalization here requires pre-training on large synthetic corpora (e.g., TabPFN priors) or a large collection of real data (e.g., TabDPT training datasets), discovering that a relatively small amount of data suffices for generalization. We find that simple self-supervised pre-training on just a \\emph{single} real table can produce surprisingly strong transfer across heterogeneous benchmarks. By systematically pre-training and evaluating on many diverse datasets, we analyze what aspects of the data are most important for building a Tabular Foundation Model (TFM) generalizing across domains. We then connect this to the pre-training procedure shared by most TFMs and show that the number and quality of \\emph{tasks} one can construct from a dataset is key to downstream performance.","short_abstract":"Deep tabular modelling increasingly relies on in-context learning where, during inference, a model receives a set of $(x,y)$ pairs as context and predicts labels for new inputs without weight updates. We challenge the prevailing view that broad generalization here requires pre-training on large synthetic corpora (e.g.,...","url_abs":"https://arxiv.org/abs/2511.09665","url_pdf":"https://arxiv.org/pdf/2511.09665v1","authors":"[\"Junwei Ma\",\"Nour Shaheen\",\"Alex Labach\",\"Amine Mhedhbi\",\"Frank Hutter\",\"Anthony L. Caterini\",\"Valentin Thomas\"]","published":"2025-11-12T19:12:40Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[]","has_code":false}
