{"ID":3053305,"CreatedAt":"2026-06-04T04:41:36.695875263Z","UpdatedAt":"2026-06-06T00:47:47.621308184Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.04320","arxiv_id":"2606.04320","title":"OpenRFM: Dissecting Relational In-Context Learning","abstract":"Relational Foundation Models (RFMs) promise a single pre-trained predictor that, given any relational database, returns predictions in one forward pass via relational in-context learning (ICL). Yet a substantial gap separates open RFMs from their commercial counterparts, and the origin of this gap has not been systematically understood. We dissect a representative framework, the Relational Transformer (RT), from two perspectives. Model side: we show that RT performs relation-level ICL, and a kernel regression view shows it fails when sparse label-cell coverage yields an underdetermined regression. Data side: we ablate RT's pre-training source and find that existing synthetic-only pre-training and in-distribution pre-training drive the same architecture into different regimes, lazy vs. feature-learning. Probing this gap reveals that the missing ingredient is a support-identifiable relational latent in the label-generation process. These two diagnoses translate into (1) a dual-stage ICL architecture that combines the relational backbone with a batch-level ICL layer lifted from a pre-trained tabular foundation model to overcome relation-level label scarcity, and (2) a homophily-aware synthetic plus continual real-data pre-training mixture, augmented with a prototype-based regularization. These choices define OpenRFM, a simple yet effective RFM that improves average task performance by approximately 30% over the RT backbone and surpasses the commercial model KumoRFMv1 on a large set of evaluation tasks.","short_abstract":"Relational Foundation Models (RFMs) promise a single pre-trained predictor that, given any relational database, returns predictions in one forward pass via relational in-context learning (ICL). Yet a substantial gap separates open RFMs from their commercial counterparts, and the origin of this gap has not been systemat...","url_abs":"https://arxiv.org/abs/2606.04320","url_pdf":"https://arxiv.org/pdf/2606.04320v1","authors":"[\"Zhikai Chen\",\"Junyu Yin\",\"Jialiang Gu\",\"Siheng Xiong\",\"Xiaoze Liu\",\"Ruowang Zhang\",\"Keren Zhou\",\"Kai Guo\"]","published":"2026-06-03T00:48:44Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\"]","methods":"[\"Transformer\"]","has_code":false}
