{"ID":2896668,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.08867","arxiv_id":"2507.08867","title":"Mind the Gap: Navigating Inference with Optimal Transport Maps","abstract":"Machine learning (ML) techniques have recently enabled enormous gains in sensitivity to new phenomena across the sciences. In particle physics, much of this progress has relied on excellent simulations of a wide range of physical processes. However, due to the sophistication of modern machine learning algorithms and their reliance on high-quality training samples, discrepancies between simulation and experimental data can significantly limit their effectiveness. In this work, we present a solution to this ``misspecification'' problem: a model calibration approach based on optimal transport, which we apply to high-dimensional simulations for the first time. We demonstrate the performance of our approach through jet tagging, using a dataset inspired by the CMS experiment at the Large Hadron Collider. A 128-dimensional internal jet representation from a powerful general-purpose classifier is studied; after calibrating this internal ``latent'' representation, we find that a wide variety of quantities derived from it for downstream tasks are also properly calibrated: using this calibrated high-dimensional representation, powerful new applications of jet flavor information can be utilized in LHC analyses. This is a key step toward allowing the unbiased use of ``foundation models'' in particle physics. More broadly, this calibration framework has broad applications for correcting high-dimensional simulations across the sciences.","short_abstract":"Machine learning (ML) techniques have recently enabled enormous gains in sensitivity to new phenomena across the sciences. In particle physics, much of this progress has relied on excellent simulations of a wide range of physical processes. However, due to the sophistication of modern machine learning algorithms and th...","url_abs":"https://arxiv.org/abs/2507.08867","url_pdf":"https://arxiv.org/pdf/2507.08867v2","authors":"[\"Malte Algren\",\"Tobias Golling\",\"Francesco Armando Di Bello\",\"Christopher Pollard\"]","published":"2025-07-09T16:28:21Z","proceeding":"physics.data-an","tasks":"[\"physics.data-an\",\"cs.LG\",\"hep-ex\",\"stat.ML\"]","methods":"[]","has_code":false}
