{"ID":2827943,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.15230","arxiv_id":"2512.15230","title":"ColliderML: The First Release of an OpenDataDetector High-Luminosity Physics Benchmark Dataset","abstract":"We introduce ColliderML - a large, open, experiment-agnostic dataset of fully simulated and digitised proton-proton collisions in High-Luminosity Large Hadron Collider conditions ($\\sqrt{s}=14$ TeV, mean pile-up $μ= 200$). ColliderML provides one million events across ten Standard Model and Beyond Standard Model processes, plus extensive single-particle samples, all produced with modern next-to-leading order matrix element calculation and showering, realistic per-event pile-up overlay, a validated OpenDataDetector geometry, and standard reconstructions. The release fills a major gap for machine learning (ML) research on detector-level data, provided on the ML-friendly Hugging Face platform. We present physics coverage and the generation, simulation, digitisation and reconstruction pipeline, describe format and access, and initial collider physics benchmarks.","short_abstract":"We introduce ColliderML - a large, open, experiment-agnostic dataset of fully simulated and digitised proton-proton collisions in High-Luminosity Large Hadron Collider conditions ($\\sqrt{s}=14$ TeV, mean pile-up $μ= 200$). ColliderML provides one million events across ten Standard Model and Beyond Standard Model proces...","url_abs":"https://arxiv.org/abs/2512.15230","url_pdf":"https://arxiv.org/pdf/2512.15230v1","authors":"[\"Doğa Elitez\",\"Paul Gessinger\",\"Daniel Murnane\",\"Marcus Selchou Raaholt\",\"Andreas Salzburger\",\"Stine Kofoed Skov\",\"Andreas Stefl\",\"Anna Zaborowska\"]","published":"2025-12-17T09:30:44Z","proceeding":"hep-ex","tasks":"[\"hep-ex\",\"cs.LG\",\"physics.data-an\",\"physics.ins-det\"]","methods":"[]","has_code":false}
