{"ID":2828638,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.14658","arxiv_id":"2512.14658","title":"gridfm-datakit-v1: A Python Library for Scalable and Realistic Power Flow and Optimal Power Flow Data Generation","abstract":"We introduce gridfm-datakit-v1, a Python library for generating realistic and diverse Power Flow (PF) and Optimal Power Flow (OPF) datasets for training Machine Learning (ML) solvers. Existing datasets and libraries face three main challenges: (1) lack of realistic stochastic load and topology perturbations, limiting scenario diversity; (2) PF datasets are restricted to OPF-feasible points, hindering generalization of ML solvers to cases that violate operating limits (e.g., branch overloads or voltage violations); and (3) OPF datasets use fixed generator cost functions, limiting generalization across varying costs. gridfm-datakit addresses these challenges by: (1) combining global load scaling from real-world profiles with localized noise and supporting arbitrary N-k topology perturbations to create diverse yet realistic datasets; (2) generating PF samples beyond operating limits; and (3) producing OPF data with varying generator costs. It also scales efficiently to large grids (up to 10,000 buses). Comparisons with OPFData, OPF-Learn, PGLearn, and PF$Δ$ are provided. Available on GitHub at https://github.com/gridfm/gridfm-datakit under Apache 2.0 and via `pip install gridfm-datakit`.","short_abstract":"We introduce gridfm-datakit-v1, a Python library for generating realistic and diverse Power Flow (PF) and Optimal Power Flow (OPF) datasets for training Machine Learning (ML) solvers. Existing datasets and libraries face three main challenges: (1) lack of realistic stochastic load and topology perturbations, limiting s...","url_abs":"https://arxiv.org/abs/2512.14658","url_pdf":"https://arxiv.org/pdf/2512.14658v1","authors":"[\"Alban Puech\",\"Matteo Mazzonelli\",\"Celia Cintas\",\"Tamara R. Govindasamy\",\"Mangaliso Mngomezulu\",\"Jonas Weiss\",\"Matteo Baù\",\"Anna Varbella\",\"François Mirallès\",\"Kibaek Kim\",\"Le Xie\",\"Hendrik F. Hamann\",\"Etienne Vos\",\"Thomas Brunschwiler\"]","published":"2025-12-16T18:17:50Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\",\"eess.SY\",\"math.OC\"]","methods":"[]","has_code":false,"code_links":[{"ID":605895,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2828638,"paper_url":"https://arxiv.org/abs/2512.14658","paper_title":"gridfm-datakit-v1: A Python Library for Scalable and Realistic Power Flow and Optimal Power Flow Data Generation","repo_url":"https://github.com/gridfm/gridfm-datakit","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
