{"ID":2852980,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.18075","arxiv_id":"2510.18075","title":"Batch Distillation Data for Developing Machine Learning Anomaly Detection Methods","abstract":"Machine learning (ML) holds great potential to advance anomaly detection (AD) in chemical processes. However, the development of ML-based methods is hindered by the lack of openly available experimental data. To address this gap, we have set up a laboratory-scale batch distillation plant and operated it to generate an extensive experimental database, covering fault-free experiments and experiments in which anomalies were intentionally induced, for training advanced ML-based AD methods. In total, 119 experiments were conducted across a wide range of operating conditions and mixtures. Most experiments containing anomalies were paired with a corresponding fault-free one. The database that we provide here includes time-series data from numerous sensors and actuators, along with estimates of measurement uncertainty. In addition, unconventional data sources -- such as concentration profiles obtained via online benchtop NMR spectroscopy and video and audio recordings -- are provided. Extensive metadata and expert annotations of all experiments are included. The anomaly annotations are based on an ontology developed in this work. The data are organized in a structured database and made freely available via doi.org/10.5281/zenodo.17395543. This new database paves the way for the development of advanced ML-based AD methods. As it includes information on the causes of anomalies, it further enables the development of interpretable and explainable ML approaches, as well as methods for anomaly mitigation.","short_abstract":"Machine learning (ML) holds great potential to advance anomaly detection (AD) in chemical processes. However, the development of ML-based methods is hindered by the lack of openly available experimental data. To address this gap, we have set up a laboratory-scale batch distillation plant and operated it to generate an...","url_abs":"https://arxiv.org/abs/2510.18075","url_pdf":"https://arxiv.org/pdf/2510.18075v2","authors":"[\"Justus Arweiler\",\"Indra Jungjohann\",\"Aparna Muraleedharan\",\"Heike Leitte\",\"Jakob Burger\",\"Kerstin Münnemann\",\"Fabian Jirasek\",\"Hans Hasse\"]","published":"2025-10-20T20:13:31Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Generative Adversarial Network\"]","has_code":false}
