{"ID":2826177,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.19772","arxiv_id":"2512.19772","title":"A K-Means, Ward and DBSCAN repeatability study","abstract":"Reproducibility is essential in machine learning because it ensures that a model or experiment yields the same scientific conclusion. For specific algorithms repeatability with bitwise identical results is also a key for scientific integrity because it allows debugging. We decomposed several very popular clustering algorithms: K-Means, DBSCAN and Ward into their fundamental steps, and we identify the conditions required to achieve repeatability at each stage. We use an implementation example with the Python library scikit-learn to examine the repeatable aspects of each method. Our results reveal inconsistent results with K-Means when the number of OpenMP threads exceeds two. This work aims to raise awareness of this issue among both users and developers, encouraging further investigation and potential fixes.","short_abstract":"Reproducibility is essential in machine learning because it ensures that a model or experiment yields the same scientific conclusion. For specific algorithms repeatability with bitwise identical results is also a key for scientific integrity because it allows debugging. We decomposed several very popular clustering alg...","url_abs":"https://arxiv.org/abs/2512.19772","url_pdf":"https://arxiv.org/pdf/2512.19772v1","authors":"[\"Anthony Bertrand\",\"Engelbert Mephu Nguifo\",\"Violaine Antoine\",\"David Hill\"]","published":"2025-12-22T09:30:33Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\"]","methods":"[]","has_code":false}
