{"ID":2872631,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.08625","arxiv_id":"2509.08625","title":"An upper bound on the silhouette evaluation metric for clustering","abstract":"The silhouette coefficient quantifies, for each observation, the balance between within-cluster cohesion and between-cluster separation, taking values in the range [-1,1]. The average silhouette width (ASW) is a widely used internal measure of clustering quality, with higher values indicating more cohesive and well-separated clusters. However, the dataset-specific maximum of ASW is typically unknown, and the standard upper limit of 1 is rarely attainable. In this work, we derive for each data point a sharp upper bound on its silhouette width and aggregate these to obtain a canonical upper bound for the ASW. This bound-often substantially below 1-enhances the interpretability of empirical ASW values by providing guidance on how close a given clustering result is to the best possible outcome for that dataset. We evaluate the usefulness of the upper bound on a variety of datasets and conclude that it can meaningfully enrich cluster quality evaluation; however, its practical relevance depends on the specific dataset. Finally, we extend the framework to establish an upper bound for the macro-averaged silhouette.","short_abstract":"The silhouette coefficient quantifies, for each observation, the balance between within-cluster cohesion and between-cluster separation, taking values in the range [-1,1]. The average silhouette width (ASW) is a widely used internal measure of clustering quality, with higher values indicating more cohesive and well-sep...","url_abs":"https://arxiv.org/abs/2509.08625","url_pdf":"https://arxiv.org/pdf/2509.08625v5","authors":"[\"Hugo Sträng\",\"Tai Dinh\"]","published":"2025-09-10T14:20:38Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[]","has_code":false}
