{"ID":2890478,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.19204","arxiv_id":"2507.19204","title":"Should Top-Down Clustering Affect Boundaries in Unsupervised Word Discovery?","abstract":"We investigate the problem of segmenting unlabeled speech into word-like units and clustering these to create a lexicon. Prior work can be categorized into two frameworks. Bottom-up methods first determine boundaries and then cluster the fixed segmented words into a lexicon. In contrast, top-down methods incorporate information from the clustered words to inform boundary selection. However, it is unclear whether top-down information is necessary to improve segmentation. To explore this, we look at two similar approaches that differ in whether top-down clustering informs boundary selection. Our simple bottom-up strategy predicts word boundaries using the dissimilarity between adjacent self-supervised features, then clusters the resulting segments to construct a lexicon. Our top-down system is an updated version of the ES-KMeans dynamic programming method that iteratively uses K-means to update its boundaries. On the five-language ZeroSpeech benchmarks, both approaches achieve comparable state-of-the-art results, with the bottom-up system being nearly five times faster. Through detailed analyses, we show that the top-down influence of ES-KMeans can be beneficial (depending on factors like the candidate boundaries), but in many cases the simple bottom-up method performs just as well. For both methods, we show that the clustering step is a limiting factor. Therefore, we recommend that future work focus on improved clustering techniques and learning more discriminative word-like representations. Project code repository: https://github.com/s-malan/prom-seg-clus.","short_abstract":"We investigate the problem of segmenting unlabeled speech into word-like units and clustering these to create a lexicon. Prior work can be categorized into two frameworks. Bottom-up methods first determine boundaries and then cluster the fixed segmented words into a lexicon. In contrast, top-down methods incorporate in...","url_abs":"https://arxiv.org/abs/2507.19204","url_pdf":"https://arxiv.org/pdf/2507.19204v2","authors":"[\"Simon Malan\",\"Benjamin van Niekerk\",\"Herman Kamper\"]","published":"2025-07-25T12:19:16Z","proceeding":"eess.AS","tasks":"[\"eess.AS\",\"cs.CL\",\"cs.SD\"]","methods":"[]","has_code":false,"code_links":[{"ID":611785,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2890478,"paper_url":"https://arxiv.org/abs/2507.19204","paper_title":"Should Top-Down Clustering Affect Boundaries in Unsupervised Word Discovery?","repo_url":"https://github.com/s-malan/prom-seg-clus","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
