{"ID":2877412,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.03373","arxiv_id":"2509.03373","title":"Cluster and then Embed: A Modular Approach for Visualization","abstract":"Dimensionality reduction methods such as t-SNE and UMAP are popular methods for visualizing data with a potential (latent) clustered structure. They are known to group data points at the same time as they embed them, resulting in visualizations with well-separated clusters that preserve local information well. However, t-SNE and UMAP also tend to distort the global geometry of the underlying data. We propose a more transparent, modular approach consisting of first clustering the data, then embedding each cluster, and finally aligning the clusters to obtain a global embedding. We demonstrate this approach on several synthetic and real-world datasets and show that it is competitive with existing methods, while being much more transparent.","short_abstract":"Dimensionality reduction methods such as t-SNE and UMAP are popular methods for visualizing data with a potential (latent) clustered structure. They are known to group data points at the same time as they embed them, resulting in visualizations with well-separated clusters that preserve local information well. However,...","url_abs":"https://arxiv.org/abs/2509.03373","url_pdf":"https://arxiv.org/pdf/2509.03373v1","authors":"[\"Elizabeth Coda\",\"Ery Arias-Castro\",\"Gal Mishne\"]","published":"2025-08-27T00:27:30Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"stat.ME\",\"stat.ML\"]","methods":"[]","has_code":false}
