ClustRecNet: A Novel End-to-End Deep Learning Framework for Clustering Algorithm Recommendation
Abstract
In unsupervised learning, identifying an effective clustering algorithm for a given tabular dataset remains a fundamental challenge. We introduce ClustRecNet, a novel end-to-end deep learning framework that recommends a suitable clustering algorithm by directly learning high-order representations of raw tabular data. To facilitate robust meta-learning, we construct a comprehensive repository of 34,000 synthetic datasets with diverse structures, run 10 prominent clustering algorithms, and use Adjusted Rand Index (ARI) to establish ground-truth labels. ClustRecNet integrates convolutional, residual, and attention mechanisms to capture both local/global structural patterns, effectively bypassing the knowledge bottleneck associated with manual feature engineering. Extensive evaluations on both synthetic and real-world benchmarks demonstrate that ClustRecNet consistently outperforms state-of-the-art Automated Machine Learning (AutoML) approaches, including ML2DAC and AutoML4Clust. Our framework achieves an average 0.497 ARI gain over the well-known Calinski-Harabasz cluster validity index on synthetic data and an average 15.3% ARI improvement over the leading AutoML approach (ML2DAC) on real-world benchmarks. To the best of our knowledge, we are the first to successively apply deep learning to automatically recommend suitable clustering algorithms for tabular data at hand.