{"ID":2833971,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.02669","arxiv_id":"2512.02669","title":"SAND Challenge: Four Approaches for Dysartria Severity Classification","abstract":"This paper presents a unified study of four distinct modeling approaches for classifying dysarthria severity in the Speech Analysis for Neurodegenerative Diseases (SAND) challenge. All models tackle the same five class classification task using a common dataset of speech recordings. We investigate: (1) a ViT-OF method leveraging a Vision Transformer on spectrogram images, (2) a 1D-CNN approach using eight 1-D CNN's with majority-vote fusion, (3) a BiLSTM-OF approach using nine BiLSTM models with majority vote fusion, and (4) a Hierarchical XGBoost ensemble that combines glottal and formant features through a two stage learning framework. Each method is described, and their performances on a validation set of 53 speakers are compared. Results show that while the feature-engineered XGBoost ensemble achieves the highest macro-F1 (0.86), the deep learning models (ViT, CNN, BiLSTM) attain competitive F1-scores (0.70) and offer complementary insights into the problem.","short_abstract":"This paper presents a unified study of four distinct modeling approaches for classifying dysarthria severity in the Speech Analysis for Neurodegenerative Diseases (SAND) challenge. All models tackle the same five class classification task using a common dataset of speech recordings. We investigate: (1) a ViT-OF method...","url_abs":"https://arxiv.org/abs/2512.02669","url_pdf":"https://arxiv.org/pdf/2512.02669v1","authors":"[\"Gauri Deshpande\",\"Harish Battula\",\"Ashish Panda\",\"Sunil Kumar Kopparapu\"]","published":"2025-12-02T11:51:38Z","proceeding":"cs.SD","tasks":"[\"cs.SD\",\"cs.AI\",\"cs.LG\"]","methods":"[\"Vision Transformer\",\"Transformer\",\"Convolutional Neural Network\"]","has_code":false}