{"ID":2870079,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.16256","arxiv_id":"2509.16256","title":"HausaMovieReview: A Benchmark Dataset for Sentiment Analysis in Low-Resource African Language","abstract":"The development of Natural Language Processing (NLP) tools for low-resource languages is critically hindered by the scarcity of annotated datasets. This paper addresses this fundamental challenge by introducing HausaMovieReview, a novel benchmark dataset comprising 5,000 YouTube comments in Hausa and code-switched English. The dataset was meticulously annotated by three independent annotators, demonstrating a robust agreement with a Fleiss' Kappa score of 0.85 between annotators. We used this dataset to conduct a comparative analysis of classical models (Logistic Regression, Decision Tree, K-Nearest Neighbors) and fine-tuned transformer models (BERT and RoBERTa). Our results reveal a key finding: the Decision Tree classifier, with an accuracy and F1-score 89.72% and 89.60% respectively, significantly outperformed the deep learning models. Our findings also provide a robust baseline, demonstrating that effective feature engineering can enable classical models to achieve state-of-the-art performance in low-resource contexts, thereby laying a solid foundation for future research. Keywords: Hausa, Kannywood, Low-Resource Languages, NLP, Sentiment Analysis","short_abstract":"The development of Natural Language Processing (NLP) tools for low-resource languages is critically hindered by the scarcity of annotated datasets. This paper addresses this fundamental challenge by introducing HausaMovieReview, a novel benchmark dataset comprising 5,000 YouTube comments in Hausa and code-switched Engl...","url_abs":"https://arxiv.org/abs/2509.16256","url_pdf":"https://arxiv.org/pdf/2509.16256v1","authors":"[\"Asiya Ibrahim Zanga\",\"Salisu Mamman Abdulrahman\",\"Abubakar Ado\",\"Abdulkadir Abubakar Bichi\",\"Lukman Aliyu Jibril\",\"Abdulmajid Babangida Umar\",\"Alhassan Adamu\",\"Shamsuddeen Hassan Muhammad\",\"Bashir Salisu Abubakar\"]","published":"2025-09-17T22:57:21Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[\"Transformer\"]","has_code":false}
