{"ID":2842716,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.16680","arxiv_id":"2511.16680","title":"Shona spaCy: A Morphological Analyzer for an Under-Resourced Bantu Language","abstract":"Despite rapid advances in multilingual natural language processing (NLP), the Bantu language Shona remains under-served in terms of morphological analysis and language-aware tools. This paper presents Shona spaCy, an open-source, rule-based morphological pipeline for Shona built on the spaCy framework. The system combines a curated JSON lexicon with linguistically grounded rules to model noun-class prefixes (Mupanda 1-18), verbal subject concords, tense-aspect markers, ideophones, and clitics, integrating these into token-level annotations for lemma, part-of-speech, and morphological features. The toolkit is available via pip install shona-spacy, with source code at https://github.com/HappymoreMasoka/shona-spacy and a PyPI release at https://pypi.org/project/shona-spacy/0.1.4/. Evaluation on formal and informal Shona corpora yields 90% POS-tagging accuracy and 88% morphological-feature accuracy, while maintaining transparency in its linguistic decisions. By bridging descriptive grammar and computational implementation, Shona spaCy advances NLP accessibility and digital inclusion for Shona speakers and provides a template for morphological analysis tools for other under-resourced Bantu languages.","short_abstract":"Despite rapid advances in multilingual natural language processing (NLP), the Bantu language Shona remains under-served in terms of morphological analysis and language-aware tools. This paper presents Shona spaCy, an open-source, rule-based morphological pipeline for Shona built on the spaCy framework. The system combi...","url_abs":"https://arxiv.org/abs/2511.16680","url_pdf":"https://arxiv.org/pdf/2511.16680v1","authors":"[\"Happymore Masoka\"]","published":"2025-11-12T09:19:49Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[]","project_urls":"[\"https://pypi.org/project/shona-spacy/0.1.4/\"]","has_code":false,"code_links":[{"ID":607151,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2842716,"paper_url":"https://arxiv.org/abs/2511.16680","paper_title":"Shona spaCy: A Morphological Analyzer for an Under-Resourced Bantu Language","repo_url":"https://github.com/HappymoreMasoka/shona-spacy","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
