{"ID":2852704,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.17405","arxiv_id":"2510.17405","title":"AFRICAPTION: Establishing a New Paradigm for Image Captioning in African Languages","abstract":"Multimodal AI research has overwhelmingly focused on high-resource languages, hindering the democratization of advancements in the field. To address this, we present AfriCaption, a comprehensive framework for multilingual image captioning in 20 African languages and our contributions are threefold: (i) a curated dataset built on Flickr8k, featuring semantically aligned captions generated via a context-aware selection and translation process; (ii) a dynamic, context-preserving pipeline that ensures ongoing quality through model ensembling and adaptive substitution; and (iii) the AfriCaption model, a 0.5B parameter vision-to-text architecture that integrates SigLIP and NLLB200 for caption generation across under-represented languages. This unified framework ensures ongoing data quality and establishes the first scalable image-captioning resource for under-represented African languages, laying the groundwork for truly inclusive multimodal AI.","short_abstract":"Multimodal AI research has overwhelmingly focused on high-resource languages, hindering the democratization of advancements in the field. To address this, we present AfriCaption, a comprehensive framework for multilingual image captioning in 20 African languages and our contributions are threefold: (i) a curated datase...","url_abs":"https://arxiv.org/abs/2510.17405","url_pdf":"https://arxiv.org/pdf/2510.17405v1","authors":"[\"Mardiyyah Oduwole\",\"Prince Mireku\",\"Fatimo Adebanjo\",\"Oluwatosin Olajide\",\"Mahi Aminu Aliyu\",\"Jekaterina Novikova\"]","published":"2025-10-20T10:44:44Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[]","has_code":false}