{"ID":2846672,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.01512","arxiv_id":"2511.01512","title":"BanglaNirTox: A Large-scale Parallel Corpus for Explainable AI in Bengali Text Detoxification","abstract":"Toxic language in Bengali remains prevalent, especially in online environments, with few effective precautions against it. Although text detoxification has seen progress in high-resource languages, Bengali remains underexplored due to limited resources. In this paper, we propose a novel pipeline for Bengali text detoxification that combines Pareto class-optimized large language models (LLMs) and Chain-of-Thought (CoT) prompting to generate detoxified sentences. To support this effort, we construct BanglaNirTox, an artificially generated parallel corpus of 68,041 toxic Bengali sentences with class-wise toxicity labels, reasonings, and detoxified paraphrases, using Pareto-optimized LLMs evaluated on random samples. The resulting BanglaNirTox dataset is used to fine-tune language models to produce better detoxified versions of Bengali sentences. Our findings show that Pareto-optimized LLMs with CoT prompting significantly enhance the quality and consistency of Bengali text detoxification.","short_abstract":"Toxic language in Bengali remains prevalent, especially in online environments, with few effective precautions against it. Although text detoxification has seen progress in high-resource languages, Bengali remains underexplored due to limited resources. In this paper, we propose a novel pipeline for Bengali text detoxi...","url_abs":"https://arxiv.org/abs/2511.01512","url_pdf":"https://arxiv.org/pdf/2511.01512v1","authors":"[\"Ayesha Afroza Mohsin\",\"Mashrur Ahsan\",\"Nafisa Maliyat\",\"Shanta Maria\",\"Syed Rifat Raiyan\",\"Hasan Mahmud\",\"Md Kamrul Hasan\"]","published":"2025-11-03T12:26:04Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
