{"ID":2848541,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.25409","arxiv_id":"2510.25409","title":"BhashaBench V1: A Comprehensive Benchmark for the Quadrant of Indic Domains","abstract":"The rapid advancement of large language models(LLMs) has intensified the need for domain and culture specific evaluation. Existing benchmarks are largely Anglocentric and domain-agnostic, limiting their applicability to India-centric contexts. To address this gap, we introduce BhashaBench V1, the first domain-specific, multi-task, bilingual benchmark focusing on critical Indic knowledge systems. BhashaBench V1 contains 74,166 meticulously curated question-answer pairs, with 52,494 in English and 21,672 in Hindi, sourced from authentic government and domain-specific exams. It spans four major domains: Agriculture, Legal, Finance, and Ayurveda, comprising 90+ subdomains and covering 500+ topics, enabling fine-grained evaluation. Evaluation of 29+ LLMs reveals significant domain and language specific performance gaps, with especially large disparities in low-resource domains. For instance, GPT-4o achieves 76.49% overall accuracy in Legal but only 59.74% in Ayurveda. Models consistently perform better on English content compared to Hindi across all domains. Subdomain-level analysis shows that areas such as Cyber Law, International Finance perform relatively well, while Panchakarma, Seed Science, and Human Rights remain notably weak. BhashaBench V1 provides a comprehensive dataset for evaluating large language models across India's diverse knowledge domains. It enables assessment of models' ability to integrate domain-specific knowledge with bilingual understanding. All code, benchmarks, and resources are publicly available to support open research.","short_abstract":"The rapid advancement of large language models(LLMs) has intensified the need for domain and culture specific evaluation. Existing benchmarks are largely Anglocentric and domain-agnostic, limiting their applicability to India-centric contexts. To address this gap, we introduce BhashaBench V1, the first domain-specific,...","url_abs":"https://arxiv.org/abs/2510.25409","url_pdf":"https://arxiv.org/pdf/2510.25409v2","authors":"[\"Vijay Devane\",\"Mohd Nauman\",\"Bhargav Patel\",\"Aniket Mahendra Wakchoure\",\"Yogeshkumar Sant\",\"Shyam Pawar\",\"Viraj Thakur\",\"Ananya Godse\",\"Sunil Patra\",\"Neha Maurya\",\"Suraj Racha\",\"Nitish Kamal Singh\",\"Ajay Nagpal\",\"Piyush Sawarkar\",\"Kundeshwar Vijayrao Pundalik\",\"Rohit Saluja\",\"Ganesh Ramakrishnan\"]","published":"2025-10-29T11:27:08Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}