{"ID":2921750,"CreatedAt":"2026-06-02T02:42:49.606572591Z","UpdatedAt":"2026-06-03T05:56:00.181519634Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.01260","arxiv_id":"2606.01260","title":"IndoBias: A Dual Track Culturally Grounded Benchmark for LLMs Bias Evaluation in Indonesian Languages","abstract":"Despite being home to more than 1300 ethnic groups and 700 indigenous languages, bias in Large Language Models has not been fully studied in Indonesia, thus leaving a critical gap in evaluating representational fairness and localized stereotypes within its uniquely vast, multilingual, and diverse sociocultural landscape. To address this, we introduce IndoBias as a culturally-grounded bias benchmark to assess LLMs bias in Indonesian and three local languages: Javanese, Sundanese, and Makasar. IndoBias features dual perspective evaluation tracks: depth-oriented (with contrastive-pairs) and breadth-oriented (with generation-based), where the latter is grounded in social science frameworks (SPI, O*NET, and WGI). Our results show that existing LLMs -- particularly decoder models -- exhibit strong bias towards prototypical sentences in Indonesian, while local languages suffer higher bias under Ideology and Religion category. We also find that LLMs responses exhibit a non-uniform Stereotype Polarity when prompted with various local entities. Finally, we discover that, in Indonesian, Common Crawl texts introduce more bias during pretraining, compared to human-reviewed article texts (e.g., Wikipedia, News), whereas introducing local languages to pretraining generally increases bias. This work highlights the importance of studying bias in culture-specific context. Warning: This paper contains example data that may be offensive, harmful, or biased.","short_abstract":"Despite being home to more than 1300 ethnic groups and 700 indigenous languages, bias in Large Language Models has not been fully studied in Indonesia, thus leaving a critical gap in evaluating representational fairness and localized stereotypes within its uniquely vast, multilingual, and diverse sociocultural landscap...","url_abs":"https://arxiv.org/abs/2606.01260","url_pdf":"https://arxiv.org/pdf/2606.01260v1","authors":"[\"Ikhlasul Akmal Hanif\",\"Muhammad Falensi Azmi\",\"Filbert Aurelian Tjiaranata\",\"Eryawan Presma Yulianrifat\",\"Fajri Koto\"]","published":"2026-05-31T14:27:31Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
