{"ID":2875639,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.02834","arxiv_id":"2509.02834","title":"Clustering Discourses: Racial Biases in Short Stories about Women Generated by Large Language Models","abstract":"This study investigates how large language models, in particular LLaMA 3.2-3B, construct narratives about Black and white women in short stories generated in Portuguese. From 2100 texts, we applied computational methods to group semantically similar stories, allowing a selection for qualitative analysis. Three main discursive representations emerge: social overcoming, ancestral mythification and subjective self-realization. The analysis uncovers how grammatically coherent, seemingly neutral texts materialize a crystallized, colonially structured framing of the female body, reinforcing historical inequalities. The study proposes an integrated approach, that combines machine learning techniques with qualitative, manual discourse analysis.","short_abstract":"This study investigates how large language models, in particular LLaMA 3.2-3B, construct narratives about Black and white women in short stories generated in Portuguese. From 2100 texts, we applied computational methods to group semantically similar stories, allowing a selection for qualitative analysis. Three main dis...","url_abs":"https://arxiv.org/abs/2509.02834","url_pdf":"https://arxiv.org/pdf/2509.02834v1","authors":"[\"Gustavo Bonil\",\"João Gondim\",\"Marina dos Santos\",\"Simone Hashiguti\",\"Helena Maia\",\"Nadia Silva\",\"Helio Pedrini\",\"Sandra Avila\"]","published":"2025-09-02T21:01:02Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[\"Language Model\"]","has_code":false}