{"ID":2835180,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.00461","arxiv_id":"2512.00461","title":"Whose Personae? Synthetic Persona Experiments in LLM Research and Pathways to Transparency","abstract":"Synthetic personae experiments have become a prominent method in Large Language Model alignment research, yet the representativeness and ecological validity of these personae vary considerably between studies. Through a review of 63 peer-reviewed studies published between 2023 and 2025 in leading NLP and AI venues, we reveal a critical gap: task and population of interest are often underspecified in persona-based experiments, despite personalization being fundamentally dependent on these criteria. Our analysis shows substantial differences in user representation, with most studies focusing on limited sociodemographic attributes and only 35% discussing the representativeness of their LLM personae. Based on our findings, we introduce a persona transparency checklist that emphasizes representative sampling, explicit grounding in empirical data, and enhanced ecological validity. Our work provides both a comprehensive assessment of current practices and practical guidelines to improve the rigor and ecological validity of persona-based evaluations in language model alignment research.","short_abstract":"Synthetic personae experiments have become a prominent method in Large Language Model alignment research, yet the representativeness and ecological validity of these personae vary considerably between studies. Through a review of 63 peer-reviewed studies published between 2023 and 2025 in leading NLP and AI venues, we...","url_abs":"https://arxiv.org/abs/2512.00461","url_pdf":"https://arxiv.org/pdf/2512.00461v1","authors":"[\"Jan Batzner\",\"Volker Stocker\",\"Bingjun Tang\",\"Anusha Natarajan\",\"Qinhao Chen\",\"Stefan Schmid\",\"Gjergji Kasneci\"]","published":"2025-11-29T12:27:34Z","proceeding":"cs.CY","tasks":"[\"cs.CY\",\"cs.CL\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
