{"ID":2845931,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.21695","arxiv_id":"2511.21695","title":"EvalCards: A Framework for Standardized Evaluation Reporting","abstract":"Evaluation has long been a central concern in NLP, and transparent reporting practices are more critical than ever in today's landscape of rapidly released open-access models. Drawing on a survey of recent work on evaluation and documentation, we identify three persistent shortcomings in current reporting practices: reproducibility, accessibility, and governance. We argue that existing standardization efforts remain insufficient and introduce Evaluation Disclosure Cards (EvalCards) as a path forward. EvalCards are designed to enhance transparency for both researchers and practitioners while providing a practical foundation to meet emerging governance requirements.","short_abstract":"Evaluation has long been a central concern in NLP, and transparent reporting practices are more critical than ever in today's landscape of rapidly released open-access models. Drawing on a survey of recent work on evaluation and documentation, we identify three persistent shortcomings in current reporting practices: re...","url_abs":"https://arxiv.org/abs/2511.21695","url_pdf":"https://arxiv.org/pdf/2511.21695v1","authors":"[\"Ruchira Dhar\",\"Danae Sanchez Villegas\",\"Antonia Karamolegkou\",\"Alice Schiavone\",\"Yifei Yuan\",\"Xinyi Chen\",\"Jiaang Li\",\"Stella Frank\",\"Laura De Grazia\",\"Monorama Swain\",\"Stephanie Brandl\",\"Daniel Hershcovich\",\"Anders Søgaard\",\"Desmond Elliott\"]","published":"2025-11-05T19:01:48Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\",\"cs.CY\"]","methods":"[]","has_code":false}
