{"ID":2838857,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.16707","arxiv_id":"2511.16707","title":"Large language models for automated PRISMA 2020 adherence checking","abstract":"Evaluating adherence to PRISMA 2020 guideline remains a burden in the peer review process. To address the lack of shareable benchmarks, we constructed a copyright-aware benchmark of 108 Creative Commons-licensed systematic reviews and evaluated ten large language models (LLMs) across five input formats. In a development cohort, supplying structured PRISMA 2020 checklists (Markdown, JSON, XML, or plain text) yielded 78.7-79.7% accuracy versus 45.21% for manuscript-only input (p less than 0.0001), with no differences between structured formats (p\u003e0.9). Across models, accuracy ranged from 70.6-82.8% with distinct sensitivity-specificity trade-offs, replicated in an independent validation cohort. We then selected Qwen3-Max (a high-sensitivity open-weight model) and extended evaluation to the full dataset (n=120), achieving 95.1% sensitivity and 49.3% specificity. Structured checklist provision substantially improves LLM-based PRISMA assessment, though human expert verification remains essential before editorial decisions.","short_abstract":"Evaluating adherence to PRISMA 2020 guideline remains a burden in the peer review process. To address the lack of shareable benchmarks, we constructed a copyright-aware benchmark of 108 Creative Commons-licensed systematic reviews and evaluated ten large language models (LLMs) across five input formats. In a developmen...","url_abs":"https://arxiv.org/abs/2511.16707","url_pdf":"https://arxiv.org/pdf/2511.16707v1","authors":"[\"Yuki Kataoka\",\"Ryuhei So\",\"Masahiro Banno\",\"Yasushi Tsujimoto\",\"Tomohiro Takayama\",\"Yosuke Yamagishi\",\"Takahiro Tsuge\",\"Norio Yamamoto\",\"Chiaki Suda\",\"Toshi A. Furukawa\"]","published":"2025-11-20T02:08:13Z","proceeding":"cs.SE","tasks":"[\"cs.SE\",\"cs.AI\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
