{"ID":2866268,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.21679","arxiv_id":"2509.21679","title":"ReviewScore: Misinformed Peer Review Detection with Large Language Models","abstract":"Peer review serves as a backbone of academic research, but in most AI conferences, the review quality is degrading as the number of submissions explodes. To reliably detect low-quality reviews, we define misinformed review points as either \"weaknesses\" in a review that contain incorrect premises, or \"questions\" in a review that can be already answered by the paper. We verify that 15.2% of weaknesses and 26.4% of questions are misinformed and introduce ReviewScore indicating if a review point is misinformed. To evaluate the factuality of each premise of weaknesses, we propose an automated engine that reconstructs every explicit and implicit premise from a weakness. We build a human expert-annotated ReviewScore dataset to check the ability of LLMs to automate ReviewScore evaluation. Then, we measure human-model agreements on ReviewScore using eight current state-of-the-art LLMs. The models show F1 scores of 0.4--0.5 and kappa scores of 0.3--0.4, indicating moderate agreement but also suggesting that fully automating the evaluation remains challenging. A thorough disagreement analysis reveals that most errors are due to models' incorrect reasoning. We also prove that evaluating premise-level factuality shows significantly higher agreements than evaluating weakness-level factuality.","short_abstract":"Peer review serves as a backbone of academic research, but in most AI conferences, the review quality is degrading as the number of submissions explodes. To reliably detect low-quality reviews, we define misinformed review points as either \"weaknesses\" in a review that contain incorrect premises, or \"questions\" in a re...","url_abs":"https://arxiv.org/abs/2509.21679","url_pdf":"https://arxiv.org/pdf/2509.21679v2","authors":"[\"Hyun Ryu\",\"Doohyuk Jang\",\"Hyemin S. Lee\",\"Joonhyun Jeong\",\"Gyeongman Kim\",\"Donghyeon Cho\",\"Gyouk Chu\",\"Minyeong Hwang\",\"Hyeongwon Jang\",\"Changhun Kim\",\"Haechan Kim\",\"Jina Kim\",\"Joowon Kim\",\"Yoonjeon Kim\",\"Kwanhyung Lee\",\"Chanjae Park\",\"Heecheol Yun\",\"Gregor Betz\",\"Eunho Yang\"]","published":"2025-09-25T22:55:05Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
