{"ID":2871802,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.10248","arxiv_id":"2509.10248","title":"Prompt Injection Attacks on LLM Generated Reviews of Scientific Publications","abstract":"The ongoing intense discussion on rising LLM usage in the scientific peer-review process has recently been mingled by reports of authors using hidden prompt injections to manipulate review scores. Since the existence of such \"attacks\" - although seen by some commentators as \"self-defense\" - would have a great impact on the further debate, this paper investigates the practicability and technical success of the described manipulations. Our systematic evaluation uses 1k reviews of 2024 ICLR papers generated by a wide range of LLMs shows two distinct results: I) very simple prompt injections are indeed highly effective, reaching up to 100% acceptance scores. II) LLM reviews are generally biased toward acceptance (\u003e95% in many models). Both results have great impact on the ongoing discussions on LLM usage in peer-review.","short_abstract":"The ongoing intense discussion on rising LLM usage in the scientific peer-review process has recently been mingled by reports of authors using hidden prompt injections to manipulate review scores. Since the existence of such \"attacks\" - although seen by some commentators as \"self-defense\" - would have a great impact on...","url_abs":"https://arxiv.org/abs/2509.10248","url_pdf":"https://arxiv.org/pdf/2509.10248v3","authors":"[\"Janis Keuper\"]","published":"2025-09-12T13:45:24Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Large Language Model\"]","has_code":false}
