{"ID":2824417,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.23587","arxiv_id":"2512.23587","title":"Can AI Recognize Its Own Reflection? Self-Detection Performance of LLMs in Computing Education","abstract":"The rapid advancement of Large Language Models (LLMs) presents a significant challenge to academic integrity within computing education. As educators seek reliable detection methods, this paper evaluates the capacity of three prominent LLMs (GPT-4, Claude, and Gemini) to identify AI-generated text in computing-specific contexts. We test their performance under both standard and 'deceptive' prompt conditions, where the models were instructed to evade detection. Our findings reveal a significant instability: while default AI-generated text was easily identified, all models struggled to correctly classify human-written work (with error rates up to 32%). Furthermore, the models were highly susceptible to deceptive prompts, with Gemini's output completely fooling GPT-4. Given that simple prompt alterations significantly degrade detection efficacy, our results demonstrate that these LLMs are currently too unreliable for making high-stakes academic misconduct judgments.","short_abstract":"The rapid advancement of Large Language Models (LLMs) presents a significant challenge to academic integrity within computing education. As educators seek reliable detection methods, this paper evaluates the capacity of three prominent LLMs (GPT-4, Claude, and Gemini) to identify AI-generated text in computing-specific...","url_abs":"https://arxiv.org/abs/2512.23587","url_pdf":"https://arxiv.org/pdf/2512.23587v1","authors":"[\"Christopher Burger\",\"Karmece Talley\",\"Christina Trotter\"]","published":"2025-12-29T16:35:52Z","proceeding":"cs.CY","tasks":"[\"cs.CY\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
