{"ID":2824781,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.22496","arxiv_id":"2512.22496","title":"Hierarchical Pedagogical Oversight: A Multi-Agent Adversarial Framework for Reliable AI Tutoring","abstract":"Large Language Models (LLMs) are increasingly deployed as automated tutors to address educator shortages; however, they often fail at pedagogical reasoning, frequently validating incorrect student solutions (sycophancy) or providing overly direct answers that hinder learning. We introduce Hierarchical Pedagogical Oversight (HPO), a framework that adapts structured adversarial synthesis to educational assessment. Unlike cooperative multi-agent systems that often drift toward superficial consensus, HPO enforces a dialectical separation of concerns: specialist agents first distill dialogue context, which then grounds a moderated, five-act debate between opposing pedagogical critics. We evaluate this framework on the MRBench dataset of 1,214 middle-school mathematics dialogues. Our 8B-parameter model achieves a Macro F1 of 0.845, outperforming GPT-4o (0.812) by 3.3% while using 20 times fewer parameters. These results establish adversarial reasoning as a critical mechanism for deploying reliable, low-compute pedagogical oversight in resource-constrained environments.","short_abstract":"Large Language Models (LLMs) are increasingly deployed as automated tutors to address educator shortages; however, they often fail at pedagogical reasoning, frequently validating incorrect student solutions (sycophancy) or providing overly direct answers that hinder learning. We introduce Hierarchical Pedagogical Overs...","url_abs":"https://arxiv.org/abs/2512.22496","url_pdf":"https://arxiv.org/pdf/2512.22496v1","authors":"[\"Saisab Sadhu\",\"Ashim Dhor\"]","published":"2025-12-27T06:42:07Z","proceeding":"cs.MA","tasks":"[\"cs.MA\",\"cs.AI\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}