{"ID":2857878,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.07748","arxiv_id":"2510.07748","title":"Haibu Mathematical-Medical Intelligent Agent:Enhancing Large Language Model Reliability in Medical Tasks via Verifiable Reasoning Chains","abstract":"Large Language Models (LLMs) show promise in medicine but are prone to factual and logical errors, which is unacceptable in this high-stakes field. To address this, we introduce the \"Haibu Mathematical-Medical Intelligent Agent\" (MMIA), an LLM-driven architecture that ensures reliability through a formally verifiable reasoning process. MMIA recursively breaks down complex medical tasks into atomic, evidence-based steps. This entire reasoning chain is then automatically audited for logical coherence and evidence traceability, similar to theorem proving. A key innovation is MMIA's \"bootstrapping\" mode, which stores validated reasoning chains as \"theorems.\" Subsequent tasks can then be efficiently solved using Retrieval-Augmented Generation (RAG), shifting from costly first-principles reasoning to a low-cost verification model. We validated MMIA across four healthcare administration domains, including DRG/DIP audits and medical insurance adjudication, using expert-validated benchmarks. Results showed MMIA achieved an error detection rate exceeding 98% with a false positive rate below 1%, significantly outperforming baseline LLMs. Furthermore, the RAG matching mode is projected to reduce average processing costs by approximately 85% as the knowledge base matures. In conclusion, MMIA's verifiable reasoning framework is a significant step toward creating trustworthy, transparent, and cost-effective AI systems, making LLM technology viable for critical applications in medicine.","short_abstract":"Large Language Models (LLMs) show promise in medicine but are prone to factual and logical errors, which is unacceptable in this high-stakes field. To address this, we introduce the \"Haibu Mathematical-Medical Intelligent Agent\" (MMIA), an LLM-driven architecture that ensures reliability through a formally verifiable r...","url_abs":"https://arxiv.org/abs/2510.07748","url_pdf":"https://arxiv.org/pdf/2510.07748v1","authors":"[\"Yilun Zhang\",\"Dexing Kong\"]","published":"2025-10-09T03:35:37Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[\"RAG\",\"Large Language Model\",\"Language Model\"]","has_code":false}
