{"ID":2831227,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.08659","arxiv_id":"2512.08659","title":"An Agentic AI System for Multi-Framework Communication Coding","abstract":"Clinical communication is central to patient outcomes, yet large-scale human annotation of patient-provider conversation remains labor-intensive, inconsistent, and difficult to scale. Existing approaches based on large language models typically rely on single-task models that lack adaptability, interpretability, and reliability, especially when applied across various communication frameworks and clinical domains. In this study, we developed a Multi-framework Structured Agentic AI system for Clinical Communication (MOSAIC), built on a LangGraph-based architecture that orchestrates four core agents, including a Plan Agent for codebook selection and workflow planning, an Update Agent for maintaining up-to-date retrieval databases, a set of Annotation Agents that applies codebook-guided retrieval-augmented generation (RAG) with dynamic few-shot prompting, and a Verification Agent that provides consistency checks and feedback. To evaluate performance, we compared MOSAIC outputs against gold-standard annotations created by trained human coders. We developed and evaluated MOSAIC using 26 gold standard annotated transcripts for training and 50 transcripts for testing, spanning rheumatology and OB/GYN domains. On the test set, MOSAIC achieved an overall F1 score of 0.928. Performance was highest in the Rheumatology subset (F1 = 0.962) and strongest for Patient Behavior (e.g., patients asking questions, expressing preferences, or showing assertiveness). Ablations revealed that MOSAIC outperforms baseline benchmarking.","short_abstract":"Clinical communication is central to patient outcomes, yet large-scale human annotation of patient-provider conversation remains labor-intensive, inconsistent, and difficult to scale. Existing approaches based on large language models typically rely on single-task models that lack adaptability, interpretability, and re...","url_abs":"https://arxiv.org/abs/2512.08659","url_pdf":"https://arxiv.org/pdf/2512.08659v1","authors":"[\"Bohao Yang\",\"Rui Yang\",\"Joshua M. Biro\",\"Haoyuan Wang\",\"Jessica L. Handley\",\"Brianna Richardson\",\"Sophia Bessias\",\"Nicoleta Economou-Zavlanos\",\"Armando D. Bedoya\",\"Monica Agrawal\",\"Michael M. Zavlanos\",\"Anand Chowdhury\",\"Raj M. Ratwani\",\"Kai Sun\",\"Kathryn I. Pollak\",\"Michael J. Pencina\",\"Chuan Hong\"]","published":"2025-12-09T14:46:16Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.LG\"]","methods":"[\"RAG\",\"Language Model\"]","has_code":false}