{"ID":2865732,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.20707","arxiv_id":"2509.20707","title":"An Automated Retrieval-Augmented Generation LLaMA-4 109B-based System for Evaluating Radiotherapy Treatment Plans","abstract":"Purpose: To develop a retrieval-augmented generation (RAG) system powered by LLaMA-4 109B for automated, protocol-aware, and interpretable evaluation of radiotherapy treatment plans. Methods and Materials: We curated a multi-protocol dataset of 614 radiotherapy plans across four disease sites and constructed a knowledge base containing normalized dose metrics and protocol-defined constraints. The RAG system integrates three core modules: a retrieval engine optimized across five SentenceTransformer backbones, a percentile prediction component based on cohort similarity, and a clinical constraint checker. These tools are directed by a large language model (LLM) using a multi-step prompt-driven reasoning pipeline to produce concise, grounded evaluations. Results: Retrieval hyperparameters were optimized using Gaussian Process on a scalarized loss function combining root mean squared error (RMSE), mean absolute error (MAE), and clinically motivated accuracy thresholds. The best configuration, based on all-MiniLM-L6-v2, achieved perfect nearest-neighbor accuracy within a 5-percentile-point margin and a sub-2pt MAE. When tested end-to-end, the RAG system achieved 100% agreement with the computed values by standalone retrieval and constraint-checking modules on both percentile estimates and constraint identification, confirming reliable execution of all retrieval, prediction and checking steps. Conclusion: Our findings highlight the feasibility of combining structured population-based scoring with modular tool-augmented reasoning for transparent, scalable plan evaluation in radiation therapy. The system offers traceable outputs, minimizes hallucination, and demonstrates robustness across protocols. Future directions include clinician-led validation, and improved domain-adapted retrieval models to enhance real-world integration.","short_abstract":"Purpose: To develop a retrieval-augmented generation (RAG) system powered by LLaMA-4 109B for automated, protocol-aware, and interpretable evaluation of radiotherapy treatment plans. Methods and Materials: We curated a multi-protocol dataset of 614 radiotherapy plans across four disease sites and constructed a knowledg...","url_abs":"https://arxiv.org/abs/2509.20707","url_pdf":"https://arxiv.org/pdf/2509.20707v2","authors":"[\"Junjie Cui\",\"Peilong Wang\",\"Jason Holmes\",\"Leshan Sun\",\"Michael L. Hinni\",\"Barbara A. Pockaj\",\"Sujay A. Vora\",\"Terence T. Sio\",\"William W. Wong\",\"Nathan Y. Yu\",\"Steven E. Schild\",\"Joshua R. Niska\",\"Sameer R. Keole\",\"Jean-Claude M. Rwigema\",\"Samir H. Patel\",\"Lisa A. McGee\",\"Carlos A. Vargas\",\"Wei Liu\"]","published":"2025-09-25T03:18:31Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[\"RAG\",\"Transformer\",\"Large Language Model\",\"Language Model\"]","has_code":false}