{"ID":3130471,"CreatedAt":"2026-06-05T22:19:16.587842751Z","UpdatedAt":"2026-06-06T06:00:53.628101571Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2604.27082","arxiv_id":"2604.27082","title":"When Your LLM Reaches End-of-Life: A Framework for Confident Model Migration in Production Systems","abstract":"We present a framework for migrating production Large Language Model (LLM) based systems when the underlying model reaches end-of-life or requires replacement. The key contribution is a Bayesian statistical approach that calibrates automated evaluation metrics against human judgments, enabling confident model comparison even with limited manual evaluation data. We demonstrate this framework on a commercial question-answering system serving 5.3M monthly interactions across six global regions; evaluating correctness, refusal behavior, and stylistic adherence to successfully identify suitable replacement models. The framework is broadly applicable to any enterprise deploying LLM-based products, providing a principled, reproducible methodology for model migration that balances quality assurance with evaluation efficiency. This is a capability increasingly essential as the LLM ecosystem continues to evolve rapidly and organizations manage portfolios of AI-powered services across multiple models, regions, and use cases.","short_abstract":"We present a framework for migrating production Large Language Model (LLM) based systems when the underlying model reaches end-of-life or requires replacement. The key contribution is a Bayesian statistical approach that calibrates automated evaluation metrics against human judgments, enabling confident model compariso...","url_abs":"https://arxiv.org/abs/2604.27082","url_pdf":"https://arxiv.org/pdf/2604.27082v1","authors":"[\"Emma Casey\",\"David Roberts\",\"David Sim\",\"Ian Beaver\"]","published":"2026-04-29T18:22:50Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.LG\",\"cs.SE\"]","methods":"[\"Large Language Model\",\"Language Model\",\"Generative Adversarial Network\"]","has_code":false}
