{"ID":2836180,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.20975","arxiv_id":"2511.20975","title":"Aragog: Just-in-Time Model Routing for Scalable Serving of Agentic Workflows","abstract":"Agentic workflows have emerged as a powerful paradigm for solving complex, multi-stage tasks, but serving them at scale is computationally expensive given the many LLM inferences that each request must pass through. Configuration selection, or the cost-aware assignment of workflow agents to specific LLMs, can reduce these costs, but existing approaches bind configuration decisions before request execution, making them ill-suited for the heterogeneous and lengthy execution of workflows. Specifically, system loads can fluctuate rapidly and substantially during a request's lifetime, causing fixed configurations to quickly become suboptimal. We present Aragog, a system that progressively adapts a request's configuration throughout its execution to match runtime dynamics. To make this practical despite the massive space of workflow configurations, Aragog decouples the problem into two core elements -- a one-time routing step that identifies all accuracy-preserving configurations, and a cheap per-stage scheduler that selects among them using up-to-date system observations -- and introduces novel strategies to accelerate each. Across diverse workflows and model families, Aragog increases maximum serving throughput by 50.0--217.0\\% and reduces median latency by 32.5--78.9\\% at peak request rates, while maintaining accuracy comparable to the most expensive configurations.","short_abstract":"Agentic workflows have emerged as a powerful paradigm for solving complex, multi-stage tasks, but serving them at scale is computationally expensive given the many LLM inferences that each request must pass through. Configuration selection, or the cost-aware assignment of workflow agents to specific LLMs, can reduce th...","url_abs":"https://arxiv.org/abs/2511.20975","url_pdf":"https://arxiv.org/pdf/2511.20975v2","authors":"[\"Yinwei Dai\",\"Zhuofu Chen\",\"Anand Iyer\",\"Ravi Netravali\"]","published":"2025-11-26T02:05:00Z","proceeding":"cs.DC","tasks":"[\"cs.DC\"]","methods":"[\"Large Language Model\"]","has_code":false}
