{"ID":2828383,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.14142","arxiv_id":"2512.14142","title":"Astraea: A State-Aware Scheduling Engine for LLM-Powered Agents","abstract":"Large Language Models (LLMs) are increasingly being deployed as intelligent agents. Their multi-stage workflows, which alternate between local computation and calls to external network services like Web APIs, introduce a mismatch in their execution pattern and the scheduling granularity of existing inference systems such as vLLM. Existing systems typically focus on per-segment optimization which prevents them from minimizing the end-to-end latency of the complete agentic workflow, i.e., the global Job Completion Time (JCT) over the entire request lifecycle. To address this limitation, we propose Astraea, a service engine designed to shift the optimization from local segments to the global request lifecycle. Astraea employs a state-aware, hierarchical scheduling algorithm that integrates a request's historical state with future predictions. It dynamically classifies requests by their I/O and compute intensive nature and uses an enhanced HRRN policy to balance efficiency and fairness. Astraea also implements an adaptive KV cache manager that intelligently handles the agent state during I/O waits based on the system memory pressure. Extensive experiments show that Astraea reduces average JCT by up to 25.5\\% compared to baseline methods. Moreover, our approach demonstrates strong robustness and stability under high load across various model scales.","short_abstract":"Large Language Models (LLMs) are increasingly being deployed as intelligent agents. Their multi-stage workflows, which alternate between local computation and calls to external network services like Web APIs, introduce a mismatch in their execution pattern and the scheduling granularity of existing inference systems su...","url_abs":"https://arxiv.org/abs/2512.14142","url_pdf":"https://arxiv.org/pdf/2512.14142v1","authors":"[\"Hongqiu Ni\",\"Jiabao Zhang\",\"Guopeng Li\",\"Zilong Wang\",\"Ruiqi Wu\",\"Chi Zhang\",\"Haisheng Tan\"]","published":"2025-12-16T06:55:10Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}