{"ID":2823501,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2601.00397","arxiv_id":"2601.00397","title":"Revati: Transparent GPU-Free Time-Warp Emulation for LLM Serving","abstract":"Deploying LLMs efficiently requires testing hundreds of serving configurations, but evaluating each one on a GPU cluster takes hours and costs thousands of dollars. Discrete-event simulators are faster and cheaper, but they require re-implementing the serving system's control logic -- a burden that compounds as frameworks evolve. We present Revati, a time-warp emulator that enables performance modeling by directly executing real serving system code at simulation-like speed. The system intercepts CUDA API calls to virtualize device management, allowing serving frameworks to run without physical GPUs. Instead of executing GPU kernels, it performs time jumps -- fast-forwarding virtual time by predicted kernel durations. We propose a coordination protocol that synchronizes these jumps across distributed processes while preserving causality. On vLLM and SGLang, Revati achieves less than 5% prediction error across multiple models and parallelism configurations, while running 5-17x faster than real GPU execution.","short_abstract":"Deploying LLMs efficiently requires testing hundreds of serving configurations, but evaluating each one on a GPU cluster takes hours and costs thousands of dollars. Discrete-event simulators are faster and cheaper, but they require re-implementing the serving system's control logic -- a burden that compounds as framewo...","url_abs":"https://arxiv.org/abs/2601.00397","url_pdf":"https://arxiv.org/pdf/2601.00397v1","authors":"[\"Amey Agrawal\",\"Mayank Yadav\",\"Sukrit Kumar\",\"Anirudha Agrawal\",\"Garv Ghai\",\"Souradeep Bera\",\"Elton Pinto\",\"Sirish Gambhira\",\"Mohammad Adain\",\"Kasra Sohrab\",\"Chus Antonanzas\",\"Alexey Tumanov\"]","published":"2026-01-01T17:19:58Z","proceeding":"cs.DC","tasks":"[\"cs.DC\",\"cs.LG\"]","methods":"[\"Large Language Model\"]","has_code":false}
