{"ID":2876123,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.00642","arxiv_id":"2509.00642","title":"HADIS: Hybrid Adaptive Diffusion Model Serving for Efficient Text-to-Image Generation","abstract":"Text-to-image diffusion models have achieved remarkable visual quality but incur high computational costs, making latency-aware, scalable deployment challenging. To address this, we advocate a hybrid architecture that achieves query awareness when serving diffusion models. Unlike existing query-aware serving systems that cascade lightweight and heavyweight models with a fixed configuration, our hybrid architecture first routes each query directly to a suitable model variant, then reroutes it to a cascaded heavyweight model only if necessary. We theoretically analyze conditions for the hybrid architecture to outperform non-hybrid alternatives in latency and response quality. Building on this architecture, we design HADIS, a hybrid serving system for latency-aware diffusion models that jointly optimizes cascade model selection, query routing, and resource allocation. To reduce the complexity of resource management, HADIS uses an offline profiling phase to produce a Pareto-optimal cascade configuration table. At runtime, HADIS selects the best cascade configuration and GPU allocation given latency and workload constraints. Empirical evaluations on real-world traces demonstrate that HADIS improves response quality by up to 35% while reducing latency violation rates by 2.7-45$\\times$ compared to state-of-the-art model serving systems.","short_abstract":"Text-to-image diffusion models have achieved remarkable visual quality but incur high computational costs, making latency-aware, scalable deployment challenging. To address this, we advocate a hybrid architecture that achieves query awareness when serving diffusion models. Unlike existing query-aware serving systems th...","url_abs":"https://arxiv.org/abs/2509.00642","url_pdf":"https://arxiv.org/pdf/2509.00642v2","authors":"[\"Qizheng Yang\",\"Tung-I Chen\",\"Siyu Zhao\",\"Ramesh K. Sitaraman\",\"Hui Guan\"]","published":"2025-08-31T00:26:31Z","proceeding":"cs.DC","tasks":"[\"cs.DC\"]","methods":"[\"Diffusion Model\"]","has_code":false}