{"ID":2871757,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.12252","arxiv_id":"2509.12252","title":"SynergAI: Edge-to-Cloud Synergy for Architecture-Driven High-Performance Orchestration for AI Inference","abstract":"The rapid evolution of Artificial Intelligence (AI) and Machine Learning (ML) has significantly heightened computational demands, particularly for inference-serving workloads. While traditional cloud-based deployments offer scalability, they face challenges such as network congestion, high energy consumption, and privacy concerns. In contrast, edge computing provides low-latency and sustainable alternatives but is constrained by limited computational resources. In this work, we introduce SynergAI, a novel framework designed for performance- and architecture-aware inference serving across heterogeneous edge-to-cloud infrastructures. Built upon a comprehensive performance characterization of modern inference engines, SynergAI integrates a combination of offline and online decision-making policies to deliver intelligent, lightweight, and architecture-aware scheduling. By dynamically allocating workloads across diverse hardware architectures, it effectively minimizes Quality of Service (QoS) violations. We implement SynergAI within a Kubernetes-based ecosystem and evaluate its efficiency. Our results demonstrate that architecture-driven inference serving enables optimized and architecture-aware deployments on emerging hardware platforms, achieving an average reduction of 2.4x in QoS violations compared to a State-of-the-Art (SotA) solution.","short_abstract":"The rapid evolution of Artificial Intelligence (AI) and Machine Learning (ML) has significantly heightened computational demands, particularly for inference-serving workloads. While traditional cloud-based deployments offer scalability, they face challenges such as network congestion, high energy consumption, and priva...","url_abs":"https://arxiv.org/abs/2509.12252","url_pdf":"https://arxiv.org/pdf/2509.12252v1","authors":"[\"Foteini Stathopoulou\",\"Aggelos Ferikoglou\",\"Manolis Katsaragakis\",\"Dimosthenis Masouros\",\"Sotirios Xydis\",\"Dimitrios Soudris\"]","published":"2025-09-12T10:58:19Z","proceeding":"cs.DC","tasks":"[\"cs.DC\"]","methods":"[]","has_code":false}
