{"ID":2845507,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.04774","arxiv_id":"2511.04774","title":"SLOFetch: Compressed-Hierarchical Instruction Prefetching for Cloud Microservices","abstract":"Large-scale networked services rely on deep soft-ware stacks and microservice orchestration, which increase instruction footprints and create frontend stalls that inflate tail latency and energy. We revisit instruction prefetching for these cloud workloads and present a design that aligns with SLO driven and self optimizing systems. Building on the Entangling Instruction Prefetcher (EIP), we introduce a Compressed Entry that captures up to eight destinations around a base using 36 bits by exploiting spatial clustering, and a Hierarchical Metadata Storage scheme that keeps only L1 resident and frequently queried entries on chip while virtualizing bulk metadata into lower levels. We further add a lightweight Online ML Controller that scores prefetch profitability using context features and a bandit adjusted threshold. On data center applications, our approach preserves EIP like speedups with smaller on chip state and improves efficiency for networked services in the ML era.","short_abstract":"Large-scale networked services rely on deep soft-ware stacks and microservice orchestration, which increase instruction footprints and create frontend stalls that inflate tail latency and energy. We revisit instruction prefetching for these cloud workloads and present a design that aligns with SLO driven and self optim...","url_abs":"https://arxiv.org/abs/2511.04774","url_pdf":"https://arxiv.org/pdf/2511.04774v3","authors":"[\"Zerui Bao\",\"Di Zhu\",\"Liu Jiang\",\"Shiqi Sheng\",\"Ziwei Wang\",\"Haoyun Zhang\"]","published":"2025-11-06T19:48:53Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AR\"]","methods":"[]","has_code":false}