{"ID":2883945,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.08438","arxiv_id":"2508.08438","title":"Selective KV-Cache Sharing to Mitigate Timing Side-Channels in LLM Inference","abstract":"Global KV-cache sharing is an effective optimization for accelerating large language model (LLM) inference, yet it introduces an API-visible timing side channel that lets adversaries infer sensitive user inputs from shared entries, leading to cross-tenant privacy risks. To address this problem, we introduce SafeKV (Secure and Flexible KV-cache Sharing), a system-level co-design of privacy enforcement and KV-cache management. SafeKV integrates lightweight detection and isolation directly into the serving runtime to eliminate cross-tenant reuse of sensitive KV-cache blocks under our threat model, while recovering most of the performance benefits of global sharing. Our key contributions are: (1) a three-tier asynchronous detection pipeline that decouples privacy classification from inference and supports streaming workloads, (2) a unified radix-tree-based memory manager with path compression and sensitivity-aware eviction for scalable selective isolation, and (3) an RDR-guided (Reuse Diversity Ratio) runtime safeguard that detects and bounds residual leakage. On large LLM backends, SafeKV reduces the time-to-first-token (TTFT) overhead compared to full isolation by up to 40.58% and raises throughput by up to 2.66x. Overall, SafeKV restores the efficiency of KV reuse while enforcing strong, practical privacy for multi-tenant LLM inference.","short_abstract":"Global KV-cache sharing is an effective optimization for accelerating large language model (LLM) inference, yet it introduces an API-visible timing side channel that lets adversaries infer sensitive user inputs from shared entries, leading to cross-tenant privacy risks. To address this problem, we introduce SafeKV (Sec...","url_abs":"https://arxiv.org/abs/2508.08438","url_pdf":"https://arxiv.org/pdf/2508.08438v2","authors":"[\"Kexin Chu\",\"Zecheng Lin\",\"Dawei Xiang\",\"Zixu Shen\",\"Jianchang Su\",\"Cheng Chu\",\"Yiwei Yang\",\"Wenhui Zhang\",\"Wenfei Wu\",\"Wei Zhang\"]","published":"2025-08-11T19:55:44Z","proceeding":"cs.CR","tasks":"[\"cs.CR\",\"cs.LG\",\"cs.OS\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
