{"ID":2887498,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.06526","arxiv_id":"2508.06526","title":"PiKV: KV Cache Management System for Mixture of Experts","abstract":"As large-scale language models continue to scale up in both size and context length, the memory and communication cost of key-value (KV) cache storage has become a major bottleneck in multi-GPU and multi-node inference. While MoE-based architectures sparsify computation across experts, the corresponding KV caches remain dense and globally synchronized, resulting in significant overhead. We introduce \\textbf{PiKV}, a parallel and distributed KV cache serving framework tailored for MoE architecture. PiKV leverages \\textit{expert-sharded KV storage} to partition caches across GPUs, \\textit{PiKV routing} to reduce token-to-KV access, and a \\textit{PiKV Scheduling} to adaptively retain query-relevant entries. To further reduce memory usage, PiKV integrates \\textit{PiKV Compression} modules the caching pipeline for acceleration. PiKV is recently publicly available as an open-source software library: \\href{https://github.com/NoakLiu/PiKV}{https://github.com/NoakLiu/PiKV}. PiKV is still a living project, aiming to become a comprehesive KV Cache management system for MoE Architectures.","short_abstract":"As large-scale language models continue to scale up in both size and context length, the memory and communication cost of key-value (KV) cache storage has become a major bottleneck in multi-GPU and multi-node inference. While MoE-based architectures sparsify computation across experts, the corresponding KV caches remai...","url_abs":"https://arxiv.org/abs/2508.06526","url_pdf":"https://arxiv.org/pdf/2508.06526v3","authors":"[\"Dong Liu\",\"Yanxuan Yu\",\"Ben Lengerich\",\"Ying Nian Wu\"]","published":"2025-08-02T03:50:14Z","proceeding":"cs.DC","tasks":"[\"cs.DC\",\"cs.AI\",\"cs.AR\"]","methods":"[\"Mixture of Experts\",\"Language Model\"]","has_code":false,"code_links":[{"ID":611442,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2887498,"paper_url":"https://arxiv.org/abs/2508.06526","paper_title":"PiKV: KV Cache Management System for Mixture of Experts","repo_url":"https://github.com/NoakLiu/PiKV","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}