{"ID":2826283,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.22219","arxiv_id":"2512.22219","title":"Mirage Persistent Kernel: A Compiler and Runtime for Mega-Kernelizing Tensor Programs","abstract":"We introduce Mirage Persistent Kernel (MPK), the first compiler and runtime system that automatically transforms multi-GPU model inference into a single high-performance megakernel. MPK introduces an SM-level graph representation that captures data dependencies at the granularity of individual streaming multiprocessors (SMs), enabling cross-operator software pipelining, fine-grained kernel overlap, and other previously infeasible GPU optimizations. The MPK compiler lowers tensor programs into highly optimized SM-level task graphs and generates optimized CUDA implementations for all tasks, while the MPK in-kernel parallel runtime executes these tasks within a single mega-kernel using decentralized scheduling across SMs. Together, these components provide end-to-end kernel fusion with minimal developer effort, while preserving the flexibility of existing programming models. Our evaluation shows that MPK significantly outperforms existing kernel-per-operator LLM serving systems by reducing end-to-end inference latency by up to 1.7x, pushing LLM inference performance close to hardware limits. MPK is publicly available at https://github.com/mirage-project/mirage.","short_abstract":"We introduce Mirage Persistent Kernel (MPK), the first compiler and runtime system that automatically transforms multi-GPU model inference into a single high-performance megakernel. MPK introduces an SM-level graph representation that captures data dependencies at the granularity of individual streaming multiprocessors...","url_abs":"https://arxiv.org/abs/2512.22219","url_pdf":"https://arxiv.org/pdf/2512.22219v1","authors":"[\"Xinhao Cheng\",\"Zhihao Zhang\",\"Yu Zhou\",\"Jianan Ji\",\"Jinchen Jiang\",\"Zepeng Zhao\",\"Ziruo Xiao\",\"Zihao Ye\",\"Yingyi Huang\",\"Ruihang Lai\",\"Hongyi Jin\",\"Bohan Hou\",\"Mengdi Wu\",\"Yixin Dong\",\"Anthony Yip\",\"Zihao Ye\",\"Songting Wang\",\"Wenqin Yang\",\"Xupeng Miao\",\"Tianqi Chen\",\"Zhihao Jia\"]","published":"2025-12-22T14:18:20Z","proceeding":"cs.DC","tasks":"[\"cs.DC\",\"cs.LG\",\"cs.PL\"]","methods":"[\"Large Language Model\"]","has_code":false,"code_links":[{"ID":605729,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2826283,"paper_url":"https://arxiv.org/abs/2512.22219","paper_title":"Mirage Persistent Kernel: A Compiler and Runtime for Mega-Kernelizing Tensor Programs","repo_url":"https://github.com/mirage-project/mirage","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}