{"ID":2823399,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2601.00227","arxiv_id":"2601.00227","title":"FlashInfer-Bench: Building the Virtuous Cycle for AI-driven LLM Systems","abstract":"Recent advances show that large language models (LLMs) can act as autonomous agents capable of generating GPU kernels, but integrating these AI-generated kernels into real-world inference systems remains challenging. FlashInfer-Bench addresses this gap by establishing a standardized, closed-loop framework that connects kernel generation, benchmarking, and deployment. At its core, FlashInfer Trace provides a unified schema describing kernel definitions, workloads, implementations, and evaluations, enabling consistent communication between agents and systems. Built on real serving traces, FlashInfer-Bench includes a curated dataset, a robust correctness- and performance-aware benchmarking framework, a public leaderboard to track LLM agents' GPU programming capabilities, and a dynamic substitution mechanism (apply()) that seamlessly injects the best-performing kernels into production LLM engines such as SGLang and vLLM. Using FlashInfer-Bench, we further evaluate the performance and limitations of LLM agents, compare the trade-offs among different GPU programming languages, and provide insights for future agent design. FlashInfer-Bench thus establishes a practical, reproducible pathway for continuously improving AI-generated kernels and deploying them into large-scale LLM inference.","short_abstract":"Recent advances show that large language models (LLMs) can act as autonomous agents capable of generating GPU kernels, but integrating these AI-generated kernels into real-world inference systems remains challenging. FlashInfer-Bench addresses this gap by establishing a standardized, closed-loop framework that connects...","url_abs":"https://arxiv.org/abs/2601.00227","url_pdf":"https://arxiv.org/pdf/2601.00227v1","authors":"[\"Shanli Xing\",\"Yiyan Zhai\",\"Alexander Jiang\",\"Yixin Dong\",\"Yong Wu\",\"Zihao Ye\",\"Charlie Ruan\",\"Yingyi Huang\",\"Yineng Zhang\",\"Liangsheng Yin\",\"Aksara Bayyapu\",\"Luis Ceze\",\"Tianqi Chen\"]","published":"2026-01-01T06:18:53Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}