{"ID":2853501,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.17891","arxiv_id":"2510.17891","title":"TritonRL: Training LLMs to Think and Code Triton Without Cheating","abstract":"The rapid evolution of Large Language Models (LLMs) has driven a growing demand for automated, high-performance system kernels to accelerate machine learning workloads. We introduce TritonRL, a domain-specialized 8B-scale LLM for Triton programming, trained via a novel reinforcement learning (RL) framework. While Triton synthesis faces unique challenges, including data scarcity and a high susceptibility to reward hacking, our approach enables robust kernel generation through two primary innovations. First, we implement a multi-layered verification system that provides high-fidelity reward signals, ensuring that generated kernels are both syntactically and functionally valid. Second, we propose Hierarchical Reward Decomposition (HRD), which decouples reinforcement for high-level reasoning and low-level implementation to resolve the credit assignment problem in long-sequence generation. Comprehensive evaluations on KernelBench demonstrate that TritonRL achieves state-of-the-art correctness and runtime speedup, outperforming concurrent Triton-specific models and matching the performance of frontier models with over 100B parameters. Our results highlight the effectiveness of hardware-aware RL paradigms in specialized domain adaptation.","short_abstract":"The rapid evolution of Large Language Models (LLMs) has driven a growing demand for automated, high-performance system kernels to accelerate machine learning workloads. We introduce TritonRL, a domain-specialized 8B-scale LLM for Triton programming, trained via a novel reinforcement learning (RL) framework. While Trito...","url_abs":"https://arxiv.org/abs/2510.17891","url_pdf":"https://arxiv.org/pdf/2510.17891v2","authors":"[\"Jiin Woo\",\"Shaowei Zhu\",\"Allen Nie\",\"Zhen Jia\",\"Yida Wang\",\"Youngsuk Park\"]","published":"2025-10-18T21:36:10Z","proceeding":"cs.SE","tasks":"[\"cs.SE\",\"cs.LG\"]","methods":"[\"Reinforcement Learning\",\"Large Language Model\",\"Language Model\"]","has_code":false}