{"ID":2912143,"CreatedAt":"2026-06-01T17:08:20.426514845Z","UpdatedAt":"2026-06-01T20:22:36.403506024Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2605.29357","arxiv_id":"2605.29357","title":"PassNet: Scaling Large Language Models for Graph Compiler Pass Generation","abstract":"Modern tensor compilers such as TorchInductor deliver substantial speedups on mainstream models, yet face a systematic performance ceiling on long-tail workloads -- our profiling shows that 43% of real-world subgraphs experience end-to-end slowdowns under default compilation. While LLMs offer a path toward automated optimization, existing efforts focus on standalone kernel generation. We argue that pass generation -- where LLMs author structured graph transformations that integrate directly into compiler pipelines -- is the more appropriate abstraction. We propose PassNet, the first large-scale ecosystem for LLM-based compiler pass generation, comprising: (1) PassNet-Dataset, over 18K unique computational graphs from 100K real-world models; and (2) PassBench, 200 curated long-tail fusible tasks (comprising 2,060 subgraphs in total) evaluated under the Error-aware Speedup Score (ES_t) -- a metric unifying correctness, stability, and performance -- with layered integrity defenses against systematic LLM exploitation. Experiments reveal that PassBench is both highly discriminative and genuinely unsaturated: the best frontier model trails TorchInductor by 37% in aggregate, yet on individual subgraphs LLMs achieve up to 3x speedup over the same compiler -- indicating that the bottleneck is consistency, not capability. Fine-tuning a small model on merely ~4K PassNet trajectories yields a 2.67x improvement approaching frontier-model performance, demonstrating substantial headroom and validating PassNet as live training infrastructure for advancing LLM-driven compiler optimization. All data, benchmarks, and tooling are publicly available.","short_abstract":"Modern tensor compilers such as TorchInductor deliver substantial speedups on mainstream models, yet face a systematic performance ceiling on long-tail workloads -- our profiling shows that 43% of real-world subgraphs experience end-to-end slowdowns under default compilation. While LLMs offer a path toward automated op...","url_abs":"https://arxiv.org/abs/2605.29357","url_pdf":"https://arxiv.org/pdf/2605.29357v1","authors":"[\"Yiqun Liu\",\"Yingsheng Wu\",\"Ruqi Yang\",\"Enrong Zheng\",\"Honglei Qiu\",\"Sijun He\",\"Tai Liang\",\"Jingjing Wu\",\"Yuhan Zhou\",\"Yiwei Zhang\",\"Dongyan Chen\",\"Weihan Yi\",\"Xinqi Li\",\"Siqi Bao\"]","published":"2026-05-28T04:55:14Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.LG\",\"cs.PL\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
