{"ID":2854980,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.13158","arxiv_id":"2510.13158","title":"Behavioral Embeddings of Programs: A Quasi-Dynamic Approach for Optimization Prediction","abstract":"Learning effective numerical representations, or embeddings, of programs is a fundamental prerequisite for applying machine learning to automate and enhance compiler optimization. Prevailing paradigms, however, present a dilemma. Static representations, derived from source code or intermediate representation (IR), are efficient and deterministic but offer limited insight into how a program will behave or evolve under complex code transformations. Conversely, dynamic representations, which rely on runtime profiling, provide profound insights into performance bottlenecks but are often impractical for large-scale tasks due to prohibitive overhead and inherent non-determinism. This paper transcends this trade-off by proposing a novel quasi-dynamic framework for program representation. The core insight is to model a program's optimization sensitivity. We introduce the Program Behavior Spectrum, a new representation generated by probing a program's IR with a diverse set of optimization sequences and quantifying the resulting changes in its static features. To effectively encode this high-dimensional, continuous spectrum, we pioneer a compositional learning approach. Product Quantization is employed to discretize the continuous reaction vectors into structured, compositional sub-words. Subsequently, a multi-task Transformer model, termed PQ-BERT, is pre-trained to learn the deep contextual grammar of these behavioral codes. Comprehensive experiments on two representative compiler optimization tasks -- Best Pass Prediction and -Oz Benefit Prediction -- demonstrate that our method outperforms state-of-the-art static baselines. Our code is publicly available at https://github.com/Panhaolin2001/PREP/.","short_abstract":"Learning effective numerical representations, or embeddings, of programs is a fundamental prerequisite for applying machine learning to automate and enhance compiler optimization. Prevailing paradigms, however, present a dilemma. Static representations, derived from source code or intermediate representation (IR), are...","url_abs":"https://arxiv.org/abs/2510.13158","url_pdf":"https://arxiv.org/pdf/2510.13158v1","authors":"[\"Haolin Pan\",\"Jinyuan Dong\",\"Hongbin Zhang\",\"Hongyu Lin\",\"Mingjie Xing\",\"Yanjun Wu\"]","published":"2025-10-15T05:18:41Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\"]","methods":"[\"Transformer\"]","has_code":false,"code_links":[{"ID":608207,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2854980,"paper_url":"https://arxiv.org/abs/2510.13158","paper_title":"Behavioral Embeddings of Programs: A Quasi-Dynamic Approach for Optimization Prediction","repo_url":"https://github.com/Panhaolin2001/PREP","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
