{"ID":2868807,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.15927","arxiv_id":"2509.15927","title":"Enhancing Generative Auto-bidding with Offline Reward Evaluation and Policy Search","abstract":"Auto-bidding is a critical tool for advertisers to improve advertising performance. Recent progress has demonstrated that AI-Generated Bidding (AIGB), which learns a conditional generative planner from offline data, achieves superior performance compared to typical offline reinforcement learning (RL)-based auto-bidding methods. However, existing AIGB methods still face a performance bottleneck due to their inherent inability to explore beyond the static dataset with feedback. To address this, we propose \\textbf{AIGB-Pearl} (\\emph{\\textbf{P}lanning with \\textbf{E}valu\\textbf{A}tor via \\textbf{RL}}), a novel method that integrates generative planning and policy optimization. The core of AIGB-Pearl lies in constructing a trajectory evaluator to assess the quality of generated scores and designing a provably sound KL-Lipschitz-constrained score-maximization scheme to ensure safe and efficient exploration beyond the offline dataset. A practical algorithm that incorporates the synchronous coupling technique is further developed to ensure the model regularity required by the proposed scheme. Extensive experiments on both simulated and real-world advertising systems demonstrate the state-of-the-art performance of our approach.","short_abstract":"Auto-bidding is a critical tool for advertisers to improve advertising performance. Recent progress has demonstrated that AI-Generated Bidding (AIGB), which learns a conditional generative planner from offline data, achieves superior performance compared to typical offline reinforcement learning (RL)-based auto-bidding...","url_abs":"https://arxiv.org/abs/2509.15927","url_pdf":"https://arxiv.org/pdf/2509.15927v4","authors":"[\"Zhiyu Mou\",\"Yiqin Lv\",\"Miao Xu\",\"Qi Wang\",\"Yixiu Mao\",\"Jinghao Chen\",\"Qichen Ye\",\"Chao Li\",\"Rongquan Bai\",\"Chuan Yu\",\"Jian Xu\",\"Bo Zheng\"]","published":"2025-09-19T12:30:26Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\"]","methods":"[\"Reinforcement Learning\",\"LoRA\"]","has_code":false}
