{"ID":2891597,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.16186","arxiv_id":"2507.16186","title":"EBaReT: Expert-guided Bag Reward Transformer for Auto Bidding","abstract":"Reinforcement learning has been widely applied in automated bidding. Traditional approaches model bidding as a Markov Decision Process (MDP). Recently, some studies have explored using generative reinforcement learning methods to address long-term dependency issues in bidding environments. Although effective, these methods typically rely on supervised learning approaches, which are vulnerable to low data quality due to the amount of sub-optimal bids and low probability rewards resulting from the low click and conversion rates. Unfortunately, few studies have addressed these challenges. In this paper, we formalize the automated bidding as a sequence decision-making problem and propose a novel Expert-guided Bag Reward Transformer (EBaReT) to address concerns related to data quality and uncertainty rewards. Specifically, to tackle data quality issues, we generate a set of expert trajectories to serve as supplementary data in the training process and employ a Positive-Unlabeled (PU) learning-based discriminator to identify expert transitions. To ensure the decision also meets the expert level, we further design a novel expert-guided inference strategy. Moreover, to mitigate the uncertainty of rewards, we consider the transitions within a certain period as a \"bag\" and carefully design a reward function that leads to a smoother acquisition of rewards. Extensive experiments demonstrate that our model achieves superior performance compared to state-of-the-art bidding methods.","short_abstract":"Reinforcement learning has been widely applied in automated bidding. Traditional approaches model bidding as a Markov Decision Process (MDP). Recently, some studies have explored using generative reinforcement learning methods to address long-term dependency issues in bidding environments. Although effective, these met...","url_abs":"https://arxiv.org/abs/2507.16186","url_pdf":"https://arxiv.org/pdf/2507.16186v1","authors":"[\"Kaiyuan Li\",\"Pengyu Wang\",\"Yunshan Peng\",\"Pengjia Yuan\",\"Yanxiang Zeng\",\"Rui Xiang\",\"Yanhua Cheng\",\"Xialong Liu\",\"Peng Jiang\"]","published":"2025-07-22T02:56:36Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.IR\"]","methods":"[\"Reinforcement Learning\",\"Transformer\"]","has_code":false}
