{"ID":2825149,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.21527","arxiv_id":"2512.21527","title":"Generative Actor Critic","abstract":"Conventional Reinforcement Learning (RL) algorithms, typically focused on estimating or maximizing expected returns, face challenges when refining offline pretrained models with online experiences. This paper introduces Generative Actor Critic (GAC), a novel framework that decouples sequential decision-making by reframing \\textit{policy evaluation} as learning a generative model of the joint distribution over trajectories and returns, $p(τ, y)$, and \\textit{policy improvement} as performing versatile inference on this learned model. To operationalize GAC, we introduce a specific instantiation based on a latent variable model that features continuous latent plan vectors. We develop novel inference strategies for both \\textit{exploitation}, by optimizing latent plans to maximize expected returns, and \\textit{exploration}, by sampling latent plans conditioned on dynamically adjusted target returns. Experiments on Gym-MuJoCo and Maze2D benchmarks demonstrate GAC's strong offline performance and significantly enhanced offline-to-online improvement compared to state-of-the-art methods, even in absence of step-wise rewards.","short_abstract":"Conventional Reinforcement Learning (RL) algorithms, typically focused on estimating or maximizing expected returns, face challenges when refining offline pretrained models with online experiences. This paper introduces Generative Actor Critic (GAC), a novel framework that decouples sequential decision-making by refram...","url_abs":"https://arxiv.org/abs/2512.21527","url_pdf":"https://arxiv.org/pdf/2512.21527v1","authors":"[\"Aoyang Qin\",\"Deqian Kong\",\"Wei Wang\",\"Ying Nian Wu\",\"Song-Chun Zhu\",\"Sirui Xie\"]","published":"2025-12-25T06:31:11Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Reinforcement Learning\",\"LoRA\"]","has_code":false}
