{"ID":2888360,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.23501","arxiv_id":"2507.23501","title":"Adaptive Ensemble Aggregation for Actor-Critics","abstract":"Ensembles are ubiquitous in off-policy actor-critic learning, yet their efficacy depends critically on how they are aggregated. Current methods typically rely on static rules or task-specific hyperparameters to balance overestimation bias and variance, leaving the challenge of a truly adaptive approach open. We introduce Adaptive Ensemble Aggregation (AEA), an algorithm that dynamically constructs ensemble-based targets for both critic and actor updates directly from training dynamics. We prove that AEA converges to a unique equilibrium where the aggregation parameter minimizes value estimation error within a defined stability region. Theoretically, we establish that AEA achieves a shrinkage property where the estimation bias vanishes as the total ensemble size grows. Unlike subset-based methods like REDQ, which hit an information bottleneck determined by a fixed variance floor regardless of the ensemble size, AEA exploits the full ensemble to achieve optimal variance reduction-scaling inversely with the total number of models-and maximal Fisher information. Furthermore, we provide a formal guarantee for monotonic policy improvement under this adaptive regime. Extensive evaluations on various continuous control tasks demonstrate that AEA outperforms, on the majority of tasks, state-of-the-art baselines, providing a robust and self-calibrating framework for ensemble-based reinforcement learning.","short_abstract":"Ensembles are ubiquitous in off-policy actor-critic learning, yet their efficacy depends critically on how they are aggregated. Current methods typically rely on static rules or task-specific hyperparameters to balance overestimation bias and variance, leaving the challenge of a truly adaptive approach open. We introdu...","url_abs":"https://arxiv.org/abs/2507.23501","url_pdf":"https://arxiv.org/pdf/2507.23501v2","authors":"[\"Nicklas Werge\",\"Yi-Shan Wu\",\"Manuel Haussmann\",\"Bahareh Tasdighi\",\"Melih Kandemir\"]","published":"2025-07-31T12:40:50Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"stat.ML\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}