{"ID":2872754,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.08933","arxiv_id":"2509.08933","title":"Corruption-Tolerant Asynchronous Q-Learning with Near-Optimal Rates","abstract":"We study the problem of learning the optimal policy in a discounted, infinite-horizon reinforcement learning (RL) setting in the presence of adversarially corrupted rewards. To address this problem, we develop a novel robust variant of the \\(Q\\)-learning algorithm and analyze it under the challenging asynchronous sampling model with time-correlated data. Despite corruption, we prove that the finite-time guarantees of our approach match existing bounds, up to an additive term that scales with the fraction of corrupted samples. We also establish an information-theoretic lower bound, revealing that our guarantees are near-optimal. Notably, our algorithm is agnostic to the underlying reward distribution and provides the first finite-time robustness guarantees for asynchronous \\(Q\\)-learning. A key element of our analysis is a refined Azuma-Hoeffding inequality for almost-martingales, which may have broader applicability in the study of RL algorithms.","short_abstract":"We study the problem of learning the optimal policy in a discounted, infinite-horizon reinforcement learning (RL) setting in the presence of adversarially corrupted rewards. To address this problem, we develop a novel robust variant of the \\(Q\\)-learning algorithm and analyze it under the challenging asynchronous sampl...","url_abs":"https://arxiv.org/abs/2509.08933","url_pdf":"https://arxiv.org/pdf/2509.08933v2","authors":"[\"Sreejeet Maity\",\"Aritra Mitra\"]","published":"2025-09-10T18:56:39Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"eess.SY\",\"math.OC\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
