{"ID":2884527,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.05984","arxiv_id":"2508.05984","title":"Parameter-free Optimal Rates for Nonlinear Semi-Norm Contractions with Applications to $Q$-Learning","abstract":"Algorithms for solving \\textit{nonlinear} fixed-point equations -- such as average-reward \\textit{$Q$-learning} and \\textit{TD-learning} -- often involve semi-norm contractions. Achieving parameter-free optimal convergence rates for these methods via Polyak--Ruppert averaging has remained elusive, largely due to the non-monotonicity of such semi-norms. We close this gap by (i.) recasting the averaged error as a linear recursion involving a nonlinear perturbation, and (ii.) taming the nonlinearity by coupling the semi-norm's contraction with the monotonicity of a suitably induced norm. Our main result yields the first parameter-free $\\tilde{O}(1/\\sqrt{t})$ optimal rates for $Q$-learning in both average-reward and exponentially discounted settings, where $t$ denotes the iteration index. The result applies within a broad framework that accommodates synchronous and asynchronous updates, single-agent and distributed deployments, and data streams obtained either from simulators or along Markovian trajectories.","short_abstract":"Algorithms for solving \\textit{nonlinear} fixed-point equations -- such as average-reward \\textit{$Q$-learning} and \\textit{TD-learning} -- often involve semi-norm contractions. Achieving parameter-free optimal convergence rates for these methods via Polyak--Ruppert averaging has remained elusive, largely due to the no...","url_abs":"https://arxiv.org/abs/2508.05984","url_pdf":"https://arxiv.org/pdf/2508.05984v2","authors":"[\"Ankur Naskar\",\"Gugan Thoppe\",\"Vijay Gupta\"]","published":"2025-08-08T03:35:29Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[]","has_code":false}
