{"ID":2888981,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.21397","arxiv_id":"2507.21397","title":"Enabling Pareto-Stationarity Exploration in Multi-Objective Reinforcement Learning: A Multi-Objective Weighted-Chebyshev Actor-Critic Approach","abstract":"In many multi-objective reinforcement learning (MORL) applications, being able to systematically explore the Pareto-stationary solutions under multiple non-convex reward objectives with theoretical finite-time sample complexity guarantee is an important and yet under-explored problem. This motivates us to take the first step and fill the important gap in MORL. Specifically, in this paper, we propose a \\uline{M}ulti-\\uline{O}bjective weighted-\\uline{CH}ebyshev \\uline{A}ctor-critic (MOCHA) algorithm for MORL, which judiciously integrates the weighted-Chebychev (WC) and actor-critic framework to enable Pareto-stationarity exploration systematically with finite-time sample complexity guarantee. Sample complexity result of MOCHA algorithm reveals an interesting dependency on $p_{\\min}$ in finding an $ε$-Pareto-stationary solution, where $p_{\\min}$ denotes the minimum entry of a given weight vector $\\mathbf{p}$ in WC-scarlarization. By carefully choosing learning rates, the sample complexity for each exploration can be $\\tilde{\\mathcal{O}}(ε^{-2})$. Furthermore, simulation studies on a large KuaiRand offline dataset, show that the performance of MOCHA algorithm significantly outperforms other baseline MORL approaches.","short_abstract":"In many multi-objective reinforcement learning (MORL) applications, being able to systematically explore the Pareto-stationary solutions under multiple non-convex reward objectives with theoretical finite-time sample complexity guarantee is an important and yet under-explored problem. This motivates us to take the firs...","url_abs":"https://arxiv.org/abs/2507.21397","url_pdf":"https://arxiv.org/pdf/2507.21397v1","authors":"[\"Fnu Hairi\",\"Jiao Yang\",\"Tianchen Zhou\",\"Haibo Yang\",\"Chaosheng Dong\",\"Fan Yang\",\"Michinari Momma\",\"Yan Gao\",\"Jia Liu\"]","published":"2025-07-29T00:11:59Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Reinforcement Learning\",\"LoRA\"]","has_code":false}
