{"ID":2870245,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.12764","arxiv_id":"2509.12764","title":"Myopic Optimality: why reinforcement learning portfolio management strategies lose money","abstract":"Myopic optimization (MO) outperforms reinforcement learning (RL) in portfolio management: RL yields lower or negative returns, higher variance, larger costs, heavier CVaR, lower profitability, and greater model risk. We model execution/liquidation frictions with mark-to-market accounting. Using Malliavin calculus (Clark-Ocone/BEL), we derive policy gradients and risk shadow price, unifying HJB and KKT. This gives dual gap and convergence results: geometric MO vs. RL floors. We quantify phantom profit in RL via Malliavin policy-gradient contamination analysis and define a control-affects-dynamics (CAD) premium of RL indicating plausibly positive.","short_abstract":"Myopic optimization (MO) outperforms reinforcement learning (RL) in portfolio management: RL yields lower or negative returns, higher variance, larger costs, heavier CVaR, lower profitability, and greater model risk. We model execution/liquidation frictions with mark-to-market accounting. Using Malliavin calculus (Clar...","url_abs":"https://arxiv.org/abs/2509.12764","url_pdf":"https://arxiv.org/pdf/2509.12764v1","authors":"[\"Yuming Ma\"]","published":"2025-09-16T07:24:24Z","proceeding":"q-fin.TR","tasks":"[\"q-fin.TR\",\"math.OC\",\"math.PR\",\"q-fin.PM\",\"q-fin.RM\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
