{"ID":2882435,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.10804","arxiv_id":"2508.10804","title":"Non-Stationary Restless Multi-Armed Bandits with Provable Guarantee","abstract":"Online restless multi-armed bandits (RMABs) typically assume that each arm follows a stationary Markov Decision Process (MDP) with fixed state transitions and rewards. However, in real-world applications like healthcare and recommendation systems, these assumptions often break due to non-stationary dynamics, posing significant challenges for traditional RMAB algorithms. In this work, we specifically consider $N$-armd RMAB with non-stationary transition constrained by bounded variation budgets $B$. Our proposed \\rmab\\; algorithm integrates sliding window reinforcement learning (RL) with an upper confidence bound (UCB) mechanism to simultaneously learn transition dynamics and their variations. We further establish that \\rmab\\; achieves $\\widetilde{\\mathcal{O}}(N^2 B^{\\frac{1}{4}} T^{\\frac{3}{4}})$ regret bound by leveraging a relaxed definition of regret, providing a foundational theoretical framework for non-stationary RMAB problems for the first time.","short_abstract":"Online restless multi-armed bandits (RMABs) typically assume that each arm follows a stationary Markov Decision Process (MDP) with fixed state transitions and rewards. However, in real-world applications like healthcare and recommendation systems, these assumptions often break due to non-stationary dynamics, posing sig...","url_abs":"https://arxiv.org/abs/2508.10804","url_pdf":"https://arxiv.org/pdf/2508.10804v1","authors":"[\"Yu-Heng Hung\",\"Ping-Chun Hsieh\",\"Kai Wang\"]","published":"2025-08-14T16:26:00Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
