{"ID":2864608,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.23157","arxiv_id":"2509.23157","title":"Grouped Satisficing Paths in Pure Strategy Games: a Topological Perspective","abstract":"In game theory and multi-agent reinforcement learning (MARL), each agent selects a strategy, interacts with the environment and other agents, and subsequently updates its strategy based on the received payoff. This process generates a sequence of joint strategies $(s^t)_{t \\geq 0}$, where $s^t$ represents the strategy profile of all agents at time step $t$. A widely adopted principle in MARL algorithms is \"win-stay, lose-shift\", which dictates that an agent retains its current strategy if it achieves the best response. This principle exhibits a fixed-point property when the joint strategy has become an equilibrium. The sequence of joint strategies under this principle is referred to as a satisficing path, a concept first introduced in [40] and explored in the context of $N$-player games in [39]. A fundamental question arises regarding this principle: Under what conditions does every initial joint strategy $s$ admit a finite-length satisficing path $(s^t)_{0 \\leq t \\leq T}$ where $s^0=s$ and $s^T$ is an equilibrium? This paper establishes a sufficient condition for such a property, and demonstrates that any finite-state Markov game, as well as any $N$-player game, guarantees the existence of a finite-length satisficing path from an arbitrary initial strategy to some equilibrium. These results provide a stronger theoretical foundation for the design of MARL algorithms.","short_abstract":"In game theory and multi-agent reinforcement learning (MARL), each agent selects a strategy, interacts with the environment and other agents, and subsequently updates its strategy based on the received payoff. This process generates a sequence of joint strategies $(s^t)_{t \\geq 0}$, where $s^t$ represents the strategy...","url_abs":"https://arxiv.org/abs/2509.23157","url_pdf":"https://arxiv.org/pdf/2509.23157v1","authors":"[\"Yanqing Fu\",\"Chao Huang\",\"Chenrun Wang\",\"Zhuping Wang\"]","published":"2025-09-27T07:07:27Z","proceeding":"cs.GT","tasks":"[\"cs.GT\",\"cs.LG\",\"cs.MA\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}