{"ID":2861556,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.02149","arxiv_id":"2510.02149","title":"Reinforcement Learning with Action-Triggered Observations","abstract":"We study reinforcement learning problems where state observations are stochastically triggered by actions, a constraint common in many real-world applications. This framework is formulated as Action-Triggered Sporadically Traceable Markov Decision Processes (ATST-MDPs), where each action has a specified probability of triggering a state observation. We derive tailored Bellman optimality equations for this framework and introduce the action-sequence learning paradigm in which agents commit to executing a sequence of actions until the next observation arrives. Under the linear MDP assumption, value-functions are shown to admit linear representations in an induced action-sequence feature map. Leveraging this structure, we propose off-policy estimators with statistical error guarantees for such feature maps and introduce ST-LSVI-UCB, a variant of LSVI-UCB adapted for action-triggered settings. ST-LSVI-UCB achieves regret $\\widetilde O(\\sqrt{Kd^3(1-γ)^{-3}})$, where $K$ is the number of episodes, $d$ the feature dimension, and $γ$ the discount factor (per-step episode non-termination probability). Crucially, this work establishes the theoretical foundation for learning with sporadic, action-triggered observations while demonstrating that efficient learning remains feasible under such observation constraints.","short_abstract":"We study reinforcement learning problems where state observations are stochastically triggered by actions, a constraint common in many real-world applications. This framework is formulated as Action-Triggered Sporadically Traceable Markov Decision Processes (ATST-MDPs), where each action has a specified probability of...","url_abs":"https://arxiv.org/abs/2510.02149","url_pdf":"https://arxiv.org/pdf/2510.02149v1","authors":"[\"Alexander Ryabchenko\",\"Wenlong Mou\"]","published":"2025-10-02T16:00:50Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"math.OC\",\"stat.ML\"]","methods":"[\"Reinforcement Learning\",\"Large Language Model\"]","has_code":false}
