{"ID":2892574,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.14901","arxiv_id":"2507.14901","title":"Learning Nonlinear Causal Reductions to Explain Reinforcement Learning Policies","abstract":"Why do reinforcement learning (RL) policies fail or succeed? This is a challenging question due to the complex, high-dimensional nature of agent-environment interactions. In this work, we take a causal perspective on explaining the behavior of RL policies by viewing the states, actions, and rewards as variables in a low-level causal model. We introduce random perturbations to policy actions during execution and observe their effects on the cumulative reward, learning a simplified high-level causal model that explains these relationships. To this end, we develop a nonlinear Causal Model Reduction framework that ensures approximate interventional consistency, meaning the simplified high-level model responds to interventions in a similar way as the original complex system. We prove that for a class of nonlinear causal models, there exists a unique solution that achieves exact interventional consistency, ensuring learned explanations reflect meaningful causal patterns. Experiments on both synthetic causal models and practical RL tasks-including pendulum control and robot table tennis-demonstrate that our approach can uncover important behavioral patterns, biases, and failure modes in trained RL policies.","short_abstract":"Why do reinforcement learning (RL) policies fail or succeed? This is a challenging question due to the complex, high-dimensional nature of agent-environment interactions. In this work, we take a causal perspective on explaining the behavior of RL policies by viewing the states, actions, and rewards as variables in a lo...","url_abs":"https://arxiv.org/abs/2507.14901","url_pdf":"https://arxiv.org/pdf/2507.14901v1","authors":"[\"Armin Kekić\",\"Jan Schneider\",\"Dieter Büchler\",\"Bernhard Schölkopf\",\"Michel Besserve\"]","published":"2025-07-20T10:25:24Z","proceeding":"stat.ML","tasks":"[\"stat.ML\",\"cs.AI\",\"cs.LG\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}