{"ID":2832747,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.06204","arxiv_id":"2512.06204","title":"Quantifying Memory Use in Reinforcement Learning with Temporal Range","abstract":"How much does a trained RL policy actually use its past observations? We propose \\emph{Temporal Range}, a model-agnostic metric that treats first-order sensitivities of multiple vector outputs across a temporal window to the input sequence as a temporal influence profile and summarizes it by the magnitude-weighted average lag. Temporal Range is computed via reverse-mode automatic differentiation from the Jacobian blocks $\\partial y_s/\\partial x_t\\in\\mathbb{R}^{c\\times d}$ averaged over final timesteps $s\\in\\{t+1,\\dots,T\\}$ and is well-characterized in the linear setting by a small set of natural axioms. Across diagnostic and control tasks (POPGym; flicker/occlusion; Copy-$k$) and architectures (MLPs, RNNs, SSMs), Temporal Range (i) remains small in fully observed control, (ii) scales with the task's ground-truth lag in Copy-$k$, and (iii) aligns with the minimum history window required for near-optimal return as confirmed by window ablations. We also report Temporal Range for a compact Long Expressive Memory (LEM) policy trained on the task, using it as a proxy readout of task-level memory. Our axiomatic treatment draws on recent work on range measures, specialized here to temporal lag and extended to vector-valued outputs in the RL setting. Temporal Range thus offers a practical per-sequence readout of memory dependence for comparing agents and environments and for selecting the shortest sufficient context.","short_abstract":"How much does a trained RL policy actually use its past observations? We propose \\emph{Temporal Range}, a model-agnostic metric that treats first-order sensitivities of multiple vector outputs across a temporal window to the input sequence as a temporal influence profile and summarizes it by the magnitude-weighted aver...","url_abs":"https://arxiv.org/abs/2512.06204","url_pdf":"https://arxiv.org/pdf/2512.06204v1","authors":"[\"Rodney Lafuente-Mercado\",\"Daniela Rus\",\"T. Konstantin Rusch\"]","published":"2025-12-05T22:58:09Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
