{"ID":2877621,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.19828","arxiv_id":"2508.19828","title":"Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning","abstract":"Large Language Models (LLMs) have demonstrated impressive capabilities across a wide range of NLP tasks, but they remain fundamentally stateless, constrained by limited context windows that hinder long-horizon reasoning. Recent efforts to address this limitation often augment LLMs with an external memory bank, yet most existing pipelines are static and heuristic-driven, lacking a learned mechanism for deciding what to store, update, or retrieve. We present Memory-R1, a reinforcement learning (RL) framework that equips LLMs with the ability to actively manage and utilize external memory through two specialized agents: a Memory Manager that learns structured operations, including ADD, UPDATE, DELETE, and NOOP; and an Answer Agent that pre-selects and reasons over relevant entries. Both agents are fine-tuned with outcome-driven RL (PPO and GRPO), enabling adaptive memory management with minimal supervision. With only 152 training QA pairs, Memory-R1 outperforms strong baselines and generalizes across diverse question types, three benchmarks (LoCoMo, MSC, LongMemEval), and multiple model scales (3B-14B).","short_abstract":"Large Language Models (LLMs) have demonstrated impressive capabilities across a wide range of NLP tasks, but they remain fundamentally stateless, constrained by limited context windows that hinder long-horizon reasoning. Recent efforts to address this limitation often augment LLMs with an external memory bank, yet most...","url_abs":"https://arxiv.org/abs/2508.19828","url_pdf":"https://arxiv.org/pdf/2508.19828v5","authors":"[\"Sikuan Yan\",\"Xiufeng Yang\",\"Zuchao Huang\",\"Ercong Nie\",\"Zifeng Ding\",\"Zonggen Li\",\"Xiaowen Ma\",\"Jinhe Bi\",\"Kristian Kersting\",\"Jeff Z. Pan\",\"Hinrich Schütze\",\"Volker Tresp\",\"Yunpu Ma\"]","published":"2025-08-27T12:26:55Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.MA\"]","methods":"[\"Reinforcement Learning\",\"Large Language Model\",\"Language Model\"]","has_code":false}
