{"ID":2825704,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.20092","arxiv_id":"2512.20092","title":"Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents","abstract":"Temporal reasoning over long, multi-session dialogues is a critical capability for conversational agents. However, existing works and our pilot study have shown that as dialogue histories grow in length and accumulate noise, current long-context models struggle to accurately identify temporally pertinent information, significantly impairing reasoning performance. To address this, we introduce Memory-T1, a framework that learns a time-aware memory selection policy using reinforcement learning (RL). It employs a coarse-to-fine strategy, first pruning the dialogue history into a candidate set using temporal and relevance filters, followed by an RL agent that selects the precise evidence sessions. The RL training is guided by a multi-level reward function optimizing (i) answer accuracy, (ii) evidence grounding, and (iii) temporal consistency. In particular, the temporal consistency reward provides a dense signal by evaluating alignment with the query time scope at both the session-level (chronological proximity) and the utterance-level (chronological fidelity), enabling the agent to resolve subtle chronological ambiguities. On the Time-Dialog benchmark, Memory-T1 boosts a 7B model to an overall score of 67.0\\%, establishing a new state-of-the-art performance for open-source models and outperforming a 14B baseline by 10.2\\%. Ablation studies show temporal consistency and evidence grounding rewards jointly contribute to a 15.0\\% performance gain. Moreover, Memory-T1 maintains robustness up to 128k tokens, where baseline models collapse, proving effectiveness against noise in extensive dialogue histories. The code and datasets are publicly available at https://github.com/Elvin-Yiming-Du/Memory-T1/","short_abstract":"Temporal reasoning over long, multi-session dialogues is a critical capability for conversational agents. However, existing works and our pilot study have shown that as dialogue histories grow in length and accumulate noise, current long-context models struggle to accurately identify temporally pertinent information, s...","url_abs":"https://arxiv.org/abs/2512.20092","url_pdf":"https://arxiv.org/pdf/2512.20092v1","authors":"[\"Yiming Du\",\"Baojun Wang\",\"Yifan Xiang\",\"Zhaowei Wang\",\"Wenyu Huang\",\"Boyang Xue\",\"Bin Liang\",\"Xingshan Zeng\",\"Fei Mi\",\"Haoli Bai\",\"Lifeng Shang\",\"Jeff Z. Pan\",\"Yuxin Jiang\",\"Kam-Fai Wong\"]","published":"2025-12-23T06:37:29Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Reinforcement Learning\"]","has_code":false,"code_links":[{"ID":605689,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2825704,"paper_url":"https://arxiv.org/abs/2512.20092","paper_title":"Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents","repo_url":"https://github.com/Elvin-Yiming-Du/Memory-T1","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
