{"ID":2922160,"CreatedAt":"2026-06-02T02:42:49.606572591Z","UpdatedAt":"2026-06-02T17:44:34.312992241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.00832","arxiv_id":"2606.00832","title":"Momento: Evaluating Persistent Memory and Reasoning with Multi-Session Agentic Conversations","abstract":"Recent advances in agentic AI have enabled agents to complete complex tasks through tool use, reasoning, and multi-step planning. Yet existing benchmarks evaluate agents within a single session, ignoring past actions, stated preferences, and prior decisions that agents must integrate to fulfill personalized user goals. We introduce Momento, a benchmark for persistent agentic task completion in multi-session service environments, requiring agents to take consequential, tool-mediated actions while resolving temporal dependencies and evolving user goals across sessions. Experimental results reveal that current agents fail primarily through misestimation of user state, treating prior session history as a reliable proxy for current context rather than stale information requiring re-validation, highlighting a substantial gap between current agent capabilities and realistic long-horizon human-agent interaction.","short_abstract":"Recent advances in agentic AI have enabled agents to complete complex tasks through tool use, reasoning, and multi-step planning. Yet existing benchmarks evaluate agents within a single session, ignoring past actions, stated preferences, and prior decisions that agents must integrate to fulfill personalized user goals....","url_abs":"https://arxiv.org/abs/2606.00832","url_pdf":"https://arxiv.org/pdf/2606.00832v1","authors":"[\"Adril Putra Merin\",\"David Anugraha\",\"Ayu Purwarianti\",\"Genta Indra Winata\"]","published":"2026-05-30T18:08:51Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[]","has_code":false}
