{"ID":2862839,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.26340","arxiv_id":"2509.26340","title":"Memory-Driven Self-Improvement for Decision Making with Large Language Models","abstract":"Large language models (LLMs) have emerged as effective action policies for sequential decision-making (SDM) tasks due to their extensive prior knowledge. However, this broad yet general knowledge is often insufficient for specific decision-making tasks with limited task-related data, making it challenging to efficiently adapt LLMs to specific SDM tasks. To address this challenge, we propose a memory-driven self-improvement framework that combines LLM general prior knowledge with a compact memory of domain-specific experiences. Memory retains past interactions and associated Q-values, thereby capturing decision-relevant knowledge that facilitates accurate value estimation and informs the LLM prior refinement. The refined LLM prior, in turn, generates higher-reward trajectories that further enrich memory, forming a natural self-improvement framework where memory and LLM prior mutually reinforce each other. Experiments show that our memory-driven approach significantly outperforms both traditional RL and LLM-based baselines, e.g., improving performance by over 40\\% on in-distribution tasks and over 75\\% when generalized to unseen tasks in ALFWorld.","short_abstract":"Large language models (LLMs) have emerged as effective action policies for sequential decision-making (SDM) tasks due to their extensive prior knowledge. However, this broad yet general knowledge is often insufficient for specific decision-making tasks with limited task-related data, making it challenging to efficientl...","url_abs":"https://arxiv.org/abs/2509.26340","url_pdf":"https://arxiv.org/pdf/2509.26340v1","authors":"[\"Xue Yan\",\"Zijing Ou\",\"Mengyue Yang\",\"Yan Song\",\"Haifeng Zhang\",\"Yingzhen Li\",\"Jun Wang\"]","published":"2025-09-30T14:46:06Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
