{"ID":3004936,"CreatedAt":"2026-06-03T03:09:48.883664427Z","UpdatedAt":"2026-06-04T19:14:31.964469513Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.03099","arxiv_id":"2606.03099","title":"PhotoCraft: Agentic Reasoning with Hierarchical Self-Evolving Memory for Deep Image Search","abstract":"Deep Image Search requires multi-step reasoning over rich contextual cues, such as time, location, and event relations. However, most existing LLM-based agents are stateless and reactive, lacking persistent memory to maintain long-horizon context or transfer experience across tasks, which often leads to execution drift and experience isolation. To address these limitations, we propose PhotoCraft, a training-free, hierarchical memory system for photo-search agents. Inspired by human cognition, PhotoCraft equips MLLMs with working, episodic, and semantic memory, which are dynamically invoked during reasoning to preserve logical consistency and knowledge transferability throughout multi-step reasoning and answer generation. Extensive experiments on DISBench demonstrate that PhotoCraft consistently improves context-aware retrieval across diverse MLLM backbones, achieving gains of up to 18.5\\% and effectively mitigating key bottlenecks in memoryless deep image search, offering a practical path toward reliable and generalizable multimodal search agents.","short_abstract":"Deep Image Search requires multi-step reasoning over rich contextual cues, such as time, location, and event relations. However, most existing LLM-based agents are stateless and reactive, lacking persistent memory to maintain long-horizon context or transfer experience across tasks, which often leads to execution drift...","url_abs":"https://arxiv.org/abs/2606.03099","url_pdf":"https://arxiv.org/pdf/2606.03099v1","authors":"[\"Kailin Lyu\",\"Zhiqiang Yuan\",\"Jianwei He\",\"Qiwei Yan\",\"Xuanbo Su\",\"Nanxing Hu\",\"Yang Liu\",\"Ce Hao\",\"Shengqian Qin\",\"Lianyu Hu\",\"Jinchao Zhang\",\"Jie Zhou\"]","published":"2026-06-02T03:38:44Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[\"Large Language Model\"]","has_code":false}
