{"ID":2882149,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.10337","arxiv_id":"2508.10337","title":"A Curriculum Learning Approach to Reinforcement Learning: Leveraging RAG for Multimodal Question Answering","abstract":"This paper describes the solutions of the Dianping-Trust-Safety team for the META CRAG-MM challenge. The challenge requires building a comprehensive retrieval-augmented generation system capable for multi-modal multi-turn question answering. The competition consists of three tasks: (1) answering questions using structured data retrieved from an image-based mock knowledge graph, (2) synthesizing information from both knowledge graphs and web search results, and (3) handling multi-turn conversations that require context understanding and information aggregation from multiple sources. For Task 1, our solution is based on the vision large language model, enhanced by supervised fine-tuning with knowledge distilled from GPT-4.1. We further applied curriculum learning strategies to guide reinforcement learning, resulting in improved answer accuracy and reduced hallucination. For Task 2 and Task 3, we additionally leveraged web search APIs to incorporate external knowledge, enabling the system to better handle complex queries and multi-turn conversations. Our approach achieved 1st place in Task 1 with a significant lead of 52.38%, and 3rd place in Task 3, demonstrating the effectiveness of the integration of curriculum learning with reinforcement learning in our training pipeline.","short_abstract":"This paper describes the solutions of the Dianping-Trust-Safety team for the META CRAG-MM challenge. The challenge requires building a comprehensive retrieval-augmented generation system capable for multi-modal multi-turn question answering. The competition consists of three tasks: (1) answering questions using structu...","url_abs":"https://arxiv.org/abs/2508.10337","url_pdf":"https://arxiv.org/pdf/2508.10337v2","authors":"[\"Chenliang Zhang\",\"Lin Wang\",\"Yuanyuan Lu\",\"Yusheng Qi\",\"Kexin Wang\",\"Peixu Hou\",\"Wenshi Chen\"]","published":"2025-08-14T04:37:56Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.LG\"]","methods":"[\"RAG\",\"Reinforcement Learning\",\"Language Model\"]","has_code":false}