{"ID":2898029,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.03868","arxiv_id":"2507.03868","title":"From Query to Explanation: Uni-RAG for Multi-Modal Retrieval-Augmented Learning in STEM","abstract":"In AI-facilitated teaching, leveraging various query styles to interpret abstract educational content is crucial for delivering effective and accessible learning experiences. However, existing retrieval systems predominantly focus on natural text-image matching and lack the capacity to address the diversity and ambiguity inherent in real-world educational scenarios. To address this limitation, we develop a lightweight and efficient multi-modal retrieval module, named Uni-Retrieval, which extracts query-style prototypes and dynamically matches them with tokens from a continually updated Prompt Bank. This Prompt Bank encodes and stores domain-specific knowledge by leveraging a Mixture-of-Expert Low-Rank Adaptation (MoE-LoRA) module and can be adapted to enhance Uni-Retrieval's capability to accommodate unseen query types at test time. To enable natural language educational content generation, we integrate the original Uni-Retrieval with a compact instruction-tuned language model, forming a complete retrieval-augmented generation pipeline named Uni-RAG. Given a style-conditioned query, Uni-RAG first retrieves relevant educational materials and then generates human-readable explanations, feedback, or instructional content aligned with the learning objective. Experimental results on SER and other multi-modal benchmarks show that Uni-RAG outperforms baseline retrieval and RAG systems in both retrieval accuracy and generation quality, while maintaining low computational cost. Our framework provides a scalable, pedagogically grounded solution for intelligent educational systems, bridging retrieval and generation to support personalized, explainable, and efficient learning assistance across diverse STEM scenarios.","short_abstract":"In AI-facilitated teaching, leveraging various query styles to interpret abstract educational content is crucial for delivering effective and accessible learning experiences. However, existing retrieval systems predominantly focus on natural text-image matching and lack the capacity to address the diversity and ambigui...","url_abs":"https://arxiv.org/abs/2507.03868","url_pdf":"https://arxiv.org/pdf/2507.03868v1","authors":"[\"Xinyi Wu\",\"Yanhao Jia\",\"Luwei Xiao\",\"Shuai Zhao\",\"Fengkuang Chiang\",\"Erik Cambria\"]","published":"2025-07-05T02:44:38Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.CE\",\"cs.CY\",\"cs.MM\"]","methods":"[\"RAG\",\"Language Model\",\"LoRA\"]","has_code":false}
