{"ID":2855838,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.12668","arxiv_id":"2510.12668","title":"Understanding Parametric Knowledge Injection in Retrieval-Augmented Generation","abstract":"Context-grounded generation underpins many LLM applications, including long-document question answering (QA), conversational personalization, and retrieval-augmented generation (RAG). However, classic token-based context concatenation is costly for long inputs and can be lost in the middle at extreme context lengths. Recent work explores context parameterization, which encodes context into lightweight trainable parameters (e.g., LoRA adapters) injected into a frozen LLM. Extending this idea to retrieved evidence yields parametric RAG (P-RAG), which incorporates knowledge via parameter updates rather than token-level attention. In this paper, we present a systematic study of this emerging RAG paradigm-parametric knowledge injection. First, we reassess P-RAG under answer-presence accuracy and show that it does not consistently outperform standard token-based RAG (T-RAG), while combining both (PT-RAG) achieves the best overall performance. Second, we introduce a QA benchmark with up-to-date knowledge beyond the LLM's internal memory to enable controlled analysis. Our representational and mechanistic results indicate that parametric representations capture document-level semantics and primarily influence deeper feed-forward computations, providing high-level guidance but limited evidence consolidation. Finally, we evaluate parametric injection under key RAG challenges, demonstrating improved faithfulness under knowledge conflicts, stronger robustness to retrieval noise, and solid generalization to tasks beyond QA. Our findings clarify the strengths and limitations of parametric RAG and provide practical guidance for future retrieval-augmented LLM systems.","short_abstract":"Context-grounded generation underpins many LLM applications, including long-document question answering (QA), conversational personalization, and retrieval-augmented generation (RAG). However, classic token-based context concatenation is costly for long inputs and can be lost in the middle at extreme context lengths. R...","url_abs":"https://arxiv.org/abs/2510.12668","url_pdf":"https://arxiv.org/pdf/2510.12668v2","authors":"[\"Minghao Tang\",\"Shiyu Ni\",\"Jingtong Wu\",\"Zengxin Han\",\"Keping Bi\"]","published":"2025-10-14T16:05:01Z","proceeding":"cs.IR","tasks":"[\"cs.IR\",\"cs.CL\"]","methods":"[\"RAG\",\"Large Language Model\",\"LoRA\"]","has_code":false}
