{"ID":2838494,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.17044","arxiv_id":"2511.17044","title":"Parametric Retrieval-Augmented Generation using Latent Routing of LoRA Adapters","abstract":"Parametric Retrieval-Augmented Generation (PRAG) is a RAG approach that integrates external knowledge directly into model parameters using a LoRA adapter, aiming at reducing the inference cost compared to traditional RAG. However, current PRAG approaches adopt a \\textit{one-to-one} document encoding scheme, using a dedicated LoRA adapter for each individual document. This scheme introduces two major limitations: 1) As the number of documents increases, there will be a prohibitive cost for training and storage. 2) The LoRA adapters may largely overlap due to the shared knowledge across documents, making the approach highly inefficient. To overcome these challenges, we propose the Poly-PRAG approach, which uses a small set of LoRA adapters that are able to encode more general knowledge. Each document can be encoded using a combination of them through a latent routing function. By jointly training the LoRA adapters and the latent routing function, each LoRA adapter is able to encode a shared part of the knowledge across documents, and the routing function can select the best combination of adapters for a document. Experimental results on four benchmarks demonstrate the effectiveness of the Poly-PRAG compared to other strong PRAG baselines. In addition, this approach reduces the storage requirement by avoiding the need to store a large number of LoRA adapters and offers a more efficient way to encode external knowledge into LLMs.","short_abstract":"Parametric Retrieval-Augmented Generation (PRAG) is a RAG approach that integrates external knowledge directly into model parameters using a LoRA adapter, aiming at reducing the inference cost compared to traditional RAG. However, current PRAG approaches adopt a \\textit{one-to-one} document encoding scheme, using a ded...","url_abs":"https://arxiv.org/abs/2511.17044","url_pdf":"https://arxiv.org/pdf/2511.17044v2","authors":"[\"Zhan Su\",\"Fengran Mo\",\"Jinghan Zhang\",\"Yuchen Hui\",\"Jiaao Sun\",\"Jian-yun Nie\"]","published":"2025-11-21T08:44:21Z","proceeding":"cs.IR","tasks":"[\"cs.IR\"]","methods":"[\"RAG\",\"Large Language Model\",\"LoRA\"]","has_code":false}