{"ID":2832404,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.05411","arxiv_id":"2512.05411","title":"A Systematic Framework for Enterprise Knowledge Retrieval: Leveraging LLM-Generated Metadata to Enhance RAG Systems","abstract":"In enterprise settings, efficiently retrieving relevant information from large and complex knowledge bases is essential for operational productivity and informed decision-making. This research presents a systematic empirical framework for metadata enrichment using large language models (LLMs) to enhance document retrieval in Retrieval-Augmented Generation (RAG) systems. Our approach employs a structured pipeline that dynamically generates meaningful metadata for document segments, substantially improving their semantic representations and retrieval accuracy. Through a controlled 3 X 3 experimental matrix, we compare three chunking strategies -- semantic, recursive, and naive -- and evaluate their interactions with three embedding techniques -- content-only, TF-IDF weighted, and prefix-fusion -- isolating the contribution of each component through ablation analysis. The results demonstrate that metadata-enriched approaches consistently outperform content-only baselines, with recursive chunking paired with TF-IDF weighted embeddings yielding 82.5% precision and naive chunking with prefix-fusion achieving the strongest ranking quality (NDCG 0.813). Our evaluation employs cross-encoder reranking for silver-standard ground truth generation, with statistical significance confirmed via Bonferroni-corrected paired t-tests. These findings confirm that metadata enrichment improves vector space organization and retrieval effectiveness while maintaining sub-30 ms P95 latency, providing a quantitative decision framework for deploying high-performance, scalable RAG systems in enterprise settings.","short_abstract":"In enterprise settings, efficiently retrieving relevant information from large and complex knowledge bases is essential for operational productivity and informed decision-making. This research presents a systematic empirical framework for metadata enrichment using large language models (LLMs) to enhance document retrie...","url_abs":"https://arxiv.org/abs/2512.05411","url_pdf":"https://arxiv.org/pdf/2512.05411v2","authors":"[\"Pranav Pushkar Mishra\",\"Kranti Prakash Yeole\",\"Ramyashree Keshavamurthy\",\"Mokshit Bharat Surana\",\"Fatemeh Sarayloo\"]","published":"2025-12-05T04:05:06Z","proceeding":"cs.IR","tasks":"[\"cs.IR\",\"cs.AI\"]","methods":"[\"RAG\",\"Large Language Model\",\"Language Model\",\"Generative Adversarial Network\"]","has_code":false}