{"ID":2840004,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.16700","arxiv_id":"2511.16700","title":"RAG-Driven Data Quality Governance for Enterprise ERP Systems","abstract":"Enterprise ERP systems managing hundreds of thousands of employee records face critical data quality challenges when human resources departments perform decentralized manual entry across multiple languages. We present an end-to-end pipeline combining automated data cleaning with LLM-driven SQL query generation, deployed on a production system managing 240,000 employee records over six months. The system operates in two integrated stages: a multi-stage cleaning pipeline that performs translation normalization, spelling correction, and entity deduplication during periodic synchronization from Microsoft SQL Server to PostgreSQL; and a retrieval-augmented generation framework powered by GPT-4o that translates natural-language questions in Turkish, Russian, and English into validated SQL queries. The query engine employs LangChain orchestration, FAISS vector similarity search, and few-shot learning with 500+ validated examples. Our evaluation demonstrates 92.5% query validity, 95.1% schema compliance, and 90.7\\% semantic accuracy on 2,847 production queries. The system reduces query turnaround time from 2.3 days to under 5 seconds while maintaining 99.2% uptime, with GPT-4o achieving 46% lower latency and 68% cost reduction versus GPT-3.5. This modular architecture provides a reproducible framework for AI-native enterprise data governance, demonstrating real-world viability at enterprise scale with 4.3/5.0 user satisfaction.","short_abstract":"Enterprise ERP systems managing hundreds of thousands of employee records face critical data quality challenges when human resources departments perform decentralized manual entry across multiple languages. We present an end-to-end pipeline combining automated data cleaning with LLM-driven SQL query generation, deploye...","url_abs":"https://arxiv.org/abs/2511.16700","url_pdf":"https://arxiv.org/pdf/2511.16700v1","authors":"[\"Sedat Bin Vedat\",\"Enes Kutay Yarkan\",\"Meftun Akarsu\",\"Recep Kaan Karaman\",\"Arda Sar\",\"Çağrı Çelikbilek\",\"Savaş Saygılı\"]","published":"2025-11-18T12:08:44Z","proceeding":"cs.DB","tasks":"[\"cs.DB\",\"cs.AI\"]","methods":"[\"RAG\",\"Large Language Model\"]","has_code":false}
