{"ID":2895151,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.15863","arxiv_id":"2507.15863","title":"eSapiens's DEREK Module: Deep Extraction \u0026 Reasoning Engine for Knowledge with LLMs","abstract":"We present the DEREK (Deep Extraction \u0026 Reasoning Engine for Knowledge) Module, a secure and scalable Retrieval-Augmented Generation pipeline designed specifically for enterprise document question answering. Designed and implemented by eSapiens, the system ingests heterogeneous content (PDF, Office, web), splits it into 1,000-token overlapping chunks, and indexes them in a hybrid HNSW+BM25 store. User queries are refined by GPT-4o, retrieved via combined vector+BM25 search, reranked with Cohere, and answered by an LLM using CO-STAR prompt engineering. A LangGraph verifier enforces citation overlap, regenerating answers until every claim is grounded. On four LegalBench subsets, 1000-token chunks improve Recall@50 by approximately 1 pp and hybrid+rerank boosts Precision@10 by approximately 7 pp; the verifier raises TRACe Utilization above 0.50 and limits unsupported statements to less than 3%. All components run in containers, enforce end-to-end TLS 1.3 and AES-256. These results demonstrate that the DEREK module delivers accurate, traceable, and production-ready document QA with minimal operational overhead. The module is designed to meet enterprise demands for secure, auditable, and context-faithful retrieval, providing a reliable baseline for high-stakes domains such as legal and finance.","short_abstract":"We present the DEREK (Deep Extraction \u0026 Reasoning Engine for Knowledge) Module, a secure and scalable Retrieval-Augmented Generation pipeline designed specifically for enterprise document question answering. Designed and implemented by eSapiens, the system ingests heterogeneous content (PDF, Office, web), splits it int...","url_abs":"https://arxiv.org/abs/2507.15863","url_pdf":"https://arxiv.org/pdf/2507.15863v1","authors":"[\"Isaac Shi\",\"Zeyuan Li\",\"Fan Liu\",\"Wenli Wang\",\"Lewei He\",\"Yang Yang\",\"Tianyu Shi\"]","published":"2025-07-13T05:54:01Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[\"RAG\",\"Large Language Model\"]","has_code":false}
