{"ID":2827808,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2602.22224","arxiv_id":"2602.22224","title":"DS SERVE: A Framework for Efficient and Scalable Neural Retrieval","abstract":"We present DS-Serve, a framework that transforms large-scale text datasets, comprising half a trillion tokens, into a high-performance neural retrieval system. DS-Serve offers both a web interface and API endpoints, achieving low latency with modest memory overhead on a single node. The framework also supports inference-time trade-offs between latency, accuracy, and result diversity. We anticipate that DS-Serve will be broadly useful for a range of applications, including large-scale retrieval-augmented generation (RAG), training data attribution, training search agents, and beyond.","short_abstract":"We present DS-Serve, a framework that transforms large-scale text datasets, comprising half a trillion tokens, into a high-performance neural retrieval system. DS-Serve offers both a web interface and API endpoints, achieving low latency with modest memory overhead on a single node. The framework also supports inferenc...","url_abs":"https://arxiv.org/abs/2602.22224","url_pdf":"https://arxiv.org/pdf/2602.22224v1","authors":"[\"Jinjian Liu\",\"Yichuan Wang\",\"Xinxi Lyu\",\"Rulin Shao\",\"Joseph E. Gonzalez\",\"Matei Zaharia\",\"Sewon Min\"]","published":"2025-12-17T00:43:10Z","proceeding":"cs.IR","tasks":"[\"cs.IR\",\"cs.AI\",\"cs.CL\"]","methods":"[\"RAG\"]","has_code":false}