{"ID":2887649,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.01405","arxiv_id":"2508.01405","title":"Balancing the Blend: An Experimental Analysis of Trade-offs in Hybrid Search","abstract":"Hybrid search, the integration of lexical and semantic retrieval, has become a cornerstone of modern information retrieval systems, driven by demanding applications like Retrieval-Augmented Generation (RAG). The architectural design space for these systems is vast and complex, yet a systematic understanding of the trade-offs among their core components -- retrieval paradigms, combination schemes, and re-ranking methods -- is lacking. To address this, and informed by our experience building the Infinity open-source database, we present the first experimental analysis of advanced hybrid search architectures. Our framework integrates four retrieval paradigms -- Full-Text Search (FTS), Sparse Vector Search (SVS), Dense Vector Search (DVS), and Tensor Search (TenS) -- and evaluates their combinations and re-ranking strategies across 11 real-world datasets. Our results reveal three key findings: (1) A \"weakest link\" phenomenon, where a weak path can substantially degrade overall accuracy, highlighting the need for path-wise quality assessment before fusion. (2) A data-driven map of performance trade-offs, demonstrating that optimal configurations depend heavily on resource constraints and data characteristics, precluding a one-size-fits-all solution. (3) The identification of Tensor-based Re-ranking Fusion (TRF) as a high-efficacy alternative to mainstream fusion methods, offering the semantic power of tensor search at a fraction of the computational and memory cost. Our findings offer concrete guidelines for designing adaptive, scalable hybrid search systems and identify key directions for future research.","short_abstract":"Hybrid search, the integration of lexical and semantic retrieval, has become a cornerstone of modern information retrieval systems, driven by demanding applications like Retrieval-Augmented Generation (RAG). The architectural design space for these systems is vast and complex, yet a systematic understanding of the trad...","url_abs":"https://arxiv.org/abs/2508.01405","url_pdf":"https://arxiv.org/pdf/2508.01405v2","authors":"[\"Mengzhao Wang\",\"Boyu Tan\",\"Yunjun Gao\",\"Hai Jin\",\"Yingfeng Zhang\",\"Xiangyu Ke\",\"Xiaoliang Xu\",\"Yifan Zhu\"]","published":"2025-08-02T15:24:01Z","proceeding":"cs.DB","tasks":"[\"cs.DB\"]","methods":"[\"RAG\"]","has_code":false}
