{"ID":2823222,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2603.04402","arxiv_id":"2603.04402","title":"SearchGym: A Modular Infrastructure for Cross-Platform Benchmarking and Hybrid Search Orchestration","abstract":"The rapid growth of Retrieval-Augmented Generation (RAG) has created a proliferation of toolkits, yet a fundamental gap remains between experimental prototypes and robust, production-ready systems. We present SearchGym, a modular infrastructure designed for cross-platform benchmarking and hybrid search orchestration. Unlike existing model-centric frameworks, SearchGym decouples data representation, embedding strategies, and retrieval logic into stateful abstractions: Dataset, VectorSet, and App. This separation enables a Compositional Config Algebra, allowing designers to synthesize entire systems from hierarchical configurations while ensuring perfect reproducibility. Moreover, we analyze the \"Top-$k$ Cognizance\" in hybrid retrieval pipelines, demonstrating that the optimal sequence of semantic ranking and structured filtering is highly dependent on filter strength. Evaluated on the LitSearch expert-annotated benchmark, SearchGym achieves a 70% Top-100 retrieval rate. SearchGym reveals a design tension between generalizability and optimizability, presenting the potential where engineering optimization may serve as a tool for uncovering the causal mechanisms inherent in information retrieval across heterogeneous domains. An open-source implementation of SearchGym is available at: https://github.com/JeromeTH/search-gym","short_abstract":"The rapid growth of Retrieval-Augmented Generation (RAG) has created a proliferation of toolkits, yet a fundamental gap remains between experimental prototypes and robust, production-ready systems. We present SearchGym, a modular infrastructure designed for cross-platform benchmarking and hybrid search orchestration. U...","url_abs":"https://arxiv.org/abs/2603.04402","url_pdf":"https://arxiv.org/pdf/2603.04402v1","authors":"[\"Jerome Tze-Hou Hsu\"]","published":"2026-01-02T07:19:25Z","proceeding":"cs.IR","tasks":"[\"cs.IR\",\"cs.CL\"]","methods":"[\"RAG\"]","has_code":false,"code_links":[{"ID":605482,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2823222,"paper_url":"https://arxiv.org/abs/2603.04402","paper_title":"SearchGym: A Modular Infrastructure for Cross-Platform Benchmarking and Hybrid Search Orchestration","repo_url":"https://github.com/JeromeTH/search-gym","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
