{"ID":2850254,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.22215","arxiv_id":"2510.22215","title":"Hybrid-Vector Retrieval for Visually Rich Documents: Combining Single-Vector Efficiency and Multi-Vector Accuracy","abstract":"Retrieval over visually rich documents is essential for tasks such as legal discovery, scientific search, and enterprise knowledge management. Existing approaches fall into two paradigms: single-vector retrieval, which is efficient but coarse, and multi-vector retrieval, which is accurate but computationally expensive. To address this trade-off, we propose HEAVEN, a plug-and-play two-stage hybrid-vector framework. In the first stage, HEAVEN efficiently retrieves candidate pages using a single-vector method over Visually-Summarized Pages (VS-Pages), which assemble representative visual layouts from multiple pages. In the second stage, it reranks candidates with a multi-vector method while filtering query tokens by linguistic importance to reduce redundant computations. To evaluate retrieval systems under realistic conditions, we also introduce ViMDoc, a benchmark for visually rich, multi-document, and long-document retrieval. Across four benchmarks, HEAVEN attains 99.87% of the Recall@1 performance of multi-vector models on average while reducing per-query computation by 99.82%, achieving efficiency and accuracy. Our code and datasets are available at: https://github.com/juyeonnn/HEAVEN","short_abstract":"Retrieval over visually rich documents is essential for tasks such as legal discovery, scientific search, and enterprise knowledge management. Existing approaches fall into two paradigms: single-vector retrieval, which is efficient but coarse, and multi-vector retrieval, which is accurate but computationally expensive....","url_abs":"https://arxiv.org/abs/2510.22215","url_pdf":"https://arxiv.org/pdf/2510.22215v2","authors":"[\"Juyeon Kim\",\"Geon Lee\",\"Dongwon Choi\",\"Taeuk Kim\",\"Kijung Shin\"]","published":"2025-10-25T08:27:37Z","proceeding":"cs.IR","tasks":"[\"cs.IR\",\"cs.CV\"]","methods":"[]","has_code":false,"code_links":[{"ID":607780,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2850254,"paper_url":"https://arxiv.org/abs/2510.22215","paper_title":"Hybrid-Vector Retrieval for Visually Rich Documents: Combining Single-Vector Efficiency and Multi-Vector Accuracy","repo_url":"https://github.com/juyeonnn/HEAVEN","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
