{"ID":2824690,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.23044","arxiv_id":"2512.23044","title":"Video-Browser: Towards Agentic Open-web Video Browsing","abstract":"The evolution of autonomous agents is redefining information seeking, transitioning from passive retrieval to proactive, open-ended web research. However, a significant modality gap remains in processing the web's most dynamic and information-dense modality: video. In this paper, we first formalize the task of Agentic Video Browsing and introduce Video-BrowseComp, a benchmark evaluating open-ended agentic browsing tasks that enforce a mandatory dependency on videos. We observe that current paradigms struggle to reconcile the scale of open-ended video exploration with the need for fine-grained visual verification. Direct visual inference (e.g., RAG) maximizes perception but incurs prohibitive context costs, while text-centric summarization optimizes efficiency but often misses critical visual details required for accurate grounding. To address this, we propose Video-Browser, a novel agent leveraging Pyramidal Perception, filtering with cheap metadata and zooming in with expensive visual perception only when necessary. Experiments demonstrate that our approach achieves a 37.5% relative improvement while reducing token consumption by 58.3% compared to Direct visual inference, establishing a foundation for verifiable open-web video research. We open-source all codes, benchmark at {https://anonymous.4open.science/r/VideoBrowser} and {https://github.com/chrisx599/Video-Browser}.","short_abstract":"The evolution of autonomous agents is redefining information seeking, transitioning from passive retrieval to proactive, open-ended web research. However, a significant modality gap remains in processing the web's most dynamic and information-dense modality: video. In this paper, we first formalize the task of Agentic...","url_abs":"https://arxiv.org/abs/2512.23044","url_pdf":"https://arxiv.org/pdf/2512.23044v2","authors":"[\"Zhengyang Liang\",\"Yan Shu\",\"Xiangrui Liu\",\"Minghao Qin\",\"Kaixin Liang\",\"Nicu Sebe\",\"Zheng Liu\",\"Lizi Liao\"]","published":"2025-12-28T19:08:27Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"LoRA\"]","project_urls":"[\"https://anonymous.4open.science/r/VideoBrowser\"]","has_code":false,"code_links":[{"ID":605605,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2824690,"paper_url":"https://arxiv.org/abs/2512.23044","paper_title":"Video-Browser: Towards Agentic Open-web Video Browsing","repo_url":"https://github.com/chrisx599/Video-Browser","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
