{"ID":2846101,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.16925","arxiv_id":"2512.16925","title":"V-Agent: An Interactive Video Search System Using Vision-Language Models","abstract":"We introduce V-Agent, a novel multi-agent platform designed for advanced video search and interactive user-system conversations. By fine-tuning a vision-language model (VLM) with a small video preference dataset and enhancing it with a retrieval vector from an image-text retrieval model, we overcome the limitations of traditional text-based retrieval systems in multimodal scenarios. The VLM-based retrieval model independently embeds video frames and audio transcriptions from an automatic speech recognition (ASR) module into a shared multimodal representation space, enabling V-Agent to interpret both visual and spoken content for context-aware video search. This system consists of three agents-a routing agent, a search agent, and a chat agent-that work collaboratively to address user intents by refining search outputs and communicating with users. The search agent utilizes the VLM-based retrieval model together with an additional re-ranking module to further enhance video retrieval quality. Our proposed framework demonstrates state-of-the-art zero-shot performance on the MultiVENT 2.0 benchmark, highlighting its potential for both academic research and real-world applications. The retrieval model and demo videos are available at https://huggingface.co/NCSOFT/multimodal-embedding.","short_abstract":"We introduce V-Agent, a novel multi-agent platform designed for advanced video search and interactive user-system conversations. By fine-tuning a vision-language model (VLM) with a small video preference dataset and enhancing it with a retrieval vector from an image-text retrieval model, we overcome the limitations of...","url_abs":"https://arxiv.org/abs/2512.16925","url_pdf":"https://arxiv.org/pdf/2512.16925v2","authors":"[\"SunYoung Park\",\"Jong-Hyeon Lee\",\"Youngjune Kim\",\"Daegyu Sung\",\"Younghyun Yu\",\"Young-rok Cha\",\"Jeongho Ju\"]","published":"2025-11-04T07:24:45Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\",\"cs.IR\",\"cs.MA\"]","methods":"[\"Language Model\"]","has_code":false}
