{"ID":2855537,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.12126","arxiv_id":"2510.12126","title":"MetaCaptioner: Towards Generalist Visual Captioning with Open-source Suites","abstract":"Generalist visual captioning goes beyond a simple appearance description task, but requires integrating a series of visual cues into a caption and handling various visual domains. In this task, current open-source models present a large performance gap with commercial ones, which limits various applications such as data synthesis. To bridge the gap, this paper proposes CapFlow, a novel multi-agent collaboration workflow. CapFlow demonstrates for the first time that, by capitalizing on open-source models, it is possible to achieve caption quality on par with GPT-4.1 in various domains with an 89.5% reduction in costs. By leveraging CapFlow as the data synthesizer, we produce high-quality visual captions from image and video domains at scale, and obtain a generalist visual captioner via fine-tuning, namely MetaCaptioner. Through extensive experiments, we show that MetaCaptioner not only achieves comparable captioning capabilities with commercial models but also reaches top-tier multimodal performance in the open-source community. We hope CapFlow and MetaCaptioner can benefit future multimodal research by providing a strong and cost-effective visual captioning solution.","short_abstract":"Generalist visual captioning goes beyond a simple appearance description task, but requires integrating a series of visual cues into a caption and handling various visual domains. In this task, current open-source models present a large performance gap with commercial ones, which limits various applications such as dat...","url_abs":"https://arxiv.org/abs/2510.12126","url_pdf":"https://arxiv.org/pdf/2510.12126v3","authors":"[\"Zhenxin Lei\",\"Zhangwei Gao\",\"Changyao Tian\",\"Erfei Cui\",\"Guanzhou Chen\",\"Danni Yang\",\"Yuchen Duan\",\"Zhaokai Wang\",\"Wenhao Li\",\"Weiyun Wang\",\"Xiangyu Zhao\",\"Jiayi Ji\",\"Yu Qiao\",\"Wenhai Wang\",\"Gen Luo\"]","published":"2025-10-14T04:03:25Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false}
