{"ID":2848953,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.24134","arxiv_id":"2510.24134","title":"VC4VG: Optimizing Video Captions for Text-to-Video Generation","abstract":"Recent advances in text-to-video (T2V) generation highlight the critical role of high-quality video-text pairs in training models capable of producing coherent and instruction-aligned videos. However, strategies for optimizing video captions specifically for T2V training remain underexplored. In this paper, we introduce VC4VG (Video Captioning for Video Generation), a comprehensive caption optimization framework tailored to the needs of T2V models. We begin by analyzing caption content from a T2V perspective, decomposing the essential elements required for video reconstruction into multiple dimensions, and proposing a principled caption design methodology. To support evaluation, we construct VC4VG-Bench, a new benchmark featuring fine-grained, multi-dimensional, and necessity-graded metrics aligned with T2V-specific requirements. Extensive T2V fine-tuning experiments demonstrate a strong correlation between improved caption quality and video generation performance, validating the effectiveness of our approach. We release all benchmark tools and code at https://github.com/alimama-creative/VC4VG to support further research.","short_abstract":"Recent advances in text-to-video (T2V) generation highlight the critical role of high-quality video-text pairs in training models capable of producing coherent and instruction-aligned videos. However, strategies for optimizing video captions specifically for T2V training remain underexplored. In this paper, we introduc...","url_abs":"https://arxiv.org/abs/2510.24134","url_pdf":"https://arxiv.org/pdf/2510.24134v2","authors":"[\"Yang Du\",\"Zhuoran Lin\",\"Kaiqiang Song\",\"Biao Wang\",\"Zhicheng Zheng\",\"Tiezheng Ge\",\"Bo Zheng\",\"Qin Jin\"]","published":"2025-10-28T07:19:01Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\",\"cs.CL\"]","methods":"[]","has_code":false,"code_links":[{"ID":607662,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2848953,"paper_url":"https://arxiv.org/abs/2510.24134","paper_title":"VC4VG: Optimizing Video Captions for Text-to-Video Generation","repo_url":"https://github.com/alimama-creative/VC4VG","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}