{"ID":3084824,"CreatedAt":"2026-06-05T06:46:15.197025399Z","UpdatedAt":"2026-06-07T03:38:11.424509713Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.05665","arxiv_id":"2606.05665","title":"V2V-Bench: A Comprehensive Benchmark for Video-to-Video Generation Evaluation","abstract":"Video-to-video (V2V) generation is difficult to evaluate because outputs must both follow editing instructions and preserve frame-level correspondence with the source video, which existing T2V and I2V metrics do not capture. We introduce V2V-Bench, a 11-dimension benchmark organized into five categories: temporal alignment, structural fidelity, transformation quality, video quality, and semantic alignment. V2V-Bench pairs diverse source videos with challenging editing tasks and evaluates two commercial models, Grok Imagine and Gemini Veo3, and one open-source model, Open Sora 2. Results show complementary model strengths: Grok performs better on editing fidelity, while Veo3 achieves stronger visual quality. On six V2V-specific dimensions, V2V-Bench reaches a Spearman correlation of 0.905 with human judgments.","short_abstract":"Video-to-video (V2V) generation is difficult to evaluate because outputs must both follow editing instructions and preserve frame-level correspondence with the source video, which existing T2V and I2V metrics do not capture. We introduce V2V-Bench, a 11-dimension benchmark organized into five categories: temporal align...","url_abs":"https://arxiv.org/abs/2606.05665","url_pdf":"https://arxiv.org/pdf/2606.05665v1","authors":"[\"Tao Liu\",\"Leela Krishna\",\"Gouti Pavan Kumar\",\"Sreeja K\",\"Vishav Garg\"]","published":"2026-06-04T03:48:42Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Generative Adversarial Network\"]","has_code":false}
