{"ID":2840812,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.13897","arxiv_id":"2511.13897","title":"Temporal Realism Evaluation of Generated Videos Using Compressed-Domain Motion Vectors","abstract":"Temporal realism remains a central weakness of current generative video models, as most evaluation metrics prioritize spatial appearance and offer limited sensitivity to motion. We introduce a scalable, model-agnostic framework that assesses temporal behavior using motion vectors (MVs) extracted directly from compressed video streams. Codec-generated MVs from standards such as H.264 and HEVC provide lightweight, resolution-consistent descriptors of motion dynamics. We quantify realism by computing Kullback-Leibler, Jensen-Shannon, and Wasserstein divergences between MV statistics of real and generated videos. Experiments on the GenVidBench dataset containing videos from eight state-of-the-art generators reveal systematic discrepancies from real motion: entropy-based divergences rank Pika and SVD as closest to real videos, MV-sum statistics favor VC2 and Text2Video-Zero, and CogVideo shows the largest deviations across both measures. Visualizations of MV fields and class-conditional motion heatmaps further reveal center bias, sparse and piecewise constant flows, and grid-like artifacts that frame-level metrics do not capture. Beyond evaluation, we investigate MV-RGB fusion through channel concatenation, cross-attention, joint embedding, and a motion-aware fusion module. Incorporating MVs improves downstream classification across ResNet, I3D, and TSN backbones, with ResNet-18 and ResNet-34 reaching up to 97.4% accuracy and I3D achieving 99.0% accuracy on real-versus-generated discrimination. These findings demonstrate that compressed-domain MVs provide an effective temporal signal for diagnosing motion defects in generative videos and for strengthening temporal reasoning in discriminative models. The implementation is available at: https://github.com/KurbanIntelligenceLab/Motion-Vector-Learning","short_abstract":"Temporal realism remains a central weakness of current generative video models, as most evaluation metrics prioritize spatial appearance and offer limited sensitivity to motion. We introduce a scalable, model-agnostic framework that assesses temporal behavior using motion vectors (MVs) extracted directly from compresse...","url_abs":"https://arxiv.org/abs/2511.13897","url_pdf":"https://arxiv.org/pdf/2511.13897v1","authors":"[\"Mert Onur Cakiroglu\",\"Idil Bilge Altun\",\"Zhihe Lu\",\"Mehmet Dalkilic\",\"Hasan Kurban\"]","published":"2025-11-17T20:47:06Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false,"code_links":[{"ID":607005,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2840812,"paper_url":"https://arxiv.org/abs/2511.13897","paper_title":"Temporal Realism Evaluation of Generated Videos Using Compressed-Domain Motion Vectors","repo_url":"https://github.com/KurbanIntelligenceLab/Motion-Vector-Learning","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}