{"ID":2833398,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.03590","arxiv_id":"2512.03590","title":"Beyond Boundary Frames: Context-Centric Video Interpolation with Audio-Visual Semantics","abstract":"Video frame interpolation has long been challenged by limited controllability and interactivity, especially in scenarios involving fast, highly non-linear, and fine-grained motion. Although recent interactive interpolation methods have made progress, they remain largely boundary-centric and ignore auxiliary contextual signals beyond the start and end frames, leading to outputs that deviate from user-intended objectives. To address this issue, we reformulate VFI from a boundary-centric task into a context-centric generation problem. Based on this, we propose BBF (Beyond Boundary Frames), a context-centric video frame interpolation framework with decoupled multimodal conditioning, which jointly exploits endpoint-adjacent visual context, text semantics, and audio-correlated temporal dynamics. To balance endpoint consistency with context-dependent temporal evolution, BBF further introduces a multi-stream context integration mechanism, consisting of endpoint-constraint integration, evolution-prior integration, and temporal-context integration. In addition, BBF adopts a progressive training strategy to stabilize multimodal learning and improve controllable interpolation. Extensive experiments show that BBF outperforms specialized state-of-the-art methods on both generic interpolation and audio-visual synchronized generation tasks, establishing a unified framework for video frame interpolation under coordinated multimodal conditioning. The code, the model, and the interface will be released to facilitate further research.","short_abstract":"Video frame interpolation has long been challenged by limited controllability and interactivity, especially in scenarios involving fast, highly non-linear, and fine-grained motion. Although recent interactive interpolation methods have made progress, they remain largely boundary-centric and ignore auxiliary contextual...","url_abs":"https://arxiv.org/abs/2512.03590","url_pdf":"https://arxiv.org/pdf/2512.03590v2","authors":"[\"Yuchen Deng\",\"Xiuyang Wu\",\"Hai-Tao Zheng\",\"Jie Wang\",\"Feidiao Yang\",\"Yuxing Han\"]","published":"2025-12-03T09:22:13Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false}
