{"ID":2830237,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.10571","arxiv_id":"2512.10571","title":"AVI-Edit: Audio-sync Video Instance Editing with Granularity-Aware Mask Refiner","abstract":"Recent advancements in video generation highlight that realistic audio-visual synchronization is crucial for engaging content creation. However, existing video editing methods largely overlook audio-visual synchronization and lack the fine-grained spatial and temporal controllability required for precise instance-level edits. In this paper, we propose AVI-Edit, a framework for audio-sync video instance editing. We propose a granularity-aware mask refiner that iteratively refines coarse user-provided masks into precise instance-level regions. We further design a self-feedback audio agent to curate high-quality audio guidance, providing fine-grained temporal control. To facilitate this task, we additionally construct a large-scale dataset with instance-centric correspondence and comprehensive annotations. Extensive experiments demonstrate that AVI-Edit outperforms state-of-the-art methods in visual quality, condition following, and audio-visual synchronization. Project page: https://hjzheng.net/projects/AVI-Edit/.","short_abstract":"Recent advancements in video generation highlight that realistic audio-visual synchronization is crucial for engaging content creation. However, existing video editing methods largely overlook audio-visual synchronization and lack the fine-grained spatial and temporal controllability required for precise instance-level...","url_abs":"https://arxiv.org/abs/2512.10571","url_pdf":"https://arxiv.org/pdf/2512.10571v4","authors":"[\"Haojie Zheng\",\"Shuchen Weng\",\"Jingqi Liu\",\"Siqi Yang\",\"Boxin Shi\",\"Xinlong Wang\"]","published":"2025-12-11T11:58:53Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","project_urls":"[\"https://hjzheng.net/projects/AVI-Edit/\"]","has_code":false,"code_links":[{"ID":606012,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2830237,"paper_url":"https://arxiv.org/abs/2512.10571","paper_title":"AVI-Edit: Audio-sync Video Instance Editing with Granularity-Aware Mask Refiner","repo_url":"https://github.com/suimuc/AVI-Edit-Framework","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
