{"ID":2866735,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.20360","arxiv_id":"2509.20360","title":"EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning","abstract":"Recent advances in foundation models highlight a clear trend toward unification and scaling, showing emergent capabilities across diverse domains. While image generation and editing have rapidly transitioned from task-specific to unified frameworks, video generation and editing remain fragmented due to architectural limitations and data scarcity. In this work, we introduce EditVerse, a unified framework for image and video generation and editing within a single model. By representing all modalities, i.e., text, image, and video, as a unified token sequence, EditVerse leverages self-attention to achieve robust in-context learning, natural cross-modal knowledge transfer, and flexible handling of inputs and outputs with arbitrary resolutions and durations. To address the lack of video editing training data, we design a scalable data pipeline that curates 232K video editing samples and combines them with large-scale image and video datasets for joint training. Furthermore, we present EditVerseBench, the first benchmark for instruction-based video editing covering diverse tasks and resolutions. Extensive experiments and user studies demonstrate that EditVerse achieves state-of-the-art performance, surpassing existing open-source and commercial models, while exhibiting emergent editing and generation abilities across modalities.","short_abstract":"Recent advances in foundation models highlight a clear trend toward unification and scaling, showing emergent capabilities across diverse domains. While image generation and editing have rapidly transitioned from task-specific to unified frameworks, video generation and editing remain fragmented due to architectural li...","url_abs":"https://arxiv.org/abs/2509.20360","url_pdf":"https://arxiv.org/pdf/2509.20360v3","authors":"[\"Xuan Ju\",\"Tianyu Wang\",\"Yuqian Zhou\",\"He Zhang\",\"Qing Liu\",\"Nanxuan Zhao\",\"Zhifei Zhang\",\"Yijun Li\",\"Yuanhao Cai\",\"Shaoteng Liu\",\"Daniil Pakhomov\",\"Zhe Lin\",\"Soo Ye Kim\",\"Qiang Xu\"]","published":"2025-09-24T17:59:30Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false}
