{"ID":2869160,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.14638","arxiv_id":"2509.14638","title":"MultiEdit: Advancing Instruction-based Image Editing on Diverse and Challenging Tasks","abstract":"Current instruction-based image editing (IBIE) methods struggle with challenging editing tasks, as both editing types and sample counts of existing datasets are limited. Moreover, traditional dataset construction often contains noisy image-caption pairs, which may introduce biases and limit model capabilities in complex editing scenarios. To address these limitations, we introduce MultiEdit, a comprehensive dataset featuring over 107K high-quality image editing samples. It encompasses 6 challenging editing tasks through a diverse collection of 18 non-style-transfer editing types and 38 style transfer operations, covering a spectrum from sophisticated style transfer to complex semantic operations like person reference editing and in-image text editing. We employ a novel dataset construction pipeline that utilizes two multi-modal large language models (MLLMs) to generate visual-adaptive editing instructions and produce high-fidelity edited images, respectively. Extensive experiments demonstrate that fine-tuning foundational open-source models with our MultiEdit-Train set substantially improves models' performance on sophisticated editing tasks in our proposed MultiEdit-Test benchmark, while effectively preserving their capabilities on the standard editing benchmark. We believe MultiEdit provides a valuable resource for advancing research into more diverse and challenging IBIE capabilities. Our dataset is available at https://huggingface.co/datasets/inclusionAI/MultiEdit.","short_abstract":"Current instruction-based image editing (IBIE) methods struggle with challenging editing tasks, as both editing types and sample counts of existing datasets are limited. Moreover, traditional dataset construction often contains noisy image-caption pairs, which may introduce biases and limit model capabilities in comple...","url_abs":"https://arxiv.org/abs/2509.14638","url_pdf":"https://arxiv.org/pdf/2509.14638v1","authors":"[\"Mingsong Li\",\"Lin Liu\",\"Hongjun Wang\",\"Haoxing Chen\",\"Xijun Gu\",\"Shizhan Liu\",\"Dong Gong\",\"Junbo Zhao\",\"Zhenzhong Lan\",\"Jianguo Li\"]","published":"2025-09-18T05:33:38Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}