{"ID":2899318,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.01926","arxiv_id":"2507.01926","title":"IC-Custom: Diverse Image Customization via In-Context Learning","abstract":"Image customization, a crucial technique for industrial media production, aims to generate content that is consistent with reference images. However, current approaches conventionally separate image customization into position-aware and position-free customization paradigms and lack a universal framework for diverse customization, limiting their applications across various scenarios. To overcome these limitations, we propose IC-Custom, a unified framework that seamlessly integrates position-aware and position-free image customization through in-context learning. IC-Custom concatenates reference images with target images to a polyptych, leveraging DiT's multi-modal attention mechanism for fine-grained token-level interactions. We propose the In-context Multi-Modal Attention (ICMA) mechanism, which employs learnable task-oriented register tokens and boundary-aware positional embeddings to enable the model to effectively handle diverse tasks and distinguish between inputs in polyptych configurations. To address the data gap, we curated a 12K identity-consistent dataset with 8K real-world and 4K high-quality synthetic samples, avoiding the overly glossy, oversaturated look typical of synthetic data. IC-Custom supports various industrial applications, including try-on, image insertion, and creative IP customization. Extensive evaluations on our proposed ProductBench and the publicly available DreamBench demonstrate that IC-Custom significantly outperforms community workflows, closed-source models, and state-of-the-art open-source approaches. IC-Custom achieves about 73\\% higher human preference across identity consistency, harmony, and text alignment metrics, while training only 0.4\\% of the original model parameters. Project page: https://liyaowei-stu.github.io/project/IC_Custom","short_abstract":"Image customization, a crucial technique for industrial media production, aims to generate content that is consistent with reference images. However, current approaches conventionally separate image customization into position-aware and position-free customization paradigms and lack a universal framework for diverse cu...","url_abs":"https://arxiv.org/abs/2507.01926","url_pdf":"https://arxiv.org/pdf/2507.01926v3","authors":"[\"Yaowei Li\",\"Xiaoyu Li\",\"Zhaoyang Zhang\",\"Yuxuan Bian\",\"Gan Liu\",\"Xinyuan Li\",\"Jiale Xu\",\"Wenbo Hu\",\"Yating Liu\",\"Lingen Li\",\"Jing Cai\",\"Yuexian Zou\",\"Yancheng He\",\"Ying Shan\"]","published":"2025-07-02T17:36:38Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false}
