{"ID":2873354,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.06499","arxiv_id":"2509.06499","title":"TIDE: Achieving Balanced Subject-Driven Image Generation via Target-Instructed Diffusion Enhancement","abstract":"Subject-driven image generation (SDIG) aims to manipulate specific subjects within images while adhering to textual instructions, a task crucial for advancing text-to-image diffusion models. SDIG requires reconciling the tension between maintaining subject identity and complying with dynamic edit instructions, a challenge inadequately addressed by existing methods. In this paper, we introduce the Target-Instructed Diffusion Enhancing (TIDE) framework, which resolves this tension through target supervision and preference learning without test-time fine-tuning. TIDE pioneers target-supervised triplet alignment, modelling subject adaptation dynamics using a (reference image, instruction, target images) triplet. This approach leverages the Direct Subject Diffusion (DSD) objective, training the model with paired \"winning\" (balanced preservation-compliance) and \"losing\" (distorted) targets, systematically generated and evaluated via quantitative metrics. This enables implicit reward modelling for optimal preservation-compliance balance. Experimental results on standard benchmarks demonstrate TIDE's superior performance in generating subject-faithful outputs while maintaining instruction compliance, outperforming baseline methods across multiple quantitative metrics. TIDE's versatility is further evidenced by its successful application to diverse tasks, including structural-conditioned generation, image-to-image generation, and text-image interpolation. Our code is available at https://github.com/KomJay520/TIDE.","short_abstract":"Subject-driven image generation (SDIG) aims to manipulate specific subjects within images while adhering to textual instructions, a task crucial for advancing text-to-image diffusion models. SDIG requires reconciling the tension between maintaining subject identity and complying with dynamic edit instructions, a challe...","url_abs":"https://arxiv.org/abs/2509.06499","url_pdf":"https://arxiv.org/pdf/2509.06499v2","authors":"[\"Jibai Lin\",\"Bo Ma\",\"Yating Yang\",\"Xi Zhou\",\"Rong Ma\",\"Turghun Osman\",\"Ahtamjan Ahmat\",\"Rui Dong\",\"Lei Wang\"]","published":"2025-09-08T10:06:37Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Diffusion Model\"]","has_code":false,"code_links":[{"ID":610048,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2873354,"paper_url":"https://arxiv.org/abs/2509.06499","paper_title":"TIDE: Achieving Balanced Subject-Driven Image Generation via Target-Instructed Diffusion Enhancement","repo_url":"https://github.com/KomJay520/TIDE","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}