{"ID":2831485,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.07155","arxiv_id":"2512.07155","title":"CHIMERA: Adaptive Cache Injection and Semantic Anchor Prompting for Zero-shot Image Morphing with Morphing-oriented Metrics","abstract":"Recent diffusion-based image morphing methods typically interpolate inverted latents and reuse limited conditioning signals, which often yields unstable intermediates for heterogeneous endpoint pairs. In particular, (i) feature reuse is usually partial or non-adaptive, leading to abrupt structural changes or over-smoothing, and (ii) text conditions are commonly obtained independently per endpoint and then interpolated, which can introduce incompatible semantics. We present CHIMERA, a novel zero-shot diffusion morphing framework that addresses both issues via inversion-guided denoising with complementary feature reuse and text conditioning. ACI caches a broader set of multi-scale diffusion features beyond Key--Value-only reuse during DDIM inversion, and re-injects them with layer- and timestep-aware scheduling to stabilize denoising and enable gradual fusion. Semantic Anchor Prompting (SAP) uses a vision-language model to generate a shared anchor-prompt and anchor-conditioned endpoint prompts, and injects the anchor into cross-attention to improve intermediate semantic coherence. Finally, we propose Global-Local Consistency Score (GLCS), a morphing-oriented metric that jointly captures global domain harmonization and local transition smoothness. Extensive experiments and user study show that CHIMERA produces smoother and more semantically consistent morphs than prior methods, while remaining efficient and applicable across diverse diffusion backbones without retraining. Code and the project page will be released.","short_abstract":"Recent diffusion-based image morphing methods typically interpolate inverted latents and reuse limited conditioning signals, which often yields unstable intermediates for heterogeneous endpoint pairs. In particular, (i) feature reuse is usually partial or non-adaptive, leading to abrupt structural changes or over-smoot...","url_abs":"https://arxiv.org/abs/2512.07155","url_pdf":"https://arxiv.org/pdf/2512.07155v5","authors":"[\"Dahyeon Kye\",\"Jeahun Sung\",\"Minkyu Jeon\",\"Jihyong Oh\"]","published":"2025-12-08T04:39:12Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Diffusion Model\",\"Language Model\"]","has_code":false}
