{"ID":2823154,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2601.01294","arxiv_id":"2601.01294","title":"Diffusion Timbre Transfer Via Mutual Information Guided Inpainting","abstract":"We study timbre transfer as an inference-time editing problem for music audio. Starting from a strong pre-trained latent diffusion model, we introduce a lightweight procedure that requires no additional training: (i) a dimension-wise noise injection that targets latent channels most informative of instrument identity, and (ii) an early-step clamping mechanism that re-imposes the input's melodic and rhythmic structure during reverse diffusion. The method operates directly on audio latents and is compatible with text/audio conditioning (e.g., CLAP). We discuss design choices,analyze trade-offs between timbral change and structural preservation, and show that simple inference-time controls can meaningfully steer pre-trained models for style-transfer use cases.","short_abstract":"We study timbre transfer as an inference-time editing problem for music audio. Starting from a strong pre-trained latent diffusion model, we introduce a lightweight procedure that requires no additional training: (i) a dimension-wise noise injection that targets latent channels most informative of instrument identity,...","url_abs":"https://arxiv.org/abs/2601.01294","url_pdf":"https://arxiv.org/pdf/2601.01294v2","authors":"[\"Ching Ho Lee\",\"Javier Nistal\",\"Stefan Lattner\",\"Marco Pasini\",\"George Fazekas\"]","published":"2026-01-03T21:53:35Z","proceeding":"cs.SD","tasks":"[\"cs.SD\",\"cs.AI\",\"eess.AS\"]","methods":"[\"Diffusion Model\"]","has_code":false}
