{"ID":2860395,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.04339","arxiv_id":"2510.04339","title":"Pitch-Conditioned Instrument Sound Synthesis From an Interactive Timbre Latent Space","abstract":"This paper presents a novel approach to neural instrument sound synthesis using a two-stage semi-supervised learning framework capable of generating pitch-accurate, high-quality music samples from an expressive timbre latent space. Existing approaches that achieve sufficient quality for music production often rely on high-dimensional latent representations that are difficult to navigate and provide unintuitive user experiences. We address this limitation through a two-stage training paradigm: first, we train a pitch-timbre disentangled 2D representation of audio samples using a Variational Autoencoder; second, we use this representation as conditioning input for a Transformer-based generative model. The learned 2D latent space serves as an intuitive interface for navigating and exploring the sound landscape. We demonstrate that the proposed method effectively learns a disentangled timbre space, enabling expressive and controllable audio generation with reliable pitch conditioning. Experimental results show the model's ability to capture subtle variations in timbre while maintaining a high degree of pitch accuracy. The usability of our method is demonstrated in an interactive web application, highlighting its potential as a step towards future music production environments that are both intuitive and creatively empowering: https://pgesam.faresschulz.com","short_abstract":"This paper presents a novel approach to neural instrument sound synthesis using a two-stage semi-supervised learning framework capable of generating pitch-accurate, high-quality music samples from an expressive timbre latent space. Existing approaches that achieve sufficient quality for music production often rely on h...","url_abs":"https://arxiv.org/abs/2510.04339","url_pdf":"https://arxiv.org/pdf/2510.04339v1","authors":"[\"Christian Limberg\",\"Fares Schulz\",\"Zhe Zhang\",\"Stefan Weinzierl\"]","published":"2025-10-05T20:03:30Z","proceeding":"cs.SD","tasks":"[\"cs.SD\",\"cs.AI\",\"cs.LG\",\"eess.AS\",\"eess.SP\"]","methods":"[\"Transformer\"]","project_urls":"[\"https://pgesam.faresschulz.com\"]","has_code":false}
