{"ID":2860747,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.02672","arxiv_id":"2510.02672","title":"STSM-FiLM: A FiLM-Conditioned Neural Architecture for Time-Scale Modification of Speech","abstract":"Time-Scale Modification (TSM) of speech aims to alter the playback rate of audio without changing its pitch. While classical methods like Waveform Similarity-based Overlap-Add (WSOLA) provide strong baselines, they often introduce artifacts under non-stationary or extreme stretching conditions. We propose STSM-FILM - a fully neural architecture that incorporates Feature-Wise Linear Modulation (FiLM) to condition the model on a continuous speed factor. By supervising the network using WSOLA-generated outputs, STSM-FILM learns to mimic alignment and synthesis behaviors while benefiting from representations learned through deep learning. We explore four encoder-decoder variants: STFT-HiFiGAN, WavLM-HiFiGAN, Whisper-HiFiGAN, and EnCodec, and demonstrate that STSM-FILM is capable of producing perceptually consistent outputs across a wide range of time-scaling factors. Overall, our results demonstrate the potential of FiLM-based conditioning to improve the generalization and flexibility of neural TSM models.","short_abstract":"Time-Scale Modification (TSM) of speech aims to alter the playback rate of audio without changing its pitch. While classical methods like Waveform Similarity-based Overlap-Add (WSOLA) provide strong baselines, they often introduce artifacts under non-stationary or extreme stretching conditions. We propose STSM-FILM - a...","url_abs":"https://arxiv.org/abs/2510.02672","url_pdf":"https://arxiv.org/pdf/2510.02672v1","authors":"[\"Dyah A. M. G. Wisnu\",\"Ryandhimas E. Zezario\",\"Stefano Rini\",\"Fo-Rui Li\",\"Yan-Tsung Peng\",\"Hsin-Min Wang\",\"Yu Tsao\"]","published":"2025-10-03T02:09:41Z","proceeding":"eess.AS","tasks":"[\"eess.AS\",\"cs.SD\"]","methods":"[\"Generative Adversarial Network\"]","has_code":false}
