{"ID":2835177,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.00451","arxiv_id":"2512.00451","title":"STCTS: Generative Semantic Compression for Ultra-Low Bitrate Speech via Explicit Text-Prosody-Timbre Decomposition","abstract":"Voice communication in bandwidth-constrained environments--maritime, satellite, and tactical networks--remains prohibitively expensive. Traditional codecs struggle below 1 kbps, while existing semantic approaches (STT-TTS) sacrifice prosody and speaker identity. We present STCTS, a generative semantic compression framework enabling natural voice communication at 80 bps. STCTS explicitly decomposes speech into linguistic content, prosodic expression, and speaker timbre, applying tailored compression: context-aware text encoding (70 bps), sparse prosody transmission via TTS interpolation (\u003c14 bps at 0.1-1 Hz), and amortized speaker embedding. Evaluations on LibriSpeech demonstrate a 75x bitrate reduction versus Opus (6 kbps) and 12x versus EnCodec (1 kbps), while maintaining perceptual quality (NISQA MOS \u003e 4.26), graceful degradation under packet loss and noise resilience. We also discover a bimodal quality distribution with prosody sampling rate: sparse and dense updates both achieve high quality, while mid-range rates degrade due to perceptual discontinuities--guiding optimal configuration design. Beyond efficiency, our modular architecture supports privacy-preserving encryption, human-interpretable transmission, and flexible deployment on edge devices, offering a robust solution for ultra-low bandwidth scenarios.","short_abstract":"Voice communication in bandwidth-constrained environments--maritime, satellite, and tactical networks--remains prohibitively expensive. Traditional codecs struggle below 1 kbps, while existing semantic approaches (STT-TTS) sacrifice prosody and speaker identity. We present STCTS, a generative semantic compression frame...","url_abs":"https://arxiv.org/abs/2512.00451","url_pdf":"https://arxiv.org/pdf/2512.00451v2","authors":"[\"Siyu Wang\",\"Haitao Li\",\"Donglai Zhu\"]","published":"2025-11-29T11:53:15Z","proceeding":"cs.SD","tasks":"[\"cs.SD\",\"cs.MM\"]","methods":"[]","has_code":false}
