{"ID":2874793,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.04667","arxiv_id":"2509.04667","title":"DarkStream: real-time speech anonymization with low latency","abstract":"We propose DarkStream, a streaming speech synthesis model for real-time speaker anonymization. To improve content encoding under strict latency constraints, DarkStream combines a causal waveform encoder, a short lookahead buffer, and transformer-based contextual layers. To further reduce inference time, the model generates waveforms directly via a neural vocoder, thus removing intermediate mel-spectrogram conversions. Finally, DarkStream anonymizes speaker identity by injecting a GAN-generated pseudo-speaker embedding into linguistic features from the content encoder. Evaluations show our model achieves strong anonymization, yielding close to 50% speaker verification EER (near-chance performance) on the lazy-informed attack scenario, while maintaining acceptable linguistic intelligibility (WER within 9%). By balancing low-latency, robust privacy, and minimal intelligibility degradation, DarkStream provides a practical solution for privacy-preserving real-time speech communication.","short_abstract":"We propose DarkStream, a streaming speech synthesis model for real-time speaker anonymization. To improve content encoding under strict latency constraints, DarkStream combines a causal waveform encoder, a short lookahead buffer, and transformer-based contextual layers. To further reduce inference time, the model gener...","url_abs":"https://arxiv.org/abs/2509.04667","url_pdf":"https://arxiv.org/pdf/2509.04667v1","authors":"[\"Waris Quamer\",\"Ricardo Gutierrez-Osuna\"]","published":"2025-09-04T21:30:25Z","proceeding":"eess.AS","tasks":"[\"eess.AS\",\"cs.CL\",\"cs.LG\"]","methods":"[\"Transformer\",\"Generative Adversarial Network\"]","has_code":false}
