{"ID":2884465,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.07086","arxiv_id":"2508.07086","title":"SEF-MK: Speaker-Embedding-Free Voice Anonymization through Multi-k-means Quantization","abstract":"Voice anonymization protects speaker privacy by concealing identity while preserving linguistic and paralinguistic content. Self-supervised learning (SSL) representations encode linguistic features but preserve speaker traits. We propose a novel speaker-embedding-free framework called SEF-MK. Instead of using a single k-means model trained on the entire dataset, SEF-MK anonymizes SSL representations for each utterance by randomly selecting one of multiple k-means models, each trained on a different subset of speakers. We explore this approach from both attacker and user perspectives. Extensive experiments show that, compared to a single k-means model, SEF-MK with multiple k-means models better preserves linguistic and emotional content from the user's viewpoint. However, from the attacker's perspective, utilizing multiple k-means models boosts the effectiveness of privacy attacks. These insights can aid users in designing voice anonymization systems to mitigate attacker threats.","short_abstract":"Voice anonymization protects speaker privacy by concealing identity while preserving linguistic and paralinguistic content. Self-supervised learning (SSL) representations encode linguistic features but preserve speaker traits. We propose a novel speaker-embedding-free framework called SEF-MK. Instead of using a single...","url_abs":"https://arxiv.org/abs/2508.07086","url_pdf":"https://arxiv.org/pdf/2508.07086v2","authors":"[\"Beilong Tang\",\"Xiaoxiao Miao\",\"Xin Wang\",\"Ming Li\"]","published":"2025-08-09T19:47:34Z","proceeding":"cs.SD","tasks":"[\"cs.SD\",\"cs.LG\",\"eess.AS\"]","methods":"[]","has_code":false}