{"ID":2869354,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.14959","arxiv_id":"2509.14959","title":"Discrete optimal transport is a strong audio adversarial attack","abstract":"In this paper, we introduce the discrete optimal transport voice conversion ($k$DOT-VC) method. Comparison with $k$NN-VC, SinkVC, and Gaussian optimal transport (MKL) demonstrates stronger domain adaptation abilities of our method. We use the probabilistic nature of optimal transport (OT) and show that $k$DOT-VC is an effective black-box adversarial attack against modern audio anti-spoofing countermeasures (CMs). Our attack operates as a post-processing, distribution-alignment step: frame-level {WavLM} embeddings of generated speech are aligned to an unpaired bona fide pool via entropic OT and a top-$k$ barycentric projection, then decoded with a neural vocoder. Ablation analysis indicates that distribution-level alignment is a powerful and stable attack for deployed CMs.","short_abstract":"In this paper, we introduce the discrete optimal transport voice conversion ($k$DOT-VC) method. Comparison with $k$NN-VC, SinkVC, and Gaussian optimal transport (MKL) demonstrates stronger domain adaptation abilities of our method. We use the probabilistic nature of optimal transport (OT) and show that $k$DOT-VC is an...","url_abs":"https://arxiv.org/abs/2509.14959","url_pdf":"https://arxiv.org/pdf/2509.14959v2","authors":"[\"Anton Selitskiy\",\"Akib Shahriyar\",\"Jishnuraj Prakasan\"]","published":"2025-09-18T13:46:16Z","proceeding":"eess.AS","tasks":"[\"eess.AS\",\"cs.AI\"]","methods":"[]","has_code":false}
