Discrete Optimal Transport Is A Strong Audio Adversarial Attack
2025 Β· Anton Selitskiy, Akib Shahriyar, Jishnuraj Prakasan
Abstract
In this paper, we introduce the discrete optimal transport voice conversion (\(k\)DOT-VC) method. Comparison with \(k\)NN-VC, SinkVC, and Gaussian optimal transport (MKL) demonstrates stronger domain adaptation abilities of our method. We use the probabilistic nature of optimal transport (OT) and show that \(k\)DOT-VC is an effective black-box adversarial attack against modern audio anti-spoofing countermeasures (CMs). Our attack operates as a post-processing, distribution-alignment step: frame-level \{WavLM\} embeddings of generated speech are aligned to an unpaired bona fide pool via entropic OT and a top-\(k\) barycentric projection, then decoded with a neural vocoder. Ablation analysis indicates that distribution-level alignment is a powerful and stable attack for deployed CMs.
Authors
(none)
Tags
Stats
Related papers
- Unsupervised Noise Adaptive Speech Enhancement By Discriminator-constrained Optimal Transport (2021)0.00
- Channel Adaptation For Speaker Verification Using Optimal Transport With Pseudo Label (2024)0.00
- Collective Learning Mechanism Based Optimal Transport Generative Adversarial Network For Non-parallel Voice Conversion (2025)0.00
- Neural Domain Alignment For Spoken Language Recognition Based On Optimal Transport (2023)0.00
- Optimal Transport-based Adaptation In Dysarthric Speech Tasks (2021)0.00
- Unsupervised Neural Adaptation Model Based On Optimal Transport For Spoken Language Identification (2020)8.82
- Beyond Voice Identity Conversion: Manipulating Voice Attributes By Adversarial Learning Of Structured Disentangled Representations (2021)0.00
- Discrete Unit Based Masking For Improving Disentanglement In Voice Conversion (2024)0.00