← all papers Β· overview

Discrete Optimal Transport and Voice Conversion

Abstract

In this work, we address the task of voice conversion (VC) using a vector-based interface. To align audio embeddings across speakers, we employ discrete optimal transport (OT) and approximate the transport map using the barycentric projection. Our evaluation demonstrates that this approach yields high-quality and effective voice conversion. We also perform an ablation study on the number of embeddings used, extending previous work on simple averaging of kNN and OT results. Additionally, we show that applying discrete OT as a post-processing step in audio generation can cause synthetic speech to be misclassified as real, revealing a novel and strong adversarial attack.

Related papers