Svsnet: An End-to-end Speaker Voice Similarity Assessment Model
2021 Β· Cheng-Hung Hu, Yu-Huai Peng, Junichi Yamagishi, et al.
Abstract
Neural evaluation metrics derived for numerous speech generation tasks have recently attracted great attention. In this paper, we propose SVSNet, the first end-to-end neural network model to assess the speaker voice similarity between converted speech and natural speech for voice conversion tasks. Unlike most neural evaluation metrics that use hand-crafted features, SVSNet directly takes the raw waveform as input to more completely utilize speech information for prediction. SVSNet consists of encoder, co-attention, distance calculation, and prediction modules and is trained in an end-to-end manner. The experimental results on the Voice Conversion Challenge 2018 and 2020 (VCC2018 and VCC2020) datasets show that SVSNet outperforms well-known baseline systems in the assessment of speaker similarity at the utterance and system levels.
Authors
(none)
Tags
Stats
Related papers
- Svsnet+: Enhancing Speaker Voice Similarity Assessment Models With Representations From Speech Foundation Models (2024)0.00
- Mosnet: Deep Learning Based Objective Assessment For Voice Conversion (2019)16.90
- The NU Voice Conversion System For The Voice Conversion Challenge 2020: On The Effectiveness Of Sequence-to-sequence Models And Autoregressive Neural Vocoders (2020)3.58
- The Neteasegames System For Voice Conversion Challenge 2020 With Vector-quantization Variational Autoencoder And Wavenet (2020)0.00
- Measuring The Effectiveness Of Voice Conversion On Speaker Identification And Automatic Speech Recognition Systems (2019)0.00
- Sequence-to-sequence Acoustic Modeling For Voice Conversion (2018)14.97
- Vits-based Singing Voice Conversion System With DSPGAN Post-processing For SVCC2023 (2023)5.84
- Predictions Of Subjective Ratings And Spoofing Assessments Of Voice Conversion Challenge 2020 Submissions (2020)5.84