Svsnet+: Enhancing Speaker Voice Similarity Assessment Models With Representations From Speech Foundation Models
2024 Β· Chun Yin, Tai-Shih Chi, Yu Tsao, et al.
Abstract
Representations from pre-trained speech foundation models (SFMs) have shown impressive performance in many downstream tasks. However, the potential benefits of incorporating pre-trained SFM representations into speaker voice similarity assessment have not been thoroughly investigated. In this paper, we propose SVSNet+, a model that integrates pre-trained SFM representations to improve performance in assessing speaker voice similarity. Experimental results on the Voice Conversion Challenge 2018 and 2020 datasets show that SVSNet+ incorporating WavLM representations shows significant improvements compared to baseline models. In addition, while fine-tuning WavLM with a small dataset of the downstream task does not improve performance, using the same dataset to learn a weighted-sum representation of WavLM can substantially improve performance. Furthermore, when WavLM is replaced by other SFMs, SVSNet+ still outperforms the baseline models and exhibits strong generalization ability.
Authors
(none)
Tags
Stats
Related papers
- Svsnet: An End-to-end Speaker Voice Similarity Assessment Model (2021)6.34
- Leveraging Diverse Semantic-based Audio Pretrained Models For Singing Voice Conversion (2023)0.00
- Visinger2+: End-to-end Singing Voice Synthesis Augmented By Self-supervised Learning Representation (2024)4.52
- Adapting Speech Foundation Models For Unified Multimodal Speech Recognition With Large Language Models (2025)0.00
- Audio-visual Representation Learning Via Knowledge Distillation From Speech Foundation Models (2025)7.81
- Mvnet: Memory Assistance And Vocal Reinforcement Network For Speech Enhancement (2022)0.00
- XWSB: A Blend System Utilizing XLS-R And Wavlm With SLS Classifier Detection System For SVDD 2024 Challenge (2024)4.52
- Towards Supervised Performance On Speaker Verification With Self-supervised Learning By Leveraging Large-scale ASR Models (2024)7.50