Reimagining Speech: A Scoping Review Of Deep Learning-powered Voice Conversion
2023 Β· Anders R. Bargum, Stefania Serafin, Cumhur Erkut
Abstract
Research on deep learning-powered voice conversion (VC) in speech-to-speech scenarios is getting increasingly popular. Although many of the works in the field of voice conversion share a common global pipeline, there is a considerable diversity in the underlying structures, methods, and neural sub-blocks used across research efforts. Thus, obtaining a comprehensive understanding of the reasons behind the choice of the different methods in the voice conversion pipeline can be challenging, and the actual hurdles in the proposed solutions are often unclear. To shed light on these aspects, this paper presents a scoping review that explores the use of deep learning in speech analysis, synthesis, and disentangled speech representation learning within modern voice conversion systems. We screened 621 publications from more than 38 different venues between the years 2017 and 2023, followed by an in-depth review of a final database consisting of 123 eligible studies. Based on the review, we summ
Authors
(none)
Tags
Stats
Related papers
- An Overview Of Voice Conversion And Its Challenges: From Statistical Modeling To Deep Learning (2020)18.53
- Generative Adversarial Network Based Voice Conversion: Techniques, Challenges, And Recent Advancements (2025)0.00
- The Voice Conversion Challenge 2018: Promoting Development Of Parallel And Nonparallel Methods (2018)17.06
- An Overview & Analysis Of Sequence-to-sequence Emotional Voice Conversion (2022)8.60
- Beyond Voice Identity Conversion: Manipulating Voice Attributes By Adversarial Learning Of Structured Disentangled Representations (2021)0.00
- How Far Are We From Robust Voice Conversion: A Survey (2020)9.41
- A Comparative Study Of Self-supervised Speech Representation Based Voice Conversion (2022)9.76
- Assem-vc: Realistic Voice Conversion By Assembling Modern Speech Synthesis Techniques (2021)11.64