Convs2s-vc: Fully Convolutional Sequence-to-sequence Voice Conversion
2018 Β· Hirokazu Kameoka, Kou Tanaka, Damian Kwasny, et al.
Abstract
This paper proposes a voice conversion (VC) method using sequence-to-sequence (seq2seq or S2S) learning, which flexibly converts not only the voice characteristics but also the pitch contour and duration of input speech. The proposed method, called ConvS2S-VC, has three key features. First, it uses a model with a fully convolutional architecture. This is particularly advantageous in that it is suitable for parallel computations using GPUs. It is also beneficial since it enables effective normalization techniques such as batch normalization to be used for all the hidden layers in the networks. Second, it achieves many-to-many conversion by simultaneously learning mappings among multiple speakers using only a single model instead of separately learning mappings between each speaker pair using a different model. This enables the model to fully utilize available training data collected from multiple speakers by capturing common latent features that can be shared across different speakers.
Authors
(none)
Tags
Stats
Related papers
- Fasts2s-vc: Streaming Non-autoregressive Sequence-to-sequence Voice Conversion (2021)0.00
- Atts2s-vc: Sequence-to-sequence Voice Conversion With Attention And Context Preservation Mechanisms (2018)14.15
- Voice Reenactment With F0 And Timing Constraints And Adversarial Learning Of Conversions (2021)2.26
- Voice Conversion Using Sequence-to-sequence Learning Of Context Posterior Probabilities (2017)11.39
- S2VC: A Framework For Any-to-any Voice Conversion With Self-supervised Pretrained Representations (2021)12.25
- Any-to-one Sequence-to-sequence Voice Conversion Using Self-supervised Discrete Speech Representations (2020)0.00
- Transfer Learning From Speech Synthesis To Voice Conversion With Non-parallel Training Data (2020)12.74
- Convoice: Real-time Zero-shot Voice Style Transfer With Convolutional Network (2020)0.00