Fasts2s-vc: Streaming Non-autoregressive Sequence-to-sequence Voice Conversion
2021 Β· Hirokazu Kameoka, Kou Tanaka, Takuhiro Kaneko
Abstract
This paper proposes a non-autoregressive extension of our previously proposed sequence-to-sequence (S2S) model-based voice conversion (VC) methods. S2S model-based VC methods have attracted particular attention in recent years for their flexibility in converting not only the voice identity but also the pitch contour and local duration of input speech, thanks to the ability of the encoder-decoder architecture with the attention mechanism. However, one of the obstacles to making these methods work in real-time is the autoregressive (AR) structure. To overcome this obstacle, we develop a method to obtain a model that is free from an AR structure and behaves similarly to the original S2S models, based on a teacher-student learning framework. In our method, called "FastS2S-VC", the student model consists of encoder, decoder, and attention predictor. The attention predictor learns to predict attention distributions solely from source speech along with a target class index with the guidance o
Authors
(none)
Tags
Stats
Related papers
- Convs2s-vc: Fully Convolutional Sequence-to-sequence Voice Conversion (2018)12.68
- Atts2s-vc: Sequence-to-sequence Voice Conversion With Attention And Context Preservation Mechanisms (2018)14.15
- Fastvc: Fast Voice Conversion With Non-parallel Data (2020)5.24
- Voice Reenactment With F0 And Timing Constraints And Adversarial Learning Of Conversions (2021)2.26
- AAS-VC: On The Generalization Ability Of Automatic Alignment Search Based Non-autoregressive Sequence-to-sequence Voice Conversion (2023)0.00
- S2VC: A Framework For Any-to-any Voice Conversion With Self-supervised Pretrained Representations (2021)12.25
- Fastsvc: Fast Cross-domain Singing Voice Conversion With Feature-wise Linear Modulation (2020)11.08
- Any-to-one Sequence-to-sequence Voice Conversion Using Self-supervised Discrete Speech Representations (2020)0.00