Atts2s-vc: Sequence-to-sequence Voice Conversion With Attention And Context Preservation Mechanisms
2018 Β· Kou Tanaka, Hirokazu Kameoka, Takuhiro Kaneko, et al.
Abstract
This paper describes a method based on a sequence-to-sequence learning (Seq2Seq) with attention and context preservation mechanism for voice conversion (VC) tasks. Seq2Seq has been outstanding at numerous tasks involving sequence modeling such as speech synthesis and recognition, machine translation, and image captioning. In contrast to current VC techniques, our method 1) stabilizes and accelerates the training procedure by considering guided attention and proposed context preservation losses, 2) allows not only spectral envelopes but also fundamental frequency contours and durations of speech to be converted, 3) requires no context information such as phoneme labels, and 4) requires no time-aligned source and target speech data in advance. In our experiment, the proposed VC framework can be trained in only one day, using only one GPU of an NVIDIA Tesla K80, while the quality of the synthesized speech is higher than that of speech converted by Gaussian mixture model-based VC and is co
Authors
(none)
Tags
Stats
Related papers
- Convs2s-vc: Fully Convolutional Sequence-to-sequence Voice Conversion (2018)12.68
- Fasts2s-vc: Streaming Non-autoregressive Sequence-to-sequence Voice Conversion (2021)0.00
- Voice Conversion Using Sequence-to-sequence Learning Of Context Posterior Probabilities (2017)11.39
- Any-to-one Sequence-to-sequence Voice Conversion Using Self-supervised Discrete Speech Representations (2020)0.00
- Voice Transformer Network: Sequence-to-sequence Voice Conversion Using Transformer With Text-to-speech Pretraining (2019)13.17
- AAS-VC: On The Generalization Ability Of Automatic Alignment Search Based Non-autoregressive Sequence-to-sequence Voice Conversion (2023)0.00
- Fastvc: Fast Voice Conversion With Non-parallel Data (2020)5.24
- Towards General-purpose Text-instruction-guided Voice Conversion (2023)0.00