Two-stage Training Method For Japanese Electrolaryngeal Speech Enhancement Based On Sequence-to-sequence Voice Conversion
2022 Β· Ding Ma, Lester Phillip Violeta, Kazuhiro Kobayashi, et al.
Abstract
Sequence-to-sequence (seq2seq) voice conversion (VC) models have greater potential in converting electrolaryngeal (EL) speech to normal speech (EL2SP) compared to conventional VC models. However, EL2SP based on seq2seq VC requires a sufficiently large amount of parallel data for the model training and it suffers from significant performance degradation when the amount of training data is insufficient. To address this issue, we suggest a novel, two-stage strategy to optimize the performance on EL2SP based on seq2seq VC when a small amount of the parallel dataset is available. In contrast to utilizing high-quality data augmentations in previous studies, we first combine a large amount of imperfect synthetic parallel data of EL and normal speech, with the original dataset into VC training. Then, a second stage training is conducted with the original parallel dataset only. The results show that the proposed method progressively improves the performance of EL2SP based on seq2seq VC.
Authors
(none)
Tags
Stats
Related papers
- Limited Data Emotional Voice Conversion Leveraging Text-to-speech: Two-stage Sequence-to-sequence Training (2021)10.35
- Non-parallel Sequence-to-sequence Voice Conversion With Disentangled Linguistic And Speaker Representations (2019)14.02
- Atts2s-vc: Sequence-to-sequence Voice Conversion With Attention And Context Preservation Mechanisms (2018)14.15
- Any-to-one Sequence-to-sequence Voice Conversion Using Self-supervised Discrete Speech Representations (2020)0.00
- Convs2s-vc: Fully Convolutional Sequence-to-sequence Voice Conversion (2018)12.68
- Voice Transformer Network: Sequence-to-sequence Voice Conversion Using Transformer With Text-to-speech Pretraining (2019)13.17
- Speech Enhancement-assisted Voice Conversion In Noisy Environments (2021)2.26
- Hierarchical Sequence To Sequence Voice Conversion With Limited Data (2019)0.00