Disentangleing Content And Fine-grained Prosody Information Via Hybrid ASR Bottleneck Features For Voice Conversion
2022 Β· Xintao Zhao, Feng Liu, Changhe Song, et al.
Abstract
Non-parallel data voice conversion (VC) have achieved considerable breakthroughs recently through introducing bottleneck features (BNFs) extracted by the automatic speech recognition(ASR) model. However, selection of BNFs have a significant impact on VC result. For example, when extracting BNFs from ASR trained with Cross Entropy loss (CE-BNFs) and feeding into neural network to train a VC system, the timbre similarity of converted speech is significantly degraded. If BNFs are extracted from ASR trained using Connectionist Temporal Classification loss (CTC-BNFs), the naturalness of the converted speech may decrease. This phenomenon is caused by the difference of information contained in BNFs. In this paper, we proposed an any-to-one VC method using hybrid bottleneck features extracted from CTC-BNFs and CE-BNFs to complement each other advantages. Gradient reversal layer and instance normalization were used to extract prosody information from CE-BNFs and content information from CTC-BNF
Authors
(none)
Tags
Stats
Related papers
- Expressive-vc: Highly Expressive Voice Conversion With Attention Fusion Of Bottleneck And Perturbation Features (2022)9.03
- Assem-vc: Realistic Voice Conversion By Assembling Modern Speech Synthesis Techniques (2021)11.64
- Building Multi Lingual TTS Using Cross Lingual Voice Conversion (2020)0.00
- Building Bilingual And Code-switched Voice Conversion With Limited Training Data Using Embedding Consistency Loss (2021)0.00
- Enriching Source Style Transfer In Recognition-synthesis Based Non-parallel Voice Conversion (2021)9.23
- Adversarial Speaker Disentanglement Using Unannotated External Data For Self-supervised Representation Based Voice Conversion (2023)6.34
- AC-VC: Non-parallel Low Latency Phonetic Posteriorgrams Based Voice Conversion (2021)7.50
- Dnn-based Cross-lingual Voice Conversion Using Bottleneck Features (2019)3.58