Investigation Of Learning Abilities On Linguistic Features In Sequence-to-sequence Text-to-speech Synthesis
2020 Β· Yusuke Yasuda, Xin Wang, Junichi Yamagishi
Abstract
Neural sequence-to-sequence text-to-speech synthesis (TTS) can produce high-quality speech directly from text or simple linguistic features such as phonemes. Unlike traditional pipeline TTS, the neural sequence-to-sequence TTS does not require manually annotated and complicated linguistic features such as part-of-speech tags and syntactic structures for system training. However, it must be carefully designed and well optimized so that it can implicitly extract useful linguistic features from the input features. In this paper we investigate under what conditions the neural sequence-to-sequence TTS can work well in Japanese and English along with comparisons with deep neural network (DNN) based pipeline TTS systems. Unlike past comparative studies, the pipeline systems also use autoregressive probabilistic modeling and a neural vocoder. We investigated systems from three aspects: a) model architecture, b) model parameter size, and c) language. For the model architecture aspect, we adopt
Authors
(none)
Tags
Stats
Related papers
- Applying Syntax\(\unicode{x2013}\)prosody Mapping Hypothesis And Prosodic Well-formedness Constraints To Neural Sequence-to-sequence Speech Synthesis (2022)0.00
- Investigation Of Enhanced Tacotron Text-to-speech Synthesis Systems With Self-attention For Pitch Accent Language (2018)12.54
- On The Problem Of Text-to-speech Model Selection For Synthetic Data Generation In Automatic Speech Recognition (2024)4.52
- Sequence To Sequence Neural Speech Synthesis With Prosody Modification Capabilities (2019)9.59
- Investigating Context Features Hidden In End-to-end TTS (2018)0.00
- Neural Hmms Are All You Need (for High-quality Attention-free TTS) (2021)7.50
- Evaluating Text-to-speech Synthesis From A Large Discrete Token-based Speech Language Model (2024)0.00
- Robust Sequence-to-sequence Acoustic Modeling With Stepwise Monotonic Attention For Neural TTS (2019)11.49