RWEN-TTS: Relation-aware Word Encoding Network For Natural Text-to-speech Synthesis
2022 Β· Shinhyeok Oh, Hyeongrae Noh, Yoonseok Hong, et al.
Abstract
With the advent of deep learning, a huge number of text-to-speech (TTS) models which produce human-like speech have emerged. Recently, by introducing syntactic and semantic information w.r.t the input text, various approaches have been proposed to enrich the naturalness and expressiveness of TTS models. Although these strategies showed impressive results, they still have some limitations in utilizing language information. First, most approaches only use graph networks to utilize syntactic and semantic information without considering linguistic features. Second, most previous works do not explicitly consider adjacent words when encoding syntactic and semantic information, even though it is obvious that adjacent words are usually meaningful when encoding the current word. To address these issues, we propose Relation-aware Word Encoding Network (RWEN), which effectively allows syntactic and semantic information based on two modules (i.e., Semantic-level Relation Encoding and Adjacent Word
Authors
(none)
Tags
Stats
Related papers
- Enhancing Word-level Semantic Representation Via Dependency Structure For Expressive Text-to-speech Synthesis (2021)0.00
- Using Synthetic Audio To Improve The Recognition Of Out-of-vocabulary Words In End-to-end ASR Systems (2020)12.33
- Improved Neural Language Model Fusion For Streaming Recurrent Neural Network Transducer (2020)8.82
- Graphspeech: Syntax-aware Graph Attention Network For Neural Speech Synthesis (2020)7.50
- Neural Speech Synthesis With Transformer Network (2018)19.95
- Environment Aware Text-to-speech Synthesis (2021)6.34
- Utilizing Neural Transducers For Two-stage Text-to-speech Via Semantic Token Prediction (2024)0.00
- Improving RNN Transducer Modeling For End-to-end Speech Recognition (2019)0.00