Text-to-Speech
50 papers tagged Text-to-Speech (ordered by heat_score)
Papers
- Espnet-tts: Unified, Reproducible, And Integratable Open Source End-to-end Text-to-speech Toolkit (2019)Tomoki Hayashi, Ryuichi Yamamoto, Katsuki Inoue, et al.23.32
- Espnet: End-to-end Speech Processing Toolkit (2018)Shinji Watanabe, Takaaki Hori, Shigeki Karita, et al.22.17
- Text-free Prosody-aware Generative Spoken Language Modeling (2021)Eugene Kharitonov, Ann Lee, Adam Polyak, et al.20.95
- Libritts: A Corpus Derived From Librispeech For Text-to-speech (2019)Heiga Zen, Viet Dang, Rob Clark, et al.20.79
- Joint Ctc-attention Based End-to-end Speech Recognition Using Multi-task Learning (2016)Suyoun Kim, Takaaki Hori, Shinji Watanabe20.43
- A Comparative Study On Transformer Vs RNN In Speech Applications (2019)Shigeki Karita, Nanxin Chen, Tomoki Hayashi, et al.20.07
- Neural Speech Synthesis With Transformer Network (2018)Naihan Li, Shujie Liu, Yanqing Liu, et al.19.95
- Light Gated Recurrent Units For Speech Recognition (2018)Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, et al.18.90
- Streaming End-to-end Speech Recognition For Mobile Devices (2018)Yanzhang He, Tara N. Sainath, Rohit Prabhavalkar, et al.18.87
- Transformer Transducer: A Streamable Speech Recognition Model With Transformer Encoders And RNN-T Loss (2020)Qian Zhang, Han Lu, Hasim Sak, et al.18.58
- Dual-path Transformer Network: Direct Context-aware Modeling For End-to-end Monaural Speech Separation (2020)Jingjing Chen, Qirong Mao, Dong Liu18.24
- Amphion: An Open-source Audio, Music And Speech Generation Toolkit (2023)Xueyao Zhang, Liumeng Xue, Yicheng Gu, et al.18.19
- Multimodal Speech Emotion Recognition Using Audio And Text (2018)Seunghyun Yoon, Seokhyun Byun, Kyomin Jung18.02
- Deep Learning Enabled Semantic Communications With Speech Recognition And Synthesis (2022)Zhenzi Weng, Zhijin Qin, Xiaoming Tao, et al.17.85
- One-class Learning Towards Synthetic Voice Spoofing Detection (2020)You Zhang, Fei Jiang, Zhiyao Duan17.31
- Wenet: Production Oriented Streaming And Non-streaming End-to-end Speech Recognition Toolkit (2021)Zhuoyuan Yao, di Wu, Xiong Wang, et al.17.27
- Contextnet: Improving Convolutional Neural Networks For Automatic Speech Recognition With Global Context (2020)Wei Han, Zhengdong Zhang, Yu Zhang, et al.17.24
- Dense CNN With Self-attention For Time-domain Speech Enhancement (2020)Ashutosh Pandey, Deliang Wang16.59
- Efficiently Trainable Text-to-speech System Based On Deep Convolutional Networks With Guided Attention (2017)Hideyuki Tachibana, Katsuya Uenoyama, Shunsuke Aihara16.41
- Naturalspeech: End-to-end Text To Speech Synthesis With Human-level Quality (2022)Xu Tan, Jiawei Chen, Haohe Liu, et al.16.32
- Fastpitch: Parallel Text-to-speech With Pitch Prediction (2020)Adrian Łańcucki16.23
- Exploring Architectures, Data And Units For Streaming End-to-end Speech Recognition With Rnn-transducer (2018)Kanishka Rao, Haşim Sak, Rohit Prabhavalkar16.21
- Wenetspeech: A 10000+ Hours Multi-domain Mandarin Corpus For Speech Recognition (2021)Binbin Zhang, Hang Lv, Pengcheng Guo, et al.16.12
- Speech Emotion Recognition With Dual-sequence LSTM Architecture (2019)Jianyou Wang, Michael Xue, Ryan Culhane, et al.15.78
- Zero-shot Multi-speaker Text-to-speech With State-of-the-art Neural Speaker Embeddings (2019)Erica Cooper, Cheng-I Lai, Yusuke Yasuda, et al.15.67
- Deep Context: End-to-end Contextual Speech Recognition (2018)Golan Pundak, Tara N. Sainath, Rohit Prabhavalkar, et al.15.57
- Self-training For End-to-end Speech Recognition (2019)Jacob Kahn, Ann Lee, Awni Hannun15.48
- ASSERT: Anti-spoofing With Squeeze-excitation And Residual Networks (2019)Cheng-I Lai, Nanxin Chen, Jesús Villalba, et al.15.40
- VERSA: A Versatile Evaluation Toolkit For Speech, Audio, And Music (2024)Jiatong Shi, Hye-Jin Shim, Jinchuan Tian, et al.15.28
- Learning Alignment For Multimodal Emotion Recognition From Speech (2019)Haiyang Xu, Hui Zhang, Kun Han, et al.15.22
- Direct Speech-to-speech Translation With A Sequence-to-sequence Model (2019)Ye Jia, Ron J. Weiss, Fadi Biadsy, et al.15.13
- Learning To Speak Fluently In A Foreign Language: Multilingual Speech Synthesis And Cross-language Voice Cloning (2019)Yu Zhang, Ron J. Weiss, Heiga Zen, et al.15.03
- Exploring Neural Transducers For End-to-end Speech Recognition (2017)Eric Battenberg, Jitong Chen, Rewon Child, et al.14.90
- Improving Speaker Discrimination Of Target Speech Extraction With Time-domain Speakerbeam (2020)Marc Delcroix, Tsubasa Ochiai, Katerina Zmolikova, et al.14.76
- Recent Advances In Speech Language Models: A Survey (2024)Wenqian Cui, Dianzhi Yu, Xiaoqi Jiao, et al.14.64
- A Spelling Correction Model For End-to-end Speech Recognition (2019)Jinxi Guo, Tara N. Sainath, Ron J. Weiss14.62
- Deep Contextualized Acoustic Representations For Semi-supervised Speech Recognition (2019)Shaoshi Ling, Yuzong Liu, Julian Salazar, et al.14.62
- Speech Emotion Recognition Using Multi-hop Attention Mechanism (2019)Seunghyun Yoon, Seokhyun Byun, Subhadeep Dey, et al.14.58
- Lightweight And High-fidelity End-to-end Text-to-speech With Multi-band Generation And Inverse Short-time Fourier Transform (2022)Masaya Kawamura, Yuma Shirahata, Ryuichi Yamamoto, et al.14.57
- Gated Recurrent Fusion With Joint Training Framework For Robust End-to-end Speech Recognition (2020)Cunhang Fan, Jiangyan Yi, Jianhua Tao, et al.14.55
- Direct Modelling Of Speech Emotion From Raw Speech (2019)Siddique Latif, Rajib Rana, Sara Khalifa, et al.14.55
- Fast Conformer With Linearly Scalable Attention For Efficient Speech Recognition (2023)Dima Rekesh, Nithin Rao Koluguri, Samuel Kriman, et al.14.47
- A Comparison Of Techniques For Language Model Integration In Encoder-decoder Speech Recognition (2018)Shubham Toshniwal, Anjuli Kannan, Chung-Cheng Chiu, et al.14.39
- Contextual Audio-visual Switching For Speech Enhancement In Real-world Environments (2018)Ahsan Adeel, Mandar Gogate, Amir Hussain14.35
- Speech2affectivegestures: Synthesizing Co-speech Gestures With Generative Adversarial Affective Expression Learning (2021)Uttaran Bhattacharya, Elizabeth Childs, Nicholas Rewkowski, et al.14.35
- Fastdiff: A Fast Conditional Diffusion Model For High-quality Speech Synthesis (2022)Rongjie Huang, Max W. Y. Lam, Jun Wang, et al.14.35
- Robust And Fine-grained Prosody Control Of End-to-end Speech Synthesis (2018)Younggun Lee, Taesu Kim14.31
- Speech Denoising With Deep Feature Losses (2018)Francois G. Germain, Qifeng Chen, Vladlen Koltun14.23
- Speech2vec: A Sequence-to-sequence Framework For Learning Word Embeddings From Speech (2018)Yu-An Chung, James Glass14.15
- Bytes Are All You Need: End-to-end Multilingual Speech Recognition And Synthesis With Bytes (2018)Bo Li, Yu Zhang, Tara Sainath, et al.14.15