cluster #9
50 papers in this cluster (ordered by heat_score)
Papers
- Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions (2017)Jonathan Shen, Ruoming Pang, Ron J. Weiss, et al.24.07
- Text-free Prosody-aware Generative Spoken Language Modeling (2021)Eugene Kharitonov, Ann Lee, Adam Polyak, et al.20.95
- VQMIVC: Vector Quantization And Mutual Information-based Unsupervised Speech Representation Disentanglement For One-shot Voice Conversion (2021)Disong Wang, Liqun Deng, Yu Ting Yeung, et al.20.31
- A Comparison Of Discrete And Soft Speech Units For Improved Voice Conversion (2021)Benjamin van Niekerk, Marc-André Carbonneau, Julian Zaïdi, et al.20.25
- Neural Speech Synthesis With Transformer Network (2018)Naihan Li, Shujie Liu, Yanqing Liu, et al.19.95
- An Overview Of Voice Conversion And Its Challenges: From Statistical Modeling To Deep Learning (2020)Berrak Sisman, Junichi Yamagishi, Simon King, et al.18.53
- Amphion: An Open-source Audio, Music And Speech Generation Toolkit (2023)Xueyao Zhang, Liumeng Xue, Yicheng Gu, et al.18.19
- Efficiently Trainable Text-to-speech System Based On Deep Convolutional Networks With Guided Attention (2017)Hideyuki Tachibana, Katsuya Uenoyama, Shunsuke Aihara16.41
- Prompting The Hidden Talent Of Web-scale Speech Models For Zero-shot Task Generalization (2023)Puyuan Peng, Brian Yan, Shinji Watanabe, et al.16.38
- Naturalspeech: End-to-end Text To Speech Synthesis With Human-level Quality (2022)Xu Tan, Jiawei Chen, Haohe Liu, et al.16.32
- Fastpitch: Parallel Text-to-speech With Pitch Prediction (2020)Adrian Łańcucki16.23
- Zero-shot Multi-speaker Text-to-speech With State-of-the-art Neural Speaker Embeddings (2019)Erica Cooper, Cheng-I Lai, Yusuke Yasuda, et al.15.67
- VERSA: A Versatile Evaluation Toolkit For Speech, Audio, And Music (2024)Jiatong Shi, Hye-Jin Shim, Jinchuan Tian, et al.15.28
- Learning To Speak Fluently In A Foreign Language: Multilingual Speech Synthesis And Cross-language Voice Cloning (2019)Yu Zhang, Ron J. Weiss, Heiga Zen, et al.15.03
- Matcha-tts: A Fast TTS Architecture With Conditional Flow Matching (2023)Shivam Mehta, Ruibo Tu, Jonas Beskow, et al.14.73
- ACVAE-VC: Non-parallel Many-to-many Voice Conversion With Auxiliary Classifier Variational Autoencoder (2018)Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, et al.14.69
- Lightweight And High-fidelity End-to-end Text-to-speech With Multi-band Generation And Inverse Short-time Fourier Transform (2022)Masaya Kawamura, Yuma Shirahata, Ryuichi Yamamoto, et al.14.57
- Robust And Fine-grained Prosody Control Of End-to-end Speech Synthesis (2018)Younggun Lee, Taesu Kim14.31
- AGAIN-VC: A One-shot Voice Conversion Using Activation Guidance And Adaptive Instance Normalization (2020)Yen-Hao Chen, da-Yi Wu, Tsung-Han Wu, et al.14.27
- Mellotron: Multispeaker Expressive Voice Synthesis By Conditioning On Rhythm, Pitch And Global Style Tokens (2019)Rafael Valle, Jason Li, Ryan Prenger, et al.14.19
- Atts2s-vc: Sequence-to-sequence Voice Conversion With Attention And Context Preservation Mechanisms (2018)Kou Tanaka, Hirokazu Kameoka, Takuhiro Kaneko, et al.14.15
- Predicting Expressive Speaking Style From Text In End-to-end Speech Synthesis (2018)Daisy Stanton, Yuxuan Wang, Rj Skerry-Ryan14.11
- Any-to-many Voice Conversion With Location-relative Sequence-to-sequence Modeling (2020)Songxiang Liu, Yuewen Cao, Disong Wang, et al.14.02
- Non-parallel Sequence-to-sequence Voice Conversion With Disentangled Linguistic And Speaker Representations (2019)Jing-Xuan Zhang, Zhen-Hua Ling, Li-Rong Dai14.02
- Speech Recognition With Augmented Synthesized Speech (2019)Andrew Rosenberg, Yu Zhang, Bhuvana Ramabhadran, et al.13.97
- Expressive Speech Synthesis Via Modeling Expressions With Variational Autoencoder (2018)Kei Akuzawa, Yusuke Iwasawa, Yutaka Matsuo13.88
- VQVC+: One-shot Voice Conversion By Vector Quantization And U-net Architecture (2020)da-Yi Wu, Yen-Hao Chen, Hung-Yi Lee13.34
- Semi-supervised Training For Improving Data Efficiency In End-to-end Speech Synthesis (2018)Yu-An Chung, Yuxuan Wang, Wei-Ning Hsu, et al.13.28
- Fully-hierarchical Fine-grained Prosody Modeling For Interpretable Speech Synthesis (2020)Guangzhi Sun, Yu Zhang, Ron J. Weiss, et al.13.28
- Voice Transformer Network: Sequence-to-sequence Voice Conversion Using Transformer With Text-to-speech Pretraining (2019)Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, et al.13.17
- Location-relative Attention Mechanisms For Robust Long-form Speech Synthesis (2019)Eric Battenberg, Rj Skerry-Ryan, Soroosh Mariooryad, et al.13.11
- Unsupervised Text-to-speech Synthesis By Unsupervised Automatic Speech Recognition (2022)Junrui Ni, Liming Wang, Heting Gao, et al.12.92
- Wave-tacotron: Spectrogram-free End-to-end Text-to-speech Synthesis (2020)Ron J. Weiss, Rj Skerry-Ryan, Eric Battenberg, et al.12.81
- Meta-tts: Meta-learning For Few-shot Speaker Adaptive Text-to-speech (2021)Sung-Feng Huang, Chyi-Jiunn Lin, da-Rong Liu, et al.12.74
- Expressive TTS Training With Frame And Style Reconstruction Loss (2020)Rui Liu, Berrak Sisman, Guanglai Gao, et al.12.74
- Transfer Learning From Speech Synthesis To Voice Conversion With Non-parallel Training Data (2020)Mingyang Zhang, Yi Zhou, Li Zhao, et al.12.74
- Parallel Tacotron 2: A Non-autoregressive Neural TTS Model With Differentiable Duration Modeling (2021)Isaac Elias, Heiga Zen, Jonathan Shen, et al.12.68
- Convs2s-vc: Fully Convolutional Sequence-to-sequence Voice Conversion (2018)Hirokazu Kameoka, Kou Tanaka, Damian Kwasny, et al.12.68
- Parallel Tacotron: Non-autoregressive And Controllable TTS (2020)Isaac Elias, Heiga Zen, Jonathan Shen, et al.12.54
- Fragmentvc: Any-to-any Voice Conversion By End-to-end Extracting And Fusing Fine-grained Voice Fragments With Attention (2020)Yist Y. Lin, Chung-Ming Chien, Jheng-Hao Lin, et al.12.54
- Generating Diverse And Natural Text-to-speech Samples Using A Quantized Fine-grained VAE And Auto-regressive Prosody Prior (2020)Guangzhi Sun, Yu Zhang, Ron J. Weiss, et al.12.54
- Investigation Of Enhanced Tacotron Text-to-speech Synthesis Systems With Self-attention For Pitch Accent Language (2018)Yusuke Yasuda, Xin Wang, Shinji Takaki, et al.12.54
- VITS2: Improving Quality And Efficiency Of Single-stage Text-to-speech With Adversarial Learning And Architecture Design (2023)Jungil Kong, Jihoon Park, Beomjeong Kim, et al.12.40
- AVQVC: One-shot Voice Conversion By Vector Quantization With Applying Contrastive Learning (2022)Huaizhen Tang, Xulong Zhang, Jianzong Wang, et al.12.40
- S2VC: A Framework For Any-to-any Voice Conversion With Self-supervised Pretrained Representations (2021)Jheng-Hao Lin, Yist Y. Lin, Chung-Ming Chien, et al.12.25
- Fluentspeech: Stutter-oriented Automatic Speech Editing With Context-aware Diffusion Models (2023)Ziyue Jiang, Qian Yang, Jialong Zuo, et al.12.13
- JETS: Jointly Training Fastspeech2 And Hifi-gan For End To End Text To Speech (2022)Dan Lim, Sunghee Jung, Eesung Kim12.10
- Forward Attention In Sequence-to-sequence Acoustic Modelling For Speech Synthesis (2018)Jing-Xuan Zhang, Zhen-Hua Ling, Li-Rong Dai12.10
- Attentron: Few-shot Text-to-speech Utilizing Attention-based Variable-length Embedding (2020)Seungwoo Choi, Seungju Han, Dongyoung Kim, et al.12.02
- Speechx: Neural Codec Language Model As A Versatile Speech Transformer (2023)Xiaofei Wang, Manthan Thakker, Zhuo Chen, et al.11.85