cluster #4
50 papers in this cluster (ordered by heat_score)
Papers
- Diffsinger: Singing Voice Synthesis Via Shallow Diffusion Mechanism (2021)Jinglin Liu, Chengxi Li, Yi Ren, et al.23.76
- Convolutional Recurrent Neural Networks For Music Classification (2016)Keunwoo Choi, George Fazekas, Mark Sandler, et al.18.98
- Stargan-vc: Non-parallel Many-to-many Voice Conversion With Star Generative Adversarial Networks (2018)Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, et al.18.09
- Cyclegan-vc2: Improved Cyclegan-based Non-parallel Voice Conversion (2019)Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, et al.17.45
- Voice Conversion From Unaligned Corpora Using Variational Autoencoding Wasserstein Generative Adversarial Networks (2017)Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, et al.16.34
- Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks (2017)Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari16.21
- Muq: Self-supervised Music Representation Learning With Mel Residual Vector Quantization (2025)Haina Zhu, Yizhi Zhou, Hangting Chen, et al.15.66
- Singing Voice Data Scaling-up: An Introduction To Ace-opencpop And Ace-kising (2024)Jiatong Shi, Yueqian Lin, Xinyi Bai, et al.15.48
- Multi-level And Multi-scale Feature Aggregation Using Pre-trained Convolutional Neural Networks For Music Auto-tagging (2017)Jongpil Lee, Juhan Nam15.43
- Mmdenselstm: An Efficient Combination Of Convolutional And Recurrent Neural Networks For Audio Source Separation (2018)Naoya Takahashi, Nabarun Goswami, Yuki Mitsufuji15.28
- CMGAN: Conformer-based Metric GAN For Speech Enhancement (2022)Ruizhe Cao, Sherif Abdulatif, Bin Yang15.13
- Joint Robust Voicing Detection And Pitch Estimation Based On Residual Harmonics (2019)Thomas Drugman, Abeer Alwan14.93
- CMGAN: Conformer-based Metric-gan For Monaural Speech Enhancement (2022)Sherif Abdulatif, Ruizhe Cao, Bin Yang14.80
- Univnet: A Neural Vocoder With Multi-resolution Spectrogram Discriminators For High-fidelity Waveform Generation (2021)Won Jang, Dan Lim, Jaesam Yoon, et al.14.80
- Conditional LSTM-GAN For Melody Generation From Lyrics (2019)Yi Yu, Abhishek Srivastava, Simon Canales14.69
- Fast Spectrogram Inversion Using Multi-head Convolutional Neural Networks (2018)Sercan O. Arik, Heewoo Jun, Gregory Diamos14.39
- Fastdiff: A Fast Conditional Diffusion Model For High-quality Speech Synthesis (2022)Rongjie Huang, Max W. Y. Lam, Jun Wang, et al.14.35
- Vid2speech: Speech Reconstruction From Silent Video (2017)Ariel Ephrat, Shmuel Peleg14.15
- Deep Cross-modal Correlation Learning For Audio And Lyrics In Music Retrieval (2017)Yi Yu, Suhua Tang, Francisco Raposo, et al.14.06
- Cyclegan-vc3: Examining And Improving Cyclegan-vcs For Mel-spectrogram Conversion (2020)Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, et al.14.02
- NNSVS: A Neural Network-based Singing Voice Synthesis Toolkit (2022)Ryuichi Yamamoto, Reo Yoneyama, Tomoki Toda13.83
- Nnaudio: An On-the-fly GPU Audio To Spectrogram Conversion Toolbox Using 1D Convolution Neural Networks (2019)Kin Wai Cheuk, Hans Anderson, Kat Agres, et al.13.70
- Starganv2-vc: A Diverse, Unsupervised, Non-parallel Framework For Natural-sounding Voice Conversion (2021)Yinghao Aaron Li, Ali Zare, Nima Mesgarani13.70
- Multi-target Voice Conversion Without Parallel Data By Adversarially Learning Disentangled Audio Representations (2018)Ju-Chieh Chou, Cheng-Chieh Yeh, Hung-Yi Lee, et al.13.60
- Musicldm: Enhancing Novelty In Text-to-music Generation Using Beat-synchronous Mixup Strategies (2023)Ke Chen, Yusong Wu, Haohe Liu, et al.13.55
- Learning Latent Representations For Speech Generation And Transformation (2017)Wei-Ning Hsu, Yu Zhang, James Glass13.50
- Improved Speech Reconstruction From Silent Video (2017)Ariel Ephrat, Tavi Halperin, Shmuel Peleg13.34
- Istftnet: Fast And Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform (2022)Takuhiro Kaneko, Kou Tanaka, Hirokazu Kameoka, et al.13.34
- Opencpop: A High-quality Open Source Chinese Popular Song Corpus For Singing Voice Synthesis (2022)Yu Wang, Xinsheng Wang, Pengcheng Zhu, et al.13.34
- Multi-singer: Fast Multi-singer Singing Voice Vocoder With A Large-scale Corpus (2021)Rongjie Huang, Feiyang Chen, Yi Ren, et al.13.28
- Voice Impersonation Using Generative Adversarial Networks (2018)Yang Gao, Rita Singh, Bhiksha Raj13.23
- Sample-level CNN Architectures For Music Auto-tagging Using Raw Waveforms (2017)Taejun Kim, Jongpil Lee, Juhan Nam13.23
- F0-consistent Many-to-many Non-parallel Voice Conversion Via Conditional Autoencoder (2020)Kaizhi Qian, Zeyu Jin, Mark Hasegawa-Johnson, et al.13.17
- Ms-sincresnet: Joint Learning Of 1D And 2D Kernels Using Multi-scale Sincnet And Resnet For Music Genre Classification (2021)Pei-Chun Chang, Yong-Sheng Chen, Chang-Hsing Lee13.13
- End-to-end Lyrics Alignment For Polyphonic Music Using An Audio-to-character Recognition Model (2019)Daniel Stoller, Simon Durand, Sebastian Ewert13.11
- Stylemelgan: An Efficient High-fidelity Adversarial Vocoder With Temporal Adaptive Normalization (2020)Ahmed Mustafa, Nicola Pia, Guillaume Fuchs13.05
- Visinger: Variational Inference With Adversarial Learning For End-to-end Singing Voice Synthesis (2021)Yongmao Zhang, Jian Cong, Heyang Xue, et al.12.99
- A Streamlined Encoder/decoder Architecture For Melody Extraction (2018)Tsung-Han Hsieh, Li Su, Yi-Hsuan Yang12.68
- Lpips-attnwav2lip: Generic Audio-driven Lip Synchronization For Talking Head Generation In The Wild (2026)Zhipeng Chen, Xinheng Wang, Lun Xie, et al.12.65
- Wasserstein GAN And Waveform Loss-based Acoustic Model Training For Multi-speaker Text-to-speech Synthesis Systems Using A Wavenet Vocoder (2018)Yi Zhao, Shinji Takaki, Hieu-Thi Luong, et al.12.61
- Real-time Speech Frequency Bandwidth Extension (2020)Yunpeng Li, Marco Tagliasacchi, Oleg Rybakov, et al.12.54
- Xiaoicesing: A High-quality And Integrated Singing Voice Synthesis System (2020)Peiling Lu, Jie Wu, Jian Luan, et al.12.54
- Vocgan: A High-fidelity Real-time Vocoder With A Hierarchically-nested Adversarial Network (2020)Jinhyeok Yang, Junmo Lee, Youngik Kim, et al.12.54
- Muskits-espnet: A Comprehensive Toolkit For Singing Voice Synthesis In New Paradigm (2024)Yuning Wu, Jiatong Shi, Yifeng Yu, et al.12.50
- Vocal Melody Extraction Using Patch-based CNN (2018)Li Su12.47
- Bigvsan: Enhancing Gan-based Neural Vocoders With Slicing Adversarial Network (2023)Takashi Shibuya, Yuhta Takida, Yuki Mitsufuji12.43
- Nu-wave: A Diffusion Probabilistic Model For Neural Audio Upsampling (2021)Junhyeok Lee, Seungu Han12.40
- Speech Waveform Synthesis From MFCC Sequences With Generative Adversarial Networks (2018)Lauri Juvela, Bajibabu Bollepalli, Xin Wang, et al.12.25
- Neural Vocoder Is All You Need For Speech Super-resolution (2022)Haohe Liu, Woosung Choi, Xubo Liu, et al.12.25
- Nu-wave 2: A General Neural Audio Upsampling Model For Various Sampling Rates (2022)Seungu Han, Junhyeok Lee12.17