Music Generation
50 papers tagged Music Generation (ordered by heat_score)
Papers
- Audiolm: A Language Modeling Approach To Audio Generation (2022)Zalán Borsos, Raphaël Marinier, Damien Vincent, et al.18.91
- Amphion: An Open-source Audio, Music And Speech Generation Toolkit (2023)Xueyao Zhang, Liumeng Xue, Yicheng Gu, et al.18.19
- Storm: A Diffusion-based Stochastic Regeneration Model For Speech Enhancement And Dereverberation (2022)Jean-Marie Lemercier, Julius Richter, Simon Welker, et al.15.43
- VERSA: A Versatile Evaluation Toolkit For Speech, Audio, And Music (2024)Jiatong Shi, Hye-Jin Shim, Jinchuan Tian, et al.15.28
- Univnet: A Neural Vocoder With Multi-resolution Spectrogram Discriminators For High-fidelity Waveform Generation (2021)Won Jang, Dan Lim, Jaesam Yoon, et al.14.80
- Deep Clustering And Conventional Networks For Music Separation: Stronger Together (2016)Yi Luo, Zhuo Chen, John R. Hershey, et al.14.76
- Conditional LSTM-GAN For Melody Generation From Lyrics (2019)Yi Yu, Abhishek Srivastava, Simon Canales14.69
- Lightweight And High-fidelity End-to-end Text-to-speech With Multi-band Generation And Inverse Short-time Fourier Transform (2022)Masaya Kawamura, Yuma Shirahata, Ryuichi Yamamoto, et al.14.57
- Musicldm: Enhancing Novelty In Text-to-music Generation Using Beat-synchronous Mixup Strategies (2023)Ke Chen, Yusong Wu, Haohe Liu, et al.13.55
- Audiosetcaps: An Enriched Audio-caption Dataset Using Automated Generation Pipeline With Large Audio And Language Models (2024)Jisheng Bai, Haohe Liu, Mou Wang, et al.13.44
- Ms-sincresnet: Joint Learning Of 1D And 2D Kernels Using Multi-scale Sincnet And Resnet For Music Genre Classification (2021)Pei-Chun Chang, Yong-Sheng Chen, Chang-Hsing Lee13.13
- Waveform Modeling And Generation Using Hierarchical Recurrent Neural Networks For Speech Bandwidth Extension (2018)Zhen-Hua Ling, Yang Ai, Yu Gu, et al.12.99
- Lpips-attnwav2lip: Generic Audio-driven Lip Synchronization For Talking Head Generation In The Wild (2026)Zhipeng Chen, Xinheng Wang, Lun Xie, et al.12.65
- VX2TEXT: End-to-end Learning Of Video-based Text Generation From Multimodal Inputs (2021)Xudong Lin, Gedas Bertasius, Jue Wang, et al.12.17
- Weakly Supervised Deep Recurrent Neural Networks For Basic Dance Step Generation (2018)Nelson Yalta, Shinji Watanabe, Kazuhiro Nakadai, et al.12.17
- Music Artist Classification With Convolutional Recurrent Neural Networks (2019)Zain Nasrullah, Yue Zhao11.93
- Voxinstruct: Expressive Human Instruction-to-speech Generation With Unified Multilingual Codec Language Modelling (2024)Yixuan Zhou, Xiaoyu Qin, Zeyu Jin, et al.11.81
- Mustango: Toward Controllable Text-to-music Generation (2023)Jan Melechovsky, Zixun Guo, Deepanway Ghosal, et al.11.67
- The Effect Of Explicit Structure Encoding Of Deep Neural Networks For Symbolic Music Generation (2018)Ke Chen, Weilin Zhang, Shlomo Dubnov, et al.11.49
- Auffusion: Leveraging The Power Of Diffusion And Large Language Models For Text-to-audio Generation (2024)Jinlong Xue, Yayue Deng, Yingming Gao, et al.11.19
- Diverse And Aligned Audio-to-video Generation Via Text-to-video Model Adaptation (2023)Guy Yariv, Itai Gat, Sagie Benaim, et al.11.19
- Emotiongesture: Audio-driven Diverse Emotional Co-speech 3D Gesture Generation (2023)Xingqun Qi, Chen Liu, Lincheng Li, et al.10.97
- Speechbertscore: Reference-aware Automatic Evaluation Of Speech Generation Leveraging NLP Evaluation Metrics (2024)Takaaki Saeki, Soumi Maiti, Shinnosuke Takamichi, et al.10.74
- Singgan: Generative Adversarial Network For High-fidelity Singing Voice Generation (2021)Rongjie Huang, Chenye Cui, Feiyang Chen, et al.10.61
- Audio-based Music Classification With Densenet And Data Augmentation (2019)Wenhao Bian, Jie Wang, Bojin Zhuang, et al.10.48
- Probability Density Distillation With Generative Adversarial Networks For High-quality Parallel Waveform Generation (2019)Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim10.48
- Improving Trajectory Modelling For Dnn-based Speech Synthesis By Using Stacked Bottleneck Features And Minimum Generation Error Training (2016)Zhizheng Wu, Simon King10.35
- Semi-recurrent Cnn-based VAE-GAN For Sequential Data Generation (2018)Mohammad Akbari, Jie Liang10.21
- Diffusion-based Co-speech Gesture Generation Using Joint Text And Audio Representation (2023)Anna Deichler, Shivam Mehta, Simon Alexanderson, et al.10.07
- Diffprosody: Diffusion-based Latent Prosody Generation For Expressive Speech Synthesis With Prosody Conditional Adversarial Training (2023)Hyung-Seok Oh, Sang-Hoon Lee, Seong-Whan Lee10.07
- Audiotoken: Adaptation Of Text-conditioned Diffusion Models For Audio-to-image Generation (2023)Guy Yariv, Itai Gat, Lior Wolf, et al.9.76
- Phase-aware Music Super-resolution Using Generative Adversarial Networks (2020)Shichao Hu, Bin Zhang, Beici Liang, et al.9.59
- Muscaps: Generating Captions For Music Audio (2021)Ilaria Manco, Emmanouil Benetos, Elio Quinton, et al.9.59
- Freetalker: Controllable Speech And Text-driven Gesture Generation Based On Diffusion Models For Enhanced Speaker Naturalness (2024)Sicheng Yang, Zunnan Xu, Haiwei Xue, et al.9.59
- Unconditional Audio Generation With Generative Adversarial Networks And Cycle Regularization (2020)Jen-Yu Liu, Yu-Hua Chen, Yin-Cheng Yeh, et al.9.41
- S2IGAN: Speech-to-image Generation Via Adversarial Learning (2020)Xinsheng Wang, Tingting Qiao, Jihua Zhu, et al.9.23
- Musilingo: Bridging Music And Text With Pre-trained Language Models For Music Captioning And Query Response (2023)Zihao Deng, Yinghao Ma, Yudong Liu, et al.9.03
- JEN-1: Text-guided Universal Music Generation With Omnidirectional Diffusion Models (2023)Peike Li, Boyu Chen, Yao Yao, et al.9.03
- Espnet-codec: Comprehensive Training And Evaluation Of Neural Codecs For Audio, Music, And Speech (2024)Jiatong Shi, Jinchuan Tian, Yihan Wu, et al.9.03
- Generating Lead Sheets With Affect: A Novel Conditional Seq2seq Framework (2021)Dimos Makris, Kat R. Agres, Dorien Herremans8.60
- High-fidelity Audio Generation And Representation Learning With Guided Adversarial Autoencoder (2020)Kazi Nazmul Haque, Rajib Rana, Björn W Schuller8.35
- Waveform Generation For Text-to-speech Synthesis Using Pitch-synchronous Multi-scale Generative Adversarial Networks (2018)Lauri Juvela, Bajibabu Bollepalli, Junichi Yamagishi, et al.8.35
- Music Genre Classification Using Spectral Analysis And Sparse Representation Of The Signals (2018)Mehdi Banitalebi-Dehkordi, Amin Banitalebi-Dehkordi8.09
- Adversarial Speech For Voice Privacy Protection From Personalized Speech Generation (2024)Shihao Chen, Liping Chen, Jie Zhang, et al.8.09
- Lvcnet: Efficient Condition-dependent Modeling Network For Waveform Generation (2021)Zhen Zeng, Jianzong Wang, Ning Cheng, et al.8.09
- Specdiff-gan: A Spectrally-shaped Noise Diffusion GAN For Speech And Music Synthesis (2024)Teysir Baoueb, Haocheng Liu, Mathieu Fontaine, et al.7.81
- Omniflow: Any-to-any Generation With Multi-modal Rectified Flows (2024)Shufan Li, Konstantinos Kallidromitis, Akash Gokul, et al.7.78
- Melody-conditioned Lyrics Generation With Seqgans (2020)Yihao Chen, Alexander Lerch7.50
- Improving Adversarial Waveform Generation Based Singing Voice Conversion With Harmonic Signals (2022)Haohan Guo, Zhiping Zhou, Fanbo Meng, et al.7.50
- Ezaudio: Enhancing Text-to-audio Generation With Efficient Diffusion Transformer (2024)Jiarui Hai, Yong Xu, Hao Zhang, et al.7.50