cluster #3
50 papers in this cluster (ordered by heat_score)
Papers
- Wavcaps: A Chatgpt-assisted Weakly-labelled Audio Captioning Dataset For Audio-language Multimodal Research (2023)Xinhao Mei, Chutong Meng, Haohe Liu, et al.20.69
- Large-scale Contrastive Language-audio Pretraining With Feature Fusion And Keyword-to-caption Augmentation (2022)Yusong Wu, Ke Chen, Tianyu Zhang, et al.19.60
- Words Can Shift: Dynamically Adjusting Word Representations Using Nonverbal Behaviors (2018)Yansen Wang, Ying Shen, Zhun Liu, et al.18.79
- Dawn Of The Transformer Era In Speech Emotion Recognition: Closing The Valence Gap (2022)Johannes Wagner, Andreas Triantafyllopoulos, Hagen Wierstorf, et al.18.59
- Multimodal Speech Emotion Recognition Using Audio And Text (2018)Seunghyun Yoon, Seokhyun Byun, Kyomin Jung18.02
- A Joint Cross-attention Model For Audio-visual Fusion In Dimensional Emotion Recognition (2022)R. Gnana Praveen, Wheidima Carneiro de Melo, Nasib Ullah, et al.18.00
- Clotho: An Audio Captioning Dataset (2019)Konstantinos Drossos, Samuel Lipping, Tuomas Virtanen17.70
- Multimodal Transformer Networks For End-to-end Video-grounded Dialogue Systems (2019)Hung Le, Doyen Sahoo, Nancy F. Chen, et al.17.12
- Speech Emotion Recognition With Global-aware Fusion On Multi-scale Feature Representation (2022)Wenjing Zhu, Xiang Li16.53
- Seen And Unseen Emotional Style Transfer For Voice Conversion With A New Emotional Speech Dataset (2020)Kun Zhou, Berrak Sisman, Rui Liu, et al.16.34
- Emotional Voice Conversion: Theory, Databases And ESD (2021)Kun Zhou, Berrak Sisman, Rui Liu, et al.16.30
- Speech Emotion Recognition With Co-attention Based Multi-level Acoustic Information (2022)Heqing Zou, Yuke Si, Chen Chen, et al.16.17
- Attentive Convolutional Neural Network Based Speech Emotion Recognition: A Study On The Impact Of Input Features, Signal Length, And Acted Speech (2017)Michael Neumann, Ngoc Thang Vu16.14
- Emotion2vec: Self-supervised Pre-training For Speech Emotion Representation (2023)Ziyang Ma, Zhisheng Zheng, Jiaxin Ye, et al.15.88
- End-to-end Generative Pretraining For Multimodal Video Captioning (2022)Paul Hongsuck Seo, Arsha Nagrani, Anurag Arnab, et al.15.85
- Multi-modal Dense Video Captioning (2020)Vladimir Iashin, Esa Rahtu15.80
- Speech Emotion Recognition With Dual-sequence LSTM Architecture (2019)Jianyou Wang, Michael Xue, Ryan Culhane, et al.15.78
- Exploring Wav2vec 2.0 Fine-tuning For Improved Speech Emotion Recognition (2021)Li-Wei Chen, Alexander Rudnicky15.67
- Attention Based Fully Convolutional Network For Speech Emotion Recognition (2018)Yuanyuan Zhang, Jun Du, Zirui Wang, et al.15.25
- Learning Alignment For Multimodal Emotion Recognition From Speech (2019)Haiyang Xu, Hui Zhang, Kun Han, et al.15.22
- Transfer Learning For Improving Speech Emotion Classification Accuracy (2018)Siddique Latif, Rajib Rana, Shahzad Younis, et al.15.10
- Automatic Detection Of Depression In Speech Using Ensemble Convolutional Neural Networks (2024)Adrián Vázquez-Romero, Ascensión Gallardo-Antolín15.06
- Light-sernet: A Lightweight Fully Convolutional Neural Network For Speech Emotion Recognition (2021)Arya Aftab, Alireza Morsali, Shahrokh Ghaemmaghami, et al.14.90
- Hierarchical Multimodal Transformer To Summarize Videos (2021)Bin Zhao, Maoguo Gong, Xuelong Li14.69
- Multi-task Semi-supervised Adversarial Autoencoding For Speech Emotion Recognition (2019)Siddique Latif, Rajib Rana, Sara Khalifa, et al.14.58
- Speech Emotion Recognition Using Multi-hop Attention Mechanism (2019)Seunghyun Yoon, Seokhyun Byun, Subhadeep Dey, et al.14.58
- VAST: A Vision-audio-subtitle-text Omni-modality Foundation Model And Dataset (2023)Sihan Chen, Handong Li, Qunbo Wang, et al.14.55
- Direct Modelling Of Speech Emotion From Raw Speech (2019)Siddique Latif, Rajib Rana, Sara Khalifa, et al.14.55
- Speechformer++: A Hierarchical Efficient Framework For Paralinguistic Speech Processing (2023)Weidong Chen, Xiaofen Xing, Xiangmin Xu, et al.14.43
- Speech2affectivegestures: Synthesizing Co-speech Gestures With Generative Adversarial Affective Expression Learning (2021)Uttaran Bhattacharya, Elizabeth Childs, Nicholas Rewkowski, et al.14.35
- Jointly Discovering Visual Objects And Spoken Words From Raw Sensory Input (2018)David Harwath, Adrià Recasens, Dídac Surís, et al.14.27
- X-vectors Meet Emotions: A Study On Dependencies Between Emotion And Speaker Recognition (2020)Raghavendra Pappagari, Tianzi Wang, Jesus Villalba, et al.14.23
- Perfect Match: Improved Cross-modal Embeddings For Audio-visual Synchronisation (2018)Soo-Whan Chung, Joon Son Chung, Hong-Goo Kang14.19
- Progressive Neural Networks For Transfer Learning In Emotion Recognition (2017)John Gideon, Soheil Khorram, Zakaria Aldeneh, et al.14.19
- DNN-HMM Based Speaker Adaptive Emotion Recognition Using Proposed Epoch And MFCC Features (2018)Md. Shah Fahad, Jainath Yadav, Gyadhar Pradhan, et al.14.11
- An Overview Of Affective Speech Synthesis And Conversion In The Deep Learning Era (2022)Andreas Triantafyllopoulos, Björn W. Schuller, Gökçe İymen, et al.14.11
- Word Discovery In Visually Grounded, Self-supervised Speech Models (2022)Puyuan Peng, David Harwath14.08
- Enclap: Combining Neural Audio Codec And Audio-text Joint Embedding For Automated Audio Captioning (2024)Jaeyeon Kim, Jaeyoon Jung, Jinjoo Lee, et al.14.03
- Automated Audio Captioning With Recurrent Neural Networks (2017)Konstantinos Drossos, Sharath Adavanne, Tuomas Virtanen13.97
- Msemotts: Multi-scale Emotion Transfer, Prediction, And Control For Emotional Speech Synthesis (2022)Yi Lei, Shan Yang, Xinsheng Wang, et al.13.97
- Information Fusion In Attention Networks Using Adaptive And Multi-level Factorized Bilinear Pooling For Audio-visual Emotion Recognition (2021)Hengshun Zhou, Jun Du, Yuanyuan Zhang, et al.13.97
- Mingling Or Misalignment? Temporal Shift For Speech Emotion Recognition With Pre-trained Representations (2023)Siyuan Shen, Feng Liu, Aimin Zhou13.84
- Facediffuser: Speech-driven 3D Facial Animation Synthesis Using Diffusion (2023)Stefan Stan, Kazi Injamamul Haque, Zerrin Yumak13.79
- Jointly Fine-tuning "bert-like" Self Supervised Models To Improve Multimodal Speech Emotion Recognition (2020)Shamane Siriwardhana, Andrew Reis, Rivindu Weerasekera, et al.13.74
- Shemo -- A Large-scale Validated Database For Persian Speech Emotion Detection (2019)Omid Mohamad Nezami, Paria Jamshid Lou, Mansoureh Karami13.70
- Speech Emotion Recognition With Multiscale Area Attention And Data Augmentation (2021)Mingke Xu, Fan Zhang, Xiaodong Cui, et al.13.65
- Can Audio Captions Be Evaluated With Image Caption Metrics? (2021)Zelin Zhou, Zhiling Zhang, Xuenan Xu, et al.13.54
- Key-sparse Transformer For Multimodal Speech Emotion Recognition (2021)Weidong Chen, Xiaofeng Xing, Xiangmin Xu, et al.13.50
- Attention-augmented End-to-end Multi-task Learning For Emotion Prediction From Speech (2019)Zixing Zhang, Bingwen Wu, Bjoern Schuller13.50
- Audiosetcaps: An Enriched Audio-caption Dataset Using Automated Generation Pipeline With Large Audio And Language Models (2024)Jisheng Bai, Haohe Liu, Mou Wang, et al.13.44