Audio Understanding
50 papers tagged Audio Understanding (ordered by heat_score)
Papers
- Large-scale Contrastive Language-audio Pretraining With Feature Fusion And Keyword-to-caption Augmentation (2022)Yusong Wu, Ke Chen, Tianyu Zhang, et al.19.60
- Espnet-se++: Speech Enhancement For Robust Speech Recognition, Translation, And Understanding (2022)Yen-Ju Lu, Xuankai Chang, Chenda Li, et al.18.72
- Multimodal Speech Emotion Recognition Using Audio And Text (2018)Seunghyun Yoon, Seokhyun Byun, Kyomin Jung18.02
- Towards End-to-end Spoken Language Understanding (2018)Dmitriy Serdyuk, Yongqiang Wang, Christian Fuegen, et al.14.73
- Deep Cross-modal Correlation Learning For Audio And Lyrics In Music Retrieval (2017)Yi Yu, Suhua Tang, Francisco Raposo, et al.14.06
- Audiomnist: Exploring Explainable Artificial Intelligence For Audio Analysis On A Simple Benchmark (2018)Sören Becker, Johanna Vielhaben, Marcel Ackermann, et al.13.50
- Joint Online Spoken Language Understanding And Language Modeling With Recurrent Neural Networks (2016)Bing Liu, Ian Lane13.28
- Cm-net: A Novel Collaborative Memory Network For Spoken Language Understanding (2019)Yijin Liu, Fandong Meng, Jinchao Zhang, et al.13.28
- From Audio To Semantics: Approaches To End-to-end Spoken Language Understanding (2018)Parisa Haghani, Arun Narayanan, Michiel Bacchiani, et al.13.23
- Speechbert: An Audio-and-text Jointly Learned Language Model For End-to-end Spoken Question Answering (2019)Yung-Sung Chuang, Chi-Liang Liu, Hung-Yi Lee, et al.12.33
- Learning Asr-robust Contextualized Embeddings For Spoken Language Understanding (2019)Chao-Wei Huang, Yun-Nung Chen12.02
- Adapting Pretrained Transformer To Lattices For Spoken Language Understanding (2020)Chao-Wei Huang, Yun-Nung Chen12.00
- WAVPROMPT: Towards Few-shot Spoken Language Understanding With Frozen Language Models (2022)Heting Gao, Junrui Ni, Kaizhi Qian, et al.11.98
- Towards Understanding And Mitigating Audio Adversarial Examples For Speaker Recognition (2022)Guangke Chen, Zhe Zhao, Fu Song, et al.11.67
- Attention And Localization Based On A Deep Convolutional Recurrent Model For Weakly Supervised Audio Tagging (2017)Yong Xu, Qiuqiang Kong, Qiang Huang, et al.11.39
- Temporal Working Memory: Query-guided Segment Refinement For Enhanced Multimodal Understanding (2025)Xingjian Diao, Chunhui Zhang, Weiyi Wu, et al.11.33
- Sd-eval: A Benchmark Dataset For Spoken Dialogue Understanding Beyond Words (2024)Junyi Ao, Yuancheng Wang, Xiaohai Tian, et al.11.32
- Curriculum-based Transfer Learning For An Effective End-to-end Spoken Language Understanding And Domain Portability (2019)Antoine Caubrière, Natalia Tomashenko, Antoine Laurent, et al.10.74
- VALOR: Vision-audio-language Omni-perception Pretraining Model And Dataset (2023)Jing Liu, Sihan Chen, Xingjian He, et al.10.61
- Data Augmentation For Spoken Language Understanding Via Joint Variational Generation (2018)Kang Min Yoo, Youhyun Shin, Sang-Goo Lee10.61
- SPLAT: Speech-language Joint Pre-training For Spoken Language Understanding (2020)Yu-An Chung, Chenguang Zhu, Michael Zeng10.35
- Gl-clef: A Global-local Contrastive Learning Framework For Cross-lingual Spoken Language Understanding (2022)Libo Qin, Qiguang Chen, Tianbao Xie, et al.10.35
- Audiobench: A Universal Benchmark For Audio Large Language Models (2024)Bin Wang, Xunlong Zou, Geyu Lin, et al.10.21
- SLUE Phase-2: A Benchmark Suite Of Diverse Spoken Language Understanding Tasks (2022)Suwon Shon, Siddhant Arora, Chyi-Jiunn Lin, et al.10.07
- Pretrained Semantic Speech Embeddings For End-to-end Spoken Language Understanding Via Cross-modal Teacher-student Learning (2020)Pavel Denisov, Ngoc Thang Vu9.92
- ASR Error Management For Improving Spoken Language Understanding (2017)Edwin Simonnet, Sahar Ghannay, Nathalie Camelin, et al.9.92
- Large Language Models Are Strong Audio-visual Speech Recognition Learners (2024)Umberto Cappellazzo, Minsu Kim, Honglie Chen, et al.9.59
- ST-BERT: Cross-modal Language Model Pre-training For End-to-end Spoken Language Understanding (2020)Minjeong Kim, Gyuwan Kim, Sang-Woo Lee, et al.9.59
- Two-stage Textual Knowledge Distillation For End-to-end Spoken Language Understanding (2020)Seongbin Kim, Gyuwan Kim, Seongjin Shin, et al.9.41
- Speech-language Pre-training For End-to-end Spoken Language Understanding (2021)Yao Qian, Ximo Bian, Yu Shi, et al.9.41
- Using Speech Synthesis To Train End-to-end Spoken Language Understanding Models (2019)Loren Lugosch, Brett Meyer, Derek Nowrouzezahrai, et al.9.23
- Tie Your Embeddings Down: Cross-modal Latent Spaces For End-to-end Spoken Language Understanding (2020)Bhuvan Agrawal, Markus Müller, Martin Radfar, et al.9.03
- End-to-end Architectures For Asr-free Spoken Language Understanding (2019)Elisavet Palogiannidi, Ioannis Gkinis, George Mastrapas, et al.8.60
- Dynamic Time-aware Attention To Speaker Roles And Contexts For Spoken Language Understanding (2017)Po-Chun Chen, Ta-Chung Chi, Shang-Yu Su, et al.8.35
- End-to-end Spoken Language Understanding: Performance Analyses Of A Voice Command Task In A Low Resource Setting (2022)Thierry Desot, François Portet, Michel Vacher8.35
- Towards Reducing The Need For Speech Training Data To Build Spoken Language Understanding Systems (2022)Samuel Thomas, Hong-Kwang J. Kuo, Brian Kingsbury, et al.8.35
- Multi-source Spatial Knowledge Understanding For Immersive Visual Text-to-speech (2024)Shuwei He, Rui Liu8.17
- A Study On The Integration Of Pre-trained SSL, ASR, LM And SLU Models For Spoken Language Understanding (2022)Yifan Peng, Siddhant Arora, Yosuke Higuchi, et al.8.09
- Recent Advances In End-to-end Spoken Language Understanding (2019)Natalia Tomashenko, Antoine Caubriere, Yannick Esteve, et al.8.09
- PIN: A Novel Parallel Interactive Network For Spoken Language Understanding (2020)Peilin Zhou, Zhiqi Huang, Fenglin Liu, et al.8.09
- MFAAN: Unveiling Audio Deepfakes With A Multi-feature Authenticity Network (2023)Karthik Sivarama Krishnan, Koushik Sivarama Krishnan7.81
- A Comparative Study On E-branchformer Vs Conformer In Speech Recognition, Translation, And Understanding Tasks (2023)Yifan Peng, Kwangyoun Kim, Felix Wu, et al.7.81
- Ezaudio: Enhancing Text-to-audio Generation With Efficient Diffusion Transformer (2024)Jiarui Hai, Yong Xu, Hao Zhang, et al.7.50
- On Joint Training With Interfaces For Spoken Language Understanding (2021)Anirudh Raju, Milind Rao, Gautam Tiwari, et al.7.16
- Attentive Contextual Carryover For Multi-turn End-to-end Spoken Language Understanding (2021)Kai Wei, Thanh Tran, Feng-Ju Chang, et al.7.16
- Audio Caption In A Car Setting With A Sentence-level Loss (2019)Xuenan Xu, Heinrich Dinkel, Mengyue Wu, et al.7.16
- A Hierarchical Decoding Model For Spoken Language Understanding From Unaligned Data (2019)Zijian Zhao, Su Zhu, Kai Yu7.16
- Multi-modal And Multi-scale Spatial Environment Understanding For Immersive Visual Text-to-speech (2024)Rui Liu, Shuwei He, Yifan Hu, et al.6.79
- Style Attuned Pre-training And Parameter Efficient Fine-tuning For Spoken Language Understanding (2020)Jin Cao, Jun Wang, Wael Hamza, et al.6.77
- Cross-lingual Spoken Language Understanding With Regularized Representation Alignment (2020)Zihan Liu, Genta Indra Winata, Peng Xu, et al.6.77