cluster #7
50 papers in this cluster (ordered by heat_score)
Papers
- Espnet-tts: Unified, Reproducible, And Integratable Open Source End-to-end Text-to-speech Toolkit (2019)Tomoki Hayashi, Ryuichi Yamamoto, Katsuki Inoue, et al.23.32
- Espnet: End-to-end Speech Processing Toolkit (2018)Shinji Watanabe, Takaaki Hori, Shigeki Karita, et al.22.17
- Libritts: A Corpus Derived From Librispeech For Text-to-speech (2019)Heiga Zen, Viet Dang, Rob Clark, et al.20.79
- The 2020 Espnet Update: New Features, Broadened Applications, Performance Improvements, And Future Plans (2020)Shinji Watanabe, Florian Boyer, Xuankai Chang, et al.18.20
- Mintrec: A New Dataset For Multimodal Intent Recognition (2022)Hanlei Zhang, Hua Xu, Xin Wang, et al.17.08
- Speechgpt: Empowering Large Language Models With Intrinsic Cross-modal Conversational Abilities (2023)Dong Zhang, Shimin Li, Xin Zhang, et al.16.59
- Direct Speech-to-speech Translation With A Sequence-to-sequence Model (2019)Ye Jia, Ron J. Weiss, Fadi Biadsy, et al.15.13
- Large-scale Multilingual Speech Recognition With A Streaming End-to-end Model (2019)Anjuli Kannan, Arindrima Datta, Tara N. Sainath, et al.14.97
- Towards End-to-end Spoken Language Understanding (2018)Dmitriy Serdyuk, Yongqiang Wang, Christian Fuegen, et al.14.73
- Recent Advances In Speech Language Models: A Survey (2024)Wenqian Cui, Dianzhi Yu, Xiaoqi Jiao, et al.14.64
- Token-level Contrastive Learning With Modality-aware Prompting For Multimodal Intent Recognition (2023)Qianrui Zhou, Hua Xu, Hao Li, et al.14.17
- Bytes Are All You Need: End-to-end Multilingual Speech Recognition And Synthesis With Bytes (2018)Bo Li, Yu Zhang, Tara Sainath, et al.14.15
- AGIF: An Adaptive Graph-interactive Framework For Joint Multiple Intent Detection And Slot Filling (2020)Libo Qin, Xiao Xu, Wanxiang Che, et al.14.11
- Direct Speech-to-speech Translation With Discrete Units (2021)Ann Lee, Peng-Jen Chen, Changhan Wang, et al.13.97
- Dual-decoder Transformer For Joint Automatic Speech Recognition And Multilingual Speech Translation (2020)Hang Le, Juan Pino, Changhan Wang, et al.13.73
- Textless Speech-to-speech Translation On Real Data (2021)Ann Lee, Hongyu Gong, Paul-Ambroise Duquenne, et al.13.65
- Joint Online Spoken Language Understanding And Language Modeling With Recurrent Neural Networks (2016)Bing Liu, Ian Lane13.28
- Cm-net: A Novel Collaborative Memory Network For Spoken Language Understanding (2019)Yijin Liu, Fandong Meng, Jinchao Zhang, et al.13.28
- From Audio To Semantics: Approaches To End-to-end Spoken Language Understanding (2018)Parisa Haghani, Arun Narayanan, Michiel Bacchiani, et al.13.23
- Emotion Rendering For Conversational Speech Synthesis With Heterogeneous Graph-based Context Modeling (2023)Rui Liu, Yifan Hu, Yi Ren, et al.13.15
- Leveraging Weakly Supervised Data To Improve End-to-end Speech-to-text Translation (2018)Ye Jia, Melvin Johnson, Wolfgang MacHerey, et al.13.05
- Multimodal Machine Translation Through Visuals And Speech (2019)Umut Sulubacak, Ozan Caglayan, Stig-Arne Grönroos, et al.12.68
- Speechbert: An Audio-and-text Jointly Learned Language Model For End-to-end Spoken Question Answering (2019)Yung-Sung Chuang, Chi-Liang Liu, Hung-Yi Lee, et al.12.33
- A Novel Bi-directional Interrelated Model For Joint Intent Detection And Slot Filling (2019)Haihong E, Peiqing Niu, Zhongfu Chen, et al.12.33
- On The End-to-end Solution To Mandarin-english Code-switching Speech Recognition (2018)Zhiping Zeng, Yerbolat Khassanov, van Tung Pham, et al.12.10
- Rate-adaptive Coding Mechanism For Semantic Communications With Multi-modal Data (2023)Yangshuo He, Guanding Yu, Yunlong Cai11.93
- Speech Translation And The End-to-end Promise: Taking Stock Of Where We Are (2020)Matthias Sperber, Matthias Paulik11.93
- Open Source Magicdata-ramc: A Rich Annotated Mandarin Conversational(ramc) Speech Dataset (2022)Zehui Yang, Yifan Chen, Lei Luo, et al.11.93
- Voxinstruct: Expressive Human Instruction-to-speech Generation With Unified Multilingual Codec Language Modelling (2024)Yixuan Zhou, Xiaoyu Qin, Zeyu Jin, et al.11.81
- A General Multi-task Learning Framework To Leverage Text Data For Speech To Text Tasks (2020)Yun Tang, Juan Pino, Changhan Wang, et al.11.67
- STEMM: Self-learning With Speech-text Manifold Mixup For Speech Translation (2022)Qingkai Fang, Rong Ye, Lei Li, et al.11.58
- Sd-eval: A Benchmark Dataset For Spoken Dialogue Understanding Beyond Words (2024)Junyi Ao, Yuancheng Wang, Xiaohai Tian, et al.11.32
- SALM: Speech-augmented Language Model With In-context Learning For Speech Recognition And Translation (2023)Zhehuai Chen, He Huang, Andrei Andrusenko, et al.11.29
- Code-switched Language Models Using Neural Based Synthetic Data From Parallel Sentences (2019)Genta Indra Winata, Andrea Madotto, Chien-Sheng Wu, et al.11.29
- Large Language Model Can Transcribe Speech In Multi-talker Scenarios With Versatile Instructions (2024)Lingwei Meng, Shujie Hu, Jiawen Kang, et al.11.23
- Syntactic And Semantic Features For Code-switching Factored Language Models (2017)Heike Adel, Ngoc Thang Vu, Katrin Kirchhoff, et al.11.19
- Transformer-transducers For Code-switched Speech Recognition (2020)Siddharth Dalmia, Yuzong Liu, Srikanth Ronanki, et al.10.97
- Improving Speech Translation By Understanding And Learning From The Auxiliary Text Translation Task (2021)Yun Tang, Juan Pino, Xian Li, et al.10.97
- A Crowdsourced Open-source Kazakh Speech Corpus And Initial Speech Recognition Baseline (2020)Yerbolat Khassanov, Saida Mussakhojayeva, Almas Mirzakhmetov, et al.10.85
- Enhanced Direct Speech-to-speech Translation Using Self-supervised Pre-training And Data Augmentation (2022)Sravya Popuri, Peng-Jen Chen, Changhan Wang, et al.10.85
- Curriculum-based Transfer Learning For An Effective End-to-end Spoken Language Understanding And Domain Portability (2019)Antoine Caubrière, Natalia Tomashenko, Antoine Laurent, et al.10.74
- Speechut: Bridging Speech And Text With Hidden-unit For Encoder-decoder Based Speech-text Pre-training (2022)Ziqiang Zhang, Long Zhou, Junyi Ao, et al.10.74
- Llast: Improved End-to-end Speech Translation System Leveraged By Large Language Models (2024)Xi Chen, Songyang Zhang, Qibing Bai, et al.10.67
- Data Augmentation For Spoken Language Understanding Via Joint Variational Generation (2018)Kang Min Yoo, Youhyun Shin, Sang-Goo Lee10.61
- Synchronous Speech Recognition And Speech-to-text Translation With Interactive Decoding (2019)Yuchen Liu, Jiajun Zhang, Hao Xiong, et al.10.48
- Stacked Acoustic-and-textual Encoding: Integrating The Pre-trained Models Into Speech Translation Encoders (2021)Chen Xu, Bojie Hu, Yanyang Li, et al.10.48
- Towards Speech-to-text Translation Without Speech Recognition (2017)Sameer Bansal, Herman Kamper, Adam Lopez, et al.10.35
- SPLAT: Speech-language Joint Pre-training For Spoken Language Understanding (2020)Yu-An Chung, Chenguang Zhu, Michael Zeng10.35
- Gl-clef: A Global-local Contrastive Learning Framework For Cross-lingual Spoken Language Understanding (2022)Libo Qin, Qiguang Chen, Tianbao Xie, et al.10.35
- Leveraging Pseudo-labeled Data To Improve Direct Speech-to-speech Translation (2022)Qianqian Dong, Fengpeng Yue, Tom Ko, et al.10.33