Speech Recognition
50 papers tagged Speech Recognition (ordered by heat_score)
Papers
- Conv-tasnet: Surpassing Ideal Time-frequency Magnitude Masking For Speech Separation (2018)Yi Luo, Nima Mesgarani24.08
- Wavlm: Large-scale Self-supervised Pre-training For Full Stack Speech Processing (2021)Sanyuan Chen, Chengyi Wang, Zhengyang Chen, et al.24.00
- Voxceleb2: Deep Speaker Recognition (2018)Joon Son Chung, Arsha Nagrani, Andrew Zisserman23.96
- Espnet-tts: Unified, Reproducible, And Integratable Open Source End-to-end Text-to-speech Toolkit (2019)Tomoki Hayashi, Ryuichi Yamamoto, Katsuki Inoue, et al.23.32
- Espnet: End-to-end Speech Processing Toolkit (2018)Shinji Watanabe, Takaaki Hori, Shigeki Karita, et al.22.17
- Dual-path RNN: Efficient Long Sequence Modeling For Time-domain Single-channel Speech Separation (2019)Yi Luo, Zhuo Chen, Takuya Yoshioka21.06
- State-of-the-art Speech Recognition With Sequence-to-sequence Models (2017)Chung-Cheng Chiu, Tara N. Sainath, Yonghui Wu, et al.21.01
- Multi-talker Speech Separation With Utterance-level Permutation Invariant Training Of Deep Recurrent Neural Networks (2017)Morten Kolbæk, Dong Yu, Zheng-Hua Tan, et al.20.90
- Libritts: A Corpus Derived From Librispeech For Text-to-speech (2019)Heiga Zen, Viet Dang, Rob Clark, et al.20.79
- Speaker Recognition From Raw Waveform With Sincnet (2018)Mirco Ravanelli, Yoshua Bengio20.65
- Joint Ctc-attention Based End-to-end Speech Recognition Using Multi-task Learning (2016)Suyoun Kim, Takaaki Hori, Shinji Watanabe20.43
- Tasnet: Time-domain Audio Separation Network For Real-time, Single-channel Speech Separation (2017)Yi Luo, Nima Mesgarani20.16
- A Comparative Study On Transformer Vs RNN In Speech Applications (2019)Shigeki Karita, Nanxin Chen, Tomoki Hayashi, et al.20.07
- Unsupervised Cross-lingual Representation Learning For Speech Recognition (2020)Alexis Conneau, Alexei Baevski, Ronan Collobert, et al.18.91
- Light Gated Recurrent Units For Speech Recognition (2018)Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, et al.18.90
- Fine-tuning Wav2vec2 For Speaker Recognition (2021)Nik Vaessen, David A. van Leeuwen18.88
- Streaming End-to-end Speech Recognition For Mobile Devices (2018)Yanzhang He, Tara N. Sainath, Rohit Prabhavalkar, et al.18.87
- Speaker Recognition Based On Deep Learning: An Overview (2020)Zhongxin Bai, Xiao-Lei Zhang18.86
- Espnet-se++: Speech Enhancement For Robust Speech Recognition, Translation, And Understanding (2022)Yen-Ju Lu, Xuankai Chang, Chenda Li, et al.18.72
- Recent Advances In End-to-end Automatic Speech Recognition (2021)Jinyu Li18.62
- Dawn Of The Transformer Era In Speech Emotion Recognition: Closing The Valence Gap (2022)Johannes Wagner, Andreas Triantafyllopoulos, Hagen Wierstorf, et al.18.59
- Transformer Transducer: A Streamable Speech Recognition Model With Transformer Encoders And RNN-T Loss (2020)Qian Zhang, Han Lu, Hasim Sak, et al.18.58
- A Wavenet For Speech Denoising (2017)Dario Rethage, Jordi Pons, Xavier Serra18.47
- Dual-path Transformer Network: Direct Context-aware Modeling For End-to-end Monaural Speech Separation (2020)Jingjing Chen, Qirong Mao, Dong Liu18.24
- The 2020 Espnet Update: New Features, Broadened Applications, Performance Improvements, And Future Plans (2020)Shinji Watanabe, Florian Boyer, Xuankai Chang, et al.18.20
- Amphion: An Open-source Audio, Music And Speech Generation Toolkit (2023)Xueyao Zhang, Liumeng Xue, Yicheng Gu, et al.18.19
- Multimodal Speech Emotion Recognition Using Audio And Text (2018)Seunghyun Yoon, Seokhyun Byun, Kyomin Jung18.02
- A Joint Cross-attention Model For Audio-visual Fusion In Dimensional Emotion Recognition (2022)R. Gnana Praveen, Wheidima Carneiro de Melo, Nasib Ullah, et al.18.00
- Deep Learning Enabled Semantic Communications With Speech Recognition And Synthesis (2022)Zhenzi Weng, Zhijin Qin, Xiaoming Tao, et al.17.85
- W2v-bert: Combining Contrastive Learning And Masked Language Modeling For Self-supervised Speech Pre-training (2021)Yu-An Chung, Yu Zhang, Wei Han, et al.17.78
- TERA: Self-supervised Learning Of Transformer Encoder Representation For Speech (2020)Andy T. Liu, Shang-Wen Li, Hung-Yi Lee17.61
- Voicefilter: Targeted Voice Separation By Speaker-conditioned Spectrogram Masking (2018)Quan Wang, Hannah Muckenhirn, Kevin Wilson, et al.17.48
- Funcodec: A Fundamental, Reproducible And Integrable Open-source Toolkit For Neural Speech Codec (2023)Zhihao Du, Shiliang Zhang, Kai Hu, et al.17.47
- Wenet: Production Oriented Streaming And Non-streaming End-to-end Speech Recognition Toolkit (2021)Zhuoyuan Yao, di Wu, Xiong Wang, et al.17.27
- Mockingjay: Unsupervised Speech Representation Learning With Deep Bidirectional Transformer Encoders (2019)Andy T. Liu, Shu-Wen Yang, Po-Han Chi, et al.17.26
- Contextnet: Improving Convolutional Neural Networks For Automatic Speech Recognition With Global Context (2020)Wei Han, Zhengdong Zhang, Yu Zhang, et al.17.24
- Mintrec: A New Dataset For Multimodal Intent Recognition (2022)Hanlei Zhang, Hua Xu, Xin Wang, et al.17.08
- Exploring The Encoding Layer And Loss Function In End-to-end Speaker And Language Recognition System (2018)Weicheng Cai, Jinkun Chen, Ming Li17.07
- Self-supervised Speaker Recognition With Loss-gated Learning (2021)Ruijie Tao, Kong Aik Lee, Rohan Kumar Das, et al.16.93
- Speaker-independent Speech Separation With Deep Attractor Network (2017)Yi Luo, Zhuo Chen, Nima Mesgarani16.84
- Attention-based Audio-visual Fusion For Robust Automatic Speech Recognition (2018)George Sterpu, Christian Saam, Naomi Harte16.67
- Automatic Speech Recognition Using Advanced Deep Learning Approaches: A Survey (2024)Hamza Kheddar, Mustapha Hemis, Yassine Himeur16.63
- Speech Emotion Recognition With Global-aware Fusion On Multi-scale Feature Representation (2022)Wenjing Zhu, Xiang Li16.53
- Advances In Joint Ctc-attention Based End-to-end Speech Recognition With A Deep CNN Encoder And RNN-LM (2017)Takaaki Hori, Shinji Watanabe, Yu Zhang, et al.16.49
- Adversarial Attacks Against Automatic Speech Recognition Systems Via Psychoacoustic Hiding (2018)Lea Schönherr, Katharina Kohls, Steffen Zeiler, et al.16.45
- CN-CELEB: A Challenging Chinese Speaker Recognition Dataset (2019)Yue Fan, Jiawen Kang, Lantian Li, et al.16.39
- Prompting The Hidden Talent Of Web-scale Speech Models For Zero-shot Task Generalization (2023)Puyuan Peng, Brian Yan, Shinji Watanabe, et al.16.38
- Seen And Unseen Emotional Style Transfer For Voice Conversion With A New Emotional Speech Dataset (2020)Kun Zhou, Berrak Sisman, Rui Liu, et al.16.34
- Replay And Synthetic Speech Detection With Res2net Architecture (2020)Xu Li, Na Li, Chao Weng, et al.16.32
- Transformer-based Acoustic Modeling For Hybrid Speech Recognition (2019)Yongqiang Wang, Abdelrahman Mohamed, Duc Le, et al.16.30