Speech Translation
50 papers tagged Speech Translation (ordered by heat_score)
Papers
- Conv-tasnet: Surpassing Ideal Time-frequency Magnitude Masking For Speech Separation (2018)Yi Luo, Nima Mesgarani24.08
- Wavlm: Large-scale Self-supervised Pre-training For Full Stack Speech Processing (2021)Sanyuan Chen, Chengyi Wang, Zhengyang Chen, et al.24.00
- Espnet-tts: Unified, Reproducible, And Integratable Open Source End-to-end Text-to-speech Toolkit (2019)Tomoki Hayashi, Ryuichi Yamamoto, Katsuki Inoue, et al.23.32
- Dual-path RNN: Efficient Long Sequence Modeling For Time-domain Single-channel Speech Separation (2019)Yi Luo, Zhuo Chen, Takuya Yoshioka21.06
- State-of-the-art Speech Recognition With Sequence-to-sequence Models (2017)Chung-Cheng Chiu, Tara N. Sainath, Yonghui Wu, et al.21.01
- Multi-talker Speech Separation With Utterance-level Permutation Invariant Training Of Deep Recurrent Neural Networks (2017)Morten Kolbæk, Dong Yu, Zheng-Hua Tan, et al.20.90
- Attention Is All You Need In Speech Separation (2020)Cem Subakan, Mirco Ravanelli, Samuele Cornell, et al.20.59
- Joint Ctc-attention Based End-to-end Speech Recognition Using Multi-task Learning (2016)Suyoun Kim, Takaaki Hori, Shinji Watanabe20.43
- Tasnet: Time-domain Audio Separation Network For Real-time, Single-channel Speech Separation (2017)Yi Luo, Nima Mesgarani20.16
- A Comparative Study On Transformer Vs RNN In Speech Applications (2019)Shigeki Karita, Nanxin Chen, Tomoki Hayashi, et al.20.07
- Neural Speech Synthesis With Transformer Network (2018)Naihan Li, Shujie Liu, Yanqing Liu, et al.19.95
- Unsupervised Cross-lingual Representation Learning For Speech Recognition (2020)Alexis Conneau, Alexei Baevski, Ronan Collobert, et al.18.91
- Light Gated Recurrent Units For Speech Recognition (2018)Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, et al.18.90
- Streaming End-to-end Speech Recognition For Mobile Devices (2018)Yanzhang He, Tara N. Sainath, Rohit Prabhavalkar, et al.18.87
- Espnet-se++: Speech Enhancement For Robust Speech Recognition, Translation, And Understanding (2022)Yen-Ju Lu, Xuankai Chang, Chenda Li, et al.18.72
- Recent Advances In End-to-end Automatic Speech Recognition (2021)Jinyu Li18.62
- Transformer Transducer: A Streamable Speech Recognition Model With Transformer Encoders And RNN-T Loss (2020)Qian Zhang, Han Lu, Hasim Sak, et al.18.58
- Dual-path Transformer Network: Direct Context-aware Modeling For End-to-end Monaural Speech Separation (2020)Jingjing Chen, Qirong Mao, Dong Liu18.24
- The 2020 Espnet Update: New Features, Broadened Applications, Performance Improvements, And Future Plans (2020)Shinji Watanabe, Florian Boyer, Xuankai Chang, et al.18.20
- Amphion: An Open-source Audio, Music And Speech Generation Toolkit (2023)Xueyao Zhang, Liumeng Xue, Yicheng Gu, et al.18.19
- Asvspoof 2021: Towards Spoofed And Deepfake Speech Detection In The Wild (2022)Xuechen Liu, Xin Wang, Md Sahidullah, et al.17.95
- Deep Learning Enabled Semantic Communications With Speech Recognition And Synthesis (2022)Zhenzi Weng, Zhijin Qin, Xiaoming Tao, et al.17.85
- W2v-bert: Combining Contrastive Learning And Masked Language Modeling For Self-supervised Speech Pre-training (2021)Yu-An Chung, Yu Zhang, Wei Han, et al.17.78
- Wenet: Production Oriented Streaming And Non-streaming End-to-end Speech Recognition Toolkit (2021)Zhuoyuan Yao, di Wu, Xiong Wang, et al.17.27
- Contextnet: Improving Convolutional Neural Networks For Automatic Speech Recognition With Global Context (2020)Wei Han, Zhengdong Zhang, Yu Zhang, et al.17.24
- Speaker-independent Speech Separation With Deep Attractor Network (2017)Yi Luo, Zhuo Chen, Nima Mesgarani16.84
- Attention-based Audio-visual Fusion For Robust Automatic Speech Recognition (2018)George Sterpu, Christian Saam, Naomi Harte16.67
- Automatic Speech Recognition Using Advanced Deep Learning Approaches: A Survey (2024)Hamza Kheddar, Mustapha Hemis, Yassine Himeur16.63
- Speech Emotion Recognition With Global-aware Fusion On Multi-scale Feature Representation (2022)Wenjing Zhu, Xiang Li16.53
- Advances In Joint Ctc-attention Based End-to-end Speech Recognition With A Deep CNN Encoder And RNN-LM (2017)Takaaki Hori, Shinji Watanabe, Yu Zhang, et al.16.49
- Adversarial Attacks Against Automatic Speech Recognition Systems Via Psychoacoustic Hiding (2018)Lea Schönherr, Katharina Kohls, Steffen Zeiler, et al.16.45
- Prompting The Hidden Talent Of Web-scale Speech Models For Zero-shot Task Generalization (2023)Puyuan Peng, Brian Yan, Shinji Watanabe, et al.16.38
- Replay And Synthetic Speech Detection With Res2net Architecture (2020)Xu Li, Na Li, Chao Weng, et al.16.32
- Transformer-based Acoustic Modeling For Hybrid Speech Recognition (2019)Yongqiang Wang, Abdelrahman Mohamed, Duc Le, et al.16.30
- Exploring Architectures, Data And Units For Streaming End-to-end Speech Recognition With Rnn-transducer (2018)Kanishka Rao, Haşim Sak, Rohit Prabhavalkar16.21
- Speech Emotion Recognition With Co-attention Based Multi-level Acoustic Information (2022)Heqing Zou, Yuke Si, Chen Chen, et al.16.17
- Whispering Llama: A Cross-modal Generative Error Correction Framework For Speech Recognition (2023)Srijith Radhakrishnan, Chao-Han Huck Yang, Sumeer Ahmad Khan, et al.16.15
- Exploring Speech Enhancement With Generative Adversarial Networks For Robust Speech Recognition (2017)Chris Donahue, Bo Li, Rohit Prabhavalkar16.14
- Wenetspeech: A 10000+ Hours Multi-domain Mandarin Corpus For Speech Recognition (2021)Binbin Zhang, Hang Lv, Pengcheng Guo, et al.16.12
- WHAMR!: Noisy And Reverberant Single-channel Speech Separation (2019)Matthew MacIejewski, Gordon Wichern, Emmett McQuinn, et al.16.10
- Multilingual Speech Recognition With A Single End-to-end Model (2017)Shubham Toshniwal, Tara N. Sainath, Ron J. Weiss, et al.16.05
- T-GSA: Transformer With Gaussian-weighted Self-attention For Speech Enhancement (2019)Jaeyoung Kim, Mostafa El-Khamy, Jungwon Lee15.95
- Visqol V3: An Open Source Production Ready Objective Speech And Audio Metric (2020)Michael Chinen, Felicia S. C. Lim, Jan Skoglund, et al.15.83
- Speech Emotion Recognition With Dual-sequence LSTM Architecture (2019)Jianyou Wang, Michael Xue, Ryan Culhane, et al.15.78
- Bigssl: Exploring The Frontier Of Large-scale Semi-supervised Learning For Automatic Speech Recognition (2021)Yu Zhang, Daniel S. Park, Wei Han, et al.15.73
- Deep Context: End-to-end Contextual Speech Recognition (2018)Golan Pundak, Tara N. Sainath, Rohit Prabhavalkar, et al.15.57
- Self-training For End-to-end Speech Recognition (2019)Jacob Kahn, Ann Lee, Awni Hannun15.48
- Personalized Speech Recognition On Mobile Devices (2016)Ian McGraw, Rohit Prabhavalkar, Raziel Alvarez, et al.15.37
- Neural Speech Recognizer: Acoustic-to-word LSTM Model For Large Vocabulary Speech Recognition (2016)Hagen Soltau, Hank Liao, Hasim Sak15.16
- Direct Speech-to-speech Translation With A Sequence-to-sequence Model (2019)Ye Jia, Ron J. Weiss, Fadi Biadsy, et al.15.13