Recent Developments On Espnet Toolkit Boosted By Conformer
2020 Β· Pengcheng Guo, Florian Boyer, Xuankai Chang, et al.
Abstract
In this study, we present recent developments on ESPnet: End-to-End Speech Processing toolkit, which mainly involves a recently proposed architecture called Conformer, Convolution-augmented Transformer. This paper shows the results for a wide range of end-to-end speech processing applications, such as automatic speech recognition (ASR), speech translations (ST), speech separation (SS) and text-to-speech (TTS). Our experiments reveal various training tips and significant performance benefits obtained with the Conformer on different tasks. These results are competitive or even outperform the current state-of-art Transformer models. We are preparing to release all-in-one recipes using open source and publicly available corpora for all the above tasks with pre-trained models. Our aim for this work is to contribute to our research community by reducing the burden of preparing state-of-the-art research environments usually requiring high resources.
Authors
(none)
Tags
Stats
Related papers
- The 2020 Espnet Update: New Features, Broadened Applications, Performance Improvements, And Future Plans (2020)18.20
- Espnet-tts: Unified, Reproducible, And Integratable Open Source End-to-end Text-to-speech Toolkit (2019)23.32
- Espnet: End-to-end Speech Processing Toolkit (2018)22.17
- Nextformer: A Convnext Augmented Conformer For End-to-end Speech Recognition (2022)0.00
- A Comparative Study On E-branchformer Vs Conformer In Speech Recognition, Translation, And Understanding Tasks (2023)7.81
- Espnet-se: End-to-end Speech Enhancement And Separation Toolkit Designed For Asr Integration (2020)13.55
- Towards A Unified Conformer Structure: From ASR To ASV Task (2022)13.11
- Fast Conformer With Linearly Scalable Attention For Efficient Speech Recognition (2023)14.47