A Review Of On-device Fully Neural End-to-end Automatic Speech Recognition Algorithms
2020 Β· Chanwoo Kim, Dhananjaya Gowda, Dongsoo Lee, et al.
Abstract
In this paper, we review various end-to-end automatic speech recognition algorithms and their optimization techniques for on-device applications. Conventional speech recognition systems comprise a large number of discrete components such as an acoustic model, a language model, a pronunciation model, a text-normalizer, an inverse-text normalizer, a decoder based on a Weighted Finite State Transducer (WFST), and so on. To obtain sufficiently high speech recognition accuracy with such conventional speech recognition systems, a very large language model (up to 100 GB) is usually needed. Hence, the corresponding WFST size becomes enormous, which prohibits their on-device implementation. Recently, fully neural network end-to-end speech recognition algorithms have been proposed. Examples include speech recognition systems based on Connectionist Temporal Classification (CTC), Recurrent Neural Network Transducer (RNN-T), Attention-based Encoder-Decoder models (AED), Monotonic Chunk-wise Attenti
Authors
(none)
Tags
Stats
Related papers
- Optimizing Speech Recognition For The Edge (2019)0.00
- Streaming End-to-end Speech Recognition For Mobile Devices (2018)18.87
- End-to-end Neural Systems For Automatic Children Speech Recognition: An Empirical Study (2021)0.00
- Advances In All-neural Speech Recognition (2016)11.29
- Analyzing Hidden Representations In End-to-end Automatic Speech Recognition Systems (2017)0.00
- Improving RNN Transducer Modeling For End-to-end Speech Recognition (2019)0.00
- End-to-end Adaptation With Backpropagation Through WFST For On-device Speech Recognition System (2019)5.24
- On The Comparison Of Popular End-to-end Models For Large Scale Speech Recognition (2020)0.00