Transformer Transducer: A Streamable Speech Recognition Model With Transformer Encoders And RNN-T Loss
2020 Β· Qian Zhang, Han Lu, Hasim Sak, et al.
Abstract
In this paper we present an end-to-end speech recognition model with Transformer encoders that can be used in a streaming speech recognition system. Transformer computation blocks based on self-attention are used to encode both audio and label sequences independently. The activations from both audio and label encoders are combined with a feed-forward layer to compute a probability distribution over the label space for every combination of acoustic frame position and label history. This is similar to the Recurrent Neural Network Transducer (RNN-T) model, which uses RNNs for information encoding instead of Transformer encoders. The model is trained with the RNN-T loss well-suited to streaming decoding. We present results on the LibriSpeech dataset showing that limiting the left context for self-attention in the Transformer layers makes decoding computationally tractable for streaming, with only a slight degradation in accuracy. We also show that the full attention version of our model be
Authors
(none)
Tags
Stats
Related papers
- Conv-transformer Transducer: Low Latency, Low Frame Rate, Streamable End-to-end Speech Recognition (2020)11.08
- Transformer-transducer: End-to-end Speech Recognition With Self-attention (2019)0.00
- Developing Real-time Streaming Transformer Transducer For Speech Recognition On Large-scale Dataset (2020)0.00
- Transformer Transducer: One Model Unifying Streaming And Non-streaming Speech Recognition (2020)0.00
- Exploring Architectures, Data And Units For Streaming End-to-end Speech Recognition With Rnn-transducer (2018)16.21
- Multitask Learning And Joint Optimization For Transformer-rnn-transducer Speech Recognition (2020)8.09
- Improving RNN Transducer Modeling For End-to-end Speech Recognition (2019)0.00
- Cascade Rnn-transducer: Syllable Based Streaming On-device Mandarin Speech Recognition With A Syllable-to-character Converter (2020)9.92