Hybrid Transducer And Attention Based Encoder-decoder Modeling For Speech-to-text Tasks
2023 Β· Yun Tang, Anna Y. Sun, Hirofumi Inaguma, et al.
Abstract
Transducer and Attention based Encoder-Decoder (AED) are two widely used frameworks for speech-to-text tasks. They are designed for different purposes and each has its own benefits and drawbacks for speech-to-text tasks. In order to leverage strengths of both modeling methods, we propose a solution by combining Transducer and Attention based Encoder-Decoder (TAED) for speech-to-text tasks. The new method leverages AED's strength in non-monotonic sequence to sequence learning while retaining Transducer's streaming property. In the proposed framework, Transducer and AED share the same speech encoder. The predictor in Transducer is replaced by the decoder in the AED model, and the outputs of the decoder are conditioned on the speech inputs instead of outputs from an unconditioned language model. The proposed solution ensures that the model is optimized by covering all possible read/write scenarios and creates a matched environment for streaming applications. We evaluate the proposed appro
Authors
(none)
Tags
Stats
Related papers
- Hybrid Attention-based Encoder-decoder Model For Efficient Language Model Adaptation (2023)0.00
- Automatic Audio Captioning Using Attention Weighted Event Based Embeddings (2022)0.00
- Optimizing Alignment Of Speech And Language Latent Spaces For End-to-end Speech Recognition And Understanding (2021)9.03
- Alignment Knowledge Distillation For Online Streaming Attention-based Speech Recognition (2021)7.16
- Investigating Methods To Improve Language Model Integration For Attention-based Encoder-decoder ASR Models (2021)0.00
- Chunked Attention-based Encoder-decoder Model For Streaming Speech Recognition (2023)7.81
- Joint Ctc-attention Based End-to-end Speech Recognition Using Multi-task Learning (2016)20.43
- Text-conditioned Transformer For Automatic Pronunciation Error Detection (2020)10.48