Online Hybrid Ctc/attention End-to-end Automatic Speech Recognition Architecture
2023 Β· Haoran Miao, Gaofeng Cheng, Pengyuan Zhang, et al.
Abstract
Recently, there has been increasing progress in end-to-end automatic speech recognition (ASR) architecture, which transcribes speech to text without any pre-trained alignments. One popular end-to-end approach is the hybrid Connectionist Temporal Classification (CTC) and attention (CTC/attention) based ASR architecture. However, how to deploy hybrid CTC/attention systems for online speech recognition is still a non-trivial problem. This article describes our proposed online hybrid CTC/attention end-to-end ASR architecture, which replaces all the offline components of conventional CTC/attention ASR architecture with their corresponding streaming components. Firstly, we propose stable monotonic chunk-wise attention (sMoChA) to stream the conventional global attention, and further propose monotonic truncated attention (MTA) to simplify sMoChA and solve the training-and-decoding mismatch problem of sMoChA. Secondly, we propose truncated CTC (T-CTC) prefix score to stream CTC prefix score ca
Authors
(none)
Tags
Stats
Related papers
- Transformer-based Online Ctc/attention End-to-end Speech Recognition Architecture (2020)14.06
- Alignment Knowledge Distillation For Online Streaming Attention-based Speech Recognition (2021)7.16
- Unified Streaming And Non-streaming Two-pass End-to-end Model For Speech Recognition (2020)0.00
- Multi-stream End-to-end Speech Recognition (2019)8.35
- An Improved Hybrid Ctc-attention Model For Speech Recognition (2018)0.00
- Audio Adversarial Examples For Robust Hybrid Ctc/attention Speech Recognition (2020)3.58
- Streaming Chunk-aware Multihead Attention For Online End-to-end Speech Recognition (2020)8.60
- An Investigation Of Enhancing CTC Model For Triggered Attention-based Streaming ASR (2021)0.00