Text-conditioned Transformer For Automatic Pronunciation Error Detection
2020 Β· Zhan Zhang, Yuehai Wang, Jianyi Yang
Abstract
Automatic pronunciation error detection (APED) plays an important role in the domain of language learning. As for the previous ASR-based APED methods, the decoded results need to be aligned with the target text so that the errors can be found out. However, since the decoding process and the alignment process are independent, the prior knowledge about the target text is not fully utilized. In this paper, we propose to use the target text as an extra condition for the Transformer backbone to handle the APED task. The proposed method can output the error states with consideration of the relationship between the input speech and the target text in a fully end-to-end fashion.Meanwhile, as the prior target text is used as a condition for the decoder input, the Transformer works in a feed-forward manner instead of autoregressive in the inference stage, which can significantly boost the speed in the actual deployment. We set the ASR-based Transformer as the baseline APED model and conduct seve
Authors
(none)
Tags
Stats
Related papers
- Patcorrect: Non-autoregressive Phoneme-augmented Transformer For ASR Error Correction (2023)0.00
- Hybrid Transducer And Attention Based Encoder-decoder Modeling For Speech-to-text Tasks (2023)6.77
- A CTC Alignment-based Non-autoregressive Transformer For End-to-end Automatic Speech Recognition (2023)10.97
- Decoupling Pronunciation And Language For End-to-end Code-switching Automatic Speech Recognition (2020)0.00
- Fast Offline Transformer-based End-to-end Automatic Speech Recognition For Real-world Applications (2021)7.16
- Label-synchronous Speech-to-text Alignment For ASR Using Forward And Backward Transformers (2021)0.00
- Improving Transformer-based Conversational ASR By Inter-sentential Attention Mechanism (2022)7.50
- Controllable Time-delay Transformer For Real-time Punctuation Prediction And Disfluency Detection (2020)10.48