Conversational Speech Recognition By Learning Conversation-level Characteristics
2022 Β· Kun Wei, Yike Zhang, Sining Sun, et al.
Abstract
Conversational automatic speech recognition (ASR) is a task to recognize conversational speech including multiple speakers. Unlike sentence-level ASR, conversational ASR can naturally take advantages from specific characteristics of conversation, such as role preference and topical coherence. This paper proposes a conversational ASR model which explicitly learns conversation-level characteristics under the prevalent end-to-end neural framework. The highlights of the proposed model are twofold. First, a latent variational module (LVM) is attached to a conformer-based encoder-decoder ASR backbone to learn role preference and topical coherence. Second, a topic model is specifically adopted to bias the outputs of the decoder to words in the predicted topics. Experiments on two Mandarin conversational ASR tasks show that the proposed model achieves a maximum 12% relative character error rate (CER) reduction.
Authors
(none)
Tags
Stats
Related papers
- VAIS ASR: Building A Conversational Speech Recognition System Using Language Model Combination (2019)0.00
- Effective Cross-utterance Language Modeling For Conversational Speech Recognition (2021)2.26
- Improving Transformer-based Conversational ASR By Inter-sentential Attention Mechanism (2022)7.50
- Speaker Conditioned Acoustic Modeling For Multi-speaker Conversational ASR (2021)4.52
- Leveraging Acoustic Contextual Representation By Audio-textual Cross-modal Learning For Conversational ASR (2022)0.00
- Non-autoregressive End-to-end Approaches For Joint Automatic Speech Recognition And Spoken Language Understanding (2023)5.84
- Visualizing Automatic Speech Recognition -- Means For A Better Understanding? (2022)4.52
- Effectiveasr: A Single-step Non-autoregressive Mandarin Speech Recognition Architecture With High Accuracy And Inference Speed (2024)3.58