Attentive Temporal Pooling For Conformer-based Streaming Language Identification In Long-form Speech
2022 Β· Quan Wang, Yang Yu, Jason Pelecanos, et al.
Abstract
In this paper, we introduce a novel language identification system based on conformer layers. We propose an attentive temporal pooling mechanism to allow the model to carry information in long-form audio via a recurrent form, such that the inference can be performed in a streaming fashion. Additionally, we investigate two domain adaptation approaches to allow adapting an existing language identification model without retraining the model parameters for a new domain. We perform a comparative study of different model topologies under different constraints of model size, and find that conformer-based models significantly outperform LSTM and transformer based models. Our experiments also show that attentive temporal pooling and domain adaptation improve model accuracy.
Authors
(none)
Tags
Stats
Related papers
- Accidental Learners: Spoken Language Identification In Multilingual Self-supervised Models (2022)5.84
- Utterance-level End-to-end Language Identification Using Attention-based CNN-BLSTM (2019)11.67
- Streaming Language Identification Using Combination Of Acoustic Representations And ASR Hypotheses (2020)0.00
- Stateful Conformer With Cache-based Inference For Streaming Automatic Speech Recognition (2023)8.60
- Multi-language Identification Using Convolutional Recurrent Neural Network (2016)13.88
- Fast Conformer With Linearly Scalable Attention For Efficient Speech Recognition (2023)14.47
- Spoken Language Identification Using Convnets (2019)9.59
- Towards Effective And Compact Contextual Representation For Conformer Transducer Speech Recognition Systems (2023)7.16