Signal Combination For Language Identification
2019 Β· Shengye Wang, Li Wan, Yang Yu, et al.
Abstract
Google's multilingual speech recognition system combines low-level acoustic signals with language-specific recognizer signals to better predict the language of an utterance. This paper presents our experience with different signal combination methods to improve overall language identification accuracy. We compare the performance of a lattice-based ensemble model and a deep neural network model to combine signals from recognizers with that of a baseline that only uses low-level acoustic signals. Experimental results show that the deep neural network model outperforms the lattice-based ensemble model, and it reduced the error rate from 5.5% in the baseline to 4.3%, which is a 21.8% relative reduction.
Authors
(none)
Tags
Stats
Related papers
- Streaming Language Identification Using Combination Of Acoustic Representations And ASR Hypotheses (2020)0.00
- Multi-language Identification Using Convolutional Recurrent Neural Network (2016)13.88
- Enhancing Neural Spoken Language Recognition: An Exploration With Multilingual Datasets (2025)0.00
- Speaker Recognition By Means Of A Combination Of Linear And Nonlinear Predictive Models (2022)3.58
- System Combination For Short Utterance Speaker Recognition (2016)5.84
- Spoken Language Identification Using Convnets (2019)9.59
- Rnn-transducer With Language Bias For End-to-end Mandarin-english Code-switching Speech Recognition (2020)8.09
- Leveraging Language ID To Calculate Intermediate CTC Loss For Enhanced Code-switching Speech Recognition (2023)0.00