Streaming Language Identification Using Combination Of Acoustic Representations And ASR Hypotheses
2020 Β· Chander Chandak, Zeynab Raeesy, Ariya Rastrow, et al.
Abstract
This paper presents our modeling and architecture approaches for building a highly accurate low-latency language identification system to support multilingual spoken queries for voice assistants. A common approach to solve multilingual speech recognition is to run multiple monolingual ASR systems in parallel and rely on a language identification (LID) component that detects the input language. Conventionally, LID relies on acoustic only information to detect input language. We propose an approach that learns and combines acoustic level representations with embeddings estimated on ASR hypotheses resulting in up to 50% relative reduction of identification error rate, compared to a model that uses acoustic only features. Furthermore, to reduce the processing cost and latency, we exploit a streaming architecture to identify the spoken language early when the system reaches a predetermined confidence level, alleviating the need to run multiple ASR systems until the end of input query. The c
Authors
(none)
Tags
Stats
Related papers
- Streaming End-to-end Bilingual ASR Systems With Joint Language Identification (2020)0.00
- Exploring Spoken Language Identification Strategies For Automatic Transcription Of Multilingual Broadcast And Institutional Speech (2024)0.00
- Advanced Accent/dialect Identification And Accentedness Assessment With Multi-embedding Models And Automatic Speech Recognition (2023)7.16
- Signal Combination For Language Identification (2019)0.00
- VAIS ASR: Building A Conversational Speech Recognition System Using Language Model Combination (2019)0.00
- A Multimodal Approach To Device-directed Speech Detection With Large Language Models (2024)7.16
- Joint Unsupervised And Supervised Learning For Context-aware Language Identification (2023)2.26
- Combining Frame-synchronous And Label-synchronous Systems For Speech Recognition (2021)0.00