Joint Unsupervised And Supervised Learning For Context-aware Language Identification
2023 Β· Jinseok Park, Hyung Yong Kim, Jihwan Park, et al.
Abstract
Language identification (LID) recognizes the language of a spoken utterance automatically. According to recent studies, LID models trained with an automatic speech recognition (ASR) task perform better than those trained with a LID task only. However, we need additional text labels to train the model to recognize speech, and acquiring the text labels is a cost high. In order to overcome this problem, we propose context-aware language identification using a combination of unsupervised and supervised learning without any text labels. The proposed method learns the context of speech through masked language modeling (MLM) loss and simultaneously trains to determine the language of the utterance with supervised learning loss. The proposed joint learning was found to reduce the error rate by 15.6% compared to the same structure model trained by supervised-only learning on a subset of the VoxLingua107 dataset consisting of sub-three-second utterances in 11 languages.
Authors
(none)
Tags
Stats
Related papers
- Improved Language Identification Through Cross-lingual Self-supervised Learning (2021)10.61
- Attention-based Contextual Language Model Adaptation For Speech Recognition (2021)0.00
- A Compact End-to-end Model With Local And Global Context For Spoken Language Identification (2022)5.84
- Streaming Language Identification Using Combination Of Acoustic Representations And ASR Hypotheses (2020)0.00
- End-to-end Speech Recognition Contextualization With Large Language Models (2023)0.00
- Low-resource Contextual Topic Identification On Speech (2018)2.26
- Is Attention Always Needed? A Case Study On Language Identification From Speech (2021)2.26
- A Semisupervised Approach For Language Identification Based On Ladder Networks (2016)0.00