Speech Activity Detection Based On Multilingual Speech Recognition System
2020 Β· Seyyed Saeed Sarfjoo, Srikanth Madikeri, Petr Motlicek
Abstract
To better model the contextual information and increase the generalization ability of Speech Activity Detection (SAD) system, this paper leverages a multi-lingual Automatic Speech Recognition (ASR) system to perform SAD. Sequence discriminative training of Acoustic Model (AM) using Lattice-Free Maximum Mutual Information (LF-MMI) loss function, effectively extracts the contextual information of the input acoustic frame. Multi-lingual AM training, causes the robustness to noise and language variabilities. The index of maximum output posterior is considered as a frame-level speech/non-speech decision function. Majority voting and logistic regression are applied to fuse the language-dependent decisions. The multi-lingual ASR is trained on 18 languages of BABEL datasets and the built SAD is evaluated on 3 different languages. On out-of-domain datasets, the proposed SAD model shows significantly better performance with respect to baseline models. On the Ester2 dataset, without using any in-
Authors
(none)
Tags
Stats
Related papers
- Temporarily-aware Context Modelling Using Generative Adversarial Networks For Speech Activity Detection (2020)7.50
- End-to-end Audiovisual Speech Activity Detection With Bimodal Recurrent Neural Models (2018)10.48
- Incorporating VAD Into ASR System By Multi-task Learning (2021)4.52
- Multilingual Sequence-to-sequence Speech Recognition: Architecture, Transfer Learning, And Language Modeling (2018)13.84
- Analysis Of Multilingual Sequence-to-sequence Speech Recognition Systems (2018)0.00
- Building Robust And Scalable Multilingual ASR For Indian Languages (2025)0.00
- Streaming Language Identification Using Combination Of Acoustic Representations And ASR Hypotheses (2020)0.00
- Adaptive Activation Network For Low Resource Multilingual Speech Recognition (2022)0.00