Semi-supervised Acoustic Modelling For Five-lingual Code-switched ASR Using Automatically-segmented Soap Opera Speech
2020 · N. Wilkinson, A. Biswas, E. Yılmaz, et al.
Abstract
This paper considers the impact of automatic segmentation on the fully-automatic, semi-supervised training of automatic speech recognition (ASR) systems for five-lingual code-switched (CS) speech. Four automatic segmentation techniques were evaluated in terms of the recognition performance of an ASR system trained on the resulting segments in a semi-supervised manner. The system's output was compared with the recognition rates achieved by a semi-supervised system trained on manually assigned segments. Three of the automatic techniques use a newly proposed convolutional neural network (CNN) model for framewise classification, and include a novel form of HMM smoothing of the CNN outputs. Automatic segmentation was applied in combination with automatic speaker diarization. The best-performing segmentation technique was also tested without speaker diarization. An evaluation based on 248 unsegmented soap opera episodes indicated that voice activity detection (VAD) based on a CNN followed by
Authors
(none)
Tags
Stats
Related papers
- Semi-supervised Acoustic Model Training For Speech With Code-switching (2018)7.81
- Semi-supervised Development Of ASR Systems For Multilingual Code-switched Speech In Under-resourced Languages (2020)0.00
- Code-switching Detection With Data-augmented Acoustic And Language Models (2018)3.58
- Unsupervised Speech Segmentation: A General Approach Using Speech Language Models (2025)2.60
- Speaker Conditioned Acoustic Modeling For Multi-speaker Conversational ASR (2021)4.52
- Acoustic And Textual Data Augmentation For Improved ASR Of Code-switching Speech (2018)9.92
- Beyond Voice Activity Detection: Hybrid Audio Segmentation For Direct Speech Translation (2021)0.00
- Smart Speech Segmentation Using Acousto-linguistic Features With Look-ahead (2022)0.00