Exploiting Cross-lingual Speaker And Phonetic Diversity For Unsupervised Subword Modeling
2019 Β· Siyuan Feng, Tan Lee
Abstract
This research addresses the problem of acoustic modeling of low-resource languages for which transcribed training data is absent. The goal is to learn robust frame-level feature representations that can be used to identify and distinguish subword-level speech units. The proposed feature representations comprise various types of multilingual bottleneck features (BNFs) that are obtained via multi-task learning of deep neural networks (MTL-DNN). One of the key problems is how to acquire high-quality frame labels for untranscribed training data to facilitate supervised DNN training. It is shown that learning of robust BNF representations can be achieved by effectively leveraging transcribed speech data and well-trained automatic speech recognition (ASR) systems from one or more out-of-domain (resource-rich) languages. Out-of-domain ASR systems can be applied to perform speaker adaptation with untranscribed training data of the target language, and to decode the training speech into frame-l
Authors
(none)
Tags
Stats
Related papers
- Improving Unsupervised Subword Modeling Via Disentangled Speech Representation Learning And Transformation (2019)5.24
- Multilingual And Unsupervised Subword Modeling For Zero-resource Languages (2018)7.81
- The Effectiveness Of Unsupervised Subword Modeling With Autoregressive And Cross-lingual Phone-aware Networks (2020)2.26
- Multilingual Bottleneck Features For Improving ASR Performance Of Code-switched Speech In Under-resourced Languages (2020)0.00
- Learning Cross-lingual Mappings For Data Augmentation To Improve Low-resource Speech Recognition (2023)0.00
- Unsupervised Acoustic Unit Discovery By Leveraging A Language-independent Subword Discriminative Feature Representation (2021)5.84
- Combining Adversarial Training And Disentangled Speech Representation For Robust Zero-resource Subword Modeling (2019)7.16
- Cross-lingual Low Resource Speaker Adaptation Using Phonological Features (2021)5.24