Almost-unsupervised Speech Recognition With Close-to-zero Resource Based On Phonetic Structures Learned From Very Small Unpaired Speech And Text Data
2018 Β· Yi-Chen Chen, Chia-Hao Shen, Sung-Feng Huang, et al.
Abstract
Producing a large amount of annotated speech data for training ASR systems remains difficult for more than 95% of languages all over the world which are low-resourced. However, we note human babies start to learn the language by the sounds of a small number of exemplar words without hearing a large amount of data. We initiate some preliminary work in this direction in this paper. Audio Word2Vec is used to obtain embeddings of spoken words which carry phonetic information extracted from the signals. An autoencoder is used to generate embeddings of text words based on the articulatory features for the phoneme sequences. Both sets of embeddings for spoken and text words describe similar phonetic structures among words in their respective latent spaces. A mapping relation from the audio embeddings to text embeddings actually gives the word-level ASR. This can be learned by aligning a small number of spoken words and the corresponding text words in the embedding spaces. In the initial exper
Authors
(none)
Tags
Stats
Related papers
- Unsupervised Neural And Bayesian Models For Zero-resource Speech Processing (2017)0.00
- Towards Unsupervised Automatic Speech Recognition Trained By Unaligned Speech And Text Only (2018)0.00
- Almost Unsupervised Text To Speech And Automatic Speech Recognition (2019)0.00
- Unsupervised Learning For Sequence-to-sequence Text-to-speech For Low-resource Languages (2020)9.59
- Bootstrap An End-to-end ASR System By Multilingual Training, Transfer Learning, Text-to-text Mapping And Synthetic Audio (2020)5.24
- Self-supervised Language Learning From Raw Audio: Lessons From The Zero Resource Speech Challenge (2022)10.07
- Towards Unsupervised Speech Recognition Without Pronunciation Models (2024)0.00
- Pretraining By Backtranslation For End-to-end ASR In Low-resource Settings (2018)0.00