The Effectiveness Of Unsupervised Subword Modeling With Autoregressive And Cross-lingual Phone-aware Networks
2020 Β· Siyuan Feng, Odette Scharenborg
Abstract
This study addresses unsupervised subword modeling, i.e., learning acoustic feature representations that can distinguish between subword units of a language. We propose a two-stage learning framework that combines self-supervised learning and cross-lingual knowledge transfer. The framework consists of autoregressive predictive coding (APC) as the front-end and a cross-lingual deep neural network (DNN) as the back-end. Experiments on the ABX subword discriminability task conducted with the Libri-light and ZeroSpeech 2017 databases showed that our approach is competitive or superior to state-of-the-art studies. Comprehensive and systematic analyses at the phoneme- and articulatory feature (AF)-level showed that our approach was better at capturing diphthong than monophthong vowel information, while also differences in the amount of information captured for different types of consonants were observed. Moreover, a positive correlation was found between the effectiveness of the back-end in
Authors
(none)
Tags
Stats
Related papers
- Exploiting Cross-lingual Speaker And Phonetic Diversity For Unsupervised Subword Modeling (2019)6.77
- Multilingual And Unsupervised Subword Modeling For Zero-resource Languages (2018)7.81
- Unsupervised Acoustic Unit Discovery By Leveraging A Language-independent Subword Discriminative Feature Representation (2021)5.84
- Improving Unsupervised Subword Modeling Via Disentangled Speech Representation Learning And Transformation (2019)5.24
- Acoustic Data-driven Subword Modeling For End-to-end Speech Recognition (2021)6.77
- Combining Adversarial Training And Disentangled Speech Representation For Robust Zero-resource Subword Modeling (2019)7.16
- An Unsupervised Autoregressive Model For Speech Representation Learning (2019)17.26
- Investigating The Impact Of Cross-lingual Acoustic-phonetic Similarities On Multilingual Speech Recognition (2022)3.58