Extracting Domain Invariant Features By Unsupervised Learning For Robust Automatic Speech Recognition
2018 Β· Wei-Ning Hsu, James Glass
Abstract
The performance of automatic speech recognition (ASR) systems can be significantly compromised by previously unseen conditions, which is typically due to a mismatch between training and testing distributions. In this paper, we address robustness by studying domain invariant features, such that domain information becomes transparent to ASR systems, resolving the mismatch problem. Specifically, we investigate a recent model, called the Factorized Hierarchical Variational Autoencoder (FHVAE). FHVAEs learn to factorize sequence-level and segment-level attributes into different latent variables without supervision. We argue that the set of latent variables that contain segment-level information is our desired domain invariant feature for ASR. Experiments are conducted on Aurora-4 and CHiME-4, which demonstrate 41% and 27% absolute word error rate reductions respectively on mismatched domains.
Authors
(none)
Tags
Stats
Related papers
- Unsupervised Domain Adaptation For Robust Speech Recognition Via Variational Autoencoder-based Data Augmentation (2017)14.23
- Analyzing The Robustness Of Unsupervised Speech Recognition (2021)7.81
- Adversarial Learning Of Raw Speech Features For Domain Invariant Speech Recognition (2018)9.23
- Weak-supervised Dysarthria-invariant Features For Spoken Language Understanding Using An FHVAE And Adversarial Training (2022)2.26
- Toward Domain-invariant Speech Recognition Via Large Scale Training (2018)13.39
- Unsupervised Representation Learning Of Speech For Dialect Identification (2018)7.16
- Learning Invariant Representation And Risk Minimized For Unsupervised Accent Domain Adaptation (2022)2.26
- Robust Speaker Recognition Using Unsupervised Adversarial Invariance (2019)9.76