Low-rank And Sparse Soft Targets To Learn Better DNN Acoustic Models
2016 Β· Pranay Dighe, Afsaneh Asaei, Herve Bourlard
Abstract
Conventional deep neural networks (DNN) for speech acoustic modeling rely on Gaussian mixture models (GMM) and hidden Markov model (HMM) to obtain binary class labels as the targets for DNN training. Subword classes in speech recognition systems correspond to context-dependent tied states or senones. The present work addresses some limitations of GMM-HMM senone alignments for DNN training. We hypothesize that the senone probabilities obtained from a DNN trained with binary labels can provide more accurate targets to learn better acoustic models. However, DNN outputs bear inaccuracies which are exhibited as high dimensional unstructured noise, whereas the informative components are structured and low-dimensional. We exploit principle component analysis (PCA) and sparse coding to characterize the senone subspaces. Enhanced probabilities obtained from low-rank and sparse reconstructions are used as soft-targets for DNN acoustic modeling, that also enables training with untranscribed data.
Authors
(none)
Tags
Stats
Related papers
- Graph Based Manifold Regularized Deep Neural Networks For Automatic Speech Recognition (2016)0.00
- Domain Adaptation Using Class Similarity For Robust Speech Recognition (2020)6.77
- DNN Based Speaker Recognition On Short Utterances (2016)0.00
- Information Theoretic Analysis Of DNN-HMM Acoustic Modeling (2017)0.00
- Dynamic Sparsity Neural Networks For Automatic Speech Recognition (2020)0.00
- Multi-task Single Channel Speech Enhancement Using Speech Presence Probability As A Secondary Task Training Target (2020)4.52
- Sequence Training Of DNN Acoustic Models With Natural Gradient (2018)5.24
- Contaminated Speech Training Methods For Robust DNN-HMM Distant Speech Recognition (2017)4.52