Enhanced Factored Three-way Restricted Boltzmann Machines For Speech Detection
2016 Β· Pengfei Sun, Jun Qin
Abstract
In this letter, we propose enhanced factored three way restricted Boltzmann machines (EFTW-RBMs) for speech detection. The proposed model incorporates conditional feature learning by multiplying the dynamical state of the third unit, which allows a modulation over the visible-hidden node pairs. Instead of stacking previous frames of speech as the third unit in a recursive manner, the correlation related weighting coefficients are assigned to the contextual neighboring frames. Specifically, a threshold function is designed to capture the long-term features and blend the globally stored speech structure. A factored low rank approximation is introduced to reduce the parameters of the three-dimensional interaction tensor, on which non-negative constraint is imposed to address the sparsity characteristic. The validations through the area-under-ROC-curve (AUC) and signal distortion ratio (SDR) show that our approach outperforms several existing 1D and 2D (i.e., time and time-frequency domain
Authors
(none)
Tags
Stats
Related papers
- Bayesian Learning Of LF-MMI Trained Time Delay Neural Networks For Speech Recognition (2020)8.82
- Factorised Speaker-environment Adaptive Training Of Conformer Speech Recognition Systems (2023)0.00
- Complex-valued Restricted Boltzmann Machine For Direct Speech Parameterization From Complex Spectra (2018)5.24
- Vocal Tract Length Perturbation For Text-dependent Speaker Verification With Autoregressive Prediction Coding (2020)8.09
- Bayesspeech: A Bayesian Transformer Network For Automatic Speech Recognition (2023)0.00
- Content-context Factorized Representations For Automated Speech Recognition (2022)6.34
- Deep Factorization For Speech Signal (2018)8.82
- Wavelet Speech Enhancement Based On Nonnegative Matrix Factorization (2016)10.21