Batch-normalized Joint Training For Dnn-based Distant Speech Recognition
2017 Β· Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, et al.
Abstract
Improving distant speech recognition is a crucial step towards flexible human-machine interfaces. Current technology, however, still exhibits a lack of robustness, especially when adverse acoustic conditions are met. Despite the significant progress made in the last years on both speech enhancement and speech recognition, one potential limitation of state-of-the-art technology lies in composing modules that are not well matched because they are not trained jointly. To address this concern, a promising approach consists in concatenating a speech enhancement and a speech recognition deep neural network and to jointly update their parameters as if they were within a single bigger network. Unfortunately, joint training can be difficult because the output distribution of the speech enhancement system may change substantially during the optimization procedure. The speech recognition module would have to deal with an input distribution that is non-stationary and unnormalized. To mitigate this
Authors
(none)
Tags
Stats
Related papers
- A Network Of Deep Neural Networks For Distant Speech Recognition (2017)10.35
- Contaminated Speech Training Methods For Robust DNN-HMM Distant Speech Recognition (2017)4.52
- Ensemble Of Jointly Trained Deep Neural Network-based Acoustic Models For Reverberant Speech Recognition (2016)0.00
- Incorporating Multi-target In Multi-stage Speech Enhancement Model For Better Generalization (2021)0.00
- Progressive Joint Modeling In Unsupervised Single-channel Overlapped Speech Recognition (2017)11.67
- Multi-modal Hybrid Deep Neural Network For Speech Enhancement (2016)0.00
- Joint Training Of Speech Enhancement And Self-supervised Model For Noise-robust ASR (2022)0.00
- Revisiting Joint Decoding Based Multi-talker Speech Recognition With DNN Acoustic Model (2021)2.26