Progressive Joint Modeling In Unsupervised Single-channel Overlapped Speech Recognition
2017 Β· Zhehuai Chen, Jasha Droppo, Jinyu Li, et al.
Abstract
Unsupervised single-channel overlapped speech recognition is one of the hardest problems in automatic speech recognition (ASR). Permutation invariant training (PIT) is a state of the art model-based approach, which applies a single neural network to solve this single-input, multiple-output modeling problem. We propose to advance the current state of the art by imposing a modular structure on the neural network, applying a progressive pretraining regimen, and improving the objective function with transfer learning and a discriminative training criterion. The modular structure splits the problem into three sub-tasks: frame-wise interpreting, utterance-level speaker tracing, and speech recognition. The pretraining regimen uses these modules to solve progressively harder tasks. Transfer learning leverages parallel clean speech to improve the training targets for the network. Our discriminative training formulation is a modification of standard formulations, that also penalizes competing ou
Authors
(none)
Tags
Stats
Related papers
- Single-channel Multi-talker Speech Recognition With Permutation Invariant Training (2017)12.10
- Transcription-free Fine-tuning Of Speech Separation Models For Noisy And Reverberant Multi-speaker Automatic Speech Recognition (2024)3.58
- Overlapped Speech Recognition From A Jointly Learned Multi-channel Neural Speech Extraction And Representation (2019)0.00
- Recognizing Multi-talker Speech With Permutation Invariant Training (2017)12.81
- Unified Autoregressive Modeling For Joint End-to-end Multi-talker Overlapped Speech Recognition And Speaker Attribute Estimation (2021)6.34
- Single-channel Speech Separation Using Soft-minimum Permutation Invariant Training (2021)2.26
- Batch-normalized Joint Training For Dnn-based Distant Speech Recognition (2017)8.82
- Separating Long-form Speech With Group-wise Permutation Invariant Training (2021)4.52