Convolutive Transfer Function Invariant SDR Training Criteria For Multi-channel Reverberant Speech Separation
2020 Β· Christoph Boeddeker, Wangyou Zhang, Tomohiro Nakatani, et al.
Abstract
Time-domain training criteria have proven to be very effective for the separation of single-channel non-reverberant speech mixtures. Likewise, mask-based beamforming has shown impressive performance in multi-channel reverberant speech enhancement and source separation. Here, we propose to combine neural network supported multi-channel source separation with a time-domain training objective function. For the objective we propose to use a convolutive transfer function invariant Signal-to-Distortion Ratio (CI-SDR) based loss. While this is a well-known evaluation metric (BSS Eval), it has not been used as a training objective before. To show the effectiveness, we demonstrate the performance on LibriSpeech based reverberant mixtures. On this task, the proposed system approaches the error rate obtained on single-source non-reverberant input, i.e., LibriSpeech test_clean, with a difference of only 1.2 percentage points, thus outperforming a conventional permutation invariant training based s
Authors
(none)
Tags
Stats
Related papers
- Multichannel Loss Function For Supervised Speech Source Separation By Mask-based Beamforming (2019)7.50
- Multi-channel Narrow-band Deep Speech Separation With Full-band Permutation Invariant Training (2021)9.41
- Spatial Loss For Unsupervised Multi-channel Source Separation (2022)7.16
- Multi-channel Multi-frame ADL-MVDR For Target Speech Separation (2020)0.00
- Short-time Deep-learning Based Source Separation For Speech Enhancement In Reverberant Environments With Beamforming (2020)0.00
- Two-stage Model And Optimal SI-SNR For Monaural Multi-speaker Speech Separation In Noisy Environment (2020)0.00
- End-to-end Dereverberation, Beamforming, And Speech Recognition With Improved Numerical Stability And Advanced Frontend (2021)10.97
- Distortion-controlled Training For End-to-end Reverberant Speech Separation With Auxiliary Autoencoding Loss (2020)5.84