Dilated U-net Based Approach For Multichannel Speech Enhancement From First-order Ambisonics Recordings
2020 · Amélie Bosca, Alexandre Guérin, Lauréline Perotin, et al.
Abstract
We present a CNN architecture for speech enhancement from multichannel first-order Ambisonics mixtures. The data-dependent spatial filters, deduced from a mask-based approach, are used to help an automatic speech recognition engine to face adverse conditions of reverberation and competitive speakers. The mask predictions are provided by a neural network, fed with rough estimations of speech and noise amplitude spectra, under the assumption of known directions of arrival. This study evaluates the replacing of the recurrent LSTM network previously investigated by a convolutive U-net under more stressing conditions with an additional second competitive speaker. We show that, due to more accurate short-term masks prediction, the U-net architecture brings some improvements in terms of word error rate. Moreover, results indicate that the use of dilated convolutive layers is beneficial in difficult situations with two interfering speakers, and/or where the target and interferences are close t
Authors
(none)
Tags
Stats
Related papers
- Relunet: Relative Channel Fusion U-net For Multichannel Speech Enhancement (2024)0.00
- Distortionless Multi-channel Target Speech Enhancement For Overlapped Speech Recognition (2020)0.00
- Real-time Streaming Wave-u-net With Temporal Convolutions For Multichannel Speech Enhancement (2021)0.00
- Using Recurrences In Time And Frequency Within U-net Architecture For Speech Enhancement (2018)8.35
- Single-channel Speech Enhancement With Deep Complex U-networks And Probabilistic Latent Space Models (2023)5.24
- Multichannel Speech Enhancement Without Beamforming (2021)9.41
- Multi-modal Hybrid Deep Neural Network For Speech Enhancement (2016)0.00
- Inter-channel Conv-tasnet For Multichannel Speech Enhancement (2021)0.00