End-to-end Networks For Supervised Single-channel Speech Separation
2018 Β· Shrikant Venkataramani, Paris Smaragdis
Abstract
The performance of single channel source separation algorithms has improved greatly in recent times with the development and deployment of neural networks. However, many such networks continue to operate on the magnitude spectrogram of a mixture, and produce an estimate of source magnitude spectrograms, to perform source separation. In this paper, we interpret these steps as additional neural network layers and propose an end-to-end source separation network that allows us to estimate the separated speech waveform by operating directly on the raw waveform of the mixture. Furthermore, we also propose the use of masking based end-to-end separation networks that jointly optimize the mask and the latent representations of the mixture waveforms. These networks show a significant improvement in separation performance compared to existing architectures in our experiments. To train these end-to-end models, we investigate the use of composite cost functions that are derived from objective evalu
Authors
(none)
Tags
Stats
Related papers
- End-to-end Multi-channel Speech Separation (2019)0.00
- End-to-end Source Separation With Adaptive Front-ends (2017)12.17
- Multi-channel Narrow-band Deep Speech Separation With Full-band Permutation Invariant Training (2021)9.41
- End-to-end Training Of Time Domain Audio Separation And Recognition (2019)10.35
- Multi-channel Speech Separation Using Spatially Selective Deep Non-linear Filters (2023)10.35
- End-to-end Non-negative Autoencoders For Sound Source Separation (2019)2.26
- End-to-end Speech Separation With Unfolded Iterative Phase Reconstruction (2018)15.00
- Monaural Source Separation: From Anechoic To Reverberant Environments (2021)10.61