Deep Speech Denoising With Vector Space Projections
2018 · Jeff Hetherly, Paul Gamble, Maria Barrios, et al.
Abstract
We propose an algorithm to denoise speakers from a single microphone in the presence of non-stationary and dynamic noise. Our approach is inspired by the recent success of neural network models separating speakers from other speakers and singers from instrumental accompaniment. Unlike prior art, we leverage embedding spaces produced with source-contrastive estimation, a technique derived from negative sampling techniques in natural language processing, while simultaneously obtaining a continuous inference mask. Our embedding space directly optimizes for the discrimination of speaker and noise by jointly modeling their characteristics. This space is generalizable in that it is not speaker or noise specific and is capable of denoising speech even if the model has not seen the speaker in the training set. Parameters are trained with dual objectives: one that promotes a selective bandpass filter that eliminates noise at time-frequency positions that exceed signal power, and another that pr
Authors
(none)
Tags
Stats
Related papers
- Speech Denoising By Parametric Resynthesis (2019)7.16
- A Wavenet For Speech Denoising (2017)18.47
- End-to-end Recurrent Denoising Autoencoder Embeddings For Speaker Identification (2020)6.34
- Advancing The Dimensionality Reduction Of Speaker Embeddings For Speaker Diarisation: Disentangling Noise And Informing Speech Activity (2021)2.26
- Analysis Of DNN Speech Signal Enhancement For Robust Speaker Recognition (2018)11.39
- Spatialnet: Extensively Learning Spatial Information For Multichannel Joint Speech Separation, Denoising And Dereverberation (2023)13.88
- Voicefilter: Targeted Voice Separation By Speaker-conditioned Spectrogram Masking (2018)17.48
- Denoispeech: Denoising Text To Speech With Frame-level Noise Modeling (2020)0.00