Speech Denoising With Deep Feature Losses
2018 Β· Francois G. Germain, Qifeng Chen, Vladlen Koltun
Abstract
We present an end-to-end deep learning approach to denoising speech signals by processing the raw waveform directly. Given input audio containing speech corrupted by an additive background signal, the system aims to produce a processed signal that contains only the speech content. Recent approaches have shown promising results using various deep network architectures. In this paper, we propose to train a fully-convolutional context aggregation network using a deep feature loss. That loss is based on comparing the internal feature activations in a different network, trained for acoustic environment detection and domestic audio tagging. Our approach outperforms the state-of-the-art in objective speech quality metrics and in large-scale perceptual experiments with human listeners. It also outperforms an identical network trained using traditional regression losses. The advantage of the new approach is particularly pronounced for the hardest data with the most intrusive background noise, f
Authors
(none)
Tags
Stats
Related papers
- Feature Enhancement With Deep Feature Losses For Speaker Verification (2019)10.61
- Hifi-gan: High-fidelity Denoising And Dereverberation Based On Speech Deep Features In Adversarial Networks (2020)0.00
- A Wavenet For Speech Denoising (2017)18.47
- Perceptual Loss Based Speech Denoising With An Ensemble Of Audio Pattern Recognition And Self-supervised Models (2020)10.21
- Raw Waveform-based Speech Enhancement By Fully Convolutional Networks (2017)16.63
- A Dual-staged Context Aggregation Method Towards Efficient End-to-end Speech Enhancement (2019)0.00
- Deep Speech Denoising With Vector Space Projections (2018)0.00
- A Comparative Evaluation Of Deep Learning Models For Speech Enhancement In Real-world Noisy Environments (2025)0.00