Enhancement Of Coded Speech Using A Mask-based Post-filter
2020 · Srikanth Korse, Kishan Gupta, Guillaume Fuchs
Abstract
The quality of speech codecs deteriorates at low bitrates due to high quantization noise. A post-filter is generally employed to enhance the quality of the coded speech. In this paper, a data-driven post-filter relying on masking in the time-frequency domain is proposed. A fully connected neural network (FCNN), a convolutional encoder-decoder (CED) network and a long short-term memory (LSTM) network are implemeted to estimate a real-valued mask per time-frequency bin. The proposed models were tested on the five lowest operating modes (6.65 kbps-15.85 kbps) of the Adaptive Multi-Rate Wideband codec (AMR-WB). Both objective and subjective evaluations confirm the enhancement of the coded speech and also show the superiority of the mask-based neural network system over a conventional heuristic post-filter used in the standard like ITU-T G.718.
Authors
(none)
Tags
Stats
Related papers
- A DNN Based Post-filter To Enhance The Quality Of Coded Speech In MDCT Domain (2022)6.34
- Postgan: A Gan-based Post-processor To Enhance The Quality Of Coded Speech (2022)9.76
- CQNV: A Combination Of Coarsely Quantized Bitstream And Neural Vocoder For Low Rate Speech Coding (2023)6.34
- Low Bit-rate Speech Coding With VQ-VAE And A Wavenet Decoder (2019)14.80
- Enhancing Low-quality Voice Recordings Using Disentangled Channel Factor And Neural Waveform Model (2020)0.00
- Neural Feature Predictor And Discriminative Residual Coding For Low-bitrate Speech Coding (2022)6.77
- Spectral Masking With Explicit Time-context Windowing For Neural Network-based Monaural Speech Enhancement (2024)3.58
- LACE: A Light-weight, Causal Model For Enhancing Coded Speech Through Adaptive Convolutions (2023)0.00