Learning-based A Posteriori Speech Presence Probability Estimation And Applications
2025 Β· Shuai Tao, Jesper Rindom Jensen, Yang Xiang, et al.
Abstract
The a posteriori speech presence probability (SPP) is the fundamental component of noise power spectral density (PSD) estimation, which can contribute to speech enhancement and speech recognition systems. Most existing SPP estimators can estimate SPP accurately from the background noise. Nevertheless, numerous challenges persist, including the difficulty of accurately estimating SPP from non-stationary noise with statistics-based methods and the high latency associated with deep learning-based approaches. This paper presents an improved SPP estimation approach based on deep learning to achieve higher SPP estimation accuracy, especially in non-stationary noise conditions. To promote the information extraction performance of the DNN, the global information of the observed signal and the local information of the decoupled frequency bins from the observed signal are connected as hybrid global-local information. The global information is extracted by one encoder. Then, one decoder and two f
Authors
(none)
Tags
Stats
Related papers
- Frequency Bin-wise Single Channel Speech Presence Probability Estimation Using Multiple Dnns (2023)5.84
- Multi-task Single Channel Speech Enhancement Using Speech Presence Probability As A Secondary Task Training Target (2020)4.52
- Dnn-based Speech Presence Probability Estimation For Multi-frame Single-microphone Speech Enhancement (2019)8.82
- Speech Enhancement Via Two-stage Dual Tree Complex Wavelet Packet Transform With A Speech Presence Probability Estimator (2016)5.84
- Feature Joint-state Posterior Estimation In Factorial Speech Processing Models Using Deep Neural Networks (2017)3.58
- Plugin Speech Enhancement: A Universal Speech Enhancement Framework Inspired By Dynamic Neural Network (2024)0.00
- Noise-robust Dsp-assisted Neural Pitch Estimation With Very Low Complexity (2023)5.24
- Integrating Plug-and-play Data Priors With Weighted Prediction Error For Speech Dereverberation (2023)0.00