Low-latency Neural Speech Phase Prediction Based On Parallel Estimation Architecture And Anti-wrapping Losses For Speech Generation Tasks
2024 Β· Yang Ai, Zhen-Hua Ling
Abstract
This paper presents a novel neural speech phase prediction model which predicts wrapped phase spectra directly from amplitude spectra. The proposed model is a cascade of a residual convolutional network and a parallel estimation architecture. The parallel estimation architecture is a core module for direct wrapped phase prediction. This architecture consists of two parallel linear convolutional layers and a phase calculation formula, imitating the process of calculating the phase spectra from the real and imaginary parts of complex spectra and strictly restricting the predicted phase values to the principal value interval. To avoid the error expansion issue caused by phase wrapping, we design anti-wrapping training losses defined between the predicted wrapped phase spectra and natural ones by activating the instantaneous phase error, group delay error and instantaneous angular frequency error using an anti-wrapping function. We mathematically demonstrate that the anti-wrapping function
Authors
(none)
Tags
Stats
Related papers
- Neural Speech Phase Prediction Based On Parallel Estimation Architecture And Anti-wrapping Losses (2022)11.39
- Long-frame-shift Neural Speech Phase Prediction With Spectral Continuity Enhancement And Interpolation Error Compensation (2023)0.00
- Apnet: An All-frame-level Neural Vocoder Incorporating Direct Prediction Of Amplitude And Phase Spectra (2023)9.59
- STFT Spectral Loss For Training A Neural Speech Waveform Model (2018)9.23
- Speech Prediction Using An Adaptive Recurrent Neural Network With Application To Packet Loss Concealment (2021)11.19
- Towards High-quality And Efficient Speech Bandwidth Extension With Parallel Amplitude And Phase Prediction (2024)0.00
- Phase Reconstruction Based On Recurrent Phase Unwrapping With Deep Neural Networks (2020)9.59
- Explicit Estimation Of Magnitude And Phase Spectra In Parallel For High-quality Speech Enhancement (2023)11.19