Noisy Speech Based Temporal Decomposition To Improve Fundamental Frequency Estimation
2021 Β· A. Queiroz, R. Coelho
Abstract
This paper introduces a novel method to separate noisy speech into low or high frequency frames, in order to improve fundamental frequency (F0) estimation accuracy. In this proposal, the target signal is analyzed by means of the ensemble empirical mode decomposition. Next, the pitch information is extracted from the first decomposition modes. This feature indicates the frequency region where the F0 of speech should be located, thus separating the frames into low-frequency (LF) or high-frequency (HF). The separation is applied to correct candidates extracted from a conventional fundamental frequency detection method, and hence improving the accuracy of F0 estimate. The proposed method is evaluated in experiments with CSTR and TIMIT databases, considering six acoustic noises under various signal-to-noise ratios. A pitch enhancement algorithm is adopted as baseline in the evaluation analysis considering three conventional estimators. Results show that the proposed method outperforms the c
Authors
(none)
Tags
Stats
Related papers
- Waveform To Single Sinusoid Regression To Estimate The F0 Contour From Noisy Speech Using Recurrent Deep Neural Networks (2018)6.77
- DEEPF0: End-to-end Fundamental Frequency Estimation For Music And Speech Signals (2021)10.35
- A Regression Model Of Recurrent Deep Neural Networks For Noise Robust Estimation Of The Fundamental Frequency Contour Of Speech (2018)4.52
- Hf0: A Hybrid Pitch Extraction Method For Multimodal Voice (2019)0.00
- Nebula: F0 Estimation And Voicing Detection By Modeling The Statistical Properties Of Feature Extractors (2017)3.58
- Real-time Pitch/f0 Detection Using Spectrogram Images And Convolutional Neural Networks (2025)0.00
- Traditional Machine Learning For Pitch Detection (2019)10.85
- Consep: A Noise- And Reverberation-robust Speech Separation Framework By Magnitude Conditioning (2024)0.00