Waveform To Single Sinusoid Regression To Estimate The F0 Contour From Noisy Speech Using Recurrent Deep Neural Networks
2018 Β· Akihiro Kato, Tomi Kinnunen
Abstract
The fundamental frequency (F0) represents pitch in speech that determines prosodic characteristics of speech and is needed in various tasks for speech analysis and synthesis. Despite decades of research on this topic, F0 estimation at low signal-to-noise ratios (SNRs) in unexpected noise conditions remains difficult. This work proposes a new approach to noise robust F0 estimation using a recurrent neural network (RNN) trained in a supervised manner. Recent studies employ deep neural networks (DNNs) for F0 tracking as a frame-by-frame classification task into quantised frequency states but we propose waveform-to-sinusoid regression instead to achieve both noise robustness and accurate estimation with increased frequency resolution. Experimental results with PTDB-TUG corpus contaminated by additive noise (NOISEX-92) demonstrate that the proposed method improves gross pitch error (GPE) rate and fine pitch error (FPE) by more than 35 % at SNRs between -10 dB and +10 dB compared with well
Authors
(none)
Tags
Stats
Related papers
- A Regression Model Of Recurrent Deep Neural Networks For Noise Robust Estimation Of The Fundamental Frequency Contour Of Speech (2018)4.52
- Real-time Pitch/f0 Detection Using Spectrogram Images And Convolutional Neural Networks (2025)0.00
- Noisy Speech Based Temporal Decomposition To Improve Fundamental Frequency Estimation (2021)5.24
- DEEPF0: End-to-end Fundamental Frequency Estimation For Music And Speech Signals (2021)10.35
- Traditional Machine Learning For Pitch Detection (2019)10.85
- Nebula: F0 Estimation And Voicing Detection By Modeling The Statistical Properties Of Feature Extractors (2017)3.58
- Towards Parametric Speech Synthesis Using Gaussian-markov Model Of Spectral Envelope And Wavelet-based Decomposition Of F0 (2022)0.00
- Improving Speaker De-identification With Functional Data Analysis Of F0 Trajectories (2022)10.85