Nebula: F0 Estimation And Voicing Detection By Modeling The Statistical Properties Of Feature Extractors
2017 Β· Kanru Hua
Abstract
A F0 and voicing status estimation algorithm for high quality speech analysis/synthesis is proposed. This problem is approached from a different perspective that models the behavior of feature extractors under noise, instead of directly modeling speech signals. Under time-frequency locality assumptions, the joint distribution of extracted features and target F0 can be characterized by training a bank of Gaussian mixture models (GMM) on artificial data generated from Monte-Carlo simulations. The trained GMMs can then be used to generate a set of conditional distributions on the predicted F0, which are then combined and post-processed by Viterbi algorithm to give a final F0 trajectory. Evaluation on CSTR and CMU Arctic speech databases shows that the proposed method, trained on fully synthetic data, achieves lower gross error rates than state-of-the-art methods.
Authors
(none)
Tags
Stats
Related papers
- A Regression Model Of Recurrent Deep Neural Networks For Noise Robust Estimation Of The Fundamental Frequency Contour Of Speech (2018)4.52
- Traditional Machine Learning For Pitch Detection (2019)10.85
- Waveform To Single Sinusoid Regression To Estimate The F0 Contour From Noisy Speech Using Recurrent Deep Neural Networks (2018)6.77
- Noisy Speech Based Temporal Decomposition To Improve Fundamental Frequency Estimation (2021)5.24
- Singing Voice Separation And Vocal F0 Estimation Based On Mutual Combination Of Robust Principal Component Analysis And Subharmonic Summation (2016)10.74
- Unsupervised Voice Activity Detection By Modeling Source And System Information Using Zero Frequency Filtering (2022)6.34
- Towards Parametric Speech Synthesis Using Gaussian-markov Model Of Spectral Envelope And Wavelet-based Decomposition Of F0 (2022)0.00
- Hf0: A Hybrid Pitch Extraction Method For Multimodal Voice (2019)0.00