Real-time Pitch/f0 Detection Using Spectrogram Images And Convolutional Neural Networks
2025 Β· Xufang Zhao, Omer Tsimhoni
Abstract
This paper presents a novel approach to detect F0 through Convolutional Neural Networks and image processing techniques to directly estimate pitch from spectrogram images. Our new approach demonstrates a very good detection accuracy; a total of 92% of predicted pitch contours have strong or moderate correlations to the true pitch contours. Furthermore, the experimental comparison between our new approach and other state-of-the-art CNN methods reveals that our approach can enhance the detection rate by approximately 5% across various Signal-to-Noise Ratio conditions.
Authors
(none)
Tags
Stats
Related papers
- Traditional Machine Learning For Pitch Detection (2019)10.85
- Waveform To Single Sinusoid Regression To Estimate The F0 Contour From Noisy Speech Using Recurrent Deep Neural Networks (2018)6.77
- DEEPF0: End-to-end Fundamental Frequency Estimation For Music And Speech Signals (2021)10.35
- Multiple F0 Estimation In Vocal Ensembles Using Convolutional Neural Networks (2020)0.00
- A Regression Model Of Recurrent Deep Neural Networks For Noise Robust Estimation Of The Fundamental Frequency Contour Of Speech (2018)4.52
- Hf0: A Hybrid Pitch Extraction Method For Multimodal Voice (2019)0.00
- Noisy Speech Based Temporal Decomposition To Improve Fundamental Frequency Estimation (2021)5.24
- Human Voice Pitch Estimation: A Convolutional Network With Auto-labeled And Synthetic Data (2023)0.00