Supervised Initialization Of LSTM Networks For Fundamental Frequency Detection In Noisy Speech Signals
2019 · Marvin Coto-Jimenez
Abstract
Fundamental frequency is one of the most important parameters of human speech, of importance for the classification of accent, gender, speaking styles, speaker identification, age, among others. The proper detection of this parameter remains as an important challenge for severely degraded signals. In previous references for detecting fundamental frequency in noisy speech using deep learning, the networks, such as Long Short-term Memory (LSTM) has been initialized with random weights, and then trained following a back-propagation through time algorithm. In this work, a proposal for a more efficient initialization, based on a supervised training using an Auto-associative network, is presented. This initialization is a better starting point for the detection of fundamental frequency in noisy speech. The advantages of this initialization are noticeable using objective measures for the accuracy of the detection and for the training of the networks, under the presence of additive white noise
Authors
(none)
Tags
Stats
Related papers
- A Regression Model Of Recurrent Deep Neural Networks For Noise Robust Estimation Of The Fundamental Frequency Contour Of Speech (2018)4.52
- Waveform To Single Sinusoid Regression To Estimate The F0 Contour From Noisy Speech Using Recurrent Deep Neural Networks (2018)6.77
- An Attention Long Short-term Memory Based System For Automatic Classification Of Speech Intelligibility (2024)12.33
- Noisy Speech Based Temporal Decomposition To Improve Fundamental Frequency Estimation (2021)5.24
- Narrow-band Deep Filtering For Multichannel Speech Enhancement (2019)0.00
- Multi-view Frequency LSTM: An Efficient Frontend For Automatic Speech Recognition (2020)0.00
- DEEPF0: End-to-end Fundamental Frequency Estimation For Music And Speech Signals (2021)10.35
- Tensor-train Long Short-term Memory For Monaural Speech Enhancement (2018)0.00