Generating Mandarin And Cantonese F0 Contours With Decision Trees And Blstms
2018 Β· Weidong Yuan, Alan W Black
Abstract
This paper models the fundamental frequency contours on both Mandarin and Cantonese speech with decision trees and DNNs (deep neural networks). Different kinds of f0 representations and model architectures are tested for decision trees and DNNs. A new model called Additive-BLSTM (additive bidirectional long short term memory) that predicts a base f0 contour and a residual f0 contour with two BLSTMs is proposed. With respect to objective measures of RMSE and correlation, applying tone-dependent trees together with sample normalization and delta feature regularization within decision tree framework performs best. While the new Additive-BLSTM model with delta feature regularization performs even better. Subjective listening tests on both Mandarin and Cantonese comparing Random Forest model (multiple decision trees) and the Additive-BLSTM model were also held and confirmed the advantage of the new model according to the listeners' preference.
Authors
(none)
Tags
Stats
Related papers
- The Realization Of Tones In Spontaneous Spoken Taiwan Mandarin: A Corpus-based Survey And Theory-driven Computational Modeling (2025)0.00
- A Regression Model Of Recurrent Deep Neural Networks For Noise Robust Estimation Of The Fundamental Frequency Contour Of Speech (2018)4.52
- Investigation Of Deep Neural Network Acoustic Modelling Approaches For Low Resource Accented Mandarin Speech Recognition (2022)0.00
- Mandarin Tone Modeling Using Recurrent Neural Networks (2017)0.00
- Waveform To Single Sinusoid Regression To Estimate The F0 Contour From Noisy Speech Using Recurrent Deep Neural Networks (2018)6.77
- End-to-end Mandarin Tone Classification With Short Term Context Information (2021)0.00
- Research On Modeling Units Of Transformer Transducer For Mandarin Speech Recognition (2020)0.00
- Modeling L1 Influence On L2 Pronunciation: An Mfcc-based Framework For Explainable Machine Learning And Pedagogical Feedback (2025)0.00