Prediction Of Head Motion From Speech Waveforms With A Canonical-correlation-constrained Autoencoder
2020 Β· Jinhong Lu, Hiroshi Shimodaira
Abstract
This study investigates the direct use of speech waveforms to predict head motion for speech-driven head-motion synthesis, whereas the use of spectral features such as MFCC as basic input features together with additional features such as energy and F0 is common in the literature. We show that, rather than combining different features that originate from waveforms, it is more effective to use waveforms directly predicting corresponding head motion. The challenge with the waveform-based approach is that waveforms contain a large amount of information irrelevant to predict head motion, which hinders the training of neural networks. To overcome the problem, we propose a canonical-correlation-constrained autoencoder (CCCAE), where hidden layers are trained to not only minimise the error but also maximise the canonical correlation with head motion. Compared with an MFCC-based system, the proposed system shows comparable performance in objective evaluation, and better performance in subject
Authors
(none)
Tags
Stats
Related papers
- Speech Waveform Synthesis From MFCC Sequences With Generative Adversarial Networks (2018)12.25
- Multiview Canonical Correlation Analysis For Automatic Pathological Speech Detection (2024)2.26
- A Comparison Of Self-supervised Speech Representations As Input Features For Unsupervised Acoustic Word Embeddings (2020)7.16
- Bag-of-audio-words Based On Autoencoder Codebook For Continuous Emotion Prediction (2019)0.00
- Fast Spectrogram Inversion Using Multi-head Convolutional Neural Networks (2018)14.39
- Improved Speech Representations With Multi-target Autoregressive Predictive Coding (2020)10.97
- Learning Speech Representations From Raw Audio By Joint Audiovisual Self-supervision (2020)0.00
- Unsupervised Audiovisual Synthesis Via Exemplar Autoencoders (2020)0.00