Speech-driven Facial Animation Using Polynomial Fusion Of Features
2019 Β· Triantafyllos Kefalas, Konstantinos Vougioukas, Yannis Panagakis, et al.
Abstract
Speech-driven facial animation involves using a speech signal to generate realistic videos of talking faces. Recent deep learning approaches to facial synthesis rely on extracting low-dimensional representations and concatenating them, followed by a decoding step of the concatenated vector. This accounts for only first-order interactions of the features and ignores higher-order interactions. In this paper we propose a polynomial fusion layer that models the joint representation of the encodings by a higher-order polynomial, with the parameters modelled by a tensor decomposition. We demonstrate the suitability of this approach through experiments on generated videos evaluated on a range of metrics on video quality, audiovisual synchronisation and generation of blinks.
Authors
(none)
Tags
Stats
Related papers
- Pmmtalk: Speech-driven 3D Facial Animation From Complementary Pseudo Multi-modal Features (2023)3.58
- Audio2face: Generating Speech/face Animation From Single Audio With Attention-based Bidirectional LSTM Networks (2019)12.10
- See The Speaker: Crafting High-resolution Talking Faces From Speech With Prior Guidance And Region Refinement (2025)0.00
- From Faces To Voices: Learning Hierarchical Representations For High-quality Video-to-speech (2025)0.00
- Df-3dface: One-to-many Speech Synchronized 3D Face Animation With Diffusion (2023)0.00
- Multi-layer Feature Fusion Convolution Network For Audio-visual Speech Enhancement (2021)0.00
- Said: Speech-driven Blendshape Facial Animation With Diffusion (2023)0.00
- Transformer-s2a: Robust And Efficient Speech-to-animation (2021)8.35