Reconstructing Faces From Voices
2019 Β· Yandong Wen, Rita Singh, Bhiksha Raj
Abstract
Voice profiling aims at inferring various human parameters from their speech, e.g. gender, age, etc. In this paper, we address the challenge posed by a subtask of voice profiling - reconstructing someone's face from their voice. The task is designed to answer the question: given an audio clip spoken by an unseen person, can we picture a face that has as many common elements, or associations as possible with the speaker, in terms of identity? To address this problem, we propose a simple but effective computational framework based on generative adversarial networks (GANs). The network learns to generate faces from voices by matching the identities of generated faces to those of the speakers, on a training set. We evaluate the performance of the network by leveraging a closely related task - cross-modal matching. The results show that our model is able to generate faces that match several biometric characteristics of the speaker, and results in matching accuracies that are much better tha
Authors
(none)
Tags
Stats
Related papers
- From Inference To Generation: End-to-end Fully Self-supervised Generation Of Human Face From Speech (2020)0.00
- Voice Impersonation Using Generative Adversarial Networks (2018)13.23
- Facetron: A Multi-speaker Face-to-speech Model Based On Cross-modal Latent Representations (2021)0.00
- Seeing Your Speech Style: A Novel Zero-shot Identity-disentanglement Face-based Voice Conversion (2024)4.52
- Investigating Deep Neural Structures And Their Interpretability In The Domain Of Voice Conversion (2021)0.00
- From Faces To Voices: Learning Hierarchical Representations For High-quality Video-to-speech (2025)0.00
- Facespeak: Expressive And High-quality Speech Synthesis From Human Portraits Of Different Styles (2025)0.00
- Video-driven Speech Reconstruction Using Generative Adversarial Networks (2019)11.39