Controllable Generation Of Artificial Speaker Embeddings Through Discovery Of Principal Directions
2023 Β· Florian Lux, Pascal Tilli, Sarina Meyer, et al.
Abstract
Customizing voice and speaking style in a speech synthesis system with intuitive and fine-grained controls is challenging, given that little data with appropriate labels is available. Furthermore, editing an existing human's voice also comes with ethical concerns. In this paper, we propose a method to generate artificial speaker embeddings that cannot be linked to a real human while offering intuitive and fine-grained control over the voice and speaking style of the embeddings, without requiring any labels for speaker or style. The artificial and controllable embeddings can be fed to a speech synthesis system, conditioned on embeddings of real humans during training, without sacrificing privacy during inference.
Authors
(none)
Tags
Stats
Related papers
- Natural Language Guidance Of High-fidelity Text-to-speech With Synthetic Annotations (2024)0.00
- Deep Encoder-decoder Models For Unsupervised Learning Of Controllable Speech Synthesis (2018)0.00
- Rethinking Speaker Embeddings For Speech Generation: Sub-center Modeling For Capturing Intra-speaker Diversity (2024)0.00
- Voxgenesis: Unsupervised Discovery Of Latent Speaker Manifold For Speech Synthesis (2024)0.00
- Self-supervised Context-aware Style Representation For Expressive Speech Synthesis (2022)6.34
- Vevo: Controllable Zero-shot Voice Imitation With Self-supervised Disentanglement (2025)0.00
- Diffv2s: Diffusion-based Video-to-speech Synthesis With Vision-guided Speaker Embedding (2023)8.82
- Anonymizing Speech With Generative Adversarial Networks To Preserve Speaker Privacy (2022)11.19