Deeptalk: Vocal Style Encoding For Speaker Recognition And Speech Synthesis
2020 Β· Anurag Chowdhury, Arun Ross, Prabu David
Abstract
Automatic speaker recognition algorithms typically characterize speech audio using short-term spectral features that encode the physiological and anatomical aspects of speech production. Such algorithms do not fully capitalize on speaker-dependent characteristics present in behavioral speech features. In this work, we propose a prosody encoding network called DeepTalk for extracting vocal style features directly from raw audio data. The DeepTalk method outperforms several state-of-the-art speaker recognition systems across multiple challenging datasets. The speaker recognition performance is further improved by combining DeepTalk with a state-of-the-art physiological speech feature-based speaker recognition system. We also integrate DeepTalk into a current state-of-the-art speech synthesizer to generate synthetic speech. A detailed analysis of the synthetic speech shows that the DeepTalk captures F0 contours essential for vocal style modeling. Furthermore, DeepTalk-based synthetic spee
Authors
(none)
Tags
Stats
Related papers
- Deepvox: Discovering Features From Raw Audio For Speaker Recognition In Non-ideal Audio Signals (2020)0.00
- Facespeak: Expressive And High-quality Speech Synthesis From Human Portraits Of Different Styles (2025)0.00
- Combining Automatic Speaker Verification And Prosody Analysis For Synthetic Speech Detection (2022)10.48
- Vocal Style Factorization For Effective Speaker Recognition In Affective Scenarios (2023)0.00
- Enriching Source Style Transfer In Recognition-synthesis Based Non-parallel Voice Conversion (2021)9.23
- End-to-end Text-to-speech Based On Latent Representation Of Speaking Styles Using Spontaneous Dialogue (2022)8.35
- STYLER: Style Factor Modeling With Rapidity And Robustness Via Speech Decomposition For Expressive And Controllable Neural Text To Speech (2021)9.23
- Multi-speaker Multi-style Speech Synthesis With Timbre And Style Disentanglement (2022)6.77