Time-domain Speech Super-resolution With GAN Based Modeling For Telephony Speaker Verification
2022 · Saurabh Kataria, Jesús Villalba, Laureano Moro-Velázquez, et al.
Abstract
Automatic Speaker Verification (ASV) technology has become commonplace in virtual assistants. However, its performance suffers when there is a mismatch between the train and test domains. Mixed bandwidth training, i.e., pooling training data from both domains, is a preferred choice for developing a universal model that works for both narrowband and wideband domains. We propose complementing this technique by performing neural upsampling of narrowband signals, also known as bandwidth extension. Our main goal is to discover and analyze high-performing time-domain Generative Adversarial Network (GAN) based models to improve our downstream state-of-the-art ASV system. We choose GANs since they (1) are powerful for learning conditional distribution and (2) allow flexible plug-in usage as a pre-processor during the training of downstream task (ASV) with data augmentation. Prior works mainly focus on feature-domain bandwidth extension and limited experimental setups. We address these limitati
Authors
(none)
Tags
Stats
Related papers
- Joint Domain Adaptation And Speech Bandwidth Extension Using Time-domain Gans For Speaker Verification (2022)4.52
- Self-film: Conditioning Gans With Self-supervised Representations For Bandwidth Extension Based Speaker Recognition (2023)0.00
- Channel-aware Domain-adaptive Generative Adversarial Network For Robust Speech Recognition (2024)4.52
- Generative Adversarial Speaker Embedding Networks For Domain Robust End-to-end Speaker Verification (2018)0.00
- DSPGAN: A Gan-based Universal Vocoder For High-fidelity TTS By Time-frequency Domain Supervision From DSP (2022)9.03
- Efficient Acoustic Feature Transformation In Mismatched Environments Using A Guided-gan (2022)2.26
- Fine-tuning Of Pre-trained End-to-end Speech Recognition With Generative Adversarial Networks (2021)5.84
- Application Of ASV For Voice Identification After VC And Duration Predictor Improvement In TTS Models (2024)0.00