Variable Frame Rate-based Data Augmentation To Handle Speaking-style Variability For Automatic Speaker Verification
2020 Β· Amber Afshan, Jinxi Guo, Soo Jin Park, et al.
Abstract
The effects of speaking-style variability on automatic speaker verification were investigated using the UCLA Speaker Variability database which comprises multiple speaking styles per speaker. An x-vector/PLDA (probabilistic linear discriminant analysis) system was trained with the SRE and Switchboard databases with standard augmentation techniques and evaluated with utterances from the UCLA database. The equal error rate (EER) was low when enrollment and test utterances were of the same style (e.g., 0.98% and 0.57% for read and conversational speech, respectively), but it increased substantially when styles were mismatched between enrollment and test utterances. For instance, when enrolled with conversation utterances, the EER increased to 3.03%, 2.96% and 22.12% when tested on read, narrative, and pet-directed speech, respectively. To reduce the effect of style mismatch, we propose an entropy-based variable frame rate technique to artificially generate style-normalized representations
Authors
(none)
Tags
Stats
Related papers
- Attention-based Conditioning Methods Using Variable Frame Rate For Style-robust Speaker Verification (2022)2.26
- On-the-fly Feature Based Rapid Speaker Adaptation For Dysarthric And Elderly Speech Recognition (2022)6.34
- Deep Representation Decomposition For Rate-invariant Speaker Verification (2022)2.26
- Unsupervised Feature Enhancement For Speaker Verification (2019)5.84
- Data Augmentation Enhanced Speaker Enrollment For Text-dependent Speaker Verification (2020)0.00
- Unit Selection Synthesis Based Data Augmentation For Fixed Phrase Speaker Verification (2021)7.50
- PAS: Partial Additive Speech Data Augmentation Method For Noise Robust Speaker Verification (2023)0.00
- Improving Speaker Verification Robustness With Synthetic Emotional Utterances (2024)0.00