Tackling The Score Shift In Cross-lingual Speaker Verification By Exploiting Language Information
2021 Β· Jenthe Thienpondt, Brecht Desplanques, Kris Demuynck
Abstract
This paper contains a post-challenge performance analysis on cross-lingual speaker verification of the IDLab submission to the VoxCeleb Speaker Recognition Challenge 2021 (VoxSRC-21). We show that current speaker embedding extractors consistently underestimate speaker similarity in within-speaker cross-lingual trials. Consequently, the typical training and scoring protocols do not put enough emphasis on the compensation of intra-speaker language variability. We propose two techniques to increase cross-lingual speaker verification robustness. First, we enhance our previously proposed Large-Margin Fine-Tuning (LM-FT) training stage with a mini-batch sampling strategy which increases the amount of intra-speaker cross-lingual samples within the mini-batch. Second, we incorporate language information in the logistic regression calibration stage. We integrate quality metrics based on soft and hard decisions of a VoxLingua107 language identification model. The proposed techniques result in a
Authors
(none)
Tags
Stats
Related papers
- The IDLAB Voxsrc-20 Submission: Large Margin Fine-tuning And Quality-aware Score Calibration In DNN Based Speaker Verification (2020)12.81
- Cross-lingual Speaker Verification With Deep Feature Learning (2017)8.35
- Analyzing Speaker Verification Embedding Extractors And Back-ends Under Language And Channel Mismatch (2022)0.00
- Squeezing Value Of Cross-domain Labels: A Decoupled Scoring Approach For Speaker Verification (2020)0.00
- Cross-modal Speaker Verification And Recognition: A Multilingual Perspective (2020)0.00
- The Phonexia Voxceleb Speaker Recognition Challenge 2021 System Description (2021)0.00
- Neural Scoring: A Refreshed End-to-end Approach For Speaker Recognition In Complex Conditions (2024)0.00
- Refxvc: Cross-lingual Voice Conversion With Enhanced Reference Leveraging (2024)6.77