Memory-efficient Training For Text-dependent SV With Independent Pre-trained Models
2024 Β· Seyed Ali Farokh, Hossein Zeinali
Abstract
This paper presents our submission to the Iranian division of the Text-Dependent Speaker Verification Challenge (TdSV) 2024. Conventional TdSV approaches typically jointly model speaker and linguistic features, requiring unsegmented inputs during training and incurring high computational costs. Additionally, these methods often fine-tune large-scale pre-trained speaker embedding models on the target domain dataset, which may compromise the pre-trained models' original ability to capture speaker-specific characteristics. To overcome these limitations, we employ a TdSV system that utilizes two pre-trained models independently and demonstrate that, by leveraging pre-trained models with targeted domain adaptation, competitive results can be achieved while avoiding the substantial computational costs associated with joint fine-tuning on unsegmented inputs in conventional approaches. Our best system reached a MinDCF of 0.0358 on the evaluation subset and secured first place in the challenge.
Authors
(none)
Tags
Stats
Related papers
- The SVASR System For Text-dependent Speaker Verification (tdsv) AAIC Challenge 2024 (2024)0.00
- Robust Text-dependent Speaker Verification Via Character-level Information Preservation For The Sdsv Challenge 2020 (2020)0.00
- Text-dependent Speaker Verification (tdsv) Challenge 2024: Challenge Evaluation Plan (2024)0.00
- Integrating Frequency Translational Invariance In Tdnns And Frequency Positional Information In 2D Resnets To Enhance Speaker Verification (2021)12.68
- A Text-dependent Speaker Verification Application Framework Based On Chinese Numerical String Corpus (2023)0.00
- Exploring The Use Of An Unsupervised Autoregressive Model As A Shared Encoder For Text-dependent Speaker Verification (2020)5.84
- Asymmetric And Trial-dependent Modeling: The Contribution Of LIA To Sdsv Challenge Task 2 (2024)0.00
- Short-segment Speaker Verification With Pre-trained Models And Multi-resolution Encoder (2025)0.00