Memory-efficient Training For Deep Speaker Embedding Learning In Speaker Verification
2024 Β· Bei Liu, Yanmin Qian
Abstract
Recent speaker verification (SV) systems have shown a trend toward adopting deeper speaker embedding extractors. Although deeper and larger neural networks can significantly improve performance, their substantial memory requirements hinder training on consumer GPUs. In this paper, we explore a memory-efficient training strategy for deep speaker embedding learning in resource-constrained scenarios. Firstly, we conduct a systematic analysis of GPU memory allocation during SV system training. Empirical observations show that activations and optimizer states are the main sources of memory consumption. For activations, we design two types of reversible neural networks which eliminate the need to store intermediate activations during back-propagation, thereby significantly reducing memory usage without performance loss. For optimizer states, we introduce a dynamic quantization approach that replaces the original 32-bit floating-point values with a dynamic tree-based 8-bit data type. Experime
Authors
(none)
Tags
Stats
Related papers
- Hiddenspeaker: Generate Imperceptible Unlearnable Audios For Speaker Verification System (2024)2.26
- Small Footprint Text-independent Speaker Verification For Embedded Systems (2020)7.16
- Unified Hypersphere Embedding For Speaker Recognition (2018)0.00
- Deep Speaker Embeddings For Far-field Speaker Recognition On Short Utterances (2020)11.29
- Deep Speaker Embedding Learning With Multi-level Pooling For Text-independent Speaker Verification (2019)0.00
- Towards Lightweight Speaker Verification Via Adaptive Neural Network Quantization (2024)5.84
- Deep Neural Network Embeddings With Gating Mechanisms For Text-independent Speaker Verification (2019)8.82
- Improving Embedding Extraction For Speaker Verification With Ladder Network (2020)0.00