Dropclass And Dropadapt: Dropping Classes For Deep Speaker Representation Learning
2020 Β· Chau Luu, Peter Bell, Steve Renals
Abstract
Many recent works on deep speaker embeddings train their feature extraction networks on large classification tasks, distinguishing between all speakers in a training set. Empirically, this has been shown to produce speaker-discriminative embeddings, even for unseen speakers. However, it is not clear that this is the optimal means of training embeddings that generalize well. This work proposes two approaches to learning embeddings, based on the notion of dropping classes during training. We demonstrate that both approaches can yield performance gains in speaker verification tasks. The first proposed method, DropClass, works via periodically dropping a random subset of classes from the training data and the output layer throughout training, resulting in a feature extractor trained on many different classification tasks. Combined with an additive angular margin loss, this method can yield a 7.9% relative improvement in equal error rate (EER) over a strong baseline on VoxCeleb. The second
Authors
(none)
Tags
Stats
Related papers
- Margin Matters: Towards More Discriminative Deep Neural Network Embeddings For Speaker Recognition (2019)15.25
- DEAAN: Disentangled Embedding And Adversarial Adaptation Network For Robust Speaker Representation Learning (2020)9.59
- A Comparative Re-assessment Of Feature Extractors For Deep Speaker Embeddings (2020)8.09
- Intra-class Variation Reduction Of Speaker Representation In Disentanglement Framework (2020)8.35
- Training Speaker Embedding Extractors Using Multi-speaker Audio With Unknown Speaker Boundaries (2022)3.58
- Editnet: A Lightweight Network For Unsupervised Domain Adaptation In Speaker Verification (2022)5.84
- Unified Hypersphere Embedding For Speaker Recognition (2018)0.00
- How To Improve Your Speaker Embeddings Extractor In Generic Toolkits (2018)9.76