Espnet-spk: Full Pipeline Speaker Embedding Toolkit With Reproducible Recipes, Self-supervised Front-ends, And Off-the-shelf Models
2024 Β· Jee-Weon Jung, Wangyou Zhang, Jiatong Shi, et al.
Abstract
This paper introduces ESPnet-SPK, a toolkit designed with several objectives for training speaker embedding extractors. First, we provide an open-source platform for researchers in the speaker recognition community to effortlessly build models. We provide several models, ranging from x-vector to recent SKA-TDNN. Through the modularized architecture design, variants can be developed easily. We also aspire to bridge developed models with other domains, facilitating the broad research community to effortlessly incorporate state-of-the-art embedding extractors. Pre-trained embedding extractors can be accessed in an off-the-shelf manner and we demonstrate the toolkit's versatility by showcasing its integration with two tasks. Another goal is to integrate with diverse self-supervised learning features. We release a reproducible recipe that achieves an equal error rate of 0.39% on the Vox1-O evaluation protocol using WavLM-Large with ECAPA-TDNN.
Authors
(none)
Tags
Stats
Related papers
- Wespeaker: A Research And Production Oriented Speaker Embedding Learning Toolkit (2022)6.22
- Espnet: End-to-end Speech Processing Toolkit (2018)22.17
- How To Improve Your Speaker Embeddings Extractor In Generic Toolkits (2018)9.76
- Espnet-se: End-to-end Speech Enhancement And Separation Toolkit Designed For Asr Integration (2020)13.55
- Espnet-tts: Unified, Reproducible, And Integratable Open Source End-to-end Text-to-speech Toolkit (2019)23.32
- The 2020 Espnet Update: New Features, Broadened Applications, Performance Improvements, And Future Plans (2020)18.20
- ECAPA2: A Hybrid Neural Network Architecture And Training Strategy For Robust Speaker Embeddings (2024)0.00
- EURO: Espnet Unsupervised ASR Open-source Toolkit (2022)6.77