3d-speaker-toolkit: An Open-source Toolkit For Multimodal Speaker Verification And Diarization
2024 Β· Yafeng Chen, Siqi Zheng, Hui Wang, et al.
Abstract
We introduce 3D-Speaker-Toolkit, an open-source toolkit for multimodal speaker verification and diarization, designed for meeting the needs of academic researchers and industrial practitioners. The 3D-Speaker-Toolkit adeptly leverages the combined strengths of acoustic, semantic, and visual data, seamlessly fusing these modalities to offer robust speaker recognition capabilities. The acoustic module extracts speaker embeddings from acoustic features, employing both fully-supervised and self-supervised learning approaches. The semantic module leverages advanced language models to comprehend the substance and context of spoken language, thereby augmenting the system's proficiency in distinguishing speakers through linguistic patterns. The visual module applies image processing technologies to scrutinize facial features, which bolsters the precision of speaker diarization in multi-speaker environments. Collectively, these modules empower the 3D-Speaker-Toolkit to achieve substantially imp
Authors
(none)
Tags
Stats
Related papers
- A Toolkit For Joint Speaker Diarization And Identification With Application To Speaker-attributed ASR (2024)0.00
- Integrating Audio, Visual, And Semantic Information For Enhanced Multimodal Speaker Diarization (2024)0.00
- 3d-speaker: A Large-scale Multi-device, Multi-distance, And Multi-dialect Corpus For Speech Representation Disentanglement (2023)0.00
- Exploring Speaker-related Information In Spoken Language Understanding For Better Speaker Diarization (2023)0.00
- Wespeaker: A Research And Production Oriented Speaker Embedding Learning Toolkit (2022)6.22
- Robust Acoustic Domain Identification With Its Application To Speaker Diarization (2022)2.26
- Audio-visual Speaker Diarization Based On Spatiotemporal Bayesian Fusion (2016)14.51
- Joint Training Of Speaker Embedding Extractor, Speech And Overlap Detection For Diarization (2024)2.26