3d-speaker: A Large-scale Multi-device, Multi-distance, And Multi-dialect Corpus For Speech Representation Disentanglement
2023 Β· Siqi Zheng, Luyao Cheng, Yafeng Chen, et al.
Abstract
Disentangling uncorrelated information in speech utterances is a crucial research topic within speech community. Different speech-related tasks focus on extracting distinct speech representations while minimizing the affects of other uncorrelated information. We present a large-scale speech corpus to facilitate the research of speech representation disentanglement. 3D-Speaker contains over 10,000 speakers, each of whom are simultaneously recorded by multiple Devices, locating at different Distances, and some speakers are speaking multiple Dialects. The controlled combinations of multi-dimensional audio data yield a matrix of a diverse blend of speech representation entanglement, thereby motivating intriguing methods to untangle them. The multi-domain nature of 3D-Speaker also makes it a suitable resource to evaluate large universal speech models and experiment methods of out-of-domain learning and self-supervised learning. https://3dspeaker.github.io/
Authors
(none)
Tags
Stats
Related papers
- 3d-speaker-toolkit: An Open-source Toolkit For Multimodal Speaker Verification And Diarization (2024)6.93
- Learning Disentangled Speech Representations (2023)0.00
- Contentvec: An Improved Self-supervised Speech Representation By Disentangling Speakers (2022)0.00
- Towards The Next Frontier In Speech Representation Learning Using Disentanglement (2024)0.00
- Self-supervised Disentangled Representation Learning For Robust Target Speech Extraction (2023)5.24
- Disentangled Representation Learning For Multilingual Speaker Recognition (2022)6.34
- Disentangled Representation Learning For Environment-agnostic Speaker Recognition (2024)4.82
- Advancing The Dimensionality Reduction Of Speaker Embeddings For Speaker Diarisation: Disentangling Noise And Informing Speech Activity (2021)2.26