VoxCeleb2
Canonical25papers using it
2022first seen
VoxCeleb2 is a dataset that contains a large collection of speech samples from various speakers, used to evaluate speaker recognition systems.
Papers using VoxCeleb2 (24)
- Multilingual Audio-visual Speech Recognition With Hybrid CTC/RNN-T Fast ConformerSelf-supervised Training Of Speaker Encoder With Multi-modal Diverse Positive PairsOne-step Knowledge Distillation And Fine-tuning In Using Large Pre-trained Self-supervised Learning Models For Speaker VerificationSeeing Through The Conversation: Audio-visual Speech Separation Based On Diffusion ModelSpeaker Recognition Using Isomorphic Graph Attention Network Based Pooling On Self-supervised RepresentationTarget Speech Extraction With Pre-trained Av-hubert And Mask-and-recover StrategyCL-UZH submission to the NIST SRE 2024 Speaker Recognition EvaluationMulti-domain Adaptation By Self-supervised Learning For Speaker VerificationSpeaker Verification Using Attentive Multi-scale Convolutional Recurrent NetworkLibri2vox Dataset: Target Speaker Extraction With Diverse Speaker Conditions And Synthetic DataCan large-scale vocoded spoofed data improve speech spoofing
countermeasure with a self-supervised front end?Self-Supervised Training of Speaker Encoder with Multi-Modal Diverse
Positive PairsSpeaker Recognition Using Isomorphic Graph Attention Network Based
Pooling on Self-Supervised RepresentationSeeing Through the Conversation: Audio-Visual Speech Separation based on
Diffusion ModelTarget Speech Diarization with Multimodal PromptsMechanisms of Multimodal Synchronization: Insights from Decoder-Based Video-Text-to-Speech SynthesisAuto-AVSR: Audio-Visual Speech Recognition with Automatic LabelsA vector quantized masked autoencoder for speech emotion recognitionOne-Step Knowledge Distillation and Fine-Tuning in Using Large
Pre-Trained Self-Supervised Learning Models for Speaker VerificationSpeaker verification using attentive multi-scale convolutional recurrent
networkTarget Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover
StrategyMultilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast
ConformerLibri2Vox Dataset: Target Speaker Extraction with Diverse Speaker
Conditions and Synthetic DataFew-Shot Speaker Identification Using Lightweight Prototypical Network
with Feature Grouping and Interaction