VoxCeleb
Canonical40papers using it
2022first seen
VoxCeleb is a dataset containing naturally noisy speech recordings used to evaluate the generalization of speech separation systems in real-world scenarios.
Papers using VoxCeleb (39)
- CAM++: A Fast And Efficient Network For Speaker Verification Using Context-aware MaskingMFA: TDNN With Multi-scale Frequency-channel Attention For Text-independent Speaker Verification With Short UtterancesLeveraging ASR Pretrained Conformers For Speaker Verification Through Transfer Learning And Knowledge DistillationConvolution-based Channel-frequency Attention For Text-independent Speaker VerificationEditnet: A Lightweight Network For Unsupervised Domain Adaptation In Speaker VerificationSelf-distillation Prototypes Network: Learning Robust Speaker Representations Without SupervisionTASLA: Text-Aligned Speech Tokens with Multiple Layer-AggregationModel Compression For Dnn-based Speaker Verification Using Weight QuantizationNonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-SpeechDisentangling Voice And Content With Self-supervision For Speaker RecognitionRing Mixing with Auxiliary Signal-to-Consistency-Error Ratio Loss for Unsupervised Denoising in Speech SeparationVclip: Face-based Speaker Generation by Face-voice Association LearningRethinking Leveraging Pre-Trained Multi-Layer Representations for Speaker VerificationMagnitude and Phase-based Feature Fusion Using Co-attention Mechanism for Speaker recognitionEffective Modeling of Critical Contextual Information for TDNN-based Speaker VerificationShort-Segment Speaker Verification with Pre-trained Models and Multi-Resolution EncoderClustering-based hard negative sampling for supervised contrastive speaker verificationMGFF-TDNN: A Multi-granularity Feature Fusion TDNN Model With Depth-wise Separable Module For Speaker VerificationMGFF-TDNN: A Multi-Granularity Feature Fusion TDNN Model with Depth-Wise
Separable Module for Speaker VerificationImproving Transformer-based Networks With Locality For Automatic Speaker VerificationSE/BN Adapter: Parametric Efficient Domain Adaptation For Speaker RecognitionSelf-film: Conditioning Gans With Self-supervised Representations For Bandwidth Extension Based Speaker RecognitionLaugh Betrays You? Learning Robust Speaker Representation From Speech Containing Non-verbal FragmentsDisentangling Voice and Content with Self-Supervision for Speaker
RecognitionCAM++: A Fast and Efficient Network for Speaker Verification Using
Context-Aware MaskingConvolution-Based Channel-Frequency Attention for Text-Independent
Speaker VerificationEDITnet: A Lightweight Network for Unsupervised Domain Adaptation in
Speaker VerificationMulti-Frequency Information Enhanced Channel Attention Module for
Speaker Representation LearningToroidal Probabilistic Spherical Discriminant AnalysisLaugh Betrays You? Learning Robust Speaker Representation From Speech
Containing Non-Verbal FragmentsModel Compression for DNN-based Speaker Verification Using Weight
QuantizationDistance-based Weight Transfer from Near-field to Far-field Speaker
VerificationSelf-FiLM: Conditioning GANs with self-supervised representations for
bandwidth extension based speaker recognitionOrdered and Binary Speaker EmbeddingLeveraging ASR Pretrained Conformers for Speaker Verification through
Transfer Learning and Knowledge DistillationSE/BN Adapter: Parametric Efficient Domain Adaptation for Speaker
RecognitionAV-CrossNet: an Audiovisual Complex Spectral Mapping Network for Speech
Separation By Leveraging Narrow- and Cross-Band ModelingM-Vec: Matryoshka Speaker Embeddings with Flexible DimensionsNeural Scoring: A Refreshed End-to-End Approach for Speaker Recognition in Complex Conditions