AMI
Emerging32papers using it
2022first seen
The AMI dataset is a collection of annotated meeting recordings used to evaluate speech recognition systems, particularly in the context of unsupervised domain adaptation.
Papers using AMI (32)
- Multitask Detection Of Speaker Changes, Overlapping Speech And Voice Activity Using Wav2vec 2.0Scaling Multi-Talker ASR with Speaker-Agnostic Activity StreamsSpeaker Embeddings With Weakly Supervised Voice Activity Detection For Efficient Speaker DiarizationTeaching the Teachers: Boosting unsupervised domain adaptation in speech recognition by ensemble updateBiRQ: Bi-Level Self-Labeling Random Quantization for Self-Supervised Speech RecognitionTowards Robust Overlapping Speech Detection: A Speaker-Aware Progressive Approach Using WavLMDiCoW: Diarization-Conditioned Whisper for Target Speaker Automatic
Speech RecognitionTree-constrained Pointer Generator With Graph Neural Network Encodings For Contextual Speech RecognitionEnd-to-end Multichannel Speaker-attributed ASR: Speaker Guided Decoder And Input Feature AnalysisGPU-accelerated Guided Source Separation for Meeting TranscriptionAdapting self-supervised models to multi-talker speech recognition using
speaker embeddingsSupervised Hierarchical Clustering using Graph Neural Networks for
Speaker DiarizationAdapting Multi-Lingual ASR Models for Handling Multiple TalkersSURT 2.0: Advances in Transducer-based Multi-talker Speech RecognitionEnd-to-end Multichannel Speaker-Attributed ASR: Speaker Guided Decoder
and Input Feature AnalysisXLSR-Transducer: Streaming ASR for Self-Supervised Pretrained ModelsTree-constrained Pointer Generator with Graph Neural Network Encodings
for Contextual Speech RecognitionESSumm: Extractive Speech Summarization from Untranscribed MeetingG-Augment: Searching for the Meta-Structure of Data Augmentation
Policies for ASRSpectral Clustering-aware Learning of Embeddings for Speaker DiarisationMultitask Detection of Speaker Changes, Overlapping Speech and Voice
Activity Using wav2vec 2.0Speech separation with large-scale self-supervised learningLeveraging Cross-Utterance Context For ASR DecodingEnd-to-End Supervised Hierarchical Graph Clustering for Speaker
DiarizationOn Speaker Attribution with SURTProgressive unsupervised domain adaptation for ASR using ensemble models
and multi-stage trainingConcurrent Speaker Detection: A multi-microphone Transformer-Based
ApproachSpeaker Embeddings With Weakly Supervised Voice Activity Detection For
Efficient Speaker DiarizationAdvancing Multi-talker ASR Performance with Large Language ModelsLS-EEND: Long-Form Streaming End-to-End Neural Diarization with Online Attractor ExtractionImproving Automatic Speech Recognition with Decoder-Centric
Regularisation in Encoder-Decoder ModelsOnline speaker diarization of meetings guided by speech separation