Spot The Conversation: Speaker Diarisation In The Wild
2020 Β· Joon Son Chung, Jaesung Huh, Arsha Nagrani, et al.
Abstract
The goal of this paper is speaker diarisation of videos collected 'in the wild'. We make three key contributions. First, we propose an automatic audio-visual diarisation method for YouTube videos. Our method consists of active speaker detection using audio-visual methods and speaker verification using self-enrolled speaker models. Second, we integrate our method into a semi-automatic dataset creation pipeline which significantly reduces the number of hours required to annotate videos with diarisation labels. Finally, we use this pipeline to create a large-scale diarisation dataset called VoxConverse, collected from 'in the wild' videos, which we will release publicly to the research community. Our dataset consists of overlapping speech, a large and diverse speaker pool, and challenging background conditions.
Authors
(none)
Tags
Stats
Related papers
- Speaker Diarization As A Fully Online Learning Problem In Minivox (2020)0.00
- Microsoft Speaker Diarization System For The Voxceleb Speaker Recognition Challenge 2020 (2020)11.93
- The HUAWEI Speaker Diarisation System For The Voxceleb Speaker Diarisation Challenge (2020)0.00
- Late Audio-visual Fusion For In-the-wild Speaker Diarization (2022)3.58
- Data Efficient Child-adult Speaker Diarization With Simulated Conversations (2024)0.00
- Look Who's Not Talking (2020)0.00
- Exploring Detection-based Method For Speaker Diarization @ Ego4d Audio-only Diarization Challenge 2022 (2022)0.00
- A Reinforcement Learning Framework For Online Speaker Diarization (2023)0.00