Look Who's Not Talking
2020 Β· Youngki Kwon, Hee Soo Heo, Jaesung Huh, et al.
Abstract
The objective of this work is speaker diarisation of speech recordings 'in the wild'. The ability to determine speech segments is a crucial part of diarisation systems, accounting for a large proportion of errors. In this paper, we present a simple but effective solution for speech activity detection based on the speaker embeddings. In particular, we discover that the norm of the speaker embedding is an extremely effective indicator of speech activity. The method does not require an independent model for speech activity detection, therefore allows speaker diarisation to be performed using a unified representation for both speaker modelling and speech activity detection. We perform a number of experiments on in-house and public datasets, in which our method outperforms popular baselines.
Authors
(none)
Tags
Stats
Related papers
- Advancing The Dimensionality Reduction Of Speaker Embeddings For Speaker Diarisation: Disentangling Noise And Informing Speech Activity (2021)2.26
- Speaker Embeddings With Weakly Supervised Voice Activity Detection For Efficient Speaker Diarization (2024)2.26
- Spot The Conversation: Speaker Diarisation In The Wild (2020)15.31
- Joint Training Of Speaker Embedding Extractor, Speech And Overlap Detection For Diarization (2024)2.26
- USED: Universal Speaker Extraction And Diarization (2023)7.50
- Leveraging Speaker Embeddings In End-to-end Neural Diarization For Two-speaker Scenarios (2024)0.00
- Target-speaker Voice Activity Detection: A Novel Approach For Multi-speaker Diarization In A Dinner Party Scenario (2020)16.19
- Speaker Diarization Using Deep Recurrent Convolutional Neural Networks For Speaker Embeddings (2017)9.41