Channel-combination Algorithms For Robust Distant Voice Activity And Overlapped Speech Detection
2024 · Théo Mariotte, Anthony Larcher, Silvio Montrésor, et al.
Abstract
Voice Activity Detection (VAD) and Overlapped Speech Detection (OSD) are key pre-processing tasks for speaker diarization. In the meeting context, it is often easier to capture speech with a distant device. This consideration however leads to severe performance degradation. We study a unified supervised learning framework to solve distant multi-microphone joint VAD and OSD (VAD+OSD). This paper investigates various multi-channel VAD+OSD front-ends that weight and combine incoming channels. We propose three algorithms based on the Self-Attention Channel Combinator (SACC), previously proposed in the literature. Experiments conducted on the AMI meeting corpus exhibit that channel combination approaches bring significant VAD+OSD improvements in the distant speech scenario. Specifically, we explore the use of learned complex combination weights and demonstrate the benefits of such an approach in terms of explainability. Channel combination-based VAD+OSD systems are evaluated on the final ba
Authors
(none)
Tags
Stats
Related papers
- Joint Speech And Overlap Detection: A Benchmark Over Multiple Audio Setup And Speech Domains (2023)0.00
- Speech Enhancement Aided End-to-end Multi-task Learning For Voice Activity Detection (2020)11.49
- Cross-channel Attention-based Target Speaker Voice Activity Detection: Experimental Results For M2met Challenge (2022)10.07
- Adversarial Multi-task Deep Learning For Noise-robust Voice Activity Detection With Low Algorithmic Delay (2022)2.26
- Self-attention Channel Combinator Frontend For End-to-end Multichannel Far-field Speech Recognition (2021)7.81
- Audio-visual Approach For Multimodal Concurrent Speaker Detection (2024)0.00
- Multi-input Multi-output Target-speaker Voice Activity Detection For Unified, Flexible, And Robust Audio-visual Speaker Diarization (2024)0.00
- Multi-microphone Automatic Speech Segmentation In Meetings Based On Circular Harmonics Features (2023)0.00