Audio Inputs For Active Speaker Detection And Localization Via Microphone Array
2023 Β· Davide Berghi, Philip J. B. Jackson
Abstract
This study considers the problem of detecting and locating an active talker's horizontal position from multichannel audio captured by a microphone array. We refer to this as active speaker detection and localization (ASDL). Our goal was to investigate the performance of spatial acoustic features extracted from the multichannel audio as the input of a convolutional recurrent neural network (CRNN), in relation to the number of channels employed and additive noise. To this end, experiments were conducted to compare the generalized cross-correlation with phase transform (GCC-PHAT), the spatial cue-augmented log-spectrogram (SALSA) features, and a recently-proposed beamforming method, evaluating their robustness to various noise intensities. The array aperture and sampling density were tested by taking subsets from the 16-microphone array. Results and tests of statistical significance demonstrate the microphones' contribution to performance on the TragicTalkers dataset, which offers opportu
Authors
(none)
Tags
Stats
Related papers
- Leveraging Visual Supervision For Array-based Active Speaker Detection And Localization (2023)6.77
- Deep Learning Based Stage-wise Two-dimensional Speaker Localization With Large Ad-hoc Microphone Arrays (2022)3.58
- Multi-geometry Spatial Acoustic Modeling For Distant Speech Recognition (2019)6.34
- Is Someone Speaking? Exploring Long-term Temporal Features For Audio-visual Active Speaker Detection (2021)21.12
- How To Design A Three-stage Architecture For Audio-visual Active Speaker Detection In The Wild (2021)12.10
- Neural Directed Speech Enhancement With Dual Microphone Array In High Noise Scenario (2024)0.00
- Frequency Domain Multi-channel Acoustic Modeling For Distant Speech Recognition (2019)9.92
- Saladnet: Self-attentive Multisource Localization In The Ambisonics Domain (2021)7.50