Target Speaker Voice Activity Detection With Transformers And Its Integration With End-to-end Neural Diarization
2022 Β· Dongmei Wang, Xiong Xiao, Naoyuki Kanda, et al.
Abstract
This paper describes a speaker diarization model based on target speaker voice activity detection (TS-VAD) using transformers. To overcome the original TS-VAD model's drawback of being unable to handle an arbitrary number of speakers, we investigate model architectures that use input tensors with variable-length time and speaker dimensions. Transformer layers are applied to the speaker axis to make the model output insensitive to the order of the speaker profiles provided to the TS-VAD model. Time-wise sequential layers are interspersed between these speaker-wise transformer layers to allow the temporal and cross-speaker correlations of the input speech signal to be captured. We also extend a diarization model based on end-to-end neural diarization with encoder-decoder based attractors (EEND-EDA) by replacing its dot-product-based speaker detection layer with the transformer-based TS-VAD. Experimental results on VoxConverse show that using the transformers for the cross-speaker modelin
Authors
(none)
Tags
Stats
Related papers
- Transformer Attractors For Robust And Efficient End-to-end Neural Diarization (2023)6.77
- Improving Transformer-based End-to-end Speaker Diarization By Assigning Auxiliary Losses To Attention Heads (2023)7.16
- Target-speaker Voice Activity Detection With Improved I-vector Estimation For Unknown Number Of Speaker (2021)10.97
- Target-speaker Voice Activity Detection Via Sequence-to-sequence Prediction (2022)11.19
- Auxiliary Loss Of Transformer With Residual Connection For End-to-end Speaker Diarization (2021)8.60
- Transcribe-to-diarize: Neural Speaker Diarization For Unlimited Number Of Speakers Using End-to-end Speaker-attributed ASR (2021)11.49
- Profile-error-tolerant Target-speaker Voice Activity Detection (2023)6.77
- Cross-channel Attention-based Target Speaker Voice Activity Detection: Experimental Results For M2met Challenge (2022)10.07