Sequence-to-sequence Neural Diarization With Automatic Speaker Detection And Representation
2024 Β· Ming Cheng, Yuke Lin, Ming Li
Abstract
This paper proposes a novel Sequence-to-Sequence Neural Diarization (S2SND) framework to perform online and offline speaker diarization. It is developed from the sequence-to-sequence architecture of our previous target-speaker voice activity detection system and then evolves into a new diarization paradigm by addressing two critical problems. 1) Speaker Detection: The proposed approach can utilize partially given speaker embeddings to discover the unknown speaker and predict the target voice activities in the audio signal. It does not require a prior diarization system for speaker enrollment in advance. 2) Speaker Representation: The proposed approach can adopt the predicted voice activities as reference information to extract speaker embeddings from the audio signal simultaneously. The representation space of speaker embedding is jointly learned within the whole diarization network without using an extra speaker embedding model. During inference, the S2SND framework can process long a
Authors
(none)
Tags
Stats
Related papers
- Spatially-augmented Sequence-to-sequence Neural Diarization For Meetings (2025)0.00
- Neural Speaker Diarization Using Memory-aware Multi-speaker Embedding With Sequence-to-sequence Architecture (2023)3.87
- A Reinforcement Learning Framework For Online Speaker Diarization (2023)0.00
- Speaker Diarization Using Deep Recurrent Convolutional Neural Networks For Speaker Embeddings (2017)9.41
- Transcribe-to-diarize: Neural Speaker Diarization For Unlimited Number Of Speakers Using End-to-end Speaker-attributed ASR (2021)11.49
- Online End-to-end Neural Diarization With Speaker-tracing Buffer (2020)10.74
- Scdiar: A Streaming Diarization System Based On Speaker Change Detection And Speech Recognition (2025)2.26
- Speaker Diarization With LSTM (2017)17.48