End-to-end Multi-speaker Speech Recognition Using Speaker Embeddings And Transfer Learning
2019 Β· Pavel Denisov, Ngoc Thang Vu
Abstract
This paper presents our latest investigation on end-to-end automatic speech recognition (ASR) for overlapped speech. We propose to train an end-to-end system conditioned on speaker embeddings and further improved by transfer learning from clean speech. This proposed framework does not require any parallel non-overlapped speech materials and is independent of the number of speakers. Our experimental results on overlapped speech datasets show that joint conditioning on speaker embeddings and transfer learning significantly improves the ASR performance.
Authors
(none)
Tags
Stats
Related papers
- Joint Speaker Counting, Speech Recognition, And Speaker Identification For Overlapped Speech Of Any Number Of Speakers (2020)12.54
- Streaming Multi-speaker ASR With RNN-T (2020)10.07
- E2e-based Multi-task Learning Approach To Joint Speech And Accent Recognition (2021)0.00
- A Purely End-to-end System For Multi-speaker Speech Recognition (2018)12.25
- End-to-end Monaural Multi-speaker ASR System Without Pretraining (2018)11.93
- Overlapped Speech Recognition From A Jointly Learned Multi-channel Neural Speech Extraction And Representation (2019)0.00
- Unified Autoregressive Modeling For Joint End-to-end Multi-talker Overlapped Speech Recognition And Speaker Attribute Estimation (2021)6.34
- Investigation Of End-to-end Speaker-attributed ASR For Continuous Multi-talker Recordings (2020)10.35