Unified Autoregressive Modeling For Joint End-to-end Multi-talker Overlapped Speech Recognition And Speaker Attribute Estimation
2021 Β· Ryo Masumura, Daiki Okamura, Naoki Makishima, et al.
Abstract
In this paper, we present a novel modeling method for single-channel multi-talker overlapped automatic speech recognition (ASR) systems. Fully neural network based end-to-end models have dramatically improved the performance of multi-taker overlapped ASR tasks. One promising approach for end-to-end modeling is autoregressive modeling with serialized output training in which transcriptions of multiple speakers are recursively generated one after another. This enables us to naturally capture relationships between speakers. However, the conventional modeling method cannot explicitly take into account the speaker attributes of individual utterances such as gender and age information. In fact, the performance deteriorates when each speaker is the same gender or is close in age. To address this problem, we propose unified autoregressive modeling for joint end-to-end multi-talker overlapped ASR and speaker attribute estimation. Our key idea is to handle gender and age estimation tasks within
Authors
(none)
Tags
Stats
Related papers
- Joint Speaker Counting, Speech Recognition, And Speaker Identification For Overlapped Speech Of Any Number Of Speakers (2020)12.54
- Investigation Of End-to-end Speaker-attributed ASR For Continuous Multi-talker Recordings (2020)10.35
- End-to-end Multi-speaker Speech Recognition Using Speaker Embeddings And Transfer Learning (2019)9.41
- Progressive Joint Modeling In Unsupervised Single-channel Overlapped Speech Recognition (2017)11.67
- Unified Modeling Of Multi-talker Overlapped Speech Recognition And Diarization With A Sidecar Separator (2023)7.50
- E2e-based Multi-task Learning Approach To Joint Speech And Accent Recognition (2021)0.00
- Streaming Multi-speaker ASR With RNN-T (2020)10.07
- META-CAT: Speaker-informed Speech Embeddings Via Meta Information Concatenation For Multi-talker ASR (2024)3.58