To Reverse The Gradient Or Not: An Empirical Comparison Of Adversarial And Multi-task Learning In Speech Recognition
2018 Β· Yossi Adi, Neil Zeghidour, Ronan Collobert, et al.
Abstract
Transcribed datasets typically contain speaker identity for each instance in the data. We investigate two ways to incorporate this information during training: Multi-Task Learning and Adversarial Learning. In multi-task learning, the goal is speaker prediction; we expect a performance improvement with this joint training if the two tasks of speech recognition and speaker recognition share a common set of underlying features. In contrast, adversarial learning is a means to learn representations invariant to the speaker. We then expect better performance if this learnt invariance helps generalizing to new speakers. While the two approaches seem natural in the context of speech recognition, they are incompatible because they correspond to opposite gradients back-propagated to the model. In order to better understand the effect of these approaches in terms of error rates, we compare both strategies in controlled settings. Moreover, we explore the use of additional untranscribed data in a s
Authors
(none)
Tags
Stats
Related papers
- Adversarial Training For Multi-domain Speaker Recognition (2020)6.77
- Recognition-synthesis Based Non-parallel Voice Conversion With Adversarial Learning (2020)0.00
- Enhancing And Adversarial: Improve ASR With Speaker Labels (2022)5.24
- TIMIT Speaker Profiling: A Comparison Of Multi-task Learning And Single-task Learning Approaches (2024)0.00
- Massively Multilingual Adversarial Speech Recognition (2019)11.93
- To Train Or Not To Train Adversarially: A Study Of Bias Mitigation Strategies For Speaker Recognition (2022)0.00
- Leveraging Speaker Embeddings With Adversarial Multi-task Learning For Age Group Classification (2023)0.00
- Multi-task Adversarial Training Algorithm For Multi-speaker Neural Text-to-speech (2022)0.00