Enhancing And Adversarial: Improve ASR With Speaker Labels
2022 Β· Wei Zhou, Haotian Wu, Jingjing Xu, et al.
Abstract
ASR can be improved by multi-task learning (MTL) with domain enhancing or domain adversarial training, which are two opposite objectives with the aim to increase/decrease domain variance towards domain-aware/agnostic ASR, respectively. In this work, we study how to best apply these two opposite objectives with speaker labels to improve conformer-based ASR. We also propose a novel adaptive gradient reversal layer for stable and effective adversarial training without tuning effort. Detailed analysis and experimental verification are conducted to show the optimal positions in the ASR neural network (NN) to apply speaker enhancing and adversarial training. We also explore their combination for further improvement, achieving the same performance as i-vectors plus adversarial training. Our best speaker-based MTL achieves 7% relative improvement on the Switchboard Hub5'00 set. We also investigate the effect of such speaker-based MTL w.r.t. cleaner dataset and weaker ASR NN.
Authors
(none)
Tags
Stats
Related papers
- Unpaired Speech Enhancement By Acoustic And Adversarial Supervision For Speech Recognition (2018)10.21
- Adversarial Training For Multi-domain Speaker Recognition (2020)6.77
- Adapting End-to-end Neural Speaker Verification To New Languages And Recording Conditions With Adversarial Training (2018)9.59
- Adversarial Feature-mapping For Speech Enhancement (2018)10.48
- Superm2m: Supervised And Mixture-to-mixture Co-learning For Speech Enhancement And Noise-robust ASR (2024)5.24
- Speaker Verification Using End-to-end Adversarial Language Adaptation (2018)11.19
- Auxiliary Interference Speaker Loss For Target-speaker Speech Recognition (2019)9.76
- To Reverse The Gradient Or Not: An Empirical Comparison Of Adversarial And Multi-task Learning In Speech Recognition (2018)9.59