Cross-domain Speech Recognition With Unsupervised Character-level Distribution Matching
2021 Β· Wenxin Hou, Jindong Wang, Xu Tan, et al.
Abstract
End-to-end automatic speech recognition (ASR) can achieve promising performance with large-scale training data. However, it is known that domain mismatch between training and testing data often leads to a degradation of recognition accuracy. In this work, we focus on the unsupervised domain adaptation for ASR and propose CMatch, a Character-level distribution matching method to perform fine-grained adaptation between each character in two domains. First, to obtain labels for the features belonging to each character, we achieve frame-level label assignment using the Connectionist Temporal Classification (CTC) pseudo labels. Then, we match the character-level distributions using Maximum Mean Discrepancy. We train our algorithm using the self-training technique. Experiments on the Libri-Adapt dataset show that our proposed approach achieves 14.39% and 16.50% relative Word Error Rate (WER) reduction on both cross-device and cross-environment ASR. We also comprehensively analyze the differe
Authors
(none)
Tags
Stats
Related papers
- MADI: Inter-domain Matching And Intra-domain Discrimination For Cross-domain Speech Recognition (2023)7.50
- Unsupervised Domain Adaptation For Speech Recognition With Unsupervised Error Correction (2022)5.24
- Iterative Pseudo-forced Alignment By Acoustic CTC Loss For Self-supervised ASR Domain Adaptation (2022)0.00
- Analyzing The Robustness Of Unsupervised Speech Recognition (2021)7.81
- Boosting Cross-domain Speech Recognition With Self-supervision (2022)0.00
- Text-only Domain Adaptation For End-to-end Speech Recognition Through Down-sampling Acoustic Representation (2023)0.00
- Unsupervised Domain Adaptation For Speech Recognition Via Uncertainty Driven Self-training (2020)12.25
- Multiple-hypothesis Ctc-based Semi-supervised Adaptation Of End-to-end Speech Recognition (2021)5.84