Fully Learnable Front-end For Multi-channel Acoustic Modeling Using Semi-supervised Learning
2020 Β· Sanna Wager, Aparna Khare, Minhua Wu, et al.
Abstract
In this work, we investigated the teacher-student training paradigm to train a fully learnable multi-channel acoustic model for far-field automatic speech recognition (ASR). Using a large offline teacher model trained on beamformed audio, we trained a simpler multi-channel student acoustic model used in the speech recognition system. For the student, both multi-channel feature extraction layers and the higher classification layers were jointly trained using the logits from the teacher model. In our experiments, compared to a baseline model trained on about 600 hours of transcribed data, a relative word-error rate (WER) reduction of about 27.3% was achieved when using an additional 1800 hours of untranscribed data. We also investigated the benefit of pre-training the multi-channel front end to output the beamformed log-mel filter bank energies (LFBE) using L2 loss. We find that pre-training improves the word error rate by 10.7% when compared to a multi-channel model directly initialized
Authors
(none)
Tags
Stats
Related papers
- Developing Far-field Speaker System Via Teacher-student Learning (2018)10.85
- Frequency Domain Multi-channel Acoustic Modeling For Distant Speech Recognition (2019)9.92
- Student-teacher Learning For BLSTM Mask-based Speech Enhancement (2018)9.59
- Teach An All-rounder With Experts In Different Domains (2019)2.26
- A Unified Multichannel Far-field Speech Recognition System: Combining Neural Beamforming With Attention Based End-to-end Model (2024)0.00
- Exploiting Single-channel Speech For Multi-channel End-to-end Speech Recognition (2021)0.00
- Self-attention Channel Combinator Frontend For End-to-end Multichannel Far-field Speech Recognition (2021)7.81
- Large-scale Domain Adaptation Via Teacher-student Learning (2017)13.93