Lattice-based Lightly-supervised Acoustic Model Training
2019 Β· Joachim Fainberg, OndΕej Klejch, Steve Renals, et al.
Abstract
In the broadcast domain there is an abundance of related text data and partial transcriptions, such as closed captions and subtitles. This text data can be used for lightly supervised training, in which text matching the audio is selected using an existing speech recognition model. Current approaches to light supervision typically filter the data based on matching error rates between the transcriptions and biased decoding hypotheses. In contrast, semi-supervised training does not require matching text data, instead generating a hypothesis using a background language model. State-of-the-art semi-supervised training uses lattice-based supervision with the lattice-free MMI (LF-MMI) objective function. We propose a technique to combine inaccurate transcriptions with the lattices generated for semi-supervised training, thus preserving uncertainty in the lattice where appropriate. We demonstrate that this combined approach reduces the expected error rates over the lattices, and reduces the w
Authors
(none)
Tags
Stats
Related papers
- Unsupervised Model-based Speaker Adaptation Of End-to-end Lattice-free MMI Model For Speech Recognition (2022)2.26
- A Comparison Of Lattice-free Discriminative Training Criteria For Purely Sequence-trained Neural Network Acoustic Models (2018)4.52
- MATS: An Audio Language Model Under Text-only Supervision (2025)0.00
- Consistent Training And Decoding For End-to-end Speech Recognition Using Lattice-free MMI (2021)8.35
- Improving Sequence-to-sequence Acoustic Modeling By Adding Text-supervision (2018)9.92
- On Lattice-free Boosted MMI Training Of HMM And Ctc-based Full-context ASR Models (2021)7.81
- Adapting Pretrained Transformer To Lattices For Spoken Language Understanding (2020)12.00
- Improving Audio Captioning Models With Fine-grained Audio Features, Text Embedding Supervision, And LLM Mix-up Augmentation (2023)8.82