Linguistic Search Optimization For Deep Learning Based LVCSR
2018 Β· Zhehuai Chen
Abstract
Recent advances in deep learning based large vocabulary con- tinuous speech recognition (LVCSR) invoke growing demands in large scale speech transcription. The inference process of a speech recognizer is to find a sequence of labels whose corresponding acoustic and language models best match the input feature [1]. The main computation includes two stages: acoustic model (AM) inference and linguistic search (weighted finite-state transducer, WFST). Large computational overheads of both stages hamper the wide application of LVCSR. Benefit from stronger classifiers, deep learning, and more powerful computing devices, we propose general ideas and some initial trials to solve these fundamental problems.
Authors
(none)
Tags
Stats
Related papers
- From Hype To Insight: Rethinking Large Language Model Integration In Visual Speech Recognition (2025)0.00
- Applying GPGPU To Recurrent Neural Network Language Model Based Fast Network Search In The Real-time LVCSR (2020)2.26
- Comparison Of Lattice-free And Lattice-based Sequence Discriminative Training Criteria For LVCSR (2019)5.84
- Advances In Very Deep Convolutional Neural Networks For LVCSR (2016)0.00
- Investigating Decoder-only Large Language Models For Speech-to-text Translation (2024)0.00
- State-of-the-art Speech Recognition With Sequence-to-sequence Models (2017)21.01
- Zero-resource Speech Translation And Recognition With Llms (2024)3.58
- On Modular Training Of Neural Acoustics-to-word Model For LVCSR (2018)10.07