Twin Regularization For Online Speech Recognition
2018 Β· Mirco Ravanelli, Dmitriy Serdyuk, Yoshua Bengio
Abstract
Online speech recognition is crucial for developing natural human-machine interfaces. This modality, however, is significantly more challenging than off-line ASR, since real-time/low-latency constraints inevitably hinder the use of future information, that is known to be very helpful to perform robust predictions. A popular solution to mitigate this issue consists of feeding neural acoustic models with context windows that gather some future frames. This introduces a latency which depends on the number of employed look-ahead features. This paper explores a different approach, based on estimating the future rather than waiting for it. Our technique encourages the hidden representations of a unidirectional recurrent network to embed some useful information about the future. Inspired by a recently proposed technique called Twin Networks, we add a regularization term that forces forward hidden states to be as close as possible to cotemporal backward ones, computed by a "twin" neural networ
Authors
(none)
Tags
Stats
Related papers
- Forward-backward Decoding For Regularizing End-to-end TTS (2019)6.77
- Gated Recurrent Unit Based Acoustic Modeling With Future Context (2018)7.16
- Unidirectional Memory-self-attention Transducer For Online Speech Recognition (2021)3.58
- Improved Speech Representations With Multi-target Autoregressive Predictive Coding (2020)10.97
- Batch-normalized Joint Training For Dnn-based Distant Speech Recognition (2017)8.82
- TSNAT: Two-step Non-autoregressvie Transformer Models For Speech Recognition (2021)10.61
- Improved Neural Language Model Fusion For Streaming Recurrent Neural Network Transducer (2020)8.82
- Future Word Contexts In Neural Network Language Models (2017)8.35