Attention Based End To End Speech Recognition For Voice Search In Hindi And English
2021 Β· Raviraj Joshi, Venkateshan Kannan
Abstract
We describe here our work with automatic speech recognition (ASR) in the context of voice search functionality on the Flipkart e-Commerce platform. Starting with the deep learning architecture of Listen-Attend-Spell (LAS), we build upon and expand the model design and attention mechanisms to incorporate innovative approaches including multi-objective training, multi-pass training, and external rescoring using language models and phoneme based losses. We report a relative WER improvement of 15.7% on top of state-of-the-art LAS models using these modifications. Overall, we report an improvement of 36.9% over the phoneme-CTC system. The paper also provides an overview of different components that can be tuned in a LAS-based system.
Authors
(none)
Tags
Stats
Related papers
- On Comparison Of Encoders For Attention Based End To End Speech Recognition In Standalone And Rescoring Mode (2022)2.26
- State-of-the-art Speech Recognition With Sequence-to-sequence Models (2017)21.01
- An Online Attention-based Model For Speech Recognition (2018)9.59
- Attention-based Sequence-to-sequence Model For Speech Recognition: Development Of State-of-the-art System On Librispeech And Its Application To Non-native English (2018)0.00
- Listen Attentively, And Spell Once: Whole Sentence Generation Via A Non-autoregressive Architecture For Low-latency Speech Recognition (2020)10.07
- End-to-end Speech To Intent Prediction To Improve E-commerce Customer Support Voicebot In Hindi And English (2022)0.00
- Attention-based End-to-end Speech Recognition On Voice Search (2017)0.00
- Audio-attention Discriminative Language Model For ASR Rescoring (2019)9.23