Exploring End-to-end Techniques For Low-resource Speech Recognition
2018 Β· Vladimir Bataev, Maxim Korenevsky, Ivan Medennikov, et al.
Abstract
In this work we present simple grapheme-based system for low-resource speech recognition using Babel data for Turkish spontaneous speech (80 hours). We have investigated different neural network architectures performance, including fully-convolutional, recurrent and ResNet with GRU. Different features and normalization techniques are compared as well. We also proposed CTC-loss modification using segmentation during training, which leads to improvement while decoding with small beam size. Our best model achieved word error rate of 45.8%, which is the best reported result for end-to-end systems using in-domain data for this task, according to our knowledge.
Authors
(none)
Tags
Stats
Related papers
- Sequence-based Multi-lingual Low Resource Speech Recognition (2018)12.40
- Exploring Turkish Speech Recognition Via Hybrid Ctc/attention Architecture And Multi-feature Fusion Network (2023)0.00
- Domain Robust Feature Extraction For Rapid Low Resource ASR Development (2018)7.50
- Spoken Term Detection Methods For Sparse Transcription In Very Low-resource Settings (2021)0.00
- Adaptive Activation Network For Low Resource Multilingual Speech Recognition (2022)0.00
- Strategies For Improving Low Resource Speech To Text Translation Relying On Pre-trained ASR Models (2023)5.24
- Ctc-segmentation Of Large Corpora For German End-to-end Speech Recognition (2020)12.93
- Allost: Low-resource Speech Translation Without Source Transcription (2021)7.81