Exploring Turkish Speech Recognition Via Hybrid Ctc/attention Architecture And Multi-feature Fusion Network
2023 Β· Zeyu Ren, Nurmement Yolwas, Huiru Wang, et al.
Abstract
In recent years, End-to-End speech recognition technology based on deep learning has developed rapidly. Due to the lack of Turkish speech data, the performance of Turkish speech recognition system is poor. Firstly, this paper studies a series of speech recognition tuning technologies. The results show that the performance of the model is the best when the data enhancement technology combining speed perturbation with noise addition is adopted and the beam search width is set to 16. Secondly, to maximize the use of effective feature information and improve the accuracy of feature extraction, this paper proposes a new feature extractor LSPC. LSPC and LiGRU network are combined to form a shared encoder structure, and model compression is realized. The results show that the performance of LSPC is better than MSPC and VGGnet when only using Fbank features, and the WER is improved by 1.01% and 2.53% respectively. Finally, based on the above two points, a new multi-feature fusion network is pr
Authors
(none)
Tags
Stats
Related papers
- An Improved Hybrid Ctc-attention Model For Speech Recognition (2018)0.00
- Exploring End-to-end Techniques For Low-resource Speech Recognition (2018)5.84
- Speech Enhancement Using Multi-stage Self-attentive Temporal Convolutional Networks (2021)14.15
- Advances In Joint Ctc-attention Based End-to-end Speech Recognition With A Deep CNN Encoder And RNN-LM (2017)16.49
- Ctc-segmentation Of Large Corpora For German End-to-end Speech Recognition (2020)12.93
- Joint Ctc-attention Based End-to-end Speech Recognition Using Multi-task Learning (2016)20.43
- Multi-encoder Multi-resolution Framework For End-to-end Speech Recognition (2018)0.00
- Gated Recurrent Fusion With Joint Training Framework For Robust End-to-end Speech Recognition (2020)14.55