Redapt: An Adaptor For Wav2vec 2 Encoding \\ Faster And Smaller Speech Translation Without Quality Compromise
2022 Β· Jinming Zhao, Hao Yang, Gholamreza Haffari, et al.
Abstract
Pre-trained speech Transformers in speech translation (ST) have facilitated state-of-the-art (SotA) results; yet, using such encoders is computationally expensive. To improve this, we present a novel Reducer Adaptor block, RedApt, that could be seamlessly integrated within any Transformer-based speech encoding architecture. Integrating the pretrained wav2vec 2 speech encoder with RedAptbrings 41% speedup, 33% memory reduction with 24% fewer FLOPs at inference. To our positive surprise, our ST model with RedApt outperforms the SotA architecture by an average of 0.68 BLEU score on 8 language pairs from Must-C.
Authors
(none)
Tags
Stats
Related papers
- Speechformer: Reducing Information Loss In Direct Speech Translation (2021)7.16
- Adatrans: Adapting With Boundary-based Shrinking For End-to-end Speech Translation (2022)0.00
- Efficient Adapter Transfer Of Self-supervised Speech Models For Automatic Speech Recognition (2022)12.68
- Daspeech: Directed Acyclic Transformer For Fast And High-quality Speech-to-speech Translation (2023)5.24
- Conv-transformer Transducer: Low Latency, Low Frame Rate, Streamable End-to-end Speech Recognition (2020)11.08
- Efficient Adapter Tuning Of Pre-trained Speech Models For Automatic Speaker Verification (2024)0.00
- Self-supervised Rewiring Of Pre-trained Speech Encoders: Towards Faster Fine-tuning With Less Labels In Speech Processing (2022)3.58
- An Adapter Based Pre-training For Efficient And Scalable Self-supervised Speech Representation Learning (2021)8.35