Input-independent Attention Weights Are Expressive Enough: A Study Of Attention In Self-supervised Audio Transformers
2020 Β· Tsung-Han Wu, Chun-Chen Hsieh, Yen-Hao Chen, et al.
Abstract
In this paper, we seek solutions for reducing the computation complexity of transformer-based models for speech representation learning. We evaluate 10 attention algorithms; then, we pre-train the transformer-based model with those attention algorithms in a self-supervised fashion and treat them as feature extractors on downstream tasks, including phoneme classification and speaker classification. With the assistance of t-SNE, PCA and some observation, the attention weights in self-supervised audio transformers can be categorized into four general cases. Based on these cases and some analyses, we are able to use a specific set of attention weights to initialize the model. Our approach shows comparable performance to the typical self-attention yet requires 20% less time in both training and inference.
Authors
(none)
Tags
Stats
Related papers
- When To Use Efficient Self Attention? Profiling Text, Speech And Image Transformer Variants (2023)0.95
- An Attention-based Backend Allowing Efficient Fine-tuning Of Transformer Models For Speaker Verification (2022)11.08
- Simplified Self-attention For Transformer-based End-to-end Speech Recognition (2020)10.61
- SSAST: Self-supervised Audio Spectrogram Transformer (2021)17.61
- Is Smaller Always Faster? Tradeoffs In Compressing Self-supervised Speech Transformers (2022)0.00
- Self-supervised Rewiring Of Pre-trained Speech Encoders: Towards Faster Fine-tuning With Less Labels In Speech Processing (2022)3.58
- Exploring Self-attention Mechanisms For Speech Separation (2022)12.54
- A Low Latency Attention Module For Streaming Self-supervised Speech Representation Learning (2023)0.00