A Low Latency Attention Module For Streaming Self-supervised Speech Representation Learning
2023 Β· Jianbo Ma, Siqi Pan, Deepak Chandran, et al.
Abstract
The transformer is a fundamental building block in deep learning, and the attention mechanism is the transformer's core component. Self-supervised speech representation learning (SSRL) represents a popular use-case for the transformer architecture. Due to transformers' acausal behavior, the use of transformers for SSRL has been predominantly focused on acausal applications. However, several media processing problems, such as speech processing, require real-time solutions. In this paper, we present an implementation of the attention module that enables training of SSRL architectures with low compute and memory requirements, while allowing real-time inference with low and fixed latency. The attention module proposed in this paper includes two components, streaming attention (SA) and low-latency streaming attention (LLSA). The SA represents our proposal for an efficient streaming SSRL implementation, while the LLSA solves the latency build-up problem of other streaming attention architect
Authors
(none)
Tags
Stats
Related papers
- Streaming Transformer-based Acoustic Models Using Self-attention With Augmented Memory (2020)0.00
- Conv-transformer Transducer: Low Latency, Low Frame Rate, Streamable End-to-end Speech Recognition (2020)11.08
- Simplified Self-attention For Transformer-based End-to-end Speech Recognition (2020)10.61
- Improving Streaming Transformer Based ASR Under A Framework Of Self-supervised Learning (2021)8.09
- Blockwise Streaming Transformer For Spoken Language Understanding And Simultaneous Speech Translation (2022)4.52
- Input-independent Attention Weights Are Expressive Enough: A Study Of Attention In Self-supervised Audio Transformers (2020)0.00
- Axlstms: Learning Self-supervised Audio Representations With Xlstms (2024)2.26
- Transformer-transducer: End-to-end Speech Recognition With Self-attention (2019)0.00