Efficient Ensemble For Multimodal Punctuation Restoration Using Time-delay Neural Network
2023 Β· Xing Yi Liu, Homayoon Beigi
Abstract
Punctuation restoration plays an essential role in the post-processing procedure of automatic speech recognition, but model efficiency is a key requirement for this task. To that end, we present EfficientPunct, an ensemble method with a multimodal time-delay neural network that outperforms the current best model by 1.0 F1 points, using less than a tenth of its inference network parameters. We streamline a speech recognizer to efficiently output hidden layer acoustic embeddings for punctuation restoration, as well as BERT to extract meaningful text embeddings. By using forced alignment and temporal convolutions, we eliminate the need for attention-based fusion, greatly increasing computational efficiency and raising performance. EfficientPunct sets a new state of the art with an ensemble that weights BERT's purely language-based predictions slightly more than the multimodal network's predictions. Our code is available at https://github.com/lxy-peter/EfficientPunct.
Authors
(none)
Tags
Stats
Code
Related papers
- Unified Multimodal Punctuation Restoration Framework For Mixed-modality Corpus (2022)7.16
- Improving Punctuation Restoration For Speech Transcripts Via External Data (2021)4.52
- Improved Training For End-to-end Streaming Automatic Speech Recognition Model With Punctuation (2023)0.00
- Multimodal Semi-supervised Learning Framework For Punctuation Prediction In Conversational Speech (2020)9.59
- Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional Context For Continuous Speech Recognition (2023)4.52
- End To End ASR System With Automatic Punctuation Insertion (2020)0.00
- Longer Is (not Necessarily) Stronger: Punctuated Long-sequence Training For Enhanced Speech Recognition And Translation (2024)4.52
- Replacing Human Audio With Synthetic Audio For On-device Unspoken Punctuation Prediction (2020)0.00