Real-time Speech Frequency Bandwidth Extension
2020 Β· Yunpeng Li, Marco Tagliasacchi, Oleg Rybakov, et al.
Abstract
In this paper we propose a lightweight model for frequency bandwidth extension of speech signals, increasing the sampling frequency from 8kHz to 16kHz while restoring the high frequency content to a level almost indistinguishable from the 16kHz ground truth. The model architecture is based on SEANet (Sound EnhAncement Network), a wave-to-wave fully convolutional model, which uses a combination of feature losses and adversarial losses to reconstruct an enhanced version of the input speech. In addition, we propose a variant of SEANet that can be deployed on-device in streaming mode, achieving an architectural latency of 16ms. When profiled on a single core of a mobile CPU, processing one 16ms frame takes only 1.5ms. The low latency makes it viable for bi-directional voice communication systems.
Authors
(none)
Tags
Stats
Related papers
- Dsp-informed Bandwidth Extension Using Locally-conditioned Excitation And Linear Time-varying Filter Subnetworks (2024)2.26
- Lisennet: Lightweight Sub-band And Dual-path Modeling For Real-time Speech Enhancement (2024)9.03
- Bae-net: A Low Complexity And High Fidelity Bandwidth-adaptive Neural Network For Speech Super-resolution (2023)6.77
- Speech Bandwidth Expansion Via High Fidelity Generative Adversarial Networks (2024)0.00
- Multi-stage Speech Bandwidth Extension With Flexible Sampling Rate Control (2024)6.34
- FB-MSTCN: A Full-band Single-channel Speech Enhancement Method Based On Multi-scale Temporal Convolutional Network (2022)6.77
- Waveform Modeling And Generation Using Hierarchical Recurrent Neural Networks For Speech Bandwidth Extension (2018)12.99
- High Fidelity Speech Enhancement With Band-split RNN (2022)0.00