Waveform Modeling And Generation Using Hierarchical Recurrent Neural Networks For Speech Bandwidth Extension
2018 Β· Zhen-Hua Ling, Yang Ai, Yu Gu, et al.
Abstract
This paper presents a waveform modeling and generation method using hierarchical recurrent neural networks (HRNN) for speech bandwidth extension (BWE). Different from conventional BWE methods which predict spectral parameters for reconstructing wideband speech waveforms, this BWE method models and predicts waveform samples directly without using vocoders. Inspired by SampleRNN which is an unconditional neural audio generator, the HRNN model represents the distribution of each wideband or high-frequency waveform sample conditioned on the input narrowband waveform samples using a neural network composed of long short-term memory (LSTM) layers and feed-forward (FF) layers. The LSTM layers form a hierarchical structure and each layer operates at a specific temporal resolution to efficiently capture long-span dependencies between temporal sequences. Furthermore, additional conditions, such as the bottleneck (BN) features derived from narrowband speech using a deep neural network (DNN)-based
Authors
(none)
Tags
Stats
Related papers
- Dsp-informed Bandwidth Extension Using Locally-conditioned Excitation And Linear Time-varying Filter Subnetworks (2024)2.26
- Towards High-quality And Efficient Speech Bandwidth Extension With Parallel Amplitude And Phase Prediction (2024)0.00
- Multi-stage Speech Bandwidth Extension With Flexible Sampling Rate Control (2024)6.34
- High-fidelity And Low-latency Universal Neural Vocoder Based On Multiband Wavernn With Data-driven Linear Prediction For Discrete Waveform Modeling (2021)6.77
- Speech Bandwidth Expansion Via High Fidelity Generative Adversarial Networks (2024)0.00
- A Neural Vocoder With Hierarchical Generation Of Amplitude And Phase Spectra For Statistical Parametric Speech Synthesis (2019)10.74
- Bae-net: A Low Complexity And High Fidelity Bandwidth-adaptive Neural Network For Speech Super-resolution (2023)6.77
- Real-time Speech Frequency Bandwidth Extension (2020)12.54