Enhancing Low-quality Voice Recordings Using Disentangled Channel Factor And Neural Waveform Model
2020 Β· Haoyu Li, Yang Ai, Junichi Yamagishi
Abstract
High-quality speech corpora are essential foundations for most speech applications. However, such speech data are expensive and limited since they are collected in professional recording environments. In this work, we propose an encoder-decoder neural network to automatically enhance low-quality recordings to professional high-quality recordings. To address channel variability, we first filter out the channel characteristics from the original input audio using the encoder network with adversarial training. Next, we disentangle the channel factor from a reference audio. Conditioned on this factor, an auto-regressive decoder is then used to predict the target-environment Mel spectrogram. Finally, we apply a neural vocoder to synthesize the speech waveform. Experimental results show that the proposed system can generate a professional high-quality speech waveform when setting high-quality audio as the reference. It also improves speech enhancement performance compared with several state-o
Authors
(none)
Tags
Stats
Related papers
- A Neural Denoising Vocoder For Clean Waveform Generation From Noisy Mel-spectrogram Based On Amplitude And Phase Predictions (2024)0.00
- La-voce: Low-snr Audio-visual Speech Enhancement Using Neural Vocoders (2022)0.00
- Disentangled Feature Learning For Real-time Neural Speech Coding (2022)0.00
- Robust Vocal Quality Feature Embeddings For Dysphonic Voice Detection (2022)7.16
- Univnet: A Neural Vocoder With Multi-resolution Spectrogram Discriminators For High-fidelity Waveform Generation (2021)14.80
- Speech Denoising By Parametric Resynthesis (2019)7.16
- Hifi-gan: High-fidelity Denoising And Dereverberation Based On Speech Deep Features In Adversarial Networks (2020)0.00
- Enhancing Into The Codec: Noise Robust Speech Coding With Vector-quantized Autoencoders (2021)10.21