Improving Opus Low Bit Rate Quality With Neural Speech Synthesis
2019 Β· Jan Skoglund, Jean-Marc Valin
Abstract
The voice mode of the Opus audio coder can compress wideband speech at bit rates ranging from 6 kb/s to 40 kb/s. However, Opus is at its core a waveform matching coder, and as the rate drops below 10 kb/s, quality degrades quickly. As the rate reduces even further, parametric coders tend to perform better than waveform coders. In this paper we propose a backward-compatible way of improving low bit rate Opus quality by re-synthesizing speech from the decoded parameters. We compare two different neural generative models, WaveNet and LPCNet. WaveNet is a powerful, high-complexity, and high-latency architecture that is not feasible for a practical system, yet provides a best known achievable quality with generative models. LPCNet is a low-complexity, low-latency RNN-based generative model, and practically implementable on mobile phones. We apply these systems with parameters from Opus coded at 6 kb/s as conditioning features for the generative models. A listening test shows that for the sa
Authors
(none)
Tags
Stats
Related papers
- A Real-time Wideband Neural Vocoder At 1.6 Kb/s Using Lpcnet (2019)12.61
- Wavenet Based Low Rate Speech Coding (2017)0.00
- Lpcnet: Improving Neural Speech Synthesis Through Linear Prediction (2018)0.00
- Neural Speech Synthesis On A Shoestring: Improving The Efficiency Of Lpcnet (2022)5.84
- Low Bit-rate Speech Coding With VQ-VAE And A Wavenet Decoder (2019)14.80
- Speech Quality Factors For Traditional And Neural-based Low Bit Rate Vocoders (2020)7.16
- Postgan: A Gan-based Post-processor To Enhance The Quality Of Coded Speech (2022)9.76
- High Quality, Lightweight And Adaptable TTS Using Lpcnet (2019)10.97