Composition Of Deep And Spiking Neural Networks For Very Low Bit Rate Speech Coding
2016 Β· Milos Cernak, Alexandros Lazaridis, Afsaneh Asaei, et al.
Abstract
Most current very low bit rate (VLBR) speech coding systems use hidden Markov model (HMM) based speech recognition/synthesis techniques. This allows transmission of information (such as phonemes) segment by segment that decreases the bit rate. However, the encoder based on a phoneme speech recognition may create bursts of segmental errors. Segmental errors are further propagated to optional suprasegmental (such as syllable) information coding. Together with the errors of voicing detection in pitch parametrization, HMM-based speech coding creates speech discontinuities and unnatural speech sound artefacts. In this paper, we propose a novel VLBR speech coding framework based on neural networks (NNs) for end-to-end speech analysis and synthesis without HMMs. The speech coding framework relies on phonological (sub-phonetic) representation of speech, and it is designed as a composition of deep and spiking NNs: a bank of phonological analysers at the transmitter, and a phonological synthes
Authors
(none)
Tags
Stats
Related papers
- Neural Feature Predictor And Discriminative Residual Coding For Low-bitrate Speech Coding (2022)6.77
- CQNV: A Combination Of Coarsely Quantized Bitstream And Neural Vocoder For Low Rate Speech Coding (2023)6.34
- Low Bit-rate Speech Coding With VQ-VAE And A Wavenet Decoder (2019)14.80
- Latent-domain Predictive Neural Speech Coding (2022)12.15
- Speech Quality Factors For Traditional And Neural-based Low Bit Rate Vocoders (2020)7.16
- Deep Vocoder: Low Bit Rate Compression Of Speech With Deep Autoencoder (2019)5.24
- Optimizing Neural Speech Codec For Low-bitrate Compression Via Multi-scale Encoding (2024)0.00
- Wavenet Based Low Rate Speech Coding (2017)0.00