Exploring TTS Without T Using Biologically/psychologically Motivated Neural Network Modules (zerospeech 2020)
2020 Β· Takashi Morita, Hiroki Koda
Abstract
In this study, we reported our exploration of Text-To-Speech without Text (TTS without T) in the Zero Resource Speech Challenge 2020, in which participants proposed an end-to-end, unsupervised system that learned speech recognition and TTS together. We addressed the challenge using biologically/psychologically motivated modules of Artificial Neural Networks (ANN), with a particular interest in unsupervised learning of human language as a biological/psychological problem. The system first processes Mel Frequency Cepstral Coefficient (MFCC) frames with an Echo-State Network (ESN), and simulates computations in cortical microcircuits. The outcome is discretized by our original Variational Autoencoder (VAE) that implements the Dirichlet-based Bayesian clustering widely accepted in computational linguistics and cognitive science. The discretized signal is then reverted into sound waveform via a neural-network implementation of the source-filter model for speech production.
Authors
(none)
Tags
Stats
Related papers
- The Zero Resource Speech Challenge 2019: TTS Without T (2019)13.17
- Exploration Of End-to-end Synthesisers Forzero Resource Speech Challenge 2020 (2020)4.52
- Transformer VQ-VAE For Unsupervised Unit Discovery And Speech Synthesis: Zerospeech 2020 Challenge (2020)9.41
- Unsupervised Neural And Bayesian Models For Zero-resource Speech Processing (2017)0.00
- ZMM-TTS: Zero-shot Multilingual And Multispeaker Speech Synthesis Conditioned On Self-supervised Discrete Speech Representations (2023)10.35
- Unsupervised Acoustic Unit Discovery For Speech Synthesis Using Discrete Latent-variable Neural Networks (2019)9.59
- Self-supervised Language Learning From Raw Audio: Lessons From The Zero Resource Speech Challenge (2022)10.07
- Learning To Speak From Text: Zero-shot Multilingual Text-to-speech With Unsupervised Text Pretraining (2023)8.82