Zero Resource Speech Synthesis Using Transcripts Derived From Perceptual Acoustic Units
2020 Β· Karthik Pandia D S, Hema A Murthy
Abstract
Zerospeech synthesis is the task of building vocabulary independent speech synthesis systems, where transcriptions are not available for training data. It is, therefore, necessary to convert training data into a sequence of fundamental acoustic units that can be used for synthesis during the test. This paper attempts to discover, and model perceptual acoustic units consisting of steady-state, and transient regions in speech. The transients roughly correspond to CV, VC units, while the steady-state corresponds to sonorants and fricatives. The speech signal is first preprocessed by segmenting the same into CVC-like units using a short-term energy-like contour. These CVC segments are clustered using a connected components-based graph clustering technique. The clustered CVC segments are initialized such that the onset (CV) and decays (VC) correspond to transients, and the rhyme corresponds to steady-states. Following this initialization, the units are allowed to re-organise on the continuo
Authors
(none)
Tags
Stats
Related papers
- Exploration Of End-to-end Synthesisers Forzero Resource Speech Challenge 2020 (2020)4.52
- Transformer VQ-VAE For Unsupervised Unit Discovery And Speech Synthesis: Zerospeech 2020 Challenge (2020)9.41
- Unsupervised Acoustic Unit Discovery For Speech Synthesis Using Discrete Latent-variable Neural Networks (2019)9.59
- The Zero Resource Speech Challenge 2019: TTS Without T (2019)13.17
- The Zero Resource Speech Challenge 2020: Discovering Discrete Subword And Word Units (2020)11.58
- Vector-quantized Neural Networks For Acoustic Unit Discovery In The Zerospeech 2020 Challenge (2020)13.50
- Exploring TTS Without T Using Biologically/psychologically Motivated Neural Network Modules (zerospeech 2020) (2020)6.34
- Zero-shot Voice Conversion Via Self-supervised Prosody Representation Learning (2021)6.34