Speech Representation Learning Combining Conformer CPC With Deep Cluster For The Zerospeech Challenge 2021
2021 Β· Takashi Maekaku, Xuankai Chang, Yuya Fujita, et al.
Abstract
We present a system for the Zero Resource Speech Challenge 2021, which combines a Contrastive Predictive Coding (CPC) with deep cluster. In deep cluster, we first prepare pseudo-labels obtained by clustering the outputs of a CPC network with k-means. Then, we train an additional autoregressive model to classify the previously obtained pseudo-labels in a supervised manner. Phoneme discriminative representation is achieved by executing the second-round clustering with the outputs of the final layer of the autoregressive model. We show that replacing a Transformer layer with a Conformer layer leads to a further gain in a lexical metric. Experimental results show that a relative improvement of 35% in a phonetic metric, 1.5% in the lexical metric, and 2.3% in a syntactic metric are achieved compared to a baseline method of CPC-small which is trained on LibriSpeech 460h data. We achieve top results in this challenge with the syntactic metric.
Authors
(none)
Tags
Stats
Related papers
- Information Retrieval For Zerospeech 2021: The Submission By University Of Wroclaw (2021)7.81
- Analyzing Speaker Information In Self-supervised Models To Improve Zero-resource Speech Processing (2021)9.23
- Guided Contrastive Self-supervised Pre-training For Automatic Speech Recognition (2022)0.00
- The Zero Resource Speech Benchmark 2021: Metrics And Baselines For Unsupervised Spoken Language Modeling (2020)0.00
- The Zero Resource Speech Challenge 2020: Discovering Discrete Subword And Word Units (2020)11.58
- Data Augmenting Contrastive Learning Of Speech Representations In The Time Domain (2020)12.81
- Self-supervised Language Learning From Raw Audio: Lessons From The Zero Resource Speech Challenge (2022)10.07
- Variable-rate Hierarchical CPC Leads To Acoustic Unit Discovery In Speech (2022)0.00