Predicting Within And Across Language Phoneme Recognition Performance Of Self-supervised Learning Speech Pre-trained Models
2022 Β· Hang Ji, Tanvina Patel, Odette Scharenborg
Abstract
In this work, we analyzed and compared speech representations extracted from different frozen self-supervised learning (SSL) speech pre-trained models on their ability to capture articulatory features (AF) information and their subsequent prediction of phone recognition performance for within and across language scenarios. Specifically, we compared CPC, wav2vec 2.0, and HuBert. First, frame-level AF probing tasks were implemented. Subsequently, phone-level end-to-end ASR systems for phoneme recognition tasks were implemented, and the performance on the frame-level AF probing task and the phone accuracy were correlated. Compared to the conventional speech representation MFCC, all SSL pre-trained speech representations captured more AF information, and achieved better phoneme recognition performance within and across languages, with HuBert performing best. The frame-level AF probing task is a good predictor of phoneme recognition performance, showing the importance of capturing AF inform
Authors
(none)
Tags
Stats
Related papers
- Evidence Of Vocal Tract Articulation In Self-supervised Learning Of Speech (2022)9.41
- Automatic Pronunciation Assessment Using Self-supervised Speech Representation Learning (2022)0.00
- Investigation Of Ensemble Features Of Self-supervised Pretrained Models For Automatic Speech Recognition (2022)9.41
- Speech Representation Analysis Based On Inter- And Intra-model Similarities (2024)2.26
- Analyzing The Factors Affecting Usefulness Of Self-supervised Pre-trained Representations For Speech Recognition (2022)0.00
- Pushing The Limits Of Unsupervised Unit Discovery For SSL Speech Representation (2023)6.34
- An Empirical Analysis Of Speech Self-supervised Learning At Multiple Resolutions (2024)0.00
- Multi-resolution Hubert: Multi-resolution Speech Self-supervised Learning With Masked Unit Prediction (2023)0.00