Blind Phoneme Segmentation With Temporal Prediction Errors
2016 · Paul Michel, Okko Räsänen, Roland Thiollière, et al.
Abstract
Phonemic segmentation of speech is a critical step of speech recognition systems. We propose a novel unsupervised algorithm based on sequence prediction models such as Markov chains and recurrent neural network. Our approach consists in analyzing the error profile of a model trained to predict speech features frame-by-frame. Specifically, we try to learn the dynamics of speech in the MFCC space and hypothesize boundaries from local maxima in the prediction error. We evaluate our system on the TIMIT dataset, with improvements over similar methods.
Authors
(none)
Tags
Stats
Related papers
- Unsupervised Speech Recognition Via Segmental Empirical Output Distribution Matching (2018)0.00
- Sequence Prediction With Neural Segmental Models (2017)0.00
- Sequence Segmentation Using Joint RNN And Structured Prediction Models (2016)7.81
- Self-supervised Contrastive Learning For Unsupervised Phoneme Segmentation (2020)12.68
- Improving Speech Recognition Error Prediction For Modern And Off-the-shelf Speech Recognizers (2024)5.24
- Segmental Recurrent Neural Networks For End-to-end Speech Recognition (2016)0.00
- Segmental Contrastive Predictive Coding For Unsupervised Word Segmentation (2021)0.00
- Unsupervised Speech Segmentation And Variable Rate Representation Learning Using Segmental Contrastive Predictive Coding (2021)9.92