Neural Predictive Coding Using Convolutional Neural Networks Towards Unsupervised Learning Of Speaker Characteristics
2018 Β· Arindam Jati, Panayiotis Georgiou
Abstract
Learning speaker-specific features is vital in many applications like speaker recognition, diarization and speech recognition. This paper provides a novel approach, we term Neural Predictive Coding (NPC), to learn speaker-specific characteristics in a completely unsupervised manner from large amounts of unlabeled training data that even contain many non-speech events and multi-speaker audio streams. The NPC framework exploits the proposed short-term active-speaker stationarity hypothesis which assumes two temporally-close short speech segments belong to the same speaker, and thus a common representation that can encode the commonalities of both the segments, should capture the vocal characteristics of that speaker. We train a convolutional deep siamese network to produce "speaker embeddings" by learning to separate `same' vs `different' speaker pairs which are generated from an unlabeled data of audio streams. Two sets of experiments are done in different scenarios to evaluate the stre
Authors
(none)
Tags
Stats
Related papers
- Non-autoregressive Predictive Coding For Learning Speech Representations From Local Dependencies (2020)12.47
- Unsupervised Speech Segmentation And Variable Rate Representation Learning Using Segmental Contrastive Predictive Coding (2021)9.92
- A Robust Frame-based Nonlinear Prediction System For Automatic Speech Coding (2016)0.00
- Speaker Verification Using Convolutional Neural Networks (2018)0.00
- Neural Feature Predictor And Discriminative Residual Coding For Low-bitrate Speech Coding (2022)6.77
- Generative Pre-training For Speech With Autoregressive Predictive Coding (2019)14.73
- Self-supervised Predictive Coding Models Encode Speaker And Phonetic Information In Orthogonal Subspaces (2023)7.16
- An Unsupervised Autoregressive Model For Speech Representation Learning (2019)17.26