Scenario Aware Speech Recognition: Advancements For Apollo Fearless Steps & Chime-4 Corpora
2021 Β· Szu-Jui Chen, Wei Xia, John H. L. Hansen
Abstract
In this study, we propose to investigate triplet loss for the purpose of an alternative feature representation for ASR. We consider a general non-semantic speech representation, which is trained with a self-supervised criteria based on triplet loss called TRILL, for acoustic modeling to represent the acoustic characteristics of each audio. This strategy is then applied to the CHiME-4 corpus and CRSS-UTDallas Fearless Steps Corpus, with emphasis on the 100-hour challenge corpus which consists of 5 selected NASA Apollo-11 channels. An analysis of the extracted embeddings provides the foundation needed to characterize training utterances into distinct groups based on acoustic distinguishing properties. Moreover, we also demonstrate that triplet-loss based embedding performs better than i-Vector in acoustic modeling, confirming that the triplet loss is more effective than a speaker feature. With additional techniques such as pronunciation and silence probability modeling, plus multi-style
Authors
(none)
Tags
Stats
Related papers
- Learning Efficient Representations For Keyword Spotting With Triplet Loss (2021)11.76
- Learning Acoustic Word Embeddings With Phonetically Associated Triplet Network (2018)0.00
- Towards A Competitive End-to-end Speech Recognition For Chime-6 Dinner Party Transcription (2020)6.77
- Triplet Entropy Loss: Improving The Generalisation Of Short Speech Language Identification Systems (2020)0.00
- Towards Learning A Universal Non-semantic Representation Of Speech (2020)14.43
- Triplet Based Embedding Distance And Similarity Learning For Text-independent Speaker Verification (2019)5.24
- Triplet Loss Based Embeddings For Forensic Speaker Identification In Spanish (2021)2.26
- End-to-end Triplet Loss Based Emotion Embedding System For Speech Emotion Recognition (2020)10.35