Robust Vocal Quality Feature Embeddings For Dysphonic Voice Detection
2022 Β· Jianwei Zhang, Julie Liss, Suren Jayasuriya, et al.
Abstract
Approximately 1.2% of the world's population has impaired voice production. As a result, automatic dysphonic voice detection has attracted considerable academic and clinical interest. However, existing methods for automated voice assessment often fail to generalize outside the training conditions or to other related applications. In this paper, we propose a deep learning framework for generating acoustic feature embeddings sensitive to vocal quality and robust across different corpora. A contrastive loss is combined with a classification loss to train our deep learning model jointly. Data warping methods are used on input voice samples to improve the robustness of our method. Empirical results demonstrate that our method not only achieves high in-corpus and cross-corpus classification accuracy but also generates good embeddings sensitive to voice quality and robust across different corpora. We also compare our results against three baseline methods on clean and three variations of dete
Authors
(none)
Tags
Stats
Related papers
- Towards Robust Voice Pathology Detection (2019)13.74
- Enhancing Low-quality Voice Recordings Using Disentangled Channel Factor And Neural Waveform Model (2020)0.00
- Deep Embeddings For Robust User-based Amateur Vocal Percussion Classification (2022)0.00
- Improving Robustness Of One-shot Voice Conversion With Deep Discriminative Speaker Encoder (2021)5.84
- Feature Enhancement With Deep Feature Losses For Speaker Verification (2019)10.61
- Learning To Detect Dysarthria From Raw Speech (2018)11.85
- Efficient Speech Quality Assessment Using Self-supervised Framewise Embeddings (2022)5.84
- A Comparative Re-assessment Of Feature Extractors For Deep Speaker Embeddings (2020)8.09