NOMAD: Unsupervised Learning Of Perceptual Embeddings For Speech Enhancement And Non-matching Reference Audio Quality Assessment
2023 Β· Alessandro Ragano, Jan Skoglund, Andrew Hines
Abstract
This paper presents NOMAD (Non-Matching Audio Distance), a differentiable perceptual similarity metric that measures the distance of a degraded signal against non-matching references. The proposed method is based on learning deep feature embeddings via a triplet loss guided by the Neurogram Similarity Index Measure (NSIM) to capture degradation intensity. During inference, the similarity score between any two audio samples is computed through Euclidean distance of their embeddings. NOMAD is fully unsupervised and can be used in general perceptual audio tasks for audio analysis e.g. quality assessment and generative tasks such as speech enhancement and speech synthesis. The proposed method is evaluated with 3 tasks. Ranking degradation intensity, predicting speech quality, and as a loss function for speech enhancement. Results indicate NOMAD outperforms other non-matching reference approaches in both ranking degradation intensity and quality assessment, exhibiting competitive performanc
Authors
(none)
Tags
Stats
Related papers
- Metricnet: Towards Improved Modeling For Non-intrusive Speech Quality Assessment (2021)0.00
- Multi-cmgan+/+: Leveraging Multi-objective Speech Quality Metric Prediction For Speech Enhancement (2023)0.00
- Towards Evaluating Generative Audio: Insights From Neural Audio Codec Embedding Distances (2025)0.00
- Multi-metric Optimization Using Generative Adversarial Networks For Near-end Speech Intelligibility Enhancement (2021)8.60
- More For Less: Non-intrusive Speech Quality Assessment With Limited Annotations (2021)7.16
- Perceive And Predict: Self-supervised Speech Representation Based Loss Functions For Speech Enhancement (2023)7.16
- Non-intrusive Speech Quality Assessment Using Neural Networks (2019)13.74
- On The Behavior Of Intrusive And Non-intrusive Speech Enhancement Metrics In Predictive And Generative Settings (2023)0.00