Torchaudio-squim: Reference-less Speech Quality And Intelligibility Measures In Torchaudio
2023 Β· Anurag Kumar, Ke Tan, Zhaoheng Ni, et al.
Abstract
Measuring quality and intelligibility of a speech signal is usually a critical step in development of speech processing systems. To enable this, a variety of metrics to measure quality and intelligibility under different assumptions have been developed. Through this paper, we introduce tools and a set of models to estimate such known metrics using deep neural networks. These models are made available in the well-established TorchAudio library, the core audio and speech processing library within the PyTorch deep learning framework. We refer to it as TorchAudio-Squim, TorchAudio-Speech QUality and Intelligibility Measures. More specifically, in the current version of TorchAudio-squim, we establish and release models for estimating PESQ, STOI and SI-SDR among objective metrics and MOS among subjective metrics. We develop a novel approach for objective metric estimation and use a recently developed approach for subjective metric estimation. These models operate in a ``reference-less" manne
Authors
(none)
Tags
Stats
Related papers
- Non-intrusive Speech Quality Assessment Using Neural Networks (2019)13.74
- Squid: Measuring Speech Naturalness In Many Languages (2022)9.41
- Metricnet: Towards Improved Modeling For Non-intrusive Speech Quality Assessment (2021)0.00
- Torchaudio 2.1: Advancing Speech Recognition, Self-supervised Learning, And Audio Processing Components For Pytorch (2023)10.48
- Visqol V3: An Open Source Production Ready Objective Speech And Audio Metric (2020)15.83
- Inqss: A Speech Intelligibility And Quality Assessment Model Using A Multi-task Learning Network (2021)9.76
- Stoi-net: A Deep Learning Based Non-intrusive Speech Intelligibility Assessment Model (2020)0.00
- More For Less: Non-intrusive Speech Quality Assessment With Limited Annotations (2021)7.16