Stoi-net: A Deep Learning Based Non-intrusive Speech Intelligibility Assessment Model
2020 · Ryandhimas E. Zezario, Szu-Wei Fu, Chiou-Shann Fuh, et al.
Abstract
The calculation of most objective speech intelligibility assessment metrics requires clean speech as a reference. Such a requirement may limit the applicability of these metrics in real-world scenarios. To overcome this limitation, we propose a deep learning-based non-intrusive speech intelligibility assessment model, namely STOI-Net. The input and output of STOI-Net are speech spectral features and predicted STOI scores, respectively. The model is formed by the combination of a convolutional neural network and bidirectional long short-term memory (CNN-BLSTM) architecture with a multiplicative attention mechanism. Experimental results show that the STOI score estimated by STOI-Net has a good correlation with the actual STOI score when tested with noisy and enhanced speech utterances. The correlation values are 0.97 and 0.83, respectively, for the seen test condition (the test speakers and noise types are involved in the training set) and the unseen test condition (the test speakers and
Authors
(none)
Tags
Stats
Related papers
- Monaural Speech Enhancement Using Deep Neural Networks By Maximizing A Short-time Objective Intelligibility Measure (2018)11.76
- Metricnet: Towards Improved Modeling For Non-intrusive Speech Quality Assessment (2021)0.00
- Mosnet: Deep Learning Based Objective Assessment For Voice Conversion (2019)16.90
- On The Relationship Between Short-time Objective Intelligibility And Short-time Spectral-amplitude Mean-square Error For Speech Enhancement (2018)9.23
- Quality-net: An End-to-end Non-intrusive Speech Quality Assessment Model Based On BLSTM (2018)15.62
- Non-intrusive Speech Quality Assessment Using Neural Networks (2019)13.74
- Inqss: A Speech Intelligibility And Quality Assessment Model Using A Multi-task Learning Network (2021)9.76
- An Attention Long Short-term Memory Based System For Automatic Classification Of Speech Intelligibility (2024)12.33