Speechbertscore: Reference-aware Automatic Evaluation Of Speech Generation Leveraging NLP Evaluation Metrics
2024 Β· Takaaki Saeki, Soumi Maiti, Shinnosuke Takamichi, et al.
Abstract
While subjective assessments have been the gold standard for evaluating speech generation, there is a growing need for objective metrics that are highly correlated with human subjective judgments due to their cost efficiency. This paper proposes reference-aware automatic evaluation methods for speech generation inspired by evaluation metrics in natural language processing. The proposed SpeechBERTScore computes the BERTScore for self-supervised dense speech features of the generated and reference speech, which can have different sequential lengths. We also propose SpeechBLEU and SpeechTokenDistance, which are computed on speech discrete tokens. The evaluations on synthesized speech show that our method correlates better with human subjective ratings than mel cepstral distortion and a recent mean opinion score prediction model. Also, they are effective in noisy speech evaluation and have cross-lingual applicability.
Authors
(none)
Tags
Stats
Related papers
- Objective Evaluation Of Prosody And Intelligibility In Speech Synthesis Via Conditional Prediction Of Discrete Tokens (2025)0.00
- Pairwise Evaluation Of Accent Similarity In Speech Synthesis (2025)3.58
- A Textless Metric For Speech-to-speech Comparison (2022)0.00
- A Reference-less Quality Metric For Automatic Speech Recognition Via Contrastive-learning Of A Multi-language Model With Self-supervision (2023)2.51
- Evaluating User Perception Of Speech Recognition System Quality With Semantic Distance Metric (2021)6.77
- H_eval: A New Hybrid Evaluation Metric For Automatic Speech Recognition Tasks (2022)6.34
- On The Behavior Of Intrusive And Non-intrusive Speech Enhancement Metrics In Predictive And Generative Settings (2023)0.00
- Evaluating Subtitle Segmentation For End-to-end Generation Systems (2022)0.00