Pairwise Evaluation Of Accent Similarity In Speech Synthesis
2025 Β· Jinzuomu Zhong, Suyuan Liu, Dan Wells, et al.
Abstract
Despite growing interest in generating high-fidelity accents, evaluating accent similarity in speech synthesis has been underexplored. We aim to enhance both subjective and objective evaluation methods for accent similarity. Subjectively, we refine the XAB listening test by adding components that achieve higher statistical significance with fewer listeners and lower costs. Our method involves providing listeners with transcriptions, having them highlight perceived accent differences, and implementing meticulous screening for reliability. Objectively, we utilise pronunciation-related metrics, based on distances between vowel formants and phonetic posteriorgrams, to evaluate accent generation. Comparative experiments reveal that these metrics, alongside accent similarity, speaker similarity, and Mel Cepstral Distortion, can be used. Moreover, our findings underscore significant limitations of common metrics like Word Error Rate in assessing underrepresented accents.
Authors
(none)
Tags
Stats
Related papers
- Objective Evaluation Of Prosody And Intelligibility In Speech Synthesis Via Conditional Prediction Of Discrete Tokens (2025)0.00
- Synthetic Cross-accent Data Augmentation For Automatic Speech Recognition (2023)0.00
- Improving Accent Conversion With Reference Encoder And End-to-end Text-to-speech (2020)0.00
- Speechbertscore: Reference-aware Automatic Evaluation Of Speech Generation Leveraging NLP Evaluation Metrics (2024)10.74
- Location, Location: Enhancing The Evaluation Of Text-to-speech Synthesis Using The Rapid Prosody Transcription Paradigm (2021)3.58
- Multi-scale Accent Modeling And Disentangling For Multi-speaker Multi-accent Text-to-speech Synthesis (2024)2.26
- Disentangling Segmental And Prosodic Factors To Non-native Speech Comprehensibility (2024)0.00
- English Accent Accuracy Analysis In A State-of-the-art Automatic Speech Recognition System (2021)0.00