SOMOS: The Samsung Open MOS Dataset For The Evaluation Of Neural Text-to-speech Synthesis
2022 Β· Georgia Maniati, Alexandra Vioni, Nikolaos Ellinas, et al.
Abstract
In this work, we present the SOMOS dataset, the first large-scale mean opinion scores (MOS) dataset consisting of solely neural text-to-speech (TTS) samples. It can be employed to train automatic MOS prediction systems focused on the assessment of modern synthesizers, and can stimulate advancements in acoustic model evaluation. It consists of 20K synthetic utterances of the LJ Speech voice, a public domain speech dataset which is a common benchmark for building neural acoustic models and vocoders. Utterances are generated from 200 TTS systems including vanilla neural acoustic models as well as models which allow prosodic variations. An LPCNet vocoder is used for all systems, so that the samples' variation depends only on the acoustic models. The synthesized utterances provide balanced and adequate domain and length coverage. We collect MOS naturalness evaluations on 3 English Amazon Mechanical Turk locales and share practices leading to reliable crowdsourced annotations for this task.
Authors
(none)
Tags
Stats
Related papers
- Singmos: An Extensive Open-source Singing Voice Dataset For MOS Prediction (2024)0.00
- Investigating Content-aware Neural Text-to-speech MOS Prediction Using Prosodic And Linguistic Features (2022)6.34
- SAMOS: A Neural MOS Prediction Model Leveraging Semantic Representations And Acoustic Features (2024)2.26
- Automos: Learning A Non-intrusive Assessor Of Naturalness-of-speech (2016)0.00
- Resource-efficient Fine-tuning Strategies For Automatic MOS Prediction In Text-to-speech For Low-resource Languages (2023)4.52
- Ldnet: Unified Listener Dependent Modeling In MOS Prediction For Synthetic Speech (2021)12.74
- A Text-to-speech Pipeline, Evaluation Methodology, And Initial Fine-tuning Results For Child Speech Synthesis (2022)10.21
- Mntts: An Open-source Mongolian Text-to-speech Synthesis Dataset And Accompanied Baseline (2022)5.24