VERSA: A Versatile Evaluation Toolkit For Speech, Audio, And Music
2024 Β· Jiatong Shi, Hye-Jin Shim, Jinchuan Tian, et al.
Abstract
In this work, we introduce VERSA, a unified and standardized evaluation toolkit designed for various speech, audio, and music signals. The toolkit features a Pythonic interface with flexible configuration and dependency control, making it user-friendly and efficient. With full installation, VERSA offers 65 metrics with 729 metric variations based on different configurations. These metrics encompass evaluations utilizing diverse external resources, including matching and non-matching reference audio, text transcriptions, and text captions. As a lightweight yet comprehensive toolkit, VERSA is versatile to support the evaluation of a wide range of downstream scenarios. To demonstrate its capabilities, this work highlights example use cases for VERSA, including audio coding, speech synthesis, speech enhancement, singing synthesis, and music generation. The toolkit is available at https://github.com/wavlab-speech/versa.
Authors
(none)
Tags
Stats
Code
Related papers
- Visqol V3: An Open Source Production Ready Objective Speech And Audio Metric (2020)15.83
- Espnet-codec: Comprehensive Training And Evaluation Of Neural Codecs For Audio, Music, And Speech (2024)9.03
- Audioeval: Automatic Dual-perspective And Multi-dimensional Evaluation Of Text-to-audio-generation (2025)0.00
- Muskits-espnet: A Comprehensive Toolkit For Singing Voice Synthesis In New Paradigm (2024)12.50
- H_eval: A New Hybrid Evaluation Metric For Automatic Speech Recognition Tasks (2022)6.34
- Openace: An Open Benchmark For Evaluating Audio Coding Performance (2024)2.16
- Investigations In Audio Captioning: Addressing Vocabulary Imbalance And Evaluating Suitability Of Language-centric Performance Metrics (2022)0.00
- Semantic-wer: A Unified Metric For The Evaluation Of ASR Transcript For End Usability (2021)0.00