Espnet-codec: Comprehensive Training And Evaluation Of Neural Codecs For Audio, Music, And Speech
2024 Β· Jiatong Shi, Jinchuan Tian, Yihan Wu, et al.
Abstract
Neural codecs have become crucial to recent speech and audio generation research. In addition to signal compression capabilities, discrete codecs have also been found to enhance downstream training efficiency and compatibility with autoregressive language models. However, as extensive downstream applications are investigated, challenges have arisen in ensuring fair comparisons across diverse applications. To address these issues, we present a new open-source platform ESPnet-Codec, which is built on ESPnet and focuses on neural codec training and evaluation. ESPnet-Codec offers various recipes in audio, music, and speech for training and evaluation using several widely adopted codec models. Together with ESPnet-Codec, we present VERSA, a standalone evaluation toolkit, which provides a comprehensive evaluation of codec performance over 20 audio evaluation metrics. Notably, we demonstrate that ESPnet-Codec can be integrated into six ESPnet tasks, supporting diverse applications.
Authors
(none)
Tags
Stats
Related papers
- The 2020 Espnet Update: New Features, Broadened Applications, Performance Improvements, And Future Plans (2020)18.20
- Funcodec: A Fundamental, Reproducible And Integrable Open-source Toolkit For Neural Speech Codec (2023)17.47
- Espnet: End-to-end Speech Processing Toolkit (2018)22.17
- Espnet-se: End-to-end Speech Enhancement And Separation Toolkit Designed For Asr Integration (2020)13.55
- Pscodec: A Series Of High-fidelity Low-bitrate Neural Speech Codecs Leveraging Prompt Encoders (2024)0.00
- Neural Speech And Audio Coding: Modern AI Technology Meets Traditional Codecs (2024)7.16
- Codec-superb @ SLT 2024: A Lightweight Benchmark For Neural Audio Codec Models (2024)7.16
- Muskits-espnet: A Comprehensive Toolkit For Singing Voice Synthesis In New Paradigm (2024)12.50