Abstract

This paper introduces a new open source platform for end-to-end speech processing named ESPnet. ESPnet mainly focuses on end-to-end automatic speech recognition (ASR), and adopts widely-used dynamic neural network toolkits, Chainer and PyTorch, as a main deep learning engine. ESPnet also follows the Kaldi ASR toolkit style for data processing, feature extraction/format, and recipes to provide a complete setup for speech recognition and other speech processing experiments. This paper explains a major architecture of this software platform, several important functionalities, which differentiate ESPnet from other open source ASR toolkits, and experimental results with major ASR benchmarks.

Authors

(none)

Tags

  • Speech Recognition
  • Text-to-Speech

Stats

  • citations903
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score22.17
  • arxiv keywatanabe2018espnet

Related papers