Feature Joint-state Posterior Estimation In Factorial Speech Processing Models Using Deep Neural Networks
2017 Β· Mahdi Khademian, Mohammad Mehdi Homayounpour
Abstract
This paper proposes a new method for calculating joint-state posteriors of mixed-audio features using deep neural networks to be used in factorial speech processing models. The joint-state posterior information is required in factorial models to perform joint-decoding. The novelty of this work is its architecture which enables the network to infer joint-state posteriors from the pairs of state posteriors of stereo features. This paper defines an objective function to solve an underdetermined system of equations, which is used by the network for extracting joint-state posteriors. It develops the required expressions for fine-tuning the network in a unified way. The experiments compare the proposed network decoding results to those of the vector Taylor series method and show 2.3% absolute performance improvement in the monaural speech separation and recognition challenge. This achievement is substantial when we consider the simplicity of joint-state posterior extraction provided by deep
Authors
(none)
Tags
Stats
Related papers
- Revisiting Joint Decoding Based Multi-talker Speech Recognition With DNN Acoustic Model (2021)2.26
- Deep Factorization For Speech Signal (2018)8.82
- Joint Modeling Of Code-switched And Monolingual ASR Via Conditional Factorization (2021)8.60
- Tied Hidden Factors In Neural Networks For End-to-end Speaker Recognition (2018)2.26
- Learning-based A Posteriori Speech Presence Probability Estimation And Applications (2025)0.00
- Articulatory Representation Learning Via Joint Factor Analysis And Neural Matrix Factorization (2022)7.50
- Joint Speaker Features Learning For Audio-visual Multichannel Speech Separation And Recognition (2024)0.00
- Bayesian Learning Of LF-MMI Trained Time Delay Neural Networks For Speech Recognition (2020)8.82