Multi-path RNN For Hierarchical Modeling Of Long Sequential Data And Its Application To Speaker Stream Separation
2020 Β· Keisuke Kinoshita, Thilo von Neumann, Marc Delcroix, et al.
Abstract
Recently, the source separation performance was greatly improved by time-domain audio source separation based on dual-path recurrent neural network (DPRNN). DPRNN is a simple but effective model for a long sequential data. While DPRNN is quite efficient in modeling a sequential data of the length of an utterance, i.e., about 5 to 10 second data, it is harder to apply it to longer sequences such as whole conversations consisting of multiple utterances. It is simply because, in such a case, the number of time steps consumed by its internal module called inter-chunk RNN becomes extremely large. To mitigate this problem, this paper proposes a multi-path RNN (MPRNN), a generalized version of DPRNN, that models the input data in a hierarchical manner. In the MPRNN framework, the input data is represented at several (>3) time-resolutions, each of which is modeled by a specific RNN sub-module. For example, the RNN sub-module that deals with the finest resolution may model temporal relationship
Authors
(none)
Tags
Stats
Related papers
- Dual-path RNN: Efficient Long Sequence Modeling For Time-domain Single-channel Speech Separation (2019)21.06
- DPCRN: Dual-path Convolution Recurrent Network For Single Channel Speech Enhancement (2021)14.35
- Lafurca: Iterative Refined Speech Separation Based On Context-aware Dual-path Parallel Bi-lstm (2020)0.00
- Embedding Recurrent Layers With Dual-path Strategy In A Variant Of Convolutional Network For Speaker-independent Speech Separation (2022)4.52
- Continuous Streaming Multi-talker ASR With Dual-path Transducers (2021)7.50
- Monaural Speech Enhancement Using A Multi-branch Temporal Convolutional Network (2019)3.58
- Memory Time Span In Lstms For Multi-speaker Source Separation (2018)3.58
- Dual-path Self-attention RNN For Real-time Speech Enhancement (2020)0.00