Device-directed Utterance Detection
2018 Β· Sri Harish Mallidi, Roland Maas, Kyle Goehner, et al.
Abstract
In this work, we propose a classifier for distinguishing device-directed queries from background speech in the context of interactions with voice assistants. Applications include rejection of false wake-ups or unintended interactions as well as enabling wake-word free follow-up queries. Consider the example interaction: \("Computer,~play~music", "Computer,~reduce~the~volume"\). In this interaction, the user needs to repeat the wake-word (\(Computer\)) for the second query. To allow for more natural interactions, the device could immediately re-enter listening state after the first query (without wake-word repetition) and accept or reject a potential follow-up as device-directed or background speech. The proposed model consists of two long short-term memory (LSTM) neural networks trained on acoustic features and automatic speech recognition (ASR) 1-best hypotheses, respectively. A feed-forward deep neural network (DNN) is then trained to combine the acoustic and 1-best embeddings, deriv
Authors
(none)
Tags
Stats
Related papers
- A Multimodal Approach To Device-directed Speech Detection With Large Language Models (2024)7.16
- Exploring Attention Mechanism For Acoustic-based Classification Of Speech Utterances Into System-directed And Non-system-directed (2019)9.59
- Streaming Reslstm With Causal Mean Aggregation For Device-directed Utterance Detection (2020)0.00
- Implicit Acoustic Echo Cancellation For Keyword Spotting And Device-directed Speech Detection (2021)3.58
- Modality Dropout For Multimodal Device Directed Speech Detection Using Verbal And Non-verbal Features (2023)0.00
- Boosting Keyword Spotting Through On-device Learnable User Speech Characteristics (2024)0.00
- Leveraging Acoustic Cues And Paralinguistic Embeddings To Detect Expression From Voice (2019)8.35
- Heimdal: Highly Efficient Method For Detection And Localization Of Wake-words (2022)3.58