Real-Time Multilingual Gesture and Speech Recognition System for Text-Based Interaction Using YOLOv8

Abstract

This paper presents a simple and effective real-time system that converts both speech and sign language into text, helping people communicate more easily. The system uses YOLOv8, a modern deep learning model, trained on an American Sign Language (ASL) dataset to recognize hand gestures accurately. By combining gesture recognition with a speech recognition module, it can also convert spoken words into text in real time, enabling smooth communication between hearing and hearing-impaired individuals.The system is integrated using the Flask web framework, which provides a simple and accessible interface for users. This design allows people to use the system on computers and can potentially be extended to mobile or low-cost edge devices in the future. It also includes multilingual support, converting recognized English text into regional languages such as Telugu and Hindi. This feature makes the system more inclusive and useful in multilingual environments.Performance tests showed that YOLOv8 achieved high accuracy for gesture detection, and the speech recognition module performed reliably in both quiet and moderately noisy environments. The system operates in real time with low latency, making it practical for everyday conversations and natural interaction.Overall, this system provides a comprehensive communication solution that bridges the gap between spoken language and sign language. By combining gesture recognition, speech-to-text conversion, and multilingual translation, it promotes accessibility and inclusivity, helping both hearing and hearing-impaired individuals communicate more effectively in daily life.

Abstract

Related papers