On-Device Visual Text Acquisition and Speech Synthesis for Assistive Applications Using Raspberry PI

Abstract

This work presents an offline text-reading system to assist real-time visual-to-speech conversion without cloud dependency. The proposed architecture integrates Optical Character Recognition (OCR), Text-to-Speech (TTS), and voice-command control within a unified Python framework on the Raspberry Pi 4B. A structured image preprocessing pipeline using OpenCV enhances text clarity, while the LSTM-based Tesseract OCR and eSpeak TTS modules ensure efficient recognition and synthesis under constrained hardware resources. The system demonstrates robust real-time performance, maintaining reliable operation across diverse lighting and background conditions. Through optimized parallel execution and hardware–software co-design, it achieves low latency and consistent accuracy while operating entirely offline. The findings affirm the feasibility of developing compact, energy-efficient assistive technologies with text-reading capabilities without reliance on cloud services.

Abstract

Related papers