Abstract
Current wearable devices are limited by power consumption and size, and voice recognition mostly relies on cell phone processing. Localized deployment has the potential advantages of improving privacy security and reducing latency. This paper synthesizes several research in this field to analyze the low power and latency advantages of Field Programmable Gate Array (FPGA) hardware over traditional Central Processing Unit (CPU) and CPU plus Graphics Processing Unit (GPU) solutions for speech recognition tasks. It also demonstrates the significant potential of FPGAs in terms of energy efficiency and real time performance in conjunction with Spiked Neural Networks (SNNs) and their hardware optimization strategies. Experimental data from studies shows FPGA hardware and SNN combination scheme can reach 11.5Γ and 40Γ energy efficiency compare to CPU and GPU which provides a feasible path for implementing native speech processing in future wearable devices. Potential future optimization directions such as further optimizing the on-chip memory layout and alternative of CPU controller are also analyzed based on the current state of technology development.