Abstract
Deepfake media has rapidly emerged as one of the most concerning consequences of recent advancements in artificial intelligence and generative modelling. With tools like Generative Adversarial Networks (GANs), facial reenactment models, and audio-cloning systems becoming publicly accessible, even nonexperts can now fabricate highly realistic audio and video content. Therefore , detecting deepfake content has become an important requirement for protecting online calling platforms. While many of the research focus on identifying deceived content in offline methods, very few platforms provide solutions that function during real-time communication such as video conferencing and voice calls. This paper gives a hybrid deepfake detection system, analyzes both video and audio signals during live calls. A small and fast convolutional neural network checks for uncertainties in the video, another model spectrogram-based classifier examines at sound patterns to find anything unusual. This system is developed to operate with immediate response so that users can be alerted immediately during an live call. Several experiments were carried out using kaggle and github based public datasets and custom-generated deepfake clips to ensure realtime performance. The results demonstrate reasonably high detection accuracy and low latency even on midrange hardware, making the model suitable for deployment on mobile devices or integrated into existing communication software. This paper aims to contribute a student-developed, resource efficient, and real-time applicable approach to safeguarding digital interactions.