Inspiration

Phishing attacks remain one of the most common cybersecurity threats, tricking users into revealing passwords, banking details, and personal information. Many users — especially students, elderly individuals, and remote workers — cannot easily recognize sophisticated phishing pages.

I wanted to explore whether a real-time AI agent could act as a personal cybersecurity guardian that continuously monitors a user’s screen and warns them before they fall victim to scams. With the multimodal reasoning capabilities of Gemini, it became possible to build an agent that not only reads text but also visually understands suspicious web pages.

This idea led to the creation of Spooky, an AI-powered phishing detection agent that analyzes screen content in real time and alerts users instantly.

What it does

Spooky is a real-time AI cybersecurity agent that monitors the user’s screen and detects phishing attempts before sensitive information is entered.

Key capabilities include:

• Captures periodic screenshots of the user's screen • Uses OCR to detect suspicious phishing keywords and scam language • Extracts and analyzes visible URLs for phishing indicators • Sends suspicious screens to Gemini for deep visual analysis • Triggers a fullscreen alert when a threat is detected • Provides voice warnings and explains the threat to the user • Allows users to ask questions through voice interaction • Logs detected threats to Firebase Cloud Firestore • Provides a dashboard for monitoring and remote control

The system acts like a continuous AI security assistant protecting users from phishing attacks.

How we built it

Spooky was built using a combination of Python-based computer vision, speech interfaces, and cloud services.

The system architecture consists of several layers:

  1. Screen Monitoring

PyAutoGUI captures screenshots of the user’s screen at intervals.

  1. Local Threat Detection

Tesseract OCR extracts text from screenshots.

OpenCV processes images and identifies suspicious keywords or phishing language.

  1. URL Analysis

Regex-based heuristics detect suspicious domains, fake login URLs, and malicious patterns.

  1. AI Threat Analysis

Suspicious screenshots are sent to Gemini (gemini-2.5-flash-lite) for multimodal visual reasoning.

Gemini determines whether the content represents phishing or social engineering.

  1. User Interaction

Pyttsx3 provides voice warnings.

SpeechRecognition enables voice-based questions and answers with the AI.

  1. Cloud Logging

Threat events are stored in Firebase Cloud Firestore for monitoring and analysis.

A Firebase-hosted dashboard allows remote monitoring and control.

This layered approach minimizes API usage while still enabling powerful AI-based detection.

Challenges we ran into

Several technical challenges arose while building Spooky:

• Reducing API costs: Constantly sending screenshots to Gemini would be expensive, so a multi-layer OCR and heuristic filtering system was implemented before invoking the AI.

• Reliable OCR detection: Extracting readable text from screenshots required preprocessing with OpenCV to improve OCR accuracy.

• Real-time responsiveness: The system needed to detect threats quickly while avoiding excessive CPU usage.

• Speech interaction stability: Implementing interruptible text-to-speech and voice recognition required careful thread management.

• Balancing false positives: The detection pipeline needed to be aggressive enough to catch phishing attempts without constantly alerting the user unnecessarily.

Accomplishments that we're proud of

• Building a fully functional autonomous AI agent rather than a simple chatbot • Integrating multimodal Gemini analysis with computer vision and OCR • Designing a layered detection pipeline that reduces unnecessary AI calls • Creating a system that actively protects users rather than just analyzing data • Successfully integrating Firebase cloud services for real-time logging and monitoring

What we learned

This project provided hands-on experience with building real-world AI agents that interact with users and the environment.

Key learnings included:

• Designing efficient AI pipelines that balance local processing with cloud AI models • Integrating multimodal AI systems with traditional computer vision techniques • Managing asynchronous systems involving speech, vision, and cloud logging • Understanding how AI can be applied to practical cybersecurity problems

Most importantly, it demonstrated how AI agents can move beyond chat interfaces to actively assist and protect users in real-world environments.

What's next for Project Spooky

Future improvements for Spooky include:

• Real-time browser extension integration for more precise phishing detection • Mobile and desktop applications for broader accessibility • Enterprise dashboards for organizational security monitoring • Improved machine learning models trained specifically on phishing interfaces • Continuous real-time streaming analysis using Gemini Live capabilities

The long-term goal is to develop Spooky into a full AI-powered personal cybersecurity assistant that proactively protects users from online threats.

Built With

  • chart.js
  • cloudfirestore
  • computer-vision
  • firebase
  • firebase-admin
  • firebasehosting
  • googlegeminiapi
  • googlegenaisdk
  • multimodal-ai
  • opencv
  • pyautogui
  • python
  • python-dotenv
  • pyttsx3
  • speechrecognition
  • tailwindcss
  • tesseractocr
  • voice-interface
Share this project:

Updates