Inspiration
Phishing attacks remain one of the most common cybersecurity threats, tricking users into revealing passwords, banking details, and personal information. Many users — especially students, elderly individuals, and remote workers — cannot easily recognize sophisticated phishing pages.
I wanted to explore whether a real-time AI agent could act as a personal cybersecurity guardian that continuously monitors a user’s screen and warns them before they fall victim to scams. With the multimodal reasoning capabilities of Gemini, it became possible to build an agent that not only reads text but also visually understands suspicious web pages.
This idea led to the creation of Spooky, an AI-powered phishing detection agent that analyzes screen content in real time and alerts users instantly.
What it does
Spooky is a real-time AI cybersecurity agent that monitors the user’s screen and detects phishing attempts before sensitive information is entered.
Key capabilities include:
• Captures periodic screenshots of the user's screen • Uses OCR to detect suspicious phishing keywords and scam language • Extracts and analyzes visible URLs for phishing indicators • Sends suspicious screens to Gemini for deep visual analysis • Triggers a fullscreen alert when a threat is detected • Provides voice warnings and explains the threat to the user • Allows users to ask questions through voice interaction • Logs detected threats to Firebase Cloud Firestore • Provides a dashboard for monitoring and remote control
The system acts like a continuous AI security assistant protecting users from phishing attacks.
How we built it
Spooky was built using a combination of Python-based computer vision, speech interfaces, and cloud services.
The system architecture consists of several layers:
- Screen Monitoring
PyAutoGUI captures screenshots of the user’s screen at intervals.
- Local Threat Detection
Tesseract OCR extracts text from screenshots.
OpenCV processes images and identifies suspicious keywords or phishing language.
- URL Analysis
Regex-based heuristics detect suspicious domains, fake login URLs, and malicious patterns.
- AI Threat Analysis
Suspicious screenshots are sent to Gemini (gemini-2.5-flash-lite) for multimodal visual reasoning.
Gemini determines whether the content represents phishing or social engineering.
- User Interaction
Pyttsx3 provides voice warnings.
SpeechRecognition enables voice-based questions and answers with the AI.
- Cloud Logging
Threat events are stored in Firebase Cloud Firestore for monitoring and analysis.
A Firebase-hosted dashboard allows remote monitoring and control.
This layered approach minimizes API usage while still enabling powerful AI-based detection.
Challenges we ran into
Several technical challenges arose while building Spooky:
• Reducing API costs: Constantly sending screenshots to Gemini would be expensive, so a multi-layer OCR and heuristic filtering system was implemented before invoking the AI.
• Reliable OCR detection: Extracting readable text from screenshots required preprocessing with OpenCV to improve OCR accuracy.
• Real-time responsiveness: The system needed to detect threats quickly while avoiding excessive CPU usage.
• Speech interaction stability: Implementing interruptible text-to-speech and voice recognition required careful thread management.
• Balancing false positives: The detection pipeline needed to be aggressive enough to catch phishing attempts without constantly alerting the user unnecessarily.
Accomplishments that we're proud of
• Building a fully functional autonomous AI agent rather than a simple chatbot • Integrating multimodal Gemini analysis with computer vision and OCR • Designing a layered detection pipeline that reduces unnecessary AI calls • Creating a system that actively protects users rather than just analyzing data • Successfully integrating Firebase cloud services for real-time logging and monitoring
What we learned
This project provided hands-on experience with building real-world AI agents that interact with users and the environment.
Key learnings included:
• Designing efficient AI pipelines that balance local processing with cloud AI models • Integrating multimodal AI systems with traditional computer vision techniques • Managing asynchronous systems involving speech, vision, and cloud logging • Understanding how AI can be applied to practical cybersecurity problems
Most importantly, it demonstrated how AI agents can move beyond chat interfaces to actively assist and protect users in real-world environments.
What's next for Project Spooky
Future improvements for Spooky include:
• Real-time browser extension integration for more precise phishing detection • Mobile and desktop applications for broader accessibility • Enterprise dashboards for organizational security monitoring • Improved machine learning models trained specifically on phishing interfaces • Continuous real-time streaming analysis using Gemini Live capabilities
The long-term goal is to develop Spooky into a full AI-powered personal cybersecurity assistant that proactively protects users from online threats.
Log in or sign up for Devpost to join the conversation.