🛡️ Cognitive Guardian: Project Story
Inspiration
We spend hours every day inside a browser, and that browser has no idea what is happening to us. It does not know when we are staring at a phishing email that looks perfectly legitimate. It does not know when a headline is engineered to manipulate our emotions. It does not know when we have been mindlessly scrolling for 45 minutes instead of doing the thing we opened our laptop to do.
Traditional browser security tools rely on blocklists and pattern matching. They are reactive, static, and blind to the visual and contextual reality of what is actually on your screen. We wanted to build something that sees what you see, understands it, and protects you in the moment.
That question became Cognitive Guardian: what if your browser had an immune system?
What It Does
Cognitive Guardian is an AI agent that lives inside your Chrome browser and acts as your real-time digital immune system. Instead of waiting for you to ask it something, it silently monitors your active tab and intervenes directly in the UI when it detects a threat.
It protects you across three fronts:
- Phishing and scam detection: It catches highly tailored spear-phishing attempts that blocklists miss, by analyzing the full visual and contextual picture of the page.
- Fake news and manipulation flagging: It highlights sensationalist language and low-credibility content directly on the screen, before you share or act on it.
- Burnout and doomscrolling intervention: It tracks prolonged unproductive scrolling and gently nudges you via text overlay and synthesized voice to take a break or refocus.
How We Built It
The project is split into two tightly coupled layers.
The frontend is a Chrome Extension (Manifest V3) with a Service Worker that continuously captures anonymized visual frames of the active tab and sends them to the backend over HTTP. A Content Script then receives the agent's decision and injects non-intrusive overlays, CSS highlights, and voice alerts directly into the page via the Web Speech API.
The backend is a FastAPI server powered by Google's Agent Development Kit (ADK). Each incoming frame is analyzed by an LlmAgent backed by Gemini 2.5 Flash with multimodal vision. The agent returns a structured GuardianDecision JSON object specifying the action type, message, voice message, and a confidence score. Responses below 0.7 confidence are suppressed automatically, and a 15-second cooldown per domain prevents alert fatigue.
Sessions are scoped per domain and maintain a rolling 10-frame history, giving the agent short-term memory of what you have been doing on a given site.
Challenges We Ran Into
Getting the agent to be genuinely useful without being annoying was the hardest problem. An overly sensitive guardian becomes noise you learn to ignore. We had to carefully tune the confidence threshold, the cooldown logic, and the prompt design so that interventions feel earned rather than intrusive.
Working with continuous visual input over HTTP also required careful thinking about latency and frame management. We made deliberate tradeoffs to keep the extension lightweight and the user experience smooth.
Accomplishments We Are Proud Of
We built a fully working, end-to-end agentic system in a short hackathon window. The agent genuinely understands what is on the screen, not just what the URL says. Watching it catch a convincing phishing simulation that a traditional tool would have missed was a satisfying moment. So was hearing it speak a calm, contextual warning in the user's own language via the i18n-aware interface.
What We Learned
Multimodal agents are powerful precisely because they close the gap between what a user sees and what a system knows. Vision input unlocks an entirely different class of interventions that text or metadata alone cannot support. We also learned how much of the agent design work lives in the output schema and the session management, not just the model call itself.
What's Next for Cognitive Guardian
- Expanding threat categories to include dark patterns, aggressive cookie consent flows, and manipulative checkout UX
- A user feedback loop where accepted and rejected interventions improve the agent's calibration over time
- A lightweight on-device model option for users who want full offline privacy
- Firefox and Edge extension support
- A dashboard that surfaces weekly insights about your browsing health trends
Built With
- fastapi
- gemini
- google-adk
- google-cloud
- google-web-speech-api
- javascript
- python
Log in or sign up for Devpost to join the conversation.