CogniStream

Privacy-first vision: OpenCV automatically blurs the taskbar to protect personal data while sharing only relevant code context with Gemini.
It shows CogniStream Analyzes the screen.
CogniStream Launcher

Inspiration

The inspiration for CogniStream came from a simple frustration: modern AI coding assistants still require developers to stop, copy code, and ask questions. This constant context switching breaks flow. I wanted an AI that observes, understands, and assists naturally without being asked. With Gemini 3’s multimodal capabilities, I saw an opportunity to build an AI partner that could truly work alongside a developer in real time.

What it does

CogniStream is a real-time, multimodal Heads-Up Display (HUD) that monitors the user’s screen and provides instant, voice-enabled feedback. It proactively detects coding errors, logical issues, and UI/UX design problems directly from what’s visible on the screen—without requiring manual prompts or code input. The goal is to make AI assistance seamless, hands-free, and continuous.

How I built it

CogniStream is built using Python with OpenCV and PyAutoGUI to capture and monitor screen changes efficiently. Relevant frames are sent to the Gemini 3 Flash multimodal model for visual reasoning and contextual analysis. By configuring a high thinking level, the AI performs deeper logic checks before responding. A lightweight Tkinter interface acts as the HUD, while Text-to-Speech delivers real-time audio feedback.

Challenges faced

One major challenge was balancing responsiveness with API efficiency. Continuous screen analysis can be expensive, so we implemented smart change detection to minimize unnecessary calls. Another challenge was ensuring AI feedback stayed accurate and contextual, especially during rapid screen updates.

Accomplishments that we're proud of

Proud of building a fully functional, real-time AI assistant that doesn’t behave like a chatbot. CogniStream demonstrates how Gemini 3 can be used proactively, visually, and with low latency to enhance real developer workflows.

What I learned

Learned how critical latency, context preservation, and reasoning depth are when building always-on AI systems. Multimodal models require thoughtful orchestration to feel helpful rather than intrusive.

What's next for CogniStream

Next, I plan to add multi-monitor support, deeper IDE-specific understanding, and user-customizable feedback levels. We also aim to explore long-term context memory so CogniStream can adapt to individual coding styles over time.

Built With

gemini
git
github
googlegenaisdk
opencv
pyautogui
pyinstaller
python
pyttsx3
tkinter

Submitted to

Gemini 3 Hackathon

Created by

I built CogniStream as a solo developer to bridge the gap between AI and a live coding environment. I used OpenCV to create a privacy-aware vision pipeline that blurs sensitive system areas before sending context to Gemini 3 Flash. By leveraging the model's "High Thinking" mode and a custom-built, transparent Tkinter HUD, I created a tool that doesn't just wait for prompts, but proactively offers logical fixes and "pro-tips" via a multi-threaded voice engine. I handled the entire lifecycle—from API security and multimodal integration to packaging the final project into a secure, standalone executable.

Disha Katkade

Updates

Disha Katkade started this project — Feb 09, 2026 06:40 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.