Inspiration

What it does

How we built it

VisionDraft is a next-generation "hands-free" coding mentor that utilizes the Gemini 3 Live API to provide real-time, multimodal assistance. Unlike traditional chatbots, VisionDraft "sits" beside you, watching your screen and listening to your voice to debug errors, explain complex Python OOP concepts, and architect Flask/React applications as you build them—no tab-switching required.

Challenges we ran into

Developers often lose their "flow state" when they encounter a cryptic terminal error or need to look up documentation. Existing AI tools require manual copying and pasting of code, which is slow and disconnects the AI from the live development environment.

Accomplishments that we're proud of

Live Vision Debugging: Uses Gemini 3's multimodal capabilities to "see" terminal errors and syntax bugs directly from a screen share.

Voice-First Interaction: Powered by the Gemini Live API, allowing for low-latency, interruptible voice conversations while your hands stay on the keyboard.

Contextual Architecture Advice: Specifically tuned to help transition developers from JavaScript to Python/Flask by explaining backend logic in real-time.

Hands-Free Documentation: Ask "How do I implement a Flask Blueprint here?" and hear the step-by-step logic while looking at your specific file structure.

What we learned

AI Engine: Gemini 3 Flash (via the Google Gen AI SDK) for high-speed reasoning and 1M+ context window.

Frontend: React with Tailwind CSS, utilizing getDisplayMedia for real-time screen streaming.

Backend: Python & Flask serving as a WebSocket proxy to manage secure, low-latency communication between the client and the Gemini Live API.

Protocol: WebSockets for bidirectional audio/video streaming.

What's next for Gemini agents

Share this project:

Updates