Inspiration
I use SaaS tools every day. When I get stuck, support is slow and painful. I open a ticket, explain my screen with text, send screenshots, and wait. The agent says “click Settings” — but there are three Settings buttons. Why can’t they just see what I see?
When Google launched Gemini Live API with live audio and video, I knew I could fix this. SupportPilot was born: a live AI agent that watches your screen and talks to you like a friend looking over your shoulder.
What it does
SupportPilot is a voice-first AI support agent for complex SaaS apps. You share your mic and screen, then just talk.
The agent:
- Sees your screen in real time (1 frame per second — perfect for admin panels)
- Listens and speaks naturally
- Stops instantly when you interrupt (true barge-in)
- Finds exact buttons, errors, and pages
- Searches official docs live
- Uses your uploaded PDFs and SOPs (Vertex AI RAG)
- Pushes copyable text straight to your screen
- Remembers you across sessions
On the admin side, managers use a simple Django dashboard to create new agents with zero code — set persona, upload docs, pick voice, and deploy.
How I built it
I split it into two parts:
Agent side: React frontend captures mic + screen and sends them over WebSocket. FastAPI server streams everything to Gemini Live API using Google’s Agent Development Kit. I added four custom tools: analyze_screen, google_search, knowledge_base (RAG), and send_copy_text.
Admin side: Django + Firestore. Upload PDFs → auto-creates Vertex AI RAG. Click Deploy → spins up isolated Cloud Run container. An AI prompt generator writes perfect instructions.
Everything runs on Google Cloud: Cloud Run, Firestore, Vertex AI, and Gemini Live.
Challenges I ran into
- Too much data from screen sharing → solved with 1 FPS JPEG (works perfectly)
- Agent called tools too often → fixed with clear system instructions
- Agent made up menu paths → fixed with Vertex AI RAG (no more hallucinations)
- Barge-in felt hard → Gemini Live API handles it perfectly out of the box
What I learned
Gemini Live API is amazing. Voice is super fast, barge-in works naturally, and it understands screens at just 1 FPS. A strong system prompt makes the agent smart and helpful. Vertex AI RAG + screen vision removes all guessing. Giving non-technical managers a no-code dashboard is powerful.
What’s next
- Analytics dashboard for resolution time and happiness scores
- Easy hand-off to human agents
- Mobile SDK to embed in apps
SupportPilot turns painful support tickets into instant, live help — all powered by Gemini Live.
Built With
- django
- docker
- fastapi
- gemini
- github-actions
- google-adk
- google-cloud
- google-cloud-firestore
- google-cloud-run
- python
- react
- typescript
- vertex-ai
- vite
- websockets
Log in or sign up for Devpost to join the conversation.