Inspiration

I use SaaS tools every day. When I get stuck, support is slow and painful. I open a ticket, explain my screen with text, send screenshots, and wait. The agent says “click Settings” — but there are three Settings buttons. Why can’t they just see what I see?

When Google launched Gemini Live API with live audio and video, I knew I could fix this. SupportPilot was born: a live AI agent that watches your screen and talks to you like a friend looking over your shoulder.

What it does

SupportPilot is a voice-first AI support agent for complex SaaS apps. You share your mic and screen, then just talk.

The agent:

  • Sees your screen in real time (1 frame per second — perfect for admin panels)
  • Listens and speaks naturally
  • Stops instantly when you interrupt (true barge-in)
  • Finds exact buttons, errors, and pages
  • Searches official docs live
  • Uses your uploaded PDFs and SOPs (Vertex AI RAG)
  • Pushes copyable text straight to your screen
  • Remembers you across sessions

On the admin side, managers use a simple Django dashboard to create new agents with zero code — set persona, upload docs, pick voice, and deploy.

How I built it

I split it into two parts:

Agent side: React frontend captures mic + screen and sends them over WebSocket. FastAPI server streams everything to Gemini Live API using Google’s Agent Development Kit. I added four custom tools: analyze_screen, google_search, knowledge_base (RAG), and send_copy_text.

Admin side: Django + Firestore. Upload PDFs → auto-creates Vertex AI RAG. Click Deploy → spins up isolated Cloud Run container. An AI prompt generator writes perfect instructions.

Everything runs on Google Cloud: Cloud Run, Firestore, Vertex AI, and Gemini Live.

Challenges I ran into

  • Too much data from screen sharing → solved with 1 FPS JPEG (works perfectly)
  • Agent called tools too often → fixed with clear system instructions
  • Agent made up menu paths → fixed with Vertex AI RAG (no more hallucinations)
  • Barge-in felt hard → Gemini Live API handles it perfectly out of the box

What I learned

Gemini Live API is amazing. Voice is super fast, barge-in works naturally, and it understands screens at just 1 FPS. A strong system prompt makes the agent smart and helpful. Vertex AI RAG + screen vision removes all guessing. Giving non-technical managers a no-code dashboard is powerful.

What’s next

  • Analytics dashboard for resolution time and happiness scores
  • Easy hand-off to human agents
  • Mobile SDK to embed in apps

SupportPilot turns painful support tickets into instant, live help — all powered by Gemini Live.

Built With

Share this project:

Updates