Inspiration

You are in a meeting, someone asks you something and you just don't know the answer. You can't do anything without looking distracted. You can't ask someone without interrupting the whole room. I wanted something that sits quietly and helps, like a really smart friend whispering in your ear.

What it does

MeetMind runs in a browser tab while you are on Zoom, Teams, Meet or whatever. You whisper a question, it answers through your earphones. Nobody knows. It can also see your screen if you share it, grounding its answers in real visual context.

How I built it

The browser captures mic audio at 16kHz and streams it over WebSocket to a FastAPI backend on Google Cloud Run. Google ADK manages a live bidirectional session with Gemini 2.0 Flash Live. Audio responses come back at 24kHz and play through your speakers. When you share your screen, frames are captured every 5 seconds and sent to Gemini alongside the audio. Ask "what do you see?" and it tells you instantly.

Challenges

Audio was the hardest part. Getting the timing, scheduling and interruption logic right took 5 or 6 attempts before it clicked. Every time I spoke, the frontend was creating a new bubble for each chunk of speech instead of one clean sentence. Fixing it meant understanding how Gemini streams transcriptions incrementally and updating the existing bubble in place rather than creating new ones. Then agent transcripts were appearing twice, once while streaming and once as a final confirmation. Another deduplication issue, another length check. Every layer had something. None of it just worked out of the box.

What I learned

This is my first hackathon and I did it solo. I learned more in these few weeks than I expected. Real-time audio streaming is genuinely hard. The ADK docs are your best friend. And shipping something imperfect is better than not shipping at all.

Built With

  • api
  • audio
  • fastapi
  • gemini-live-api
  • google-adk
  • google-cloud-run
  • web
  • websockets
Share this project:

Updates