Inspiration
We’ve all been there—or known someone who has. You’re discharged from the hospital with a stack of papers and a head full of fog. You’re vulnerable. You’re scared. And the most advanced piece of technology you have is a printed medical pamphlet. We saw this gap and realized it wasn't a medical problem; it was a design problem. I wanted to build something that feels less like a database and more like a human soul.
How I built it##
I didn't just want to build a "chatbot." I wanted to build a presence. I used the Gemini 2.0 Live API to create a bidirectional, low-latency conversation. I grounded everything in Firestore—not in the cloud, but in the patient's own history. I gave it eyes with Gemini Vision, so it can see a healing incision and say, "You're doing great."
The Learnings I learned that the frontier of AI isn't in text boxes. It’s in the voice. It’s in the Live API. I learned that the Google Agent Development Kit (ADK) is the foundation for the next generation of ambient computing.
The Challenge: The Sound of the Future
The road to the future is never a straight line. Our biggest challenge was the Audio Pipeline.
The Gemini 2.5-flash-native-audio model is a pioneer, but pioneers take arrows. We faced a "1007 Invalid Frame" error that nearly stopped us. Why? Because the standard libraries weren't ready for how strictly this new model requires audio to be formatted.
The Fix: We had to go deep. We bypassed the standard SDK serialization, monkey-patched the core connection logic, and forced a raw, strongly-typed send_realtime_input stream. We turned a broken connection into a seamless heartbeat.
Built With
- docker
- fastapi
- firestore
- gemini-2.0-flash-(vision)
- gemini-2.0-flash-live-api
- google-agent-development-kit-(adk)
- google-cloud
- google-cloud-run
- javascript
- react
- sqlalchemy
- sqlite
- tailwind-css
- vite
- webaudio
- webrtc
- ython
Log in or sign up for Devpost to join the conversation.