Inspiration

We’ve all been there—or known someone who has. You’re discharged from the hospital with a stack of papers and a head full of fog. You’re vulnerable. You’re scared. And the most advanced piece of technology you have is a printed medical pamphlet. We saw this gap and realized it wasn't a medical problem; it was a design problem. I wanted to build something that feels less like a database and more like a human soul.

How I built it##

I didn't just want to build a "chatbot." I wanted to build a presence. I used the Gemini 2.0 Live API to create a bidirectional, low-latency conversation. I grounded everything in Firestore—not in the cloud, but in the patient's own history. I gave it eyes with Gemini Vision, so it can see a healing incision and say, "You're doing great."

The Learnings I learned that the frontier of AI isn't in text boxes. It’s in the voice. It’s in the Live API. I learned that the Google Agent Development Kit (ADK) is the foundation for the next generation of ambient computing.

The Challenge: The Sound of the Future

The road to the future is never a straight line. Our biggest challenge was the Audio Pipeline.

The Gemini 2.5-flash-native-audio model is a pioneer, but pioneers take arrows. We faced a "1007 Invalid Frame" error that nearly stopped us. Why? Because the standard libraries weren't ready for how strictly this new model requires audio to be formatted.

The Fix: We had to go deep. We bypassed the standard SDK serialization, monkey-patched the core connection logic, and forced a raw, strongly-typed send_realtime_input stream. We turned a broken connection into a seamless heartbeat.

Built With

Share this project:

Updates