Inspiration
Telemedicine has grown fast, but it still falls short for physical exams. Doctors cannot reach through a screen to listen to a patient's heart or lungs.
Two years ago, my uncle started experiencing chest pain that got worse over time. He was diagnosed with heart failure and needed regular checkups with a cardiologist, but the distance made it too hard for him to get to the clinic. Without regular monitoring, we feared he could lose his life.
This personal experience highlighted a massive gap in remote healthcare. We wanted to build a tool that guides patients step by step, ensuring anyone can capture clinical-grade audio from home without needing physical help.
What it does
Stezio is a smart stethoscope adapter that works with a web application. We added an AI voice assistant using the Gemini Multimodal Live API. It acts as a co-pilot for the patient. The AI looks through the phone camera to see the stethoscope placement and talks to the patient in real time. It can say things like "Move the adapter a bit to the left." If the patient says "Wait a second", the AI stops and waits.
How we built it
We divided the project into two main parts, the physical hardware and the software platform.
For the hardware, we designed and 3D-printed a custom adapter case that snaps onto any standard medical stethoscope. Inside the enclosure, we used a custom PCB that connects a microcontroller with a high-sensitivity MEMS microphone. This custom board allows us to capture clean and accurate heart sounds directly from the stethoscope.
For the software, we built the frontend patient portal using Next.js and Tailwind CSS, hosted on Vercel. We used the browser's native Web Audio API to handle local microphone access.
The core intelligence of Stezio runs on Google Cloud. We set up a Node.js backend proxy and deployed it on Google Cloud Run to make sure our system is fast, scalable, and secure. This backend manages a continuous WebSocket connection between the patient's web app and the Google Gemini 2.5 Flash model.
By using the Gemini Multimodal Live API, we created a true Live Agent. The web app streams real-time video from the patient's camera and audio from their microphone directly to Vertex AI. Gemini processes this multimodal feed instantly and talks back to the patient, telling them exactly where to move the stethoscope on their chest while handling sudden interruptions naturally.
Challenges we ran into
Latency was a big issue. A voice assistant feels broken if there is a long delay. We had to optimize our backend proxy to keep the voice response time under one second. Making the 3D-printed case fit securely onto different stethoscope brands also took many test prints.
What we learned
We learned a lot about combining custom hardware with live AI models. Working with the Gemini Multimodal Live API taught us how to manage continuous WebSocket connections. We had to figure out how to stream both video and audio in real time without causing high latency.
Deploying our Node.js backend on Google Cloud Run showed us how to set up a fast proxy that keeps our API keys secure while handling constant data streams.
On the hardware side, we improved our 3D printing skills to make sure the adapter fits tightly on a standard stethoscope. We also learned how to process raw audio signals from a MEMS microphone using our custom PCB and the microcontroller.
Finally, testing the AI voice assistant taught us that giving the model strict and clear rules is the only way to make it act like a helpful medical co-pilot instead of a generic chatbot.
What is next for Stezio
We plan to test this with real doctors and patients to validate the clinical audio quality. We also want to optimize the power consumption of the hardware adapter.
Built With
- 3d-printing
- gemini-api
- geminiliveapi
- mediapipe
- next.js
- node.js
- tailwind.css
- typescript
- vercel
- vertexai
- web-audio-api
Log in or sign up for Devpost to join the conversation.