Inspiration
Medical errors and adverse drug interactions cause hundreds of thousands of preventable injuries every year. For elderly patients, managing complex medication schedules and communicating critical health history (like severe allergies) to pharmacies can be overwhelming and dangerous. We were inspired to build something beyond a standard chatbot—we wanted to engineer an Autonomous Medical Guardian. We envisioned an AI proxy that not only speaks on behalf of the patient but actively protects them by cross-referencing real-time conversations against their medical records to prevent fatal mistakes
What it does
Onyx Aura Concierge is an ultra-low latency, autonomous voice AI that handles pharmacy communications for patients
- Instant Dossier: Uses Gemini 2.0 to parse unstructured Medical PDF’s into JSON profiles.
- Real-Time Telephony: Full-duplex Twilio WebSockets bridging web clients to live phone lines
- Native Voice Synthesis: Leverages ElevenLabs Multilingual v3 and custom acoustic tuning for sub-second, hyper-realistic conversational AI
- Autonomous Safety Net: Dynamically halts prescriptions that conflict with known patient allergies mid-conversation
- Visual Pill Scanner: Patients can hold any medication bottle up to their webcam. Our vision model identifies the exact drug, cross-references it with their medical profile, and immediately sounds an alarm if a dangerous allergy conflict is detected
How we built it
Frontend: HTML, JavaScript, GSAP and Tailwind CSS Backend: Python FastAPI Brain: Gemini 2.0 Flash Voice and Telephone: Elevenlabs v3 & Twilio
Challenges we ran into
Building real-time, two-way voice AI is notoriously difficult. Our biggest hurdle was managing the Time-to-First-Byte (TTFB) latency. If our backend took more than 2 seconds to respond, the Twilio line felt dead, and the pharmacist would hang up. We also ran into severe routing headaches—specifically, Twilio throwing 11100 - Invalid URL format errors when trying to establish the two-way listening channel. We had to completely refactor our WebSocket architecture to ensure we were streaming audio chunks asynchronously without blocking the main Python event loop.
Accomplishments that we're proud of
- Sub-Second Response Times: By bypassing standard text generation and streaming sentence chunks directly from Gemini into ElevenLabs, we achieved near-instantaneous conversational responses over a real phone line
- The "Zero-Click" Onboarding: Successfully using Gemini Vision to accurately extract medical data from a raw PDF and update the frontend UI without a single manual form entry
- The Live "Catch": Watching the system successfully listen to a live human pharmacist, cross-reference that audio against the JSON allergy profile, and autonomously reject a dangerous medication in real-time
- It was our first time working with Twilio and getting the call to go through and work was our proudest moment, espically since we were stuck on the bug for about 5 hours.
What we learned
We gained deep, practical experience in orchestrating asynchronous WebSockets in Python. We learned how to manipulate Twilio's TwiML tags for raw audio piping, how to engineer strict zero-shot prompts to force LLMs to output valid JSON schemas, and how to aggressively tune ElevenLabs' telephony parameters to prioritize latency over studio quality.
What's next for Onyx
We would like to add language support for over 100 languages in the future.
Built With
- elevenlabs
- fastapi
- geminiapi
- gsap
- html
- javascript
- python
- tailwind
- twilio






Log in or sign up for Devpost to join the conversation.