Onyx | Devpost

Main Page
Landing Page

Inspiration

Medical errors and adverse drug interactions cause hundreds of thousands of preventable injuries every year. For elderly patients, managing complex medication schedules and communicating critical health history (like severe allergies) to pharmacies can be overwhelming and dangerous. We were inspired to build something beyond a standard chatbot—we wanted to engineer an Autonomous Medical Guardian. We envisioned an AI proxy that not only speaks on behalf of the patient but actively protects them by cross-referencing real-time conversations against their medical records to prevent fatal mistakes

What it does

Onyx Aura Concierge is an ultra-low latency, autonomous voice AI that handles pharmacy communications for patients

Instant Dossier: Uses Gemini 2.0 to parse unstructured Medical PDF’s into JSON profiles.
Real-Time Telephony: Full-duplex Twilio WebSockets bridging web clients to live phone lines
Native Voice Synthesis: Leverages ElevenLabs Multilingual v3 and custom acoustic tuning for sub-second, hyper-realistic conversational AI
Autonomous Safety Net: Dynamically halts prescriptions that conflict with known patient allergies mid-conversation
Visual Pill Scanner: Patients can hold any medication bottle up to their webcam. Our vision model identifies the exact drug, cross-references it with their medical profile, and immediately sounds an alarm if a dangerous allergy conflict is detected

How we built it

Frontend: HTML, JavaScript, GSAP and Tailwind CSS Backend: Python FastAPI Brain: Gemini 2.0 Flash Voice and Telephone: Elevenlabs v3 & Twilio

Challenges we ran into

Building real-time, two-way voice AI is notoriously difficult. Our biggest hurdle was managing the Time-to-First-Byte (TTFB) latency. If our backend took more than 2 seconds to respond, the Twilio line felt dead, and the pharmacist would hang up. We also ran into severe routing headaches—specifically, Twilio throwing 11100 - Invalid URL format errors when trying to establish the two-way listening channel. We had to completely refactor our WebSocket architecture to ensure we were streaming audio chunks asynchronously without blocking the main Python event loop.

Accomplishments that we're proud of

Sub-Second Response Times: By bypassing standard text generation and streaming sentence chunks directly from Gemini into ElevenLabs, we achieved near-instantaneous conversational responses over a real phone line
The "Zero-Click" Onboarding: Successfully using Gemini Vision to accurately extract medical data from a raw PDF and update the frontend UI without a single manual form entry
The Live "Catch": Watching the system successfully listen to a live human pharmacist, cross-reference that audio against the JSON allergy profile, and autonomously reject a dangerous medication in real-time
It was our first time working with Twilio and getting the call to go through and work was our proudest moment, espically since we were stuck on the bug for about 5 hours.

What we learned

We gained deep, practical experience in orchestrating asynchronous WebSockets in Python. We learned how to manipulate Twilio's TwiML tags for raw audio piping, how to engineer strict zero-shot prompts to force LLMs to output valid JSON schemas, and how to aggressively tune ElevenLabs' telephony parameters to prioritize latency over studio quality.

What's next for Onyx

We would like to add language support for over 100 languages in the future.

Built With

elevenlabs
fastapi
geminiapi
gsap
html
javascript
python
tailwind
twilio

Submitted to

SASEHacks 2026
- Winner [MLH] Best Use of ElevenLabs

Created by

The idea was mine and I led the technical build end to end. On the AI side, I built the conversational loop using GPT-4o-mini through OpenRouter for low-latency turns, the medical PDF and image parser using Gemini 2.0 Flash with strict JSON schema prompting to extract structured patient profiles, and the visual pill scanner that uses GPT-4o-mini vision through the webcam to identify medications and trigger allergy conflict alerts based on the patient's profile. On the telephony side, I built the full Twilio Voice integration using TwiML Gather and Play for turn-by-turn pharmacy calls, plus a two-stage pharmacy persuasion flow that detects refusal language in the pharmacist's reply and synthesizes an emotional follow-up. To get phone calls actually working through ngrok, I added a CDN offload trick that bypasses Twilio's URL fetch constraints by hosting the rendered MP3 externally, and a runtime env reload inside the request handler so the public webhook URL refreshes after the tunnel rotates, which fixed the TwiML 11100 errors that initially blocked the build. I also built the FastAPI backend, the ElevenLabs voice integration including a single-voice multilingual design that lets the model infer accent from the text rather than switching voices per language, the SSH tunnel scripts, the onboarding flow, and the frontend.

Akhilesh Reddy Mallu
I worked on implementing the Twilio and Eleven labs/backend. The most important thing I contributed was providing my experience from a pharmacy

Shawn Madadha
Ucf 28

Updates

Akhilesh Reddy Mallu started this project — Mar 07, 2026 02:05 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.