Voicebox

Project logo

Inspiration

What it does

This is a suite of tools to help patients with expressive aphasia. It is a tool to facilitate conversations with the patient and the caregiver by listening to what the caregiver says, and then the AI provides potential responses for the stroke patient to choose. Expressive aphasia victims can often read, so this allows them to point or drag the mouse to the option they want.

How we built it

With the conversation facilitation, audio is sent to Deepgram to transcribe. A low-latency LLM is then called to generate potential options for the stroke patient to respond with, and the local cache of Openmoji and Twemoji images is searched. If the option corresponds to an image on Openmoji or Twemoji, it is sent to the browser.

Challenges we ran into

Often, audio recording kept breaking because of weird browser API restrictions. We had to figure out how to fix the VAD package, but we did it.

Accomplishments that we're proud of

We implemented the conversation feature to have as little latency as possible, generating a cache of potential images from Openmoji, Twemoji, and making it more extensive with Nvidia's ultra low-latency image generator Sana Sprint.

What we learned

There is plenty of room for patients with expressive aphasia to get better. They just need the right tooling.

What's next for Voicebox

If we win, we would put the money toward developing this into a mobile application and paying the fee of putting it onto the App Store. There are plenty of patients with expressive aphasia that could benefit from these tools.

Built With

antigravity
nextjs
openmoji
supabase
twemoji
vercel

Updates

Pooch63 Pillai started this project — Jun 12, 2026 09:57 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.