Inspiration
What it does
This is a suite of tools to help patients with expressive aphasia. It is a tool to facilitate conversations with the patient and the caregiver by listening to what the caregiver says, and then the AI provides potential responses for the stroke patient to choose. Expressive aphasia victims can often read, so this allows them to point or drag the mouse to the option they want.
How we built it
With the conversation facilitation, audio is sent to Deepgram to transcribe. A low-latency LLM is then called to generate potential options for the stroke patient to respond with, and the local cache of Openmoji and Twemoji images is searched. If the option corresponds to an image on Openmoji or Twemoji, it is sent to the browser.
Challenges we ran into
Often, audio recording kept breaking because of weird browser API restrictions. We had to figure out how to fix the VAD package, but we did it.
Accomplishments that we're proud of
We implemented the conversation feature to have as little latency as possible, generating a cache of potential images from Openmoji, Twemoji, and making it more extensive with Nvidia's ultra low-latency image generator Sana Sprint.
What we learned
There is plenty of room for patients with expressive aphasia to get better. They just need the right tooling.
What's next for Voicebox
If we win, we would put the money toward developing this into a mobile application and paying the fee of putting it onto the App Store. There are plenty of patients with expressive aphasia that could benefit from these tools.
Built With
- antigravity
- nextjs
- openmoji
- supabase
- twemoji
- vercel
Log in or sign up for Devpost to join the conversation.