Why'd we make SULI?

There are over 100 million people today who have trouble enunciating words properly due to a myriad of reasons, such as hearing impairments, neurological disorders like dysarthria, or developmental conditions like childhood apraxia of speech. We wanted to build an interface that made communication and easier for those who have this trouble. We also wanted to enable these users to interact with voice assistants, which normally require very clear and unbroken speech to understand a query.

What exactly does SULI do?

SULI offers a user-friendly web interface that utilizes your webcam to capture a video of you posing a question, issuing a command, and more, all without the need for audio. The innovative aspect of SULI lies in its ability to analyze the movements of your lips, enabling it to accurately predict your speech. This prediction is then converted into audible speech through text-to-speech technology. Moreover, SULI seamlessly integrates with smart voice assistants such as ChatGPT and Google Assistant. This integration means you can interact with these assistants without ever having to vocalize a sound or type a single word, revolutionizing the way we communicate with technology.

How we built it

We employed a state of the art visual speech recognition algorithm designed by Pingchuan Ma et. al. link. We utilized this model in our backend, along with the TTS and GPT APIs. Our user interface was built in React using Tailwind for quick styling.

What's next for SULI

We're just getting started! -Our immediate next step is to speed up the processing of our pipeline. This might require that we utilize a cloud computing cluster or utilize more multiprocessing in the code. -We also want to provide more integrations with different services, especially with home assistants like Alexa and Google Home.

Share this project:

Updates