Inspiration
One afternoon, my friend was giving an online presentation for his class. We live in a shared house so I was there during his presentation. Then, I noticed he mispronounced some words completely and those words are essential elements in his presentation. After his presentation, I talked to him about this and realized that he's been pronouncing those words for a long time already.
Sometimes, it's not that we don't want to improve but we don't know what is incorrect in the first place.
What it does
Prepper is a tool that aims to help identifying mispronunciation of a speech. Essentially, we provide a script and a recording of how we say it.
How we built it
The system utilizes AI models mixed with some algorithm to deliver which word can be improved further. Users can also select each word to inspect the correct phoneme (Text2Phoneme model) and the phoneme of their speech (Speech2Phoneme model) along with an audio to inform what how to correctly says a word (Azure Speech Service).
Challenges we ran into
There are a lot of components involved. We have one model for text-to-phoneme, one for audio-to-phoneme, text-to-speech service and string alignment algorithm. Further improvement on these methods are needed to maximize the experience on the platform.
What we learned
AI unlocks the possibility of solutions we can bring to help others.
What's next for Prepper
First, we are going to improve it further particularly on the methods. Then, we will try to make the rehearsal itself a lot smoother. For instance, instead of writing the script, we can use speech-to-text transcription for that instead. We can also measure the pitch and intonation for the presentation so that we can suggest a way to make it more interesting.
Built With
- azure
- azure-containers-app
- azure-cosmos
- express.js
- fast-api
- huggingface
- nuxtjs
- python
- speechapi
- wav2vec2
Log in or sign up for Devpost to join the conversation.