Sonant AI

Workflow of Conversation Mode
Workflow of creating a clone
Workflow of Practice mode

Inspiration

The idea started with a conversation about language learning and the limits of popular apps. We noticed that many learners rarely get real speaking practice, especially those using tools like Duolingo that focus on memorization instead of conversation. On top of that, learners may feel awkward or shy because of their pronunciation and our own experience lead us to find ways we could have an impact on those people’s lives.

While exploring voice cloning, we wondered how it could make learning more personal. This led to the question: What if learners could hear a corrected version of their own speech? We then expanded the idea: What if they could also practice with an AI that speaks in their own voice? This became the core inspiration for our project.

What it does

The project helps people have conversations with other native speakers in a language of their choice. These conversations are contextually rich as during development we ensured that the language felt natural and free-flowing. Moreover, to help people become more confident speaking the language, we used Fish audio's voice cloning. We then used the cloned voice so that people could hear themselves and pick up on grammatical mistakes as well as hear themselves sound more confident in the language they are trying to learn. We wanted to focus on new learners to help them build confidence and enjoy the amazing process of learning a language.

How we built it

In order to build this project, we created a web app that integrated Fish Audio’s voice cloning technology with a speech-to-text and text-to-speech and large language model in order to create realistic conversations. A user is able to create a clone of their voice, which is stored, and we use that clone during practice. During practice, user says a sentence and the audio is converted to text which is sent to a model for grammatical correction and the corrected text is sent to fish audio to create speech in the users voice. Likewise to practice conversations, user can choose the type of audio they want to listen to. The user's sentence is converted to text and sent to a model with context of the conversation to generate a reply. The reply is sent to fish audio to create an audio in the user's selected voice.

Challenges we ran into

In the planning stage, we spent a lot of time figuring out design decisions–whether we wanted to focus more on the conversations, practice, or both. Since a hackathon has a fixed time constraint, we had to figure out how to understand what had the most impact to create the best possible product in the allotted time.

Accomplishments that we're proud of

Connecting so many moving pieces in a way that flowed well. The way we formed our idea and were able to see it to fruition and collaborate.

What we learned

Through the process of making this app, we learned how important a tool like this is for people learning a new language. In our experiences learning new languages, apps like Duolingo and Babbel never were able to help us fully master a new language, as it came down to mostly memorization. Being able to hear yourself speak confidently in a language you are uncomfortable in really helps build confidence and motivation towards speaking that language. Moreover, speaking to native speakers greatly helps people learn new languages, and through our app, we were able to accomplish that.

What's next for Sonant AI

We want to increase support for languages and increase dialogue options. We can have a better feedback system for our practice modes and give specific compliments or feedbacks.