Inspiration

Our team member Rob has been learning French and has been wanting some additional tools to use in his learning process. He particularly wants to improve his pronunciation of French words, so he thought it would be cool to see if AI could help with that.

What it does

ALEC helps users practice and improve their pronunciation of words in a chosen language. It uses a chat format to give users words or phrases to say, lets the user record themselves saying the word or phrase, analyzes the pronunciation, and gives recommendations for how to improve. It can continue to give the user additional words/phrases to pronounce and can adjust the difficulty depending on how the user is doing.

How we built it

The core of ALEC is the Crew.ai pronunciation tutor crew. This consists of three agents that use A2A to collaborate. The three agents are a curriculum agent, a pronunciation coach agent, and a conversation agent. The curriculum agent manages what words and phrases are offered to the user and tracks and adjust for their overall level of skill. The pronunciation agent assesses how accurate the user's pronunciations are and gives feedback on how they can improve. The conversation agent is the user-facing friendly agent that runs the chat and integrates the work from the other agents to warmly and supportively guide the user through the pronunciation tasks.

The web app uses a FastAPI backend and a Vite frontend, with Google Text-to-Speech API for playing the conversation agent's responses out loud to the user. A key component of the backend app is the use of a package called allosaurus that can take a WAV file of the user's pronunciation and translate it into IPA characters that correspond to exactly what the user said. This IPA representation can then be compared to the IPA representation of the proper pronunciation of the given word or phrase to see how close the user got.

Challenges we ran into

Well, time as always was a challenge! We didn't manage to implement everything we wanted to but we are happy that we made a lot of progress. Working with allosaurus was a bit tricky, and getting quality translations of the audio files into accurate IPA representation is a process that needs some fine-tuning. Crew.ai, while very cool and powerful, has been challenging to use. The docs and the design of the platform aren't the best for quickly and efficiently communicating how to use it in the way we wanted. We still are working on actually connecting our web app to Crew.ai.

Accomplishments that we're proud of

Coming up with a fun concept that we enjoyed working with and creating a running usable web app for the user interface.

What we learned

About agent protocols, about the surprisingly complex world of processing human speech into concrete phonemes, about how to handle audio file creation and processing.

What's next for TBD

We initially wanted to deploy on Fly.io, and we also initially wanted to use Stagehand in Crew.ai to let the agents navigate the web to find additional pronunciations of words created by real people on howtopronounce.com to provide to the user, but we didn't have time to get to those things.

Built With

Share this project:

Updates