What inspire us

Learning languages can be hard, especially pronunciation. This can be hindered by the fact that you, as a learner, can't hear the difference between what you say and what someone else would hear. Popular language apps that allow for speech input focus on whether a whole word sounds right/wrong, but that doesn't help when you say only one sound wrong. Therefore we decided to create an app which takes a more fine-granuled, state-of-the-art, approach and checks your pronunciation of each phoneme, and can target new test words to help encourage learning at a phoneme level. For pronunciations you get incorrect, we tried to give you similar words in your target language with a similar pronunciation.

What we learnt

Only 1 of us really knew about linguistics beforehand (and therefore came up with the idea), so for the other 3 we had to learn quite a bit about phonetics and phonemes. Separately, the technologies we used were new to most/all of us, from NextJS to MongoDB, so we had to learn those quickly during the day and night. We also all realised that front-end is not for us, especially typescript. Some would say, anything Microsoft. We also should probably having a rotating sleep schedule next time.

There was quite a lot of tuning involved in the Phoneme similarity detection. Although at first glance we had thought it to be simple, there is a lot of complexity involved in such an application - from different phonemes not matching to other languages, to trying to include sentence context in our

How we built the project

Make a plan on what features we want, user story flows, technology discussion, api endpoints. Spend 6 hours deciding the framework, set up a branch per framework contender, choose one. Split team in half for back/frontend. Make the backend logic in python - read a couple papers and make the phoneme recogniser AI, make the API endpoints with FastAPI, set up MongoDB for data storage, put a model on GoogleCloud to offload some compute remotely. Simultaneously make the frontend in NextJS, make the pages, make the logic, make it look nice with tailwind, and hook it up to the backend. Then try and fix bugs everywhere!

Specific features/technologies and where to find them:

Frontend features:

NextJs with React for frontend code Users Initial few exercises to see user’s pronunciation Card-like sentence/word structure – think quizlet words/examples to train the phoneme(s) user struggles with User speech input Highlight mistakes and then give tips/corrections underneath the card

Backend features:

Database to store user information – MongoDB Python FastApi for backend Speech to phoneme pipeline – AI model to generate API to access model answers and get database data Distance score between model answer and user input Check if distance score is over threshold to label as “wrong” Link “wrongness” to phonemes Generate information to user on what they got wrong and what sound(s) it should be AI generate sentence/word examples from user data about what they struggle with - GoogleCloud

Technologies:

Python, FastApi GoogleCloud, Nltk, PyAudio, Allosaurus MongoDB React/NextJS with tailwind

Some papers

https://arxiv.org/pdf/2002.11800.pdf

Share this project:

Updates