Atypical Speech Recognition

website interface

Inspiration

During my time volunteering for long-term care centers, I came across many seniors who had speech-language disorders. People, even their families, had trouble understanding what they wanted to say. I was hoping to provide a solution using machine learning.

What it does

For the scope of this project, I focused on just simple commands. The tool recognizes speech commands (like yes/no, directions, numbers) and gives the orthographic transcription as the output.

How we built it

Dataset: TORGO (from University of Toronto) ML model: a convolutional neural network

Challenges we ran into

Not enough speech data for training the model
Finding the best architecture for the CNN
Preprocessing raw speech data in general
Finding pre-existing models for transfer learning

Accomplishments that we're proud of

Too many! Just having a (kinda) working end model is making me proud already. This is also my first personal computer science project, so I really learned a lot.

What we learned

How to execute and manage a machine learning project, from looking for data to fine-tuning the model
How to use CNN for speech recognition
Data augmentation techniques, and the pros and cons of data augmentation itself
Basic web development with Vue.js front-end and Flask back-end

What's next for Atypical Speech Recognition

Find more data
Go above simple commands (i.e., real speech recognition)
Learn more about the state-of-the-art speech recognition methods, and adapting to the target population with the results of dysarthria research

Built With

flask
javascript
jupyter
python
vue.js

Updates

Xinyi Zhang started this project — Apr 09, 2021 08:10 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.