Inspiration
In the present days, there are voice assistants available like Apple Siri or Google Assistant, which could reply to us based on the words we spoke. We have come up with an idea. If it is possible to add a feature for them to recognize our emotions, the suggestion might be more interesting. By knowing how a person feels, it will basically attract customers and could be a competitive advantage for companies in this industry.
How we built it
Machine Learning is a method that provides a system the ability to learn. It has been adapted to various industries and has become one commonly used tool(Team 2020). This method can also be very useful in many industries especially for the technology industry because it could be implemented in various ways, and there are no limits to the idea. This project would implement machine learning techniques to do an audio emotion classification using Convolutional Neural Networks with the goal of applying it to current technology like voice assistants. In more technical way, our team has developed the classification model by using 1DCNN which is the supervised learning method as we specified the number of target output classes. Then we trained the model to classify voices into target classes.
Challenges we ran into
The variations of voice tones and languages. This included that the training and testing data sets we used has high probability that could lead to overfitting problem. All the words in the dataset are spoken by only two women, therefore overfitting could easily happen. What challenging is that our 1 dimensional CNN model needed to be trained by more real world problems. We are facing a problem on testing's loss, the model seems to be overfitted. I am planning to perform an oversampling method to generate more data as I strongly believe that this is the problem with the input dataset. Oversampling is the technique that is used to adjust class distribution of a dataset: it makes more reliable predictions from being trained with balanced data. The method that we use is random oversampling, which involves randomly selecting examples from the minority class with replacement and adding them to the training dataset. The oversampling model created with the class called BalancedDataGenerator which is the combination of Image Data Generator and Random Oversampling.
Accomplishments that we're proud of
With our current model it able to classified voices with 96% accuracy rate.
What we learned
How to construct classification model, characteristics of activation functions, and extracting features library.
What's next for Emotions classification
Classifying emotions is only our first step, we are aiming to develop machine learning model that can find the correlation between actions and emotions. With the ability of our next model we will be able to tackle many solutions for many consumer brands, media and advertising firms, and digital first brands.
Built With
- cnn
- colab
- librosa
- machine-learning
- python
- tess
- toronto
Log in or sign up for Devpost to join the conversation.