Human versus Machine Emotion Detection

Inspiration

Our project focuses on nonverbal communication, specifically how we express emotion through facial expressions and how we pick up on those cues. There’s been a lot of research in recent years on machine learning algorithms that can do a lot of things normally considered very human, like playing chess, or writing poetry, and emotion detection is no exception. Making sure machines can recognize human emotions is a key step towards a future of artificial intelligence that is safe, communicative, and helpful for humans.

What it does

Our project has three main parts or three main scripts that we used to draw our conclusions - a script for data augmentation that takes in an image set and outputs an altered one - a program that provides an interface for humans to give their own labels for the data and to test the accuracy of those labels, and the actual CNN that attempts to detect emotions itself. We tested three filters that we thought would make it more difficult to identify a person’s emotions and these were rotating the image upside down, covering the eyes, and covering the mouth. In analyzing performance with these filters we hope to see how resilient a human and a machine model are to situations where the face has been partially obscured or manipulated.

How we built it

We coded all of these in python using either jupyter notebooks or google colab.

Challenges we ran into

As with any machine learning project, our number one difficulty was data. Face data is actually one of the most abundant data types but privacy issues often means you must request access and we didn’t really have time for that. We ended up finding a dataset off Kaggle of about 30,000 faces which was not bad, all things considered, but could definitely be improved.

Accomplishments that we're proud of

We definitely got some interesting and surprising results at the end of all of this. Our CNN achieved a max accuracy of 0.662 or around 66% and an AUC of 0.898. This is actually quite good considering there’s seven emotion classes so guessing randomly would yield around 14% accuracy. To put this in perspective, as well, labeling the control dataset ourselves, human performance was only at 63%. We were really surprised at how low our accuracy was and we think some of the factors contributing to this is the low quality black and white images and the fact that there weren’t necessarily enough labels to cover a full spectrum of human emotions so some faces were ambiguous as to how to classify them.

What we learned

We found that a machine model can approach or even exceed human performance in detecting emotion, at least in the limited scope of our tests, though it will be far less resilient to image perturbations. This was along the lines of what we expected, but it was interesting to see how augmenting the data affected these results. Particularly interesting was that the model seemed to rely on the mouth to identify emotions slightly more while a human relied more on the eyes, though this could be within margin of error.

What's next for our project

Future steps for a project like this if we had more time could include testing other models like RNNs to see how their performance compares or testing other types of data augmentation. We also would be interested to see how results would differ if we just had more and higher quality training data, including potentially images with the eyes or mouths perfectly hand-erased. Lastly, it would also be useful to look into ways or methods to get around these data mutations and remain resilient to them to improve future real world performance.