We live in a world with increasing interaction between humans and robots. As a result, it is in humans best interest that robots better understand us. That way, they can perform the services that we want faster and with greater accuracy. Currently, the leading method by which autonomous agents interpret humans is through Natural Language Processing (NLP). However, very few programs take human body language into account; body language is a tool that humans often use to communicate with each other; thus, an algorithm which predicts human emotion based partly on body language will better "understand" human behavior compared to a similar algorithm which relies solely on NLP.
What it does
BotyLanguage displays a live video feed, and can detect the emotional states of multiple humans in its frame, based on a combination of static visual expression and fluid body language.
How we built it
We built BotyLanguage using Tensorflow as our backend. We used OpenCV for the static image recognition task, training a standard Convolutional Neural Network (CNN) to classify individual frames on the OpenCV live stream. We then implemented a multiresolution CNN with transfer learning enabled in order to interpret body language. We factored the latter CNN's prediction into the program's emotion prediction.
Challenges we ran into
We were relatively inexperienced with Video Classification methods before Pennapps; so we initially struggled to adapt the standard CNN to a more complex CNN capable of detecting body language. However, after referencing some articles (for example, https://towardsdatascience.com/introduction-to-video-classification-6c6acbc57356), we found that most of the concepts in designing a standard CNN carried over to the more complex CNN.
Another challenge that we ran into was getting accurate measures from the body language CNN; body language can be difficult to understand even for humans and thus it makes sense that the CNN struggled to interpret human body language. Because of this difficulty, we ended up relying primarily on the predictions of the standard CNN, with the body language CNN having a small weight in the final decision. However, the combination of these two CNNs has the potential to make a very good prediction.
Accomplishments that we're proud of
We are proud of giving future robots the tools they need to understand humans and take over the world!
What we learned
We learned how to design a neural network to interpret video segments. We learned how to better use OpenCV: i.e. finding specific facial contours and using those to predict emotion.
What's next for BotyLanguage
We hope to improve our Body Language CNN so that it can accurately interpret human body language, and then integrate this with our standard CNN to produce a very accurate prediction of human emotion.