Ever since I started distance learning, I’ve seen the many challenges that kids have faced while trying to engage with their peers, TA’s and teachers. When over zoom, teachers can only see all their students through gallery view, which shrinks all the participants to a really tiny size. This inhibits the teacher’s ability to see how their students are doing, whether they understand concepts, and whether they are liking what is being taught. We take for granted many things from in-person learning, like in-person communication & dialogue, and especially the ability for teachers to look at students’ facial expressions and understand whether they actually understand concepts or not.
What it does:
So, I set out to build an app that does exactly that -- to analyze students’ expressions via machine learning image recognition, and provide the teacher with a slider that depicts the class’ average understanding at that time. This will be done by analyzing every student’s incoming feed, recognizing facial expressions, and calculating how many people are confused vs not confused. The app/add-on can also notify teachers when individual students are confused as well.
How I built it/What I learnt/Accomplishments:
I built my project by following a tutorial on how to build a convolutional neural network (CNN) from scratch. It was my first time learning about CNNs and they were super cool! I learnt at a high level how to develop all the layers of a deep network, like “Activation, MaxPooling2D, BatchNormalization”. I then trained my model using grayscale images of the seven universal facial expressions: [Anger, Disgust, Fear, Happiness, Neutral, Sadness, Fear] by running through several epochs (another thing I learnt over the course of this project). From epoch 1/10 to epoch 10/10, our model made significant jumps in accuracy. From 0.125 to 0.59. (To learn more, check out our accuracy vs epoch graph).
Testing was next. So, instead of simply feeding images of those seven emotions back into the model, I decided to feed in images of confused faces(images that I got from the web, and some that I took of myself :). The model recognized these images of confused expressions as mainly a mix between disgust and anger. So, after the image recognition code chunk, I added a piece of code that checked if the label was either ‘disgust’ or ‘anger’ and if so, the image was marked as confused and the confused tally ‘increases’ by 1.
Since this was my first time, I faced quite a few setbacks along the way. The main challenge was simply writing and understanding the code for all the layers behind the neural network. Even after a lot of reading, CNNs are very complicated and I have a lot more to learn about! A plethora of errors was another thing that held me back in terms of time. Different tensorflow packages had different requirements for the version of python I had running, the version of pip I had downloaded, etc. I spent a lot of time figuring out the different software dependencies, debugging through multiple file layers and coming up with alternative ways to solve problems!
The main thing I’d have to do is build the zoom add on part, and use the Zoom API to integrate this algorithm within a classroom call. I also think there’s a lot more refinement I can do within the algorithm itself -- for example, adding in different emotions like sadness, and making it measure a confusion score (not just based on it being a yes/no for other emotions). I also want to integrate nodding into the app -- a student nodding should show they understand the material more! Finally, I also have to deal with the challenge of students not wanting to turn their cameras on.