Massively Open Online Courses (MOOCs) have been very successful at scaling accessibility to high quality video lectures, but often lack in keeping students interested and providing nonverbal feedback to teachers as found in more traditional classrooms. At their best, MOOCs employ exercises or surveys to measure student engagement, but these often prove to be trivial and/or biased.

What it does

Sensei is an EdTech application that tracks the emotions of students in an online classroom as they watch a lecture or participate in discussion. Sensei tries to address the problem of engagement by more directly measuring it through analyzing a webcam feed of the student. As the student watches the lecture, Sensei extracts his/her emotion i.e. angry, happy, neutral, etc as a proxy for whether the student is frustrated, excited, disengaged, etc. This information for each student is relayed back to the course content creator and may be useful for future content creation, personalized lessons, tracking progress, etc.

How we built it

Sensei is composed of two web applications that communicate with each other. We used node.js with a mongodb instance to build the frontend: Users can create online classrooms and host video lectures in a Coursera-esque fashion. Students will enter private classrooms using a unique code or browse an open selection of online course lectures from a range of topics. When students watch a lecture, Sensei hooks into their webcams stream, samples a sequence of images, and passes each image to the API to fetch bounding box coordinates for each face within the image. The raw image (base64 encoded), along with the list of facial coordinates are sent to our backend.

Deep Net for Emotion Classification

Our backend is an independent flask app that takes the request described above. Upon receival, each image is then cropped (once per bounding box) to produce a set of face images. We trained our own (Xception-ResNet fusion [1], [2]) deep neural network using Keras/Tensorflow to predict a set of 7 emotion probabilities (anger, disgust, fear, happy, sad, surprised, neutral) from each face image. The net was trained on the FER2013 dataset [2] and achieves around 66% classification error (Kaggle winner is 71%). Each image is then fed into our net and the probabilities are passed back to our node instance, rendered, and saved to mongodb.

From the frontend, users/students can see their own emotion probabilities in real-time and content-creators/teachers can get average emotion statistics over time across all views for a each video. As a fun addition, we added a drop menu in the /webcam that allows users to tag their own current emotions, providing us with new training examples for online retraining.

Note: Architecture published first by Octavio Arriaga & Paul Ploger

Github Repositories and Links

Challenges we ran into

Designing/Training/Tuning neural nets always takes time. Deployment itself took us more time than expected -- we deployed the node app in AWS ( but accessing the webcam requires https. Unfortunately, waiting for SSL certification took too long.

What we learned

Predicting even coarse emotion classes is a difficult problem. Compared to image classification, modern neural nets do not perform that great (~70% at best). But emotion classification holds a lot of potential in understanding explicit and implicit human reactions to complex situations in a way that questionnaires and interviews cannot. Not limited to the education space, this technology shows promise in many fields where human interactions play an integral role.

What's next for Sensei

This project was more of an exploratory effort for us to see what deep learning was capable of in an unconventional space and see how far we can push educational technology. This is something we definitely wish to continue exploring and see if can distill our ideas into a more concrete product.

Built With

Share this project: