Final Reflection: https://docs.google.com/document/d/1xudHK-m86HAOfmSd0riJf5OrpmZNeObdLkWXVzUQLB4/edit?usp=sharing
Poster: https://drive.google.com/file/d/1RNfQUBgQikzFLyblspLE4kl9ay7rX7iE/view?usp=sharing
Git Link: https://github.com/alapre/csci1470-final-proj (private but our TA has access!)
Check in 3 link: https://docs.google.com/document/d/1SV5z6_YIOz0YfqjjFmzoS8KbH3w4hI--IidRqGnyymo/edit?usp=sharing
Outline (check in 2):
Introduction: What problem are you trying to solve and why?
- We are trying to solve the problem of recognizing emotions/sentiments of people based on their face in an image. This is an exciting area of computer vision and deep learning more broadly, and can be used for a wide range of uses - from simple analyses to utilizing these models to identify risks of mental health through posts on social media.
If you are implementing an existing paper, describe the paper’s objectives and why you chose this paper:
- The paper we chose (A study on computer vision for facial emotion recognition | Scientific Reports) published in Nature aims to use deep learning for the task of facial emotion recognition (FER).The authors used the AffectNet database which includes almost half a million images of faces, categorized into 11 different emotional categories, 7 of which were used in this study. The authors also used RAF-DB, another database with more than 300,000 images of faces similarly categorized into emotional classes. The authors use three different methods: 1) CNN, 2) squeeze-and-excitation, and 3) a residual neural network. The paper attempts to use interpretability methods such as feature maps, utilizing the residual blocks in the model’s architecture, to understand which features of the image are most important in classifying emotions of facial images.
What kind of problem is this? Classification? Regression? Structured prediction? Reinforcement Learning? Unsupervised Learning? Etc.
- This problem is largely a classification problem - taking in an input image and labeling it as one of a set amount of emotions. Our implementation of this paper may reduce the number of emotions we will predict (happy, sad, etc.) but the overall goal is the same. We then hope (as a stretch goal) to utilize feature maps, similar to the paper, to analyze where the model is learning to best classify images.
Related Work: Are you aware of any, or is there any prior work that you drew on to do your project? Please read and briefly summarize (no more than one paragraph) at least one paper/article/blog relevant to your topic beyond the paper you are re-implementing/novel idea you are researching.
- main paper to replicate: “Facial Emotion Recognition Using Conventional Machine Learning and Deep Learning Methods: Current Achievements, Analysis and Remaining Challenges”
- This paper analyzes machine learning emotion recognition approaches and compares them directly to a deep learning approach. Some of the machine learning approaches include SVMs, K-nearest neighbors, and Random Forest, and while these are computationally inexpensive, the results are not as accurate as those produced by CNNs and RNNs. The paper also poses the remaining challenges in FER. For instance, there needs to be more progress made in classifying emotions based on micro-expressions due to unwilling movements that happen in different areas of the face.
- In this section, also include URLs to any public implementations you find of the paper you’re trying to implement. Please keep this as a “living list”–if you stumble across a new implementation later down the line, add it to this list.
Data: What data are you using (if any)?
If you’re using a standard dataset (e.g. MNIST), you can just mention that briefly. Otherwise, say something more about where your data come from (especially if there’s anything interesting about how you will gather it).
- Kaggle Facial Expression Recognition Dataset
- The dataset has sub-directories for both the training and validation data
- Under the train directory, there are 7 sub-directories named after an emotion that contain images that correspond to that emotion
- These are the emotions/names of sub-folders: angry/disgust/fear/happy/neutral/sad/surprise
How big is it? Will you need to do significant preprocessing?
- Size of data: 126 MB
- Preprocessing: Most likely not super necessary, the main preprocessing we might do is trim down the number of emotional categories (positive/negative rather than joyful, disgust, fear, etc.) in order to create a simpler model to predict first before going more complex
- In relation to the given dataset, we have the following public examples for how the data is preprocessed:
- Overall: we can choose to augment images as part of preprocessing but overall a straightforward approach
Methodology: What is the architecture of your model?
- The architecture used is a SE-ResNet model (squeeze and excitation blocks)
How are you training the model?
- We are training the model on the dataset linked at the top of our paper and will use the ResNet as the neural network model. To test generalizability, we will use cross-database validation with two different databases. The kaggle database and Real-World Affective Faces database with the same set of categories: surprise, fear, disgust, anger, sadness, happiness, and neutrality http://www.whdeng.cn/raf/model1.html
If you are implementing an existing paper, detail what you think will be the hardest part about implementing the model here.
- I could see the architecture being a problem because we have not worked with SE blocks yet. If needed, we can better model our code to the public examples linked above: - - -
If you are doing something new, justify your design. Also note some backup ideas you may have to experiment with if you run into issues. - NA
Metrics: What constitutes “success?”
Broadly, success for facial emotion recognition constitutes correctly identifying an image as its labeled emotion. This is very similar to how we defined accuracy when classifying MNIST images as their hand-written digit class; now we will simply compare the predicted emotion of an image with the ground-truth.
What experiments do you plan to run?
- We plan to hold out both a validation and a test set from the data and run the model with these datasets (saving test for the very last run through) to measure the accuracy of the model. Time-permitting, we also plan to generate feature maps to understand the model’s decision process more thoroughly.
For most of our assignments, we have looked at the accuracy of the model. Does the notion of “accuracy” apply for your project, or is some other metric more appropriate?
- Yes - accuracy does apply for this project as an image can be predicted to either be the emotion which matches its ground-truth label or it is not (with success being defined as the former).
If you are implementing an existing project, detail what the authors of that paper were hoping to find and how they quantified the results of their model.
- The authors hoped to show that, with a relatively light - weight model compared to a transformer or other parameter-heavy model, they could observe fairly high training and validation accuracy for classification. They then hoped to use feature maps and class activations in order to understand and highlight segments of the image which were most important in classifying images as emotions. The accuracy on AffectNet was 56%. The authors also used transfer learning on this model by training it after using the RAF-DB data. The authors also overlaid the feature maps they created with the images to demonstrate the parts of the face which are important for emotion prediction.
What are your base, target, and stretch goals?
- Base: Use a CNN model architecture to predict the emotion class of facial image inputs with an accuracy higher than chance probability. Concretely, this means if we use 2 (most likely will be somewhere between 2 and 10) emotional categories, then, depending on the dataset which we are able to download, load, and use), we will have an accuracy greater than 50% correct predictions.
- Target: In addition to the Base goal, we hope to create a model very similar to that used in the paper, which includes squeeze-and-excitation network as well as residual blocks. Hopefully this achieves higher accuracy than that of the Base model.
- Stretch: If time allows, we hope to generate feature maps in order to understand which segments of the images are most important for classifying images. If time allows further, it would be great to run other interpretability methods on the convolution layers, such as comparing the first layer which will primarily be edge-detecting and other low level features to the last convolutional layer which will have much higher-level image features present such as parts of the human face.
Ethics: Choose 2 of the following bullet points to discuss; not all questions will be relevant to all projects so try to pick questions where there’s interesting engagement with your project. (Remember that there’s not necessarily an ethical/unethical binary; rather, we want to encourage you to think critically about your problem setup.)
What broader societal issues are relevant to your chosen problem space?
- One broader societal issue in relation to facial emotion recognition is potential for privacy and consent concerns. Individuals may not be aware that their facial expressions are being analyzed and their emotional states are being inferred without their consent.
- Furthermore, there could be real consequences for misclassification of an emotion depending on where it's implemented; for example, an individual can be wrongly labeled as threatening.
What is your dataset? Are there any concerns about how it was collected, or labeled? Is it representative? What kind of underlying historical or societal biases might it contain?
- The dataset may not be representative of diverse populations. If the dataset is skewed towards a particular demographic or cultural group, the model may perform poorly on individuals from underrepresented groups or misinterpret their emotional expressions due to cultural differences in facial expressions.
Division of labor: Briefly outline who will be responsible for which part(s) of the project.
- Jillian: Data pre-processing + initial EDA and visualizations
- Natalee: Model building (ResNet and SE block architecture)
- Anna: General help for model creation/optimization + Interpretability Methods if time (feature maps, etc.)
Log in or sign up for Devpost to join the conversation.