CNN You See What I See: Interpreting Deep Learning Models

Poster for CNN you see what I see?

This project aims to enhance the interpretability of convolutional neural networks. It uses unsupervised learning to segment images and classification to construct the concept activation vectors.

You can find our final report here: Final Report: CNN You See What I See?

You can find our midpoint report here: Midpoint Report: CNN You See What I See?

You can find our github repository here: CNN You See What I See? Pytorch Implementation

Inspiration

We are implementing the paper “Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)”. This project aims to develop procedures for interpretability of deep neural networks. In particular, the paper attempts to show human-interpretable concepts, such as stripes or wheels. Dr. Been Kim’s lecture inspired us to choose this paper. This topic was interesting for us as we believe that interpretability of neural networks is crucial to a trustable deployment in the industry and may help researchers further isolate and resolve biases within models and data. Moreover, we found the methodology of the research fascinating and thought that it would be a good final project to get us comfortable with manipulating and assessing deep neural networks (https://github.com/tensorflow/tcav).

The paper “Interpretability Beyond Feature Attribution” (Kim et.al) is the primary paper for this project. The authors aimed to introduce the idea of concept activation vectors and use them to view the inner workings of a deep learning model. They use CAVs as part of Testing with CAVs, (TCAV). The point of the paper was to derive CAVs by training a linear classifier between a concept’s examples and random counterexamples, then taking the orthogonal vector. The authors were successful in this, as they were able to sort images using CAVs. One specific example they use is sorting pictures of ties by how they correspond to “Model Women”. The CAVs correctly identify pictures of women wearing ties as more corresponding to “Model Women”. TCAV also provided confirmation of previous papers, as it confirmed that stripes is an important characteristic for zebras, red is an important characteristic for fire engines and so on.

The paper uses a metric called a TCAV score in order to evaluate their CAVs. In the following equation, k is a particular class label, and Xk corresponds to images with that label. Essentially, it measures the proportion of k-class inputs that have an activation vector that is affected by the concept in question.

Another paper that is particularly relevant to implementing TCAVs and understanding interpretability of deep learning models is the paper “Towards Automatic Concept-based Explanations” (Ghorbani et.al). Ghorbani et.al discuss an algorithm called ACE, which automatically extracts visual concepts from an image given a concept activation vector. Ghorbani’s work builds upon Dr. Been Kim’s paper by extending the methods of interpreting deep learning models in a direction that allows humans to directly visualize the concepts that a model views as the most important in a given task (ex. Classifying an image). The ACE algorithm first takes a number of images from the same class and divides them into numerous segments. Then, using a layer from a CNN, clusters them into groups of similar elements (ex. Striped segments in pictures of zebras). Afterwards, a TCAV importance score is computed for each group of segments. To test the implementation of this algorithm, Ghorbani conducted a number of human trials to determine how well the ACE algorithm was clustering similar segments by having humans interpret the groupings. In addition, as a validation set, Ghorbani selected several of the most important concepts used in classification and removed them from a set of given images. In accordance with the behavior of the neural network, these images were classified highly inaccurately, thus validating the accuracy of the ACE algorithm (https://github.com/amiratag/ACE).

What it does

We want to find CAVs for a variety of different concepts on subsets of ImageNet. For instance, we want to find a concept vector that represents the notion of stripes when used in identifying zebras vs. horses.

Base Goal: Reimplement TCAV on existing model

Target Goal: Reimplement ACE

Stretch Goal: Utilize ACE to be used in facial recognition, specifically sentiment associated with facial recognition

In our project, accuracy does not really apply to interpretability. The important part of the project is that we produce results that make sense to us. We need to be able to understand the results. Instead, we would use a two-sided t-test on the TCAV scores obtained from training the model on some number of samples to determine whether the concept was behaving consistently across samples.

How we built it

We trained this using data from ImageNet. ImageNet contains millions of images. However, we will only use a small subset of this data to try and automatically develop new human interpretable concepts (e.g. stripes). This subset will likely be a few hundred to thousand photos. We are training linear classifiers at various activation layers of the pretrained InceptionV3. These binary classifiers will distinguish our “concept data” (e.g. stripes) from random images.

Challenges we ran into

One challenge that we countered so far was getting the image data and separating it into a directory containing random images, and directories containing various concepts (zigzag, striped, dotted textures). This was difficult because we needed to find an appropriate subset of Imagenet that contained the necessary distribution of target and concept images. Once we found this set, called Broden, we had to understand how to extract a random set of images given a CSV, as well as separate out the ones that represent particular textures.

Another challenge we encountered was running into low accuracy (~52%) and incorrect loss when training our linear classifier to differentiate between vectors of a particular concept and vectors of random images. This problem was resolved as we obtained more concept images and properly included random images from the Broden dataset, increasing our accuracy to ~93%.

Lastly, a major challenge has been understanding how to implement TCAV. Specifically, we find the task of taking the directional derivative of an image vector with respect to a concept activation vector to be especially difficult. We hope to better understand the paper and the source code in order to confront this issue.

Accomplishments that we're proud of

Our CAV model is performing up to our expectations, having achieved an accuracy of 93% when classifying concept vectors and vectors of random images.

What we learned

What's next for CNN You See What I See: Interpreting Deep Learning Models

So far, we have finished preprocessing. However, we have some hard-coded portions for filepaths, so we need to update them. We also need to expand our dataset to train to increase our accuracy. Our CAV implementation is done with a good accuracy (~93%). However, as mentioned previously, it took some time to debug our CAV and increase its accuracy. That’s why we only have the idea and somewhat of a crude implementation of TCAV. We will be spending more time finishing up TCAV and making our code more efficient in general. Once we are done with TCAV, if we have the time, we will work on our reach goal to automasize the process.

Ethics Questions

What broader societal issues are relevant to your chosen problem space?

Machine Learning and Deep Learning act as an outsource for decision-making in various aspects of daily life as well as numerous industries. These fields include but are not limited to healthcare, agriculture, military training, finance, education, recruiting, engineering (including autonomous vehicles), and even musical composition. The most common factors of motivation for these fields are the efficiency and speed of decision-making through ML and DL scripts. However, the current black-box nature of such algorithms makes it quite challenging to study the direct implications of the written code. In this context, Lo Piano (2020) says that “interpretability is [...] sacrificed in favour of usability and effectiveness.” However, it is crucial to shed light on what is actually going on in the decision process not only to increase accuracy but also to ensure fairness, accountability, transparency, and reduce bias. Towards this goal, here are some questions to answer: (1) what is the extent to which the ML/DL mechanisms influence the final decision; (2) what is the dataset and its source; (3) does it reflect explicit/implicit biases, if so, how; (4) what are the parameters, how are they weighted, and (5) which concepts are highlighted in the decision-making process? In our attempt to answer some of these questions, we are studying TCAVs and ACEs and trying to implement them ourselves.

Lo Piano, S. Ethical principles in machine learning and artificial intelligence: cases from the field and possible ways forward. Humanit Soc Sci Commun 7, 9 (2020). https://doi.org/10.1057/s41599-020-0501-9

What is your dataset? Are there any concerns about how it was collected, or labeled? Is it representative? What kind of underlying historical or societal biases might it contain?

We are using ImageNet as our database, which is an “image database organized according to the WordNet hierarchy (currently only the nouns), in which each node of the hierarchy is depicted by hundreds and thousands of images.” as described on their website. Even though this is a huge dataset mainly collected for CV, we still had concerns about its representativeness and the implicit societal biases it may reflect on the algorithms. For example, when the word “CEO” is looked up, do men and women get the same amount of representation? What about other gendered words? Gender stereotypes? Yang et al. (2020) consider three key factors in their study with regards to the person subtree of ImageNer, which may lead to problematic behavior: (1) the stagnant concept vocabulary of WordNet, (2) the attempt at an exhaustive illustration of all categories with images (what happens to non-visual concepts), and (3) the inequality of representation in the images within concepts. In their research, Yang et al. tackle these questions and take steps towards debiasing the person subset (which itself is another problem that requires further investigation). Here, we believe that TCAVs and our project by association will help identify the effect of biases in decision-making. The original paper for TCAV, cited above in the previous parts, also mentioned that this implementation could be useful in identifying the biases embedded in the dataset. Going back to the CEO example, if (our automated) TCAVs indicate that gender (or gender-related cues) is a statistically significant category to label images as “CEO,” we may be able to take a step towards identifying biases both in the datasets and the learning algorithms and debiasing them.

Yang, K., Qinami, K., Fei-Fei, L., Deng, J., & Russakovsky, O. (2020, January). Towards fairer datasets: Filtering and balancing the distribution of the people subtree in the imagenet hierarchy. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (pp. 547-558).