posted an update

UPDATE FOR DATASET ETHICS --- CIFAR-10

Our dataset is CIFAR-10 (https://www.cs.toronto.edu/~kriz/cifar.html), a dataset of 60,000 32x32 images from 10 classes made up of airplane, automobiles, bird, cat, deer, dog, frog, horse, ship, and truck. The classes are noted to be mutually exclusive, meaning there is no overlap of the recognized classes. CIFAR-10 is organized by the Canadian Institute for Advanced Research, and the images were collected for labelling from the 80 million tiny images dataset, where each image is of size 32x32.

There are definite concerns regarding the 80 million tiny images dataset, which were collected using nouns from WordNet. The organizers of this image dataset even removed it from access in June, 2020 (https://groups.csail.mit.edu/vision/TinyImages/) due to the presence of “derogatory terms as categories and offensive images”, which are a direct result of using WordNet for the generation of the images. WordNet is a database of English words, and thus contains both typical and sensitive words which influenced the collection of the 80 million tiny images.

While there certainly is concern of the propagation of biases and sensitive graphics into the CIFAR-10 dataset given it is based on the 80 million tiny images dataset, our concerns are softened by the fact that CIFAR-10 has already been prescreened and labelled for the generally acceptable classes and thus we know that overtly offensive images are not a primary concern. However, we are concerned about how our model may perform and use/perpetuate biases to make classifications. For example, if CIFAR-10 has many images unfairly over representing a certain race with a certain classification, the model may learn an unfair association where it classifies the image based on race rather than on the main image itself.

Otherwise, given the main focus of CIFAR-10 is inanimate objects that are relatively benign, and each category has a variety of different images for each classification (old planes, new planes, jets, etc), we believe that CIFAR-10 is representative for our purposes.

Log in or sign up for Devpost to join the conversation.