Our interest in studying the biases present in image searches for different professions stems from a personal experience we had while growing up. As elementary school students, we were first introduced to different occupations through a poster-making project. We would go through newspapers and magazines to find and cut out images of professionals, which we would then paste onto our posters.
With the internet now accessible to even elementary school students, they mostly rely on websites like Google Images and Pinterest to find images for their projects. However, we have noticed that the images that appear in these searches are often biased towards a particular gender and/or ethnicity. For instance, a young girl of color who is interested in engineering might be discouraged from pursuing this field when she finds that the majority of images she found online are of white men. This can leave a lasting impact on her and influence her career choices in the future.
What it does
The pipeline we built consists of two modules - face detection model and a gender and ethnicity detection model. The input is a set of images of any group and output gives the breakdown of the ethnical and gender distribution among the images.
How we built it
We scraped the web and stored the first 100 or so search results for some of the most common profession related neutral queries. This data is then fed into a pre trained face detection model which crops and stores the faces from the dataset into a different folder. This data is passed through the ethnicity and gender predictor which gives the distribution of ethnicity and gender in the data.
Challenges we ran into
A big challenge was collecting the data and cleaning the datasets to remove irrelevant images like logos or stethoscopes from the dataset which had to be done manually.
Accomplishments that we're proud of
At the end of this study we found some very interesting statistics. Even in today's day and age we see specific ethnicities and genders being underrepresented on search results. We are proud that we were able to find actual bias in data at the end of this study
What we learned
Through this project, we got to learn a lot of skills like web scraping, and using keras for deploying deep learning models. Apart from skills we also learned that we have a long way to go until we have true unbiased data on the internet.
What's next for Uncovering Bias in Visual Media
Moving forward, there is a lot of scope for this project. Our initial study consists the analysis of a very small number of queries. Data from a variety of queries pertaining to profession and even other queries on different topics can be analyzed to give a better idea of how biased online data is in today's age