We investigated racial bias present in the American media’s reporting of gun-violence-related incidents by analyzing Professor Ellie Pavlick’s gun violence database: http://gun-violence.org/. We did so by training three RNN models on cleaned and tokenized data from the gun-violence database. Within one of the models, we tried to use differences in the word embeddings matrix from the model trained on data from both African-American and Caucasian individuals to show the implicit bias in the data based on the cosine similarities of the embeddings. We also generated two blocks of text, one from the model trained only on incidents involving African-American individuals and another from the model trained only on incidents involving Caucasian individuals, and ran pre-trained sentiment analysis on these generated text blocks to provide further insight as to how the media reporting differs by race in these incidents. We arrived at this topic through our observation that the media tends to portray people of color more negatively than Caucsian people in similar instances. Given the multitude of research coming out on racial bias in natural language processing models, we resolved to look at data which these models might have been trained on and how the language used in those datasets may be biased. We focused on gun-violence related incidents specifically due to Kunal’s involvement with prison education and his interactions with people more directly affected by the loaded language used in the media. Incarcerated individuals expressed that they felt their treatment within the system was dictated by their race. In looking at datasets which contained information on incarcerated individuals, we came across Brown Professor Ellie Pavlick’s Gun Violence Database. We decided to use this database due to its extensive curation of news articles surrounding gun violence and its tabulation of their titles, bodies, races involved, genders involved, locations, etc. Additionally, given that Professor Pavlick teaches at Brown, we thought that working with her dataset would give us another resource if we came across any questions or issues with the data or analysis. In addition to performing these experiments with RNN models, we had hoped to use our data to train GPT-2 and BERT models and perform the cosine similarity analysis and sentiment analysis on these models. Due to time constraints, we were unable to get these models to work, but we plan to continue working on them in the future to gain more insights into the data using these models, which have more complex architectures than our own RNN models.

Final Reflection: https://github.com/kunalhandaUCSB/gun-violence_nlp/blob/main/CS1470%20Final%20Project%20Final%20Reflection.pdf

Second Check-In: https://github.com/kunalhandaUCSB/gun-violence_nlp/blob/main/CS1470%20Final%20Project%20Check-in%202%20Reflection.pdf

Project Outline: https://github.com/kunalhandaUCSB/gun-violence_nlp/blob/main/CSCI%201470%20Final%20Project%20Outline.pdf

GitHub Repo: https://github.com/kunalhandaUCSB/gun-violence_nlp All necessary documents present on this page also accessible in the GitHub repository.

Built With

Share this project:

Updates