Inspiration
Recent events in the US have only brought to the forefront, a long-standing racial bias pervading law enforcement. We sought to analyze the historical data of arrests made in 2 major cities, NYC and Chicago, over the past 3 years that might be used by AI developers for potential crime prediction. This data has been provided by government websites and hence is more likely to be widely used without concern for veracity. Furthermore, since most crimes are committed due to financial distress, we try to find a correlation between the arrest statistics and the median household income in the neighborhoods they occur in.
What it does
- Our analysis reveals a clear skew in the arrest data towards people of African American and Hispanic origins, despite the ratio of their population being much lesser. Were there to be a predictive model based on this data to predict the likelihood of crime using perpetrator race information as a feature, it would be highly disadvantageous to these communities. While comparing income, it was seen that on average, areas with a lower median income had higher arrest counts. We also observed a trend of lower median income with an increasing African-American population. This could have stemmed from the systemic racial bias that hindered their employment opportunities and gatekept them from living in more affluent neighborhoods. We conclude that predictive modeling for crime prevention cannot be unbiased in nature if solely based on historical data due to the deep-rooted biases that are reflected in any information collected from the past.
How we built it
- We found the data for crime stats in NYC from here and for Chicago from here. We retrieved demographic and income information for both cities on the US Census website. Pandas, Tableau, and OpenRefine were used for data gathering, cleaning, and analysis.
Challenges we ran into
- Finding the datasets and performing data cleaning
- Enormous dataset size (3M+ arrests in NYC the past 3 years alone) and lack of computing resources to process it
- Framing the problem statement
- Time constraints prevented us from being able to create our own unbiased predictive models.
- The datasets we had lacked the zip codes required to link the crime dataset with the demographic dataset. Therefore, we had to use latitude and longitude to hit the OpenStreetMaps API and retrieve the location's zip code.
Accomplishments that we're proud of
- This is the first hackathon for the 3 of us and completing the submission before the deadline was definitely an accomplishment.
- Exploring the intersection of AI and justice has been a novel and motivating experience for all of us.
- We were able to successfully find correlations between the median income of an area and its incarceration rate.
- In addition, we also concluded that certain races are more prone to incarceration due to the historical disadvantages they have faced.
What we learned
- Using alternative tools for EDA like OpenRefine and Tableau.
What's next for TBD
- Developing a streamlit-based interface to make the bias in such publicly available datasets easily visualizable to all developers
Log in or sign up for Devpost to join the conversation.